DEEM: Workshop on Data Management for End-to-End Machine Learning @ ACM SIGMOD 2026

About

DEEM brings together researchers and practitioners at the intersection of applied machine learning, data management, and systems research, with the goal of discussing the arising data management issues in ML application scenarios. The DEEM workshop will be held on Friday, June 5th, in conjunction with SIGMOD/PODS 2026. The workshop will be held in person in Bengaluru.

The workshop solicits regular research papers (8 pages plus unlimited references) describing preliminary or completed research results, as well as short papers (up to 4 pages) such as reports on applications and tools, or preliminary results, interesting use cases, problems, datasets, benchmarks, visionary ideas, and descriptions of system components and tools related to end-to-end ML pipelines. Submissions should follow the guidelines as for SIGMOD, i.e., use the sigconf template for the ACM proceedings format.

Follow us on twitter @deem_workshop, bluesky @deem-workshop.bsky.social, or contact the organizers via email. We also provide archived websites of previous versions of the workshop: DEEM 2017, DEEM 2018, DEEM 2019, DEEM 2020, DEEM 2021, DEEM 2022, DEEM 2023, DEEM 2024, and DEEM 2025.

Schedule

June 5th 2026 (all times are in Bangalore Time / IST)

09:00 - 10:30

Session 1

09:00

Workshop Intro

09:10

Keynote: From Sight to Insight: Visual Memory for Smarter Assistants
Xin Luna Dong (Meta)

Imagine a personal assistant that, with user permission, persistently remembers moments from daily life—answering questions like “When and where did I see this lady?” or offering personalized suggestions like “You might enjoy The Little Prince—it relates to the statue you liked in Lyon.” Realizing this vision requires overcoming major challenges: capturing visual memories under hardware constraints (e.g., memory, battery, thermal limits, bandwidth), extracting meaningful personalization signals from noisy, task-agnostic visual histories, and supporting real-time question answering and recommendations under tight latency requirements. In this talk, we present our early work toward this goal. Pensieve, our memory-based QA system, improves accuracy by 11% over state-of-the-art multimodal RAG baselines. VisualLens infers user interests from casual photos, outperforming leading recommendation systems by 5–10%. We also share initial results on efficient, event-triggered memory capture and compression. Our work points to a broad landscape of research opportunities in building richer, more context-aware personal assistants capable of learning from and reasoning over users’ visual experiences.

10:00

Break / Buffer

10:05

Paper Talk: Data Understanding for Agents using Schema Grounding
Udayan Khurana (IBM Research)

10:30 - 11:00

Coffee Break

11:00 - 12:30

Session 2

11:00

Panel: 10th year DEEM: Looking back, and looking forward
Moderation: Aditya Parameswaran (UC Berkeley) Panelists: Luna Dong (Meta), Shreya Shankar (UC Berkeley), Sudeepa Roi (Duke University), Immanuel Trummer (Cornell University), Stefan Grafberger (Snowflake)

11:50

Paper Talk: End-to-End Auditing of ML Pipelines through Data Provenance and Model Explainability
Pasquale Leonardo Lazzaro (Università degli Studi Roma Tre), Elia Guglielmi (Università degli Studi Roma Tre), Paolo Missier (University of Birmingham), Riccardo Torlone (Università degli Studi Roma Tre)

12:10

Invited Talk: Stochastic Submodular Data Forgetting
Ramon Rico Cuevas (Utrecht University)

12:30 - 13:30

Lunch Break

13:30 - 15:00

Session 3

13:30

Keynote: Do we still need Databases in an AI World?
Carsten Binnig (TU Darmstadt & DFKI)

14:20

Invited Talk: SemBench: A Benchmark for Semantic Query Processing Engines
Immanuel Trummer (Cornell University)

14:40

Invited Talk: Cortex AISQL: A Production SQL Engine for Unstructured Data
Pawel Liskowski (Snowflake), Benjamin Han (Snowflake)

15:00 - 15:30

Coffee Break

15:30 - 17:00

Session 4