Media Summary: At Ray Summit 2025, Avin Regmi and Matan Appelbaum from Netflix share architectural patterns for processing petabyte-scale, ... At Ray Summit 2025, Jacob Huffman and Hao Wang from NVIDIA share how Roblox built a modern ML platform on Ray and ... Join Discord to discuss about this paper/channel: Title: MINT-1T: Scaling Open-Source ...

Data Curation For Multi Modal - Detailed Analysis & Overview

At Ray Summit 2025, Avin Regmi and Matan Appelbaum from Netflix share architectural patterns for processing petabyte-scale, ... At Ray Summit 2025, Jacob Huffman and Hao Wang from NVIDIA share how Roblox built a modern ML platform on Ray and ... Join Discord to discuss about this paper/channel: Title: MINT-1T: Scaling Open-Source ... This talk describes the development of a standard At Ray Summit 2025, Pablo Delgado from Netflix and Lei Xu from LanceDB share how they are transforming the construction and ... High-quality medical data is essential for building reliable medical AI systems. In this work, we explore how careful

[2025 - Day 2 - Foundation Models] Ethan Rosenthal shares insights from building a petabyte-scale 2025 Scaling Multimodal Data Curation with Ray and LanceDB RaySummit 2025 Knowledge distillation (KD) is the de facto standard for compressing large-scale models into smaller ones. Prior works have ... This handy ICPSR 101 video quickly explains the intricacies of the work our Note: I may be compensated, but you will not be charged, if you click on the links below. This video show examples of In this engaging talk, we explore the crucial part universities and the open-source community contribute to the Generative AI ...

In this video, we'll guide you through the process of creating effective and well- Ari Morcos is the cofounder and CEO of Datology, an automated Generative AI thrives on diverse, well-curated data. Encord offers one of the best

Photo Gallery

NVIDIA NeMo Curator: Scaling Multi-Modal Data Curation Workflows | Ray Summit 2025
NVIDIA’s Framework for Scalable Data Curation | Ray Summit 2025
[2024 Best AI Paper] MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with
The Daft distributed Python data engine: multimodal data curation at any scale // Jay Chia // DE4AI
Data curation for multi-modal cohort datasets: The DPUK data model | A/Professor Sarah Bauermeister
Scaling Multimodal Data Curation with Ray and LanceDB | Ray Summit 2025
Tim Ossowski - OctoMed: Data Recipes for State of the Art Multimodal Medical Reasoning
Building a Data Foundation for Multimodal Foundation Models
2025 Scaling Multimodal Data Curation with Ray and LanceDB RaySummit 2025
Computer Vision Meetup: Active Data Curation Effectively Distills Large-Scale Multimodal Models
ICPSR 101: What is Data Curation?
How do Multimodal AI models work? Simple explanation
Sponsored
Sponsored
View Detailed Profile
NVIDIA NeMo Curator: Scaling Multi-Modal Data Curation Workflows | Ray Summit 2025

NVIDIA NeMo Curator: Scaling Multi-Modal Data Curation Workflows | Ray Summit 2025

At Ray Summit 2025, Avin Regmi and Matan Appelbaum from Netflix share architectural patterns for processing petabyte-scale, ...

NVIDIA’s Framework for Scalable Data Curation | Ray Summit 2025

NVIDIA’s Framework for Scalable Data Curation | Ray Summit 2025

At Ray Summit 2025, Jacob Huffman and Hao Wang from NVIDIA share how Roblox built a modern ML platform on Ray and ...

Sponsored
[2024 Best AI Paper] MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with

[2024 Best AI Paper] MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with

Join Discord to discuss about this paper/channel: https://discord.gg/nPUm3ThuBc Title: MINT-1T: Scaling Open-Source ...

The Daft distributed Python data engine: multimodal data curation at any scale // Jay Chia // DE4AI

The Daft distributed Python data engine: multimodal data curation at any scale // Jay Chia // DE4AI

Abstract It's 2024 but

Data curation for multi-modal cohort datasets: The DPUK data model | A/Professor Sarah Bauermeister

Data curation for multi-modal cohort datasets: The DPUK data model | A/Professor Sarah Bauermeister

This talk describes the development of a standard

Sponsored
Scaling Multimodal Data Curation with Ray and LanceDB | Ray Summit 2025

Scaling Multimodal Data Curation with Ray and LanceDB | Ray Summit 2025

At Ray Summit 2025, Pablo Delgado from Netflix and Lei Xu from LanceDB share how they are transforming the construction and ...

Tim Ossowski - OctoMed: Data Recipes for State of the Art Multimodal Medical Reasoning

Tim Ossowski - OctoMed: Data Recipes for State of the Art Multimodal Medical Reasoning

High-quality medical data is essential for building reliable medical AI systems. In this work, we explore how careful

Building a Data Foundation for Multimodal Foundation Models

Building a Data Foundation for Multimodal Foundation Models

[2025 - Day 2 - Foundation Models] Ethan Rosenthal shares insights from building a petabyte-scale

2025 Scaling Multimodal Data Curation with Ray and LanceDB RaySummit 2025

2025 Scaling Multimodal Data Curation with Ray and LanceDB RaySummit 2025

2025 Scaling Multimodal Data Curation with Ray and LanceDB RaySummit 2025

Computer Vision Meetup: Active Data Curation Effectively Distills Large-Scale Multimodal Models

Computer Vision Meetup: Active Data Curation Effectively Distills Large-Scale Multimodal Models

Knowledge distillation (KD) is the de facto standard for compressing large-scale models into smaller ones. Prior works have ...

ICPSR 101: What is Data Curation?

ICPSR 101: What is Data Curation?

This handy ICPSR 101 video quickly explains the intricacies of the work our

How do Multimodal AI models work? Simple explanation

How do Multimodal AI models work? Simple explanation

Multimodality is the ability of an AI

Data Curation Examples

Data Curation Examples

Note: I may be compensated, but you will not be charged, if you click on the links below. This video show examples of

GenAI and Datacomp: Creating the Largest Public Multimodal Dataset in Academia

GenAI and Datacomp: Creating the Largest Public Multimodal Dataset in Academia

In this engaging talk, we explore the crucial part universities and the open-source community contribute to the Generative AI ...

Richard Decal - Data Curation: Transforming Bytes into AI Gold

Richard Decal - Data Curation: Transforming Bytes into AI Gold

Everyone obsesses about big

Master Encord's Multimodal AI Data Platform: Multimodal Curation & Annotation

Master Encord's Multimodal AI Data Platform: Multimodal Curation & Annotation

In this video, we'll guide you through the process of creating effective and well-

MedAI #155: Multimodal AI for Precision Oncology: From Data Integration to CDS  | Asim & Aakash

MedAI #155: Multimodal AI for Precision Oncology: From Data Integration to CDS | Asim & Aakash

Title: From

Algorithmic Data Curation for LLMs | Ari Morcos

Algorithmic Data Curation for LLMs | Ari Morcos

Ari Morcos is the cofounder and CEO of Datology, an automated

Best Data Curation Tools for Generative AI (GenAI) | Encord vs others

Best Data Curation Tools for Generative AI (GenAI) | Encord vs others

Generative AI thrives on diverse, well-curated data. Encord offers one of the best

[QA] Data curation via joint example selection further accelerates multimodal learning

[QA] Data curation via joint example selection further accelerates multimodal learning

Jointly selecting batches of