Important Dates
January 3, 2025: Submissions open (see Call for Papers)February 3, 2025: Submission deadline
March 5, 2025: Decisions released
April 27 or 28, 2025 (TBD): MLDPR 2025 takes place at ICLR in Singapore
Submission Portal
About MLDPR
Datasets are a central pillar of machine learning (ML) research—from pretraining to evaluation and benchmarking. However, a growing body of work highlights serious issues throughout the ML data ecosystem, including the under-valuing of data work, ethical issues in datasets that go undiscovered, a lack of standardized dataset deprecation procedures, the (mis)use of datasets out-of-context, an overemphasis on single metrics rather than holistic model evaluation, and the overuse of the same few benchmark datasets. Thus, developing guidelines, goals, and standards for data practices is critical; beyond this, many researchers have pointed to a need for a more fundamental culture shift surrounding data and benchmarking in ML.
This workshop aims to facilitate a broad conversation about the impact of ML datasets on research, practice, and education—working to identify current issues, propose new techniques, and establish best practices throughout the ML dataset lifecycle. In particular, we highlight the role of data repositories in ML—administrators of these repositories, including OpenML, HuggingFace Datasets, and the UCI ML Repository, will contribute their perspective on how ML datasets are created, documented, and used and discuss the practical challenges of implementing and enforcing best practices on their platforms. By involving representatives from three major ML repositories and influential researchers from ML, law, governance, and the social sciences, our intent is that this workshop can serve as a catalyst for real positive changes to the ML data ecosystem.