BigDocs2025 · 1 min read Code PDFBigDocs is a large scale, open, and permissively licensed dataset for training multimodal models on document understanding and code generation tasks. Published at ICLR 2025.Last updated on Apr 8, 2026Multimodal AI Datasets AuthorsDavid VázquezI study how machines learn to act in the world. ← Apriel Model Family Mar 1, 2025WorkArena and BrowserGym Jul 1, 2024 →