TikZ drawings and natural language captions are paired to facilitate the automated generation of LaTeX-based diagrams. This public version excludes certain drawings due to licensing but provides tools for full dataset recreation via the DaTikZ repository.
Use Cases
- Train a text-to-code model to generate TikZ source code from natural language captions.
- Fine-tune language models on LaTeX syntax to improve the generation of technical diagrams.
- Develop automated captioning systems that generate text descriptions from TikZ code snippets.
Strengths
- Pairs TikZ vector graphics source code with descriptive natural language captions.
- Includes tools and methods for dataset recreation via the DaTikZ repository.
- Accessible via the Hugging Face datasets library using the 'nllg/datikz-v3' identifier.