Reference the original paper: Drossos, K., Lipping, S., & Virtanen, T. (2020). "Clotho: an Audio Captioning Dataset." Proc. IEEE ICASSP, pp. 736-740 .

The dataset is hosted by the and can be accessed through platforms like Zenodo .

Mention the diversity of the audio (natural sounds, urban environments, etc.) and the linguistic variety of the captions.

Categorized into development, validation, and evaluation sets for training and testing machine learning models. 📥 How to Download