Generative audio model that can produce highly realistic speech, music, and sound effects. Supports non-verbal sounds like laughter and sighing.