The Higgs-tts-2-3b-base Model: A Text-to-Speech Foundation Model

higgs-tts-2-3b-base is a text-to-speech foundation model built by bosonai that generates expressive, natural-sounding speech from text input. The model consists of 5.8 billion total parameters: a 3.6B parameter Llama-3.2-3B backbone enhanced with a 2.2B DualFFN audio adapter that operates with identical training and inference computational cost to the base LLM.

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.