TalkNet-ASD is an audio-visual active speaker detection model that labels speaking faces and outputs JSON speaker tracks for real-world footage.
TalkNet-ASD is an audio-visual active speaker detection model that labels speaking faces and outputs JSON speaker tracks for real-world footage.