ProsodyLM: Uncovering the Emerging Prosody Processing Capabilities in Speech Language Models - Audio Demo

This webpage shows some audio examples for ProsodyLM, a speech language model with explicit modeling of prosody. The following demos correspond to Section 4.2 in the original paper, showcasing how ProsodyLM can generate appropriate prosody given the text context.

Intro animation

Two groups of methods are compared: