ProsodyLM: Uncovering the Emerging Prosody Processing Capabilities in Speech Language Models
- Audio Demo
This webpage shows some audio examples for ProsodyLM, a speech language model with explicit modeling
of prosody. The following demos correspond to Section 4.2 in the original paper, showcasing
how ProsodyLM can generate appropriate prosody given the text context.
Two groups of methods are compared:
- Group A Baselines, which are pre-trained on comparable audiobook data
- Group B Baselines, which are state-of-the-art/commercial systems trained
using data with much higher quality and volume, and undergo additional tuning steps.