Vol. 19, No. 8, August 31, 2025
10.3837/tiis.2025.08.012,
Download Paper (Free):
Abstract
Recent advancements in music generation have been marked by significant progress, particularly with the introduction of MusicLM, a state-of-the-art system employing a three-tiered language model to sequentially process semantic, coarse acoustic, and fine acoustic representations. Despite its impressive performance, MusicLM’s reliance on multi-stage processing renders it computationally expensive and unsuitable for real-time applications. To address this, we propose LGAS (Language-Guided Audio Synthesis), a novel framework that significantly reduces generation time by up to 95.7% for 10-second audio and 99.6% for 30-second audio while maintaining comparable output quality. LGAS retains the highest-level semantic modeling of MusicLM and introduces a dual-path diffusion (DPD) architecture integrated with an audio VAE-GAN decoder. This design enables simultaneous capture of both coarse and fine acoustic features via cross-attention at each denoising step, effectively embedding semantic information into the latent space. Experimental results show that LGAS outperforms prior models in generation speed, continuity, musical fidelity, and prompt relevance, setting a new benchmark for efficient text-to-music generation.
Statistics
Show / Hide Statistics
Statistics (Cumulative Counts from December 1st, 2015)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.
Cite this article
[IEEE Style]
D. Zihan, M. R. Islam, N. Alam, "LGAS: Language-Guided Audio Synthesis via Dual-Path Diffusion and Semantic Modeling," KSII Transactions on Internet and Information Systems, vol. 19, no. 8, pp. 2630-2649, 2025. DOI: 10.3837/tiis.2025.08.012.
[ACM Style]
Ding Zihan, Md Rakibul Islam, and Nur Alam. 2025. LGAS: Language-Guided Audio Synthesis via Dual-Path Diffusion and Semantic Modeling. KSII Transactions on Internet and Information Systems, 19, 8, (2025), 2630-2649. DOI: 10.3837/tiis.2025.08.012.
[BibTeX Style]
@article{tiis:103081, title="LGAS: Language-Guided Audio Synthesis via Dual-Path Diffusion and Semantic Modeling", author="Ding Zihan and Md Rakibul Islam and Nur Alam and ", journal="KSII Transactions on Internet and Information Systems", DOI={10.3837/tiis.2025.08.012}, volume={19}, number={8}, year="2025", month={August}, pages={2630-2649}}