LGAS: Language-Guided Audio Synthesis via Dual-Path Diffusion and Semantic Modeling

Ding Zihan; Md Rakibul Islam; Nur Alam

LGAS: Language-Guided Audio Synthesis via Dual-Path Diffusion and Semantic Modeling

Vol. 19, No. 8, August 31, 2025

10.3837/tiis.2025.08.012, Download Paper (Free):

Abstract

Recent advancements in music generation have been marked by significant progress, particularly with the introduction of MusicLM, a state-of-the-art system employing a three-tiered language model to sequentially process semantic, coarse acoustic, and fine acoustic representations. Despite its impressive performance, MusicLM’s reliance on multi-stage processing renders it computationally expensive and unsuitable for real-time applications. To address this, we propose LGAS (Language-Guided Audio Synthesis), a novel framework that significantly reduces generation time by up to 95.7% for 10-second audio and 99.6% for 30-second audio while maintaining comparable output quality. LGAS retains the highest-level semantic modeling of MusicLM and introduces a dual-path diffusion (DPD) architecture integrated with an audio VAE-GAN decoder. This design enables simultaneous capture of both coarse and fine acoustic features via cross-attention at each denoising step, effectively embedding semantic information into the latent space. Experimental results show that LGAS outperforms prior models in generation speed, continuity, musical fidelity, and prompt relevance, setting a new benchmark for efficient text-to-music generation.

Statistics

Show / Hide Statistics

Cite this article

[IEEE Style]

D. Zihan, M. R. Islam, N. Alam, "LGAS: Language-Guided Audio Synthesis via Dual-Path Diffusion and Semantic Modeling," KSII Transactions on Internet and Information Systems, vol. 19, no. 8, pp. 2630-2649, 2025. DOI: 10.3837/tiis.2025.08.012.

[ACM Style]

Ding Zihan, Md Rakibul Islam, and Nur Alam. 2025. LGAS: Language-Guided Audio Synthesis via Dual-Path Diffusion and Semantic Modeling. KSII Transactions on Internet and Information Systems, 19, 8, (2025), 2630-2649. DOI: 10.3837/tiis.2025.08.012.

[BibTeX Style]

@article{tiis:103081, title="LGAS: Language-Guided Audio Synthesis via Dual-Path Diffusion and Semantic Modeling", author="Ding Zihan and Md Rakibul Islam and Nur Alam and ", journal="KSII Transactions on Internet and Information Systems", DOI={10.3837/tiis.2025.08.012}, volume={19}, number={8}, year="2025", month={August}, pages={2630-2649}}

LGAS: Language-Guided Audio Synthesis via Dual-Path Diffusion and Semantic Modeling

Abstract

Statistics

Cite this article

[IEEE Style]

[ACM Style]

[BibTeX Style]

Unified Search
(in title, author, abstract, and keywords)

Category Search

LGAS: Language-Guided Audio Synthesis via Dual-Path Diffusion and Semantic Modeling

Abstract

Statistics

Cite this article

[IEEE Style]

[ACM Style]

[BibTeX Style]

Unified Search (in title, author, abstract, and keywords)

Category Search

Unified Search
(in title, author, abstract, and keywords)