FastSVC: Fast Cross-Domain Singing Voice Conversion with Feature-wise Linear Modulation

Songxiang Liu *, Yuewen Cao *, Na Hu, Dan Su, Helen Meng
Human-Computer Communications Laboratory, The Chinese University of Hong Kong
Tencent AI Lab

*Work done during internship at Tencent AI Lab.

Abstract

This paper presents FastSVC, a light-weight cross-domain sing voice conversion (SVC) system, which is able to achieve high conversion performance, with inference speed 4x faster than real time on CPUs. FastSVC uses Conformer based phoneme recognizer to extract singer-agnostic linguistic features from singing signals. A feature-wise linear modulation based generator is used to synthesize waveform directly from linguistic features, leveraging information from sine-excitation signals and loudness features. The waveform generator can be trained conveniently using a multi-resolution spectral loss and an adversarial loss. Experimental results show that the proposed FastSVC system, compared with a computationally heavy baseline system, can achieve comparable conversion performance in some scenarios and significantly better conversion performance in other scenarios. Moreover, the proposed FastSVC system achieves desirable cross-lingual singing conversion performance. Inference speed of the FastSVC system is 3x and 70x faster than the baseline system on GPUs and CPUs, respectively.

Brief introduction

Compared systems

Any-to-One Cross-domain (A2O-CD) singing voice conversion

Target speech reference samples from LJ-Speech.

LJ002-0271 LJ010-0295 LJ028-0335 LJ031-0224
Source UCD-SVC FastSVC (Ours)

Any-to-Many Cross-domain (A2M-CD) singing voice conversion

Source References (VCTK) UCD-SVC FastSVC (Ours)

Any-to-Many In-domain (A2M-ID) singing voice conversion

Female source singer

Source sample from ADIZ (NUS-48E)
Referneces (NUS-48E) UCD-SVC FastSVC (Ours)

Male source singer

Source sample from VKOW (NUS-48E)
Referneces (NUS-48E) UCD-SVC FastSVC (Ours)

Cross-lingual (CL) singing voice conversion

Female source singer

Chinese Source sample
Referneces (NUS-48E) UCD-SVC FastSVC (Ours)

Male source singer

Chinese Source sample
Referneces (NUS-48E) UCD-SVC FastSVC (Ours)