6 pre-trained base models for SoVITS 4.0 voice conversion, featuring 768-dimensional vectors and layer 12 configurations. These models were trained on the m4singer and vctk datasets, reaching up to 320,000 training steps with loss values as low as 14.1.
Use Cases
- Fine-tune a singing voice conversion model using the m4singer-trained base weights
- Initialize a SoVITS 4.0 training session by renaming the 216k step checkpoint to G_0.pth and D_0.pth
- Compare voice synthesis quality between the 14.75 loss (294k steps) and 14.1 loss (144k steps) models
Strengths
- Includes checkpoints at 100k, 216k, and 320k training steps
- Features a high-performance model trained on NVIDIA A10 hardware with a loss of 14.1
- Utilizes 768-dimensional vector embeddings from layer 12 of the feature extractor
- Requires renaming files to D_0.pth and G_0.pth for integration into the SoVITS 4.0 framework