Fine-tuning the Model
Single-Card Training
weclone-cli train-sftFor single-card training in a multi-card environment, you first need to run
export CUDA_VISIBLE_DEVICES=0
The training script will read the configuration from settings.jsonc and start fine-tuning. Pay attention to the terminal output to see if the loss is decreasing normally.
Multi-Card Training
If you have multiple NVIDIA GPUs and want to perform multi-card training:
Install Deepspeed:
bashuv pip install deepspeedConfigure Deepspeed
Insettings.jsonc, find thedeepspeedconfiguration item and uncomment it or fill in the path to the Deepspeed JSON configuration file as needed.Start multi-card training:
bashdeepspeed --num_gpus=<number_of_gpus_to_use> weclone/train/train_sft.py
After the training is complete, the fine-tuned LoRA adapter weights will be saved in the output_dir you specified in settings.jsonc.
Enable QLoRA (Optional Configuration)
If you want to further reduce video memory consumption, you can enable QLoRA quantized training.
WARNING
The weights fine-tuned using the quantized model cannot be merged back into the original model.
Add the following configuration to the common_args field in settings.jsonc:
"quantization_bit": 4,
"quantization_type": "nf4",
"double_quantization": true,
"quantization_method": "bitsandbytes"NOTE
quantization_bitsupports values: 2 / 4 / 8. The lower the value, the more video memory is saved, but the inference speed and effect may be slightly reduced.If you encounter the error
ImportError: Please install bitsandbytes>=0.45.3, you can try reinstallingbitsandbytes:bashuv pip install bitsandbytes>=0.39.0