98 lines
2.1 KiB
Markdown
98 lines
2.1 KiB
Markdown
|
|
# 前言
|
|||
|
|
|
|||
|
|
|
|||
|
|
使用环境:
|
|||
|
|
|
|||
|
|
- Anaconda 3
|
|||
|
|
- Python 3.8
|
|||
|
|
- Pytorch 1.13.1
|
|||
|
|
- Windows 10 or Ubuntu 18.04
|
|||
|
|
|
|||
|
|
# 项目特性
|
|||
|
|
|
|||
|
|
1. 支持模型:EcapaTdnn、TDNN、Res2Net、ResNetSE
|
|||
|
|
2. 支持池化层:AttentiveStatsPool(ASP)、SelfAttentivePooling(SAP)、TemporalStatisticsPooling(TSP)、TemporalAveragePooling(TAP)
|
|||
|
|
3. 支持损失函数:AAMLoss、AMLoss、ARMLoss、CELoss
|
|||
|
|
4. 支持预处理方法:MelSpectrogram、Spectrogram、MFCC
|
|||
|
|
|
|||
|
|
|
|||
|
|
## 安装环境
|
|||
|
|
|
|||
|
|
- 首先安装的是Pytorch的GPU版本,如果已经安装过了,请跳过。
|
|||
|
|
|
|||
|
|
```shell
|
|||
|
|
conda install pytorch==11.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.6 -c pytorch -c nvidia
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
- 安装ppvector库。
|
|||
|
|
|
|||
|
|
使用pip安装,命令如下:
|
|||
|
|
|
|||
|
|
```shell
|
|||
|
|
python -m pip install mvector -U -i https://pypi.tuna.tsinghua.edu.cn/simple
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
# 使用指南
|
|||
|
|
|
|||
|
|
## 1. 环境准备
|
|||
|
|
### 1.1 安装依赖
|
|||
|
|
```shell
|
|||
|
|
# 使用conda创建环境(可选)
|
|||
|
|
conda create -n voiceprint python=3.8
|
|||
|
|
conda activate voiceprint
|
|||
|
|
|
|||
|
|
# 安装项目依赖
|
|||
|
|
pip install -r requirements.txt
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 1.2 准备音频数据
|
|||
|
|
- 在`audio_db/`目录存放注册语音(建议16kHz单通道wav格式)
|
|||
|
|
- 测试音频建议存放至`test_audio/`目录
|
|||
|
|
|
|||
|
|
## 2. 核心功能使用
|
|||
|
|
|
|||
|
|
### 2.1 训练声纹模型
|
|||
|
|
```shell
|
|||
|
|
python train.py \
|
|||
|
|
--config_path configs/ecapa_tdnn.yml \
|
|||
|
|
--augmentation_config configs/augmentation.json \
|
|||
|
|
--save_dir models/
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 2.2 声纹注册入库
|
|||
|
|
```python
|
|||
|
|
from mvector import MVector
|
|||
|
|
mvector = MVector()
|
|||
|
|
mvector.register_user(name="user1", audio_path="audio_db/user1.wav")
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 2.3 实时声纹识别
|
|||
|
|
```shell
|
|||
|
|
python infer_recognition.py \
|
|||
|
|
--model_path models/ecapa_tdnn.pth \
|
|||
|
|
--audio_path test_audio/unknown.wav
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 2.4 声纹对比验证
|
|||
|
|
```shell
|
|||
|
|
python infer_contrast.py \
|
|||
|
|
--audio1 audio_db/user1.wav \
|
|||
|
|
--audio2 test_audio/sample.wav \
|
|||
|
|
--threshold 0.7
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## 3. 降噪预处理
|
|||
|
|
```python
|
|||
|
|
from Reduction_Noise import NoiseReducer
|
|||
|
|
reducer = NoiseReducer("Reduction_Noise/pytorch_model.bin")
|
|||
|
|
clean_audio = reducer.process("noisy_audio.wav")
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## 4. 模型评估
|
|||
|
|
```shell
|
|||
|
|
python eval.py \
|
|||
|
|
--model_path models/ecapa_tdnn.pth \
|
|||
|
|
--test_csv eval_samples.csv \
|
|||
|
|
--batch_size 32
|
|||
|
|
```
|