个人工作知识笔记
主页
Linux运维
Thinkphp相关
功能开发代码
常用工具
低频方案
常用软件
日记流水
账号体系
思考规划
添加内容
记日记
本网站
>
常用功能代码
TP安装及常用命令
TP控制器相关使用
linux常用命令
SQL常用代码
Ajax传输样例
TP6操作手册
TP3.2操作手册
coscmd配置
Ngix配置
WX错误代码
SLL配置
curl命令详解
功能
跳转至前台
数据备份
退出登录
标题
类别
Linux运维
Thinkphp相关
功能开发代码
常用工具
低频方案
是否隐藏
内容详情
<p>数字人部署需要动作一共有五个,分别是</p><p>1、原始视频分解成 音频与影像,用php+ffmpeg可完成, CPU完成</p><p>2、音频解析文案,主要用于抄别人的文案</p><p>3、文案与音色进行合成,使用cosyvoice进行完成 GPU+ubuntu +conda</p><p>4、影像完成抠像,使用https://github.com/PeterL1n/RobustVideoMatting 完成</p><p>5、抠像后的形象、与背景与声音进行合成,使用https://github.com/Holasyb918/HeyGem-Linux-Python-Hack</p><p>miniforge3 是 conda的开源版,用法相同,自行百度</p><p><br></p><p><br></p><p><b>一、cosy安装使用 ubuntu+ conda 环境</b></p><p>git clone --recursive https://github.com/FunAudioLLM/CosyVoice.git</p><p>cd CosyVoice</p><p>git submodule update --init --recursive</p><p><br></p><p>conda create -n cosyvoice -y python=3.10</p><p>conda activate cosyvoice</p><p>非腾讯云 pip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple/ --trusted-host=mirrors.aliyun.com</p><p>腾讯云可以直接 pip install -r requirements.txt 因为腾讯云里面很快</p><p>sudo apt-get install sox libsox-dev</p><p>mkdir -p pretrained_models</p><p>git clone https://www.modelscope.cn/iic/CosyVoice2-0.5B.git pretrained_models/CosyVoice2-0.5B</p><p>git clone https://www.modelscope.cn/iic/CosyVoice-ttsfrd.git pretrained_models/CosyVoice-ttsfrd</p><p>上面下载的模型是2个,但是原始是五个,因为用的cosyvoice2,所以另外三个可能用不到,如果后期有故障或者用到再进行导入</p><p>cd pretrained_models/CosyVoice-ttsfrd/</p><p>unzip resource.zip -d .</p><p>pip install ttsfrd_dependency-0.1-py3-none-any.whl</p><p>pip install ttsfrd-0.4.2-cp310-cp310-linux_x86_64.whl</p><p>以上基本上就完了,写测试文件,测试是否能够配音,然后根据测试文件修改逻辑</p><p><br></p><p>//代码开始</p><p>import sys</p><p>sys.path.append('third_party/Matcha-TTS')</p><p>from cosyvoice.cli.cosyvoice import CosyVoice, CosyVoice2</p><p>from cosyvoice.utils.file_utils import load_wav</p><p>import torchaudio</p><p>cosyvoice = CosyVoice2('pretrained_models/CosyVoice2-0.5B', load_jit=False, load_trt=False, load_vllm=False, fp16=False)</p><p><br></p><p># NOTE if you want to reproduce the results on https://funaudiollm.github.io/cosyvoice2, please add text_frontend=False during inference</p><p># zero_shot usage</p><p>prompt_speech_16k = load_wav('./asset/zero_shot_prompt.wav', 16000)</p><p>for i, j in enumerate(cosyvoice.inference_zero_shot('收到好友从远方寄来的生日礼物,那份意外的惊喜与深深的祝福让我心中充满了甜蜜的快乐,笑容如花儿般绽放。', '希望你以后能够做的比我还好呦。', prompt_speech_16k, stream=False)):</p><p> torchaudio.save('1zero_shot_{}.wav'.format(i), j['tts_speech'], cosyvoice.sample_rate)</p><p><br></p><p># save zero_shot spk for future usage</p><p>assert cosyvoice.add_zero_shot_spk('希望你以后能够做的比我还好呦。', prompt_speech_16k, 'my_zero_shot_spk') is True</p><p>for i, j in enumerate(cosyvoice.inference_zero_shot('收到好友从远方寄来的生日礼物,那份意外的惊喜与深深的祝福让我心中充满了甜蜜的快乐,笑容如花儿般绽放。', '', '', zero_shot_spk_id='my_zero_shot_spk', stream=False)):</p><p> torchaudio.save('2zero_shot_{}.wav'.format(i), j['tts_speech'], cosyvoice.sample_rate)</p><p>cosyvoice.save_spkinfo()</p><p><br></p><p># fine grained control, for supported control, check cosyvoice/tokenizer/tokenizer.py#L248</p><p>for i, j in enumerate(cosyvoice.inference_cross_lingual('在他讲述那个荒诞故事的过程中,他突然[laughter]停下来,因为他自己也被逗笑了[laughter]。', prompt_speech_16k, stream=False)):</p><p> torchaudio.save('3fine_grained_control_{}.wav'.format(i), j['tts_speech'], cosyvoice.sample_rate)</p><p><br></p><p># instruct usage</p><p>for i, j in enumerate(cosyvoice.inference_instruct2('收到好友从远方寄来的生日礼物,那份意外的惊喜与深深的祝福让我心中充满了甜蜜的快乐,笑容如花儿般绽放。', '用四川话说这句话', prompt_speech_16k, stream=False)):</p><p> torchaudio.save('4instruct_{}.wav'.format(i), j['tts_speech'], cosyvoice.sample_rate)</p><p><br></p><p># bistream usage, you can use generator as input, this is useful when using text llm model as input</p><p># NOTE you should still have some basic sentence split logic because llm can not handle arbitrary sentence length</p><p>def text_generator():</p><p> yield '收到好友从远方寄来的生日礼物,'</p><p> yield '那份意外的惊喜与深深的祝福'</p><p> yield '让我心中充满了甜蜜的快乐,'</p><p> yield '笑容如花儿般绽放。'</p><p>for i, j in enumerate(cosyvoice.inference_zero_shot(text_generator(), '希望你以后能够做的比我还好呦。', prompt_speech_16k, stream=False)):</p><p> torchaudio.save('zero_shot_{}.wav'.format(i), j['tts_speech'], cosyvoice.sample_rate)</p><p>//代码结束,执行python test.py</p><p><br></p><p><br></p><p><b>二 安装抠像</b></p><p>git clone https://github.com/PeterL1n/RobustVideoMatting.git</p><p>改目录为kouxiang</p><p>cd kouxiang </p><p>conda create -n kouxiang -y python=3.9</p><p>conda activate kouxiang</p><p>pip install -r requirements_inference.txt</p><p>这一步执行了会出错,需要改源,然后执行</p><p>安装av==8.0.3死活搞不进去,执行命令 conda install -c conda-forge av=8.0.3 安装成功</p><p>这一步在国内机器很花时间,按下面设置设置源后重新操作,差不多等20分钟,若有问题请查tare</p><p>Collecting package metadata (repodata.json): done </p><p>Solving environment: 出现这个基本就快了</p><p>conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/</p><p>conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/</p><p>conda config --set show_channel_urls yes</p><p>如果是在CPU环境中 会导致以上源禁止访问,切换到gpu却可以,不知道为什么</p><p>安装完 av=8.0.3 后回到主干道上 ,继续执行 <span>pip install -r requirements_inference.txt</span></p><p>安装完所有后下载 rvm_mobilenetv3.pth 并上传到根目录并且找一个视频改名为input.mp4上传到根目录,然后写test.py程序,内容如下</p><p>import torch</p><p>from model import MattingNetwork</p><p><br></p><p>model = MattingNetwork('mobilenetv3').eval().cuda() # or "resnet50"</p><p>model.load_state_dict(torch.load('rvm_mobilenetv3.pth'))</p><p><br></p><p>from inference import convert_video</p><p><br></p><p>convert_video(</p><p> model, # The model, can be on any device (cpu or cuda).</p><p> input_source='input.mp4', # A video file or an image sequence directory.</p><p> output_type='video', # Choose "video" or "png_sequence"</p><p> output_composition='com.mp4', # File path if video; directory path if png sequence.</p><p> output_alpha="pha.mp4", # [Optional] Output the raw alpha prediction.</p><p> output_foreground="fgr.mp4", # [Optional] Output the raw foreground prediction.</p><p> output_video_mbps=4, # Output video mbps. Not needed for png sequence.</p><p> downsample_ratio=None, # A hyperparameter to adjust or use None for auto.</p><p> seq_chunk=12, # Process n frames at once for better parallelism.</p><p>)</p><p><br></p><p>命名文件为 test.py 然后根目录执行 python ./test.py 。input.mp4和 rvm_mobilenetv3.pth必须在根目录。其他用法参照gitthub</p><p><br></p><p><br></p><p><b>三、合成视频</b></p><p>下载 源码 git clone https://github.com/Holasyb918/HeyGem-Linux-Python-Hack</p><p>创建环境 </p><p>conda create -n hecheng -y python=3.8</p><p>conda activate hecheng</p><p>修改onnxruntime-gpu==1.9 为onnxruntime-gpu==1.11.1</p><p>运行 pip install -r requirements.txt</p><p>出错后再运行 pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0 --extra-index-url https://download.pytorch.org/whl/cu113</p><p>bash download.sh</p><p>再运行 pip install -r requirements.txt 两遍 把问题暴露出来</p><p>这一步如果运行示例可能出来缺少cv2缺失 属于系统层面,需要运行 </p><p>sudo apt update</p><p>sudo apt install -y libgl1-mesa-glx</p><p>sudo apt install -y libglib2.0-0 libsm6 libxext6 libxrender-dev libgomp1</p><p>pip install -U flask 安装 flask</p><p>apt install -y libsndfile1</p><p>pip install einops</p><p>apt install -y ffmpeg</p><p>上面成功了就成功了 ,然后运行测试样例 <span>python run.py 和 </span><span>python run.py --audio_path example/audio.wav --video_path example/video.mp4 和 </span><span>python app.py 三个不同形态的测试程序,都类似的,自己研究代码</span></p>
立即提交