WINISME - 个人工作知识笔记

标题

类别

是否隐藏

内容详情

数字人部署需要动作一共有五个，分别是1、原始视频分解成 音频与影像，用php+ffmpeg可完成， CPU完成2、音频解析文案，主要用于抄别人的文案3、文案与音色进行合成，使用cosyvoice进行完成&nbsp; GPU+ubuntu +conda4、影像完成抠像，使用https://github.com/PeterL1n/RobustVideoMatting&nbsp; 完成5、抠像后的形象、与背景与声音进行合成，使用https://github.com/Holasyb918/HeyGem-Linux-Python-Hackminiforge3 是 conda的开源版，用法相同，自行百度 一、cosy安装使用&nbsp; &nbsp;ubuntu+&nbsp; conda&nbsp; 环境git clone --recursive https://github.com/FunAudioLLM/CosyVoice.gitcd CosyVoicegit submodule update --init --recursive conda create -n cosyvoice -y python=3.10conda activate cosyvoice非腾讯云&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; pip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple/ --trusted-host=mirrors.aliyun.com腾讯云可以直接&nbsp; &nbsp;pip install -r requirements.txt&nbsp; &nbsp;因为腾讯云里面很快sudo apt-get install sox libsox-devmkdir -p pretrained_modelsgit clone https://www.modelscope.cn/iic/CosyVoice2-0.5B.git pretrained_models/CosyVoice2-0.5Bgit clone https://www.modelscope.cn/iic/CosyVoice-ttsfrd.git pretrained_models/CosyVoice-ttsfrd上面下载的模型是2个，但是原始是五个，因为用的cosyvoice2，所以另外三个可能用不到，如果后期有故障或者用到再进行导入cd pretrained_models/CosyVoice-ttsfrd/unzip resource.zip -d .pip install ttsfrd_dependency-0.1-py3-none-any.whlpip install ttsfrd-0.4.2-cp310-cp310-linux_x86_64.whl以上基本上就完了，写测试文件，测试是否能够配音，然后根据测试文件修改逻辑 //代码开始import syssys.path.append('third_party/Matcha-TTS')from cosyvoice.cli.cosyvoice import CosyVoice, CosyVoice2from cosyvoice.utils.file_utils import load_wavimport torchaudiocosyvoice = CosyVoice2('pretrained_models/CosyVoice2-0.5B', load_jit=False, load_trt=False, load_vllm=False, fp16=False) # NOTE if you want to reproduce the results on https://funaudiollm.github.io/cosyvoice2, please add text_frontend=False during inference# zero_shot usageprompt_speech_16k = load_wav('./asset/zero_shot_prompt.wav', 16000)for i, j in enumerate(cosyvoice.inference_zero_shot('收到好友从远方寄来的生日礼物，那份意外的惊喜与深深的祝福让我心中充满了甜蜜的快乐，笑容如花儿般绽放。', '希望你以后能够做的比我还好呦。', prompt_speech_16k, stream=False)):&nbsp; &nbsp; torchaudio.save('1zero_shot_{}.wav'.format(i), j['tts_speech'], cosyvoice.sample_rate) # save zero_shot spk for future usageassert cosyvoice.add_zero_shot_spk('希望你以后能够做的比我还好呦。', prompt_speech_16k, 'my_zero_shot_spk') is Truefor i, j in enumerate(cosyvoice.inference_zero_shot('收到好友从远方寄来的生日礼物，那份意外的惊喜与深深的祝福让我心中充满了甜蜜的快乐，笑容如花儿般绽放。', '', '', zero_shot_spk_id='my_zero_shot_spk', stream=False)):&nbsp; &nbsp; torchaudio.save('2zero_shot_{}.wav'.format(i), j['tts_speech'], cosyvoice.sample_rate)cosyvoice.save_spkinfo() # fine grained control, for supported control, check cosyvoice/tokenizer/tokenizer.py#L248for i, j in enumerate(cosyvoice.inference_cross_lingual('在他讲述那个荒诞故事的过程中，他突然[laughter]停下来，因为他自己也被逗笑了[laughter]。', prompt_speech_16k, stream=False)):&nbsp; &nbsp; torchaudio.save('3fine_grained_control_{}.wav'.format(i), j['tts_speech'], cosyvoice.sample_rate) # instruct usagefor i, j in enumerate(cosyvoice.inference_instruct2('收到好友从远方寄来的生日礼物，那份意外的惊喜与深深的祝福让我心中充满了甜蜜的快乐，笑容如花儿般绽放。', '用四川话说这句话', prompt_speech_16k, stream=False)):&nbsp; &nbsp; torchaudio.save('4instruct_{}.wav'.format(i), j['tts_speech'], cosyvoice.sample_rate) # bistream usage, you can use generator as input, this is useful when using text llm model as input# NOTE you should still have some basic sentence split logic because llm can not handle arbitrary sentence lengthdef text_generator():&nbsp; &nbsp; yield '收到好友从远方寄来的生日礼物，'&nbsp; &nbsp; yield '那份意外的惊喜与深深的祝福'&nbsp; &nbsp; yield '让我心中充满了甜蜜的快乐，'&nbsp; &nbsp; yield '笑容如花儿般绽放。'for i, j in enumerate(cosyvoice.inference_zero_shot(text_generator(), '希望你以后能够做的比我还好呦。', prompt_speech_16k, stream=False)):&nbsp; &nbsp; torchaudio.save('zero_shot_{}.wav'.format(i), j['tts_speech'], cosyvoice.sample_rate)//代码结束，执行python test.py 二&nbsp; 安装抠像git clone https://github.com/PeterL1n/RobustVideoMatting.git改目录为kouxiangcd&nbsp; kouxiang&nbsp;conda create -n kouxiang -y python=3.9conda activate kouxiangpip install -r requirements_inference.txt这一步执行了会出错，需要改源，然后执行安装av==8.0.3死活搞不进去，执行命令&nbsp;conda install -c conda-forge av=8.0.3 安装成功这一步在国内机器很花时间，按下面设置设置源后重新操作，差不多等20分钟，若有问题请查tareCollecting package metadata (repodata.json): done&nbsp;Solving environment:&nbsp; 出现这个基本就快了conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/conda config --set show_channel_urls yes如果是在CPU环境中 会导致以上源禁止访问，切换到gpu却可以，不知道为什么安装完 av=8.0.3&nbsp; 后回到主干道上 ，继续执行&nbsp;pip install -r requirements_inference.txt安装完所有后下载&nbsp;rvm_mobilenetv3.pth 并上传到根目录并且找一个视频改名为input.mp4上传到根目录，然后写test.py程序，内容如下import torchfrom model import MattingNetwork model = MattingNetwork('mobilenetv3').eval().cuda()&nbsp; # or "resnet50"model.load_state_dict(torch.load('rvm_mobilenetv3.pth')) from inference import convert_video convert_video(&nbsp; &nbsp; model,&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;# The model, can be on any device (cpu or cuda).&nbsp; &nbsp; input_source='input.mp4',&nbsp; &nbsp; &nbsp; &nbsp; # A video file or an image sequence directory.&nbsp; &nbsp; output_type='video',&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;# Choose "video" or "png_sequence"&nbsp; &nbsp; output_composition='com.mp4',&nbsp; &nbsp; # File path if video; directory path if png sequence.&nbsp; &nbsp; output_alpha="pha.mp4",&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # [Optional] Output the raw alpha prediction.&nbsp; &nbsp; output_foreground="fgr.mp4",&nbsp; &nbsp; &nbsp;# [Optional] Output the raw foreground prediction.&nbsp; &nbsp; output_video_mbps=4,&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;# Output video mbps. Not needed for png sequence.&nbsp; &nbsp; downsample_ratio=None,&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;# A hyperparameter to adjust or use None for auto.&nbsp; &nbsp; seq_chunk=12,&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # Process n frames at once for better parallelism.) 命名文件为 test.py&nbsp; 然后根目录执行&nbsp; python ./test.py 。input.mp4和 rvm_mobilenetv3.pth必须在根目录。其他用法参照gitthub 三、合成视频下载&nbsp; 源码&nbsp; git&nbsp; clone&nbsp;https://github.com/Holasyb918/HeyGem-Linux-Python-Hack创建环境&nbsp; &nbsp;&nbsp;conda create -n hecheng -y python=3.8conda activate hecheng修改onnxruntime-gpu==1.9 为onnxruntime-gpu==1.11.1运行&nbsp;&nbsp;pip install -r requirements.txt出错后再运行&nbsp; pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;--extra-index-url https://download.pytorch.org/whl/cu113bash download.sh再运行&nbsp; pip install -r requirements.txt&nbsp; &nbsp;两遍 把问题暴露出来这一步如果运行示例可能出来缺少cv2缺失 属于系统层面，需要运行&nbsp;sudo apt updatesudo apt install -y libgl1-mesa-glxsudo apt install -y libglib2.0-0 libsm6 libxext6 libxrender-dev libgomp1pip install -U flask&nbsp; &nbsp; 安装 flaskapt install -y libsndfile1pip install einopsapt install -y ffmpeg上面成功了就成功了&nbsp; ，然后运行测试样例&nbsp;python run.py&nbsp; &nbsp;和&nbsp;&nbsp;python run.py --audio_path example/audio.wav --video_path example/video.mp4&nbsp; 和&nbsp;&nbsp;python app.py&nbsp; 三个不同形态的测试程序，都类似的，自己研究代码