首页 1 2 3 4 5 6 7

whisperX 语音识别本地部署

WhisperX 是一个优秀的开源Python语音识别库。
下面记录Windows10系统下部署Whisper
1、在操作系统中安装 Python环境
2、安装 CUDA环境
3、安装Annaconda或Minconda环境
4、下载安装ffmpeg
下载release-builds包，如下图所示
在这里插入图片描述
将下载的包解压到你想要的路径，然后配置系统环境：我的电脑->高级系统设置->环境变量->Path

设置完成后打开cmd窗口输入

ffmpeg

在这里插入图片描述
5、conda环境安装指定位置的虚拟环境

6、激活虚拟环境

conda activate D:\Projects\LiimouDemo\WhisperX\Code\whisperX\whisperXVenv

7、安装WhisperX库

pip install git+https://github.com/m-bain/whisperx.git

8、更新WhisperX库

pip install git+https://github.com/m-bain/whisperx.git --upgrade

9、在Python中使用

import whisperx
import time
import zhconv
device = "cuda"
audio_file = "data/test.mp3"
batch_size = 16 # reduce if low on GPU mem
compute_type = "float16" # change to "int8" if low on GPU mem (may reduce accuracy)
# compute_type = "int8" # change to "int8" if low on GPU mem (may reduce accuracy)
print('开始加载模型')
start = time.time()
# 1. Transcribe with original whisper (batched)
model = whisperx.load_model("large-v2", device, compute_type=compute_type)
# model = whisperx.load_model("small", device, compute_type=compute_type)
end = time.time()
print('加载使用的时间：',end-start,'s')
start = time.time()
audio = whisperx.load_audio(audio_file)
result = model.transcribe(audio, batch_size=batch_size)

print(result["segments"][0]["text"]) # before alignment
end = time.time()
print('识别使用的时间：',end-start,'s')

封装上述代码，初始化时调用一次loadModel()方法，之后使用就直接调用asr(path)方法

import whisperx
import zhconv
from whisperx.asr import FasterWhisperPipeline
import time

class WhisperXTool:
    device = "cuda"
    audio_file = "data/test.mp3"
    batch_size = 16  # reduce if low on GPU mem
    compute_type = "float16"  # change to "int8" if low on GPU mem (may reduce accuracy)
    # compute_type = "int8" # change to "int8" if low on GPU mem (may reduce accuracy)
    fast_model: FasterWhisperPipeline

    def loadModel(self):
        # 1. Transcribe with original whisper (batched)
        self.fast_model = whisperx.load_model("large-v2", self.device, compute_type=self.compute_type)
        print("模型加载完成")

    def asr(self, filePath: str):
        start = time.time()
        audio = whisperx.load_audio(filePath)
        result = self.fast_model.transcribe(audio, batch_size=self.batch_size)
        s = result["segments"][0]["text"]
        s1 = zhconv.convert(s, 'zh-cn')
        print(s1)
        end = time.time()
        print('识别使用的时间：', end - start, 's')
        return s1

zhconv是中文简体繁体转换的库，安装命令如下

pip install zhconv

freeswitch源码unimrcp中asr的实现

通过学习回顾freeswitch源码，来完善我们对音频处理，对话的理解我们之前有说过，我们通过fs的media bug形式进行录音的监听，对话监听，做实时语音处理。当然除了使用mediabug还可以使用unimcrp形式进行对接。这句话不够准确，理论上unimrc

数据结构——什么是数据结构？

Vue+Elementui实现删除对话框（MessageBox弹框）

需求：点击删除按钮弹出对话框，点击确定发送请求，点击取消alert一个info弹框提示

一、Rust入门基础（推荐 https://course.rs/）

一、Rust入门基础 1、为

「造轮子」一个文件上传靶场知识总结记录

https://www.sqlsec.com/2020/10/upload.html#toc-heading-1 或者 https://xz.aliyun.com/t/8435 直接使用别人的靶场总感觉不太

A2-B1Grammaire progressive du francais(5)

L41

字符指针定义

逆向时发现汇编代码取得一个标号的地址，但是这个地址对应的值是4个字节的0，后续就没再用过这个值，一直没有想明白作者为什么要这样写(既然是NULL，而且只用一次，直接写NULL不就完了，非要再写一个变量，值是NULL) C的源代码应该是这样：char *

android 支付sdk流程,支付SDK

1. 集成准备 1.1 获得DBKEY和APPKEY 访问地址：https://pay.dangbei.com/open，注册并添加应用后，系统会自动生成DBKEY和APPKEY，也可在配置页面查看 1.2 下载SDK 1.

keras 入门 --手写数字识别

深度学习keras库中的helloworld： # #搭建一个简单的全连接神经网络，用于手写数字识别 # from keras.layers import Input,D

Python依赖包迁移

方法一转载链接：https://blog.csdn.net/Dust_Evc/article/details/120467415 1、原环境生成文件 requirements.txt