开源 Fish Speech v1.5.0 优化版 文字转语音 可以克隆声音 一键整合包 附效果演示

开源 19 3812

Fish Speech是一个由Fish Audio团队开发的全新文本转语音(TTS)解决方案,旨在通过先进的机器学习和深度学习技术,将文本转换成高质量、逼真的语音输出。该项目基于CC-BY-NC-SA-4.0许可证发布,意味着任何人都可以在遵守许可证规定的前提下自由使用、改进和分享代码和模型。

技术亮点

Fish Speech项目采用了一系列前沿的AI技术,包括但不限于Transformer架构、VQ-GAN、Llama和VITS等。Transformer架构的引入,使模型能够更好地理解和生成长序列的语音数据,而其自注意力机制则大大提升了语音生成的精度和效率。此外,Fish Speech还结合了多任务学习和先进的神经网络声码器技术,确保了模型能够处理复杂的语音合成任务,并生成自然流畅的语音。

功能特点

  1. 多语言支持:Fish Speech能够熟练掌握中文、日语和英语等多种语言,为用户提供了强大的多语言语音合成能力。

  2. 情感表达:该模型能够生成带有不同情感色彩的语音,如快乐、悲伤、愤怒等,增强了语音输出的表现力。

  3. 声音克隆:通过少量样本学习特定说话者的声音特征,Fish Speech能够实现个性化语音合成,满足用户的多样化需求。

  4. 实时合成:支持低延迟的实时语音生成,适用于需要即时反馈的应用场景,如在线聊天机器人和自动化客户服务系统。

  5. 高效轻量:尽管功能强大,但Fish Speech的设计却非常高效轻量,对硬件要求较低,只需4GB的GPU显存即可运行,降低了用户的使用门槛。

应用场景

Fish Speech的多样性和灵活性使其适用于多种场景,包括但不限于智能助手和聊天机器人、无障碍技术、教育领域、内容创作、游戏开发以及客户服务等。通过这些应用场景,Fish Speech不仅能够提升用户体验,还能够推动语音技术在更多领域的广泛应用。

开源与社区

Fish Speech项目完全开源,用户不仅可以免费使用代码和模型,还可以根据自己的需求进行修改和扩展。同时,Fish Audio团队和开源社区也在不断努力,持续改进和优化项目,为用户带来更多惊喜和便利。

综上所述,Fish Speech是一个功能强大、高效轻量且易于使用的文本转语音开源项目,具有广阔的应用前景和发展潜力。如果你对语音合成技术感兴趣,不妨来GitHub上关注一下Fish Speech项目吧!


昨天晚上找了点时间把这个项目做了整合包。实现了解压即可玩了。

今天这个项目 我用我的4070ti spuer 玩了一下午。生成29秒的语音文件大概需要130秒。显存占用也不高,我估计一般有6G以上的英伟达就可以跑了。我跑的时候,显卡连风扇的转速都没有什么变化。。。

下面附上音频演示。不知道你们是否可以听出来 是用谁的声音参考的

最后附上视频演示



v20241220 优化记录

1 删除了没用的一些文件,节省了大概4G的体积

2 添加了更多的参考音频实例

3 添加了文字处理工具,能更好生成优质的语音文件

4 其他一些小优化,显卡占用明显降低。处理速度更快(也有可能是心理作用)

下载有疑问看下这里

相关推荐:

我要评论:

◎欢迎参与讨论,请自觉遵守国家法律法规。

已有 19 条评论

  1. 稻草人 稻草人

    backend='inductor' raised: RuntimeError: Failed to find C compiler. Please specify via CC environment variable. Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information You can suppress this exception and fall back to eager by setting: import torch._dynamo torch._dynamo.config.suppress_errors = True

  2. 调皮爱飞鸟 调皮爱飞鸟

    求助!

    All checkpoints downloaded
    Traceback (most recent call last):
    File "E:\projectE\vits\fish-speech1.5-jian27\app.py", line 24, in
    import gradio as gr
    File "E:\projectE\vits\fish-speech1.5-jian27\jian27\lib\site-packages\gradio\__init__.py", line 3, in
    import gradio._simple_templates
    File "E:\projectE\vits\fish-speech1.5-jian27\jian27\lib\site-packages\gradio\_simple_templates\__init__.py", line 1, in
    from .simpledropdown import SimpleDropdown
    File "E:\projectE\vits\fish-speech1.5-jian27\jian27\lib\site-packages\gradio\_simple_templates\simpledropdown.py", line 7, in
    from gradio.components.base import Component, FormComponent
    File "E:\projectE\vits\fish-speech1.5-jian27\jian27\lib\site-packages\gradio\components\__init__.py", line 1, in
    from gradio.components.annotated_image import AnnotatedImage
    File "E:\projectE\vits\fish-speech1.5-jian27\jian27\lib\site-packages\gradio\components\annotated_image.py", line 14, in
    from gradio import processing_utils, utils
    File "E:\projectE\vits\fish-speech1.5-jian27\jian27\lib\site-packages\gradio\processing_utils.py", line 120, in
    sync_client = httpx.Client(transport=sync_transport)
    File "E:\projectE\vits\fish-speech1.5-jian27\jian27\lib\site-packages\httpx\_client.py", line 690, in __init__
    self._transport = self._init_transport(
    File "E:\projectE\vits\fish-speech1.5-jian27\jian27\lib\site-packages\httpx\_client.py", line 733, in _init_transport
    return HTTPTransport(
    File "E:\projectE\vits\fish-speech1.5-jian27\jian27\lib\site-packages\httpx\_transports\default.py", line 153, in __init__
    ssl_context = create_ssl_context(verify=verify, cert=cert, trust_env=trust_env)
    File "E:\projectE\vits\fish-speech1.5-jian27\jian27\lib\site-packages\httpx\_config.py", line 35, in create_ssl_context
    ctx = ssl.create_default_context(cafile=os.environ["SSL_CERT_FILE"])
    File "E:\projectE\vits\fish-speech1.5-jian27\jian27\lib\ssl.py", line 766, in create_default_context
    context.load_verify_locations(cafile, capath, cadata)
    FileNotFoundError: [Errno 2] No such file or directory

    1. 调皮爱飞鸟 调皮爱飞鸟

      奇怪了,原项目也这错误。之前运行1.4啥的也没事啊

    2. 剑心 剑心

      不要搞那么多花里胡哨的符号在路径上

      1. 调皮爱飞鸟 调皮爱飞鸟

        说少 SSL_CERT_FILE 这个?

        1. 剑心 剑心

          什么都不少,别搞那么多符号在路径,关掉科学上网

          1. 调皮爱飞鸟 调皮爱飞鸟

            谢谢,但没开那个科学,正常的啊,https还需要证书啊

  3. 百战十年归 百战十年归

    速度很快,但是声音克隆效果实在是差强人意啊。

    1. 剑心 剑心

      那说明你的声音素材不行

  4. 天地學 天地學

    I don't have CUDA. However, the python.exe process only uses 1 core (my CPU has 4 cores). How can I make the python.exe process use all cores?

    1. 剑心 剑心

      Sorry, a Nvidia graphics card is required to run it

  5. singingdalong singingdalong

    I don't have CUDA. However, the python.exe process only uses 1 core (my CPU has 4 cores). How can I make the python.exe process use all cores?

    1. 剑心 剑心

      Sorry, a Nvidia graphics card is required to run it

  6. 酸奶腼腆 酸奶腼腆

    --------更多AI工具,开源免费软件 请前往 https://www.jian27.com--------
    请关注我的微信公众号 剑二十七
    "HF_ENDPOINT: !HF_ENDPOINT!"
    "NO_PROXY: !no_proxy!"

    [main 2024-10-17T08:35:45.235Z] update#setState idle
    [15868:1017/163546.985:ERROR:ssl_client_socket_impl.cc(970)] handshake failed; returned -1, SSL error code 1, net_error -101
    [15868:1017/163547.084:ERROR:ssl_client_socket_impl.cc(970)] handshake failed; returned -1, SSL error code 1, net_error -101
    [15868:1017/163549.130:ERROR:ssl_client_socket_impl.cc(970)] handshake failed; returned -1, SSL error code 1, net_error -101
    [15868:1017/163549.227:ERROR:ssl_client_socket_impl.cc(970)] handshake failed; returned -1, SSL error code 1, net_error -101
    [main 2024-10-17T08:36:15.245Z] update#setState checking for updates
    [main 2024-10-17T08:36:15.255Z] update#setState downloading
    [main 2024-10-17T08:36:31.544Z] Extension host with pid 2464 exited with code: 0, signal: unknown.
    2024-10-17 16:36:47.430 | INFO | __main__::523 - Loading Llama model...
    F:\fish-speech\fish_speech\models\text2semantic\llama.py:370: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
    weights = torch.load(
    2024-10-17 16:36:55.762 | INFO | tools.llama.generate:load_model:347 - Restored model from checkpoint
    2024-10-17 16:36:55.762 | INFO | tools.llama.generate:load_model:351 - Using DualARTransformer
    2024-10-17 16:36:55.762 | INFO | __main__::530 - Llama model loaded, loading VQ-GAN model...
    F:\fish-speech\jian27\lib\site-packages\vector_quantize_pytorch\vector_quantize_pytorch.py:457: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
    @autocast(enabled = False)
    F:\fish-speech\jian27\lib\site-packages\vector_quantize_pytorch\vector_quantize_pytorch.py:642: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
    @autocast(enabled = False)
    F:\fish-speech\jian27\lib\site-packages\vector_quantize_pytorch\finite_scalar_quantization.py:162: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
    @autocast(enabled = False)
    F:\fish-speech\tools\vqgan\inference.py:26: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
    state_dict = torch.load(
    2024-10-17 16:36:57.598 | INFO | tools.vqgan.inference:load_model:44 - Loaded model:
    2024-10-17 16:36:57.598 | INFO | __main__::538 - Decoder model loaded, warming up...
    2024-10-17 16:36:57.598 | INFO | tools.api:encode_reference:117 - No reference audio provided
    2024-10-17 16:36:57.644 | INFO | tools.llama.generate:generate_long:432 - Encoded text: Hello, world!
    2024-10-17 16:36:57.644 | INFO | tools.llama.generate:generate_long:450 - Generating sentence 1/1 of sample 1/1
    F:\fish-speech\fish_speech\models\text2semantic\llama.py:655: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:555.)
    y = F.scaled_dot_product_attention(
    0%| | 0/4080 [00:00

    1. 剑心 剑心

      把你的科学上网工具关掉

  7. 专一与摩托 专一与摩托

    感謝提供分享.辛苦了!

  8. xwgod xwgod

    感谢分享🙏应该是不错的

  9. 认真踢烤鸡 认真踢烤鸡

    A卡能用不?

    1. 剑心 剑心

      不能