Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

如何解决colab炼丹每次都要上传数据集预处理数据集还爆磁盘的蛋疼问题 #764

Open
HexBanana opened this issue Oct 10, 2022 · 10 comments

Comments

@HexBanana
Copy link

HexBanana commented Oct 10, 2022

许多同鞋因为家里设备不佳训练模型效果不好,不得不去世界最大乞丐炼丹聚集地colab上训练。但是对于无法扩容google drive和升级colab的同鞋来说,上传数据集真的如同地狱一般,网速又慢空间又不够,而且每次重置都要上传,预处理令人头疼。我耗时9天终于解决了这个问题,现在给各位同学分享我的解决方案。
首先要去kaggle这个网站上面注册一个账号,然后获取token
我已经把预处理了的数据集(用的aidatatang_200zh)上传在上面了,但是下载数据集需要token,token需要注册账号,具体获取token的方法请自行百度,在此不过多赘述。

然后打开colab
修改-> 笔记本设置->运行时把 None 改成 GPU
输入以下代码:

!pip install kaggle
import json
token = {"username":"你的账号","key":"你获取到的token"}
with open('/content/kaggle.json', 'w') as file:
  json.dump(token, file)
!mkdir -p ~/.kaggle
!cp /content/kaggle.json ~/.kaggle/
!chmod 600 ~/.kaggle/kaggle.json
!kaggle config set -n path -v /content

第三行请根据之前获取到的token填写
这一步是准备好kaggle命令行

然后是下载数据集并解压

!kaggle datasets download -d bjorndido/sv2ttspart1
!unzip "/content/datasets/bjorndido/sv2ttspart1/sv2ttspart1.zip" -d "/content/aidatatang_200zh"
!rm -rf /content/datasets
!kaggle datasets download -d bjorndido/sv2ttspart2
!unzip "/content/datasets/bjorndido/sv2ttspart2/sv2ttspart2.zip" -d "/content/aidatatang_200zh"
!rm -rf /content/datasets

为了怕某些童鞋用和我一样的免费版,如果从下载未处理的数据集开始磁盘要爆炸,所以我把预处理过的数据集上传到kaggle了
而且解压后会自己删掉zip,非常滴银杏
实测下载速度能达到200MB/s,网慢点也有50MB/s,非常滴快
这一步要不了10分钟就可以弄好了

!git clone https://github.com/babysor/MockingBird.git
!pip install -r /content/MockingBird/requirements.txt
git仓库,下载依赖,这一步不多说

然后改hparams

%%writefile /content/MockingBird/synthesizer/hparams.py
import ast
import pprint
import json

class HParams(object):
    def __init__(self, **kwargs): self.__dict__.update(kwargs)
    def __setitem__(self, key, value): setattr(self, key, value)
    def __getitem__(self, key): return getattr(self, key)
    def __repr__(self): return pprint.pformat(self.__dict__)

    def parse(self, string):
        # Overrides hparams from a comma-separated string of name=value pairs
        if len(string) > 0:
            overrides = [s.split("=") for s in string.split(",")]
            keys, values = zip(*overrides)
            keys = list(map(str.strip, keys))
            values = list(map(str.strip, values))
            for k in keys:
                self.__dict__[k] = ast.literal_eval(values[keys.index(k)])
        return self

    def loadJson(self, dict):
        print("\Loading the json with %s\n", dict)
        for k in dict.keys():
            if k not in ["tts_schedule", "tts_finetune_layers"]: 
                self.__dict__[k] = dict[k]
        return self

    def dumpJson(self, fp):
        print("\Saving the json with %s\n", fp)
        with fp.open("w", encoding="utf-8") as f:
            json.dump(self.__dict__, f)
        return self

hparams = HParams(
        ### Signal Processing (used in both synthesizer and vocoder)
        sample_rate = 16000,
        n_fft = 800,
        num_mels = 80,
        hop_size = 200,                             # Tacotron uses 12.5 ms frame shift (set to sample_rate * 0.0125)
        win_size = 800,                             # Tacotron uses 50 ms frame length (set to sample_rate * 0.050)
        fmin = 55,
        min_level_db = -100,
        ref_level_db = 20,
        max_abs_value = 4.,                         # Gradient explodes if too big, premature convergence if too small.
        preemphasis = 0.97,                         # Filter coefficient to use if preemphasize is True
        preemphasize = True,

        ### Tacotron Text-to-Speech (TTS)
        tts_embed_dims = 512,                       # Embedding dimension for the graphemes/phoneme inputs
        tts_encoder_dims = 256,
        tts_decoder_dims = 128,
        tts_postnet_dims = 512,
        tts_encoder_K = 5,
        tts_lstm_dims = 1024,
        tts_postnet_K = 5,
        tts_num_highways = 4,
        tts_dropout = 0.5,
        tts_cleaner_names = ["basic_cleaners"],
        tts_stop_threshold = -3.4,                  # Value below which audio generation ends.
                                                    # For example, for a range of [-4, 4], this
                                                    # will terminate the sequence at the first
                                                    # frame that has all values < -3.4

        ### Tacotron Training
        tts_schedule = [(2,  1e-3,  10_000,  32),   # Progressive training schedule
                    (2,  5e-4,  15_000,  32),   # (r, lr, step, batch_size)
                    (2,  2e-4,  20_000,  32),   # (r, lr, step, batch_size)
                    (2,  1e-4,  30_000,  32),   #
                    (2,  5e-5,  40_000,  32),   #
                    (2,  1e-5,  60_000,  32),   #
                    (2,  5e-6, 160_000,  32),   # r = reduction factor (# of mel frames
                    (2,  3e-6, 320_000,  32),   #     synthesized for each decoder iteration)
                    (2,  1e-6, 640_000,  32)],  # lr = learning rate

        tts_clip_grad_norm = 1.0,                   # clips the gradient norm to prevent explosion - set to None if not needed
        tts_eval_interval = 500,                    # Number of steps between model evaluation (sample generation)
                                                    # Set to -1 to generate after completing epoch, or 0 to disable
        tts_eval_num_samples = 1,                   # Makes this number of samples

        ## For finetune usage, if set, only selected layers will be trained, available: encoder,encoder_proj,gst,decoder,postnet,post_proj
        tts_finetune_layers = [], 

        ### Data Preprocessing
        max_mel_frames = 900,
        rescale = True,
        rescaling_max = 0.9,
        synthesis_batch_size = 16,                  # For vocoder preprocessing and inference.

        ### Mel Visualization and Griffin-Lim
        signal_normalization = True,
        power = 1.5,
        griffin_lim_iters = 60,

        ### Audio processing options
        fmax = 7600,                                # Should not exceed (sample_rate // 2)
        allow_clipping_in_normalization = True,     # Used when signal_normalization = True
        clip_mels_length = True,                    # If true, discards samples exceeding max_mel_frames
        use_lws = False,                            # "Fast spectrogram phase recovery using local weighted sums"
        symmetric_mels = True,                      # Sets mel range to [-max_abs_value, max_abs_value] if True,
                                                    #               and [0, max_abs_value] if False
        trim_silence = True,                        # Use with sample_rate of 16000 for best results

        ### SV2TTS
        speaker_embedding_size = 256,               # Dimension for the speaker embedding
        silence_min_duration_split = 0.4,           # Duration in seconds of a silence for an utterance to be split
        utterance_min_duration = 1.6,               # Duration in seconds below which utterances are discarded
        use_gst = True,                             # Whether to use global style token    
        use_ser_for_gst = True,                     # Whether to use speaker embedding referenced for global style token  
        )

我用的batch size是32,同鞋们可以根据情况自行更改

开始训练

%cd "/content/MockingBird/"
!python synthesizer_train.py train "/content/aidatatang_200zh" -m /content/drive/MyDrive/

注意,开始这个步骤前请先挂载谷歌云盘,不想挂载的就把-m后面的路径改了
我选择drive是因为下次训练又能继续上传训练的进度继续训练
然后就是欢快的白嫖时间了
氪金的同鞋可以运行!nvidia-smi查看显卡信息,白嫖版的都是tesla t4 16g显存
实测9k步的时候开始出现注意力曲线,loss值为0.45
注意!白嫖版的用户长时间不碰电脑colab会自动断开
再次打开环境会还原成最初的样子
这个时候选择drive保存的优势就体现出来了:不用担心模型重置被删掉

第一次写,写得不好请见谅
希望这篇教程可以帮助到你们

@twj515895394
Copy link

很有帮助 赞!

@twj515895394
Copy link

colab 免费版 浏览器时常断 有点烦。。。

@HexBanana
Copy link
Author

colab 免费版 浏览器时常断 有点烦。。。

训练效果是不是比用3060好?另:你用的batch size是多少?

@twj515895394
Copy link

colab 免费版 浏览器时常断 有点烦。。。

训练效果是不是比用3060好?另:你用的batch size是多少?

我现在用的40在跑 显存大就是好啊 哈哈

@HexBanana
Copy link
Author

我现在用的40在跑 显存大就是好啊 哈哈
极限一点可以用60的,我用的都是50

@babysor
Copy link
Owner

babysor commented Nov 16, 2022

加精一下

@babysor babysor pinned this issue Nov 16, 2022
@HexBanana
Copy link
Author

加精一下

wow,我只是共享一下我的解决方法,没想到还能被加精。希望我的方法能帮助更多人

@jessemoe
Copy link

colab老是爆显存,请问怎么设置PyTorch显存分配好一点

CUDA out of memory. Tried to allocate 3.51 GiB (GPU 0; 14.75 GiB total capacity; 10.81 GiB already allocated; 1.65 GiB free; 11.99 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

@HexBanana
Copy link
Author

是不是patches太大了?也有可能运气不好分配到了显存少的显卡。我有一段时间没登COLAB了,不知道有没有调整显卡分配。

@jessemoe
Copy link

是不是patches太大了?也有可能运气不好分配到了显存少的显卡。我有一段时间没登COLAB了,不知道有没有调整显卡分配。

谢谢 调整patches解决了,但是colab老是断连

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants