我们的lora提供了更方便且质量更好的效果:https://tusiart.com/models/782173940702393925
我们运用了DMD2 蒸馏技术,在 Animagine 3.0 的基础上进行了蒸馏学习。这一创新方法使我们能够在短短一秒钟内,通过四个推理步骤生成高质量的二次元图片,显著提升了图像生成的效率和质量。
Our Lora offers a more convenient and useful way, welcome to use: https://tensor.art/models/782173940702393925/TA-Animagine-DMD2_lora-v1
We applied DMD2 distillation technology based on Animagine XL 3.0. This innovative approach allows us to generate high-quality anime images in just one second through four denoising steps.
sampler: lcm
scheduler: simple
steps: 5 (step=4 will be supported in the future)
cfg: 1.0
----------------------------------------------------------------------------------------------------------
基于开源项目:https://github.com/tianweiy/DMD2,其在训练过程中不要求一张噪声一定会生成一个图片,而是一堆噪声图片来生成一堆图片。通过real score function 和 fake score function 的差值来更新生成器,使得生成的图片既具有真实性,又具有多样性。
总共消耗L20x3 3days 或 RTX4080S-32gx7。
我们在原有基础上主要做了如下改进:
修改优化器adam为adam8bit, 大大减少了原实验的显存占用使得实验能够继续。
原有的图片输入的分辨率只能是1024x1024, 修改相关代码把裁剪信息和时间进行编码传给模型,使之能够传入更多样的分辨率的图片来进行训练。
采用新的美学过滤器,通过CLIP-IQA(https://github.com/IceClear/CLIP-IQA)打分来过滤质量低下的图片项目通过利用无监督 CLIP 的预训练模型,将图像和“高质量”或“低质量”等描述性文本进行对比,从而判断图像的质量。
改造dataloader,之前数据载入用的是LMDB,是一种内存非关系型数据库,前期需要把图片通过vae转为latent,过程繁琐,占用空间较大。改为webdataset库之后,把vae的过程延后,惰性载入,减少了数据预处理步骤,理论上支持更大规模数据,且加快了训练速度。训练速度从之前的12.5it/s提速到8.5it/s。同时在数据载入的时候对原有的数据再进行过滤,根据负面标签集合["bad", "nfsw", "text", "negative"...]将某些不符合训练要求筛去。
prompt数据集数据量提升至400k,加入长度大于2000word的prompt. 图文对数据集提升到100k。
Based on the open-source project: https://github.com/tianweiy/DMD2, during the training process, it does not require that a single noise will necessarily generate one image, but rather a batch of noise images to produce a batch of images. The generator is updated based on the difference between the real score function and the fake score function, ensuring that the generated images are both realistic and diverse.
Total consumption: L20x3 3 days or RTX4080S-32gx7.
The following improvements have been made:
The optimizer adam has been modified to adam8bit, significantly reducing memory usage and allowing experiments to continue.
Originally, the input image resolution was limited to 1024x1024; relevant code has been adjusted to encode cropping information and time for transmission to the model, enabling training with images of various resolutions.
A new aesthetic filter has been adopted, using CLIP-IQA scores (https://github.com/IceClear/CLIP-IQA) to filter out low-quality images. The CLIP-IQA project assesses image quality by comparing images with descriptive texts like "high quality" or "low quality" using an unsupervised pre-trained CLIP model.
The dataloader has been redesigned; previously, LMDB, a non-relational in-memory database, was used for data loading—a cumbersome process that required converting images into latent space via VAE and occupied significant storage space. Switching to the webdataset library defers the VAE process and employs lazy loading, simplifying data preprocessing steps theoretically supporting larger datasets and accelerating training speed from 12.5it/s to 8.5it/s. Additionally, during data loading, existing data is further filtered based on negative label sets ["bad", "nfsw", "text", "negative"...], removing unsuitable items for training requirements.
The prompt dataset size has increased to 400k with prompts exceeding 2000 words in length added; paired text-image datasets now total up to 100k entries