Update: New Version Released!
I have retrained Anime Vision V1 with a completely new setup. This new version is far better than the beta version, incorporating the following enhancements:
Dataset: 10K images
Target Epoch: 300
Total Training Steps Completed: 150K
NSFW: Still not possible
Trigger Words: No longer needed
Important Note: Avoid including NSFW-related/mature words in your prompts. Doing so may result in unreliable image outcomes. Also, avoid using too long prompts as smaller prompts work better on SD3.
Configuration I used for training:
GPU: A6000x2
Batch Size: 8
Optimizer: AdamW
Scheduler: Cosine with restarts
Captioning: WD14
I hope you enjoy the improvements in this new version. As always, feel free to share your feedback in the comments or join my Discord server for more information.
SD3 Anime Vision Beta Version is Finally Here!
Introducing Anime Vision, my first SD3 anime model. My friend and I started working on fine-tuning SD3, and this is the first drift of the SD3 anime model. Is this model really good for anime? Not entirely, as mentioned in the title, it's a beta version or an experimental model. We are currently testing to ensure everything is going well. I trained this model for 20k steps. After some testing, I noticed that sometimes the results were not great, so we decided to pause the SD3 training for now. We will resume training when the new SD3 medium update is released.
Does this mean the model isn't good?
Not at all. If you use this model along with my previously published LoRA model, "Anime Vision | Detail Enhancer," you will get pretty good results. Additionally, this model doesn't require trigger words if you use the WD14 captioning style prompting. Otherwise, you can use the trigger words as mentioned in my LoRA model description.
Here is a quick LoRA model preview and some guidelines to use this LoRA to its full potential:
If you are trying to create any specific subject or object, use the trigger word 'anime style' in your prompt.
If you're targeting a character, you can ignore the keyword and go with something like this:
For a male character: 'anime boy'
For a female character: 'anime girl'
Important Note: Avoid including NSFW-related/mature words in your prompts. Doing so may result in unreliable image outcomes.
For better results, try using ComfyUI
Here is a workflow that is low-cost and efficient. Right now, upscaling is not possible due to some specific reasons. I have already reported the issue to TA team, and hopefully, they will fix it soon.
Here is a quick guide and parameters:
Clip Encoder: No need
VAE: No need
Sampler: dpmpp_2m
Scheduler: sgm_uniform
Sampling Steps: 25+
CFG Scale: 3+
Here are some aspect ratios for demo purposes:
1:1 [1024x1024 square]
8:5 [1216x768 landscape]
4:3 [1152x896 landscape]
3:2 [1216x832 landscape]
7:5 [1176x840 landscape]
16:9 [1344x768 landscape]
21:9 [1536x640 landscape]
19:9 [1472x704 landscape]
3:4 [896x1152 portrait]
2:3 [832x1216 portrait]
5:7 [840x1176 portrait]
9:16 [768x1344 portrait]
9:21 [640x1536 portrait]
5:8 [768x1216 portrait]
9:19 [704x1472 portrait]
When will the next version be available?
Stability AI has announced that they will release a better version of SD3 medium soon. We are waiting for this update, and I will resume training as soon as it is released.
A big thanks to Shopon SKP for working so hard with me.
Note: This is not a merged or modified model. It is the original Anime Vision fine-tuned model. If you notice similarities with the base model, please don't spread misinformation. Some users have been spreading incorrect information in the model's comment section. If you have any questions or want to know more, join my Discord server or share your thoughts in the comment section. Thank you for your time.