Beginner Basics By Beginner - posted Halloween2024
Beginners' Basics By Beginner (PART 1)PreambleFirst off I am still a beginner so please be gentle ;-P However, since I'm a beginner, I still have the beginner frame of reference and should be able to offer some insights for beginners, compared to the experts who may have gotten so used to things that some things of interest to be beginners may be taken for granted and not covered by the experts. Hence this article to give my 2 cents worth.This article is from Tensor.art frame of reference. Different AI sites have different interfaces and may have different terminology, however, concepts do generally still hold true. In other words, if you can figure out tensor.art, you should be able to use what you learn on other sites.This it NOT a DO THIS and DO THAT type of article. I aim to cover the general background so that one knows what one is doing .... hopefully :-P, so that the impact is across all your image generation and be of longer term help to you.Even for a beginner there is alot to cover so I am breaking the whole Article into parts.What I will be covering in this part 11) Model, Base Model, Check point, Lora, Embedding, ControlNet - what is what and what's the deal?Plan for future Parts:Part 2:2) VAE,3) Aspect Ratio4) Sampler, Scheduler Sampling Steps, Guidance Scale, Seed Clip EncoderPart 3:5) Prompt vs Negative Prompt6) How to promptPART 1I am covering text to image generation perspective - throw in words out pops the image. 1) Model, Base Model, Check Point, Lora, Embedding, ControlNetA model is the fundamental resource called upon with your prompts to generate the image. A model can be a base model or a check point.A base model you can think of it as the original version vs a check point being a modified form. Base models generally tend to be created by a company or organization. Checkpoints are mostly by individuals.Another way to think of the base model and checkpoint relationship, is that base models are a "language" and checkpoints are the same "language" but a dialect or local flavor of the language. This is not factually correct but is a useful way of thinking which I will tap on later.Functionally speaking, the base model or check point will be the same to you when you prompt, you only need to choose 1. The issue is choosing which one.Different models tend to have their own set of strengths and limits and most of the time later versions are generally better - if it is a base model. For checkpoints, because they are by individuals, typically the "versioning" is more ad hoc and subjective. This is because the individual creating the checkpoint may call it a new version because to their thinking something was significantly different. Whether you agree with that creator's judgment, is another matter. For example, it could be a general improvement, a change in look and feel, a change in size supported, a specialization in a certain type of images etc. Every creator will have their own thinking and there is little to coordinate random individuals. Hence, the latest version of a checkpoint is not necessarily the best version for you, or different versions may be suitable to achieve different images.Therefore, the only way to choose a suitable model is frankly to go through their samples. If you see generated images you like, then take note of what they use, find the model and link through to the model's information page and go through the posted works, if any.Choosing the model, however, is only picking up the first piece of the puzzle. Next will be the LORA.Easiest way to think of the LORA is that they are add-on modifications to the model. They are usually designed to produced some specific or range of effects on the image. Examples include identifiable face, colour range, feel to the image etc. LORA stands for LOw Rank Adaptation.LORAs are often designed to work with specific base models. Sometimes they are designed to work with specific checkpoints. Remember, the analogy of models being a language? So if a base model and checkpoints belong to thesame language group a LORA that works for one in that language group will generally work for another. How well or whether there will be unexpected results is another matter. As mentioned this same language analogy is not factually correct, its just to make it easier to understand Base model and related checkpoints are a group and that a specific LORA meant to work for a specific group. LORAs differ greatly in quality and the image that it is able to generate.LORAs or Models are NOT created but "trained", the end result for create or "train" is the same. Using "train" acknowledges that the grunt work in producing a model or Lora is mainly via crunching by a computer. The creator's role is more of preparing the inputs from seeking and selection to labeling. The images used as input vary greatly since different creators search and include different source images. This impacts not just on image quality but also on what can be generated later, and the look and feel. For example, if every single image used was dark, then very likely you would end up generating only dark images. Another example is if all the images are very mundane, then you are unlikely to be able to use it to generate images with special effects such as magical auras. Checking the samples and posts as advised for models also apply to LORAs.The creators also need to "label & tag" the images - sort of like describing them so that when a prompt can pull from the LORA to help generate the image. Therefore, since everyone thinks differently, what is considered as worth "labeling" and the word chosen to label something with will differ - why sometimes a prompt works for a LORA but not for another. Some creators will kindly provide some key prompts but even when they do, it is unlikely to be complete. So after finding a LORA producing images you like, the next step is to take a look at the prompt to see what works.Often you will see a prompted word or string of words having nothing to do with the image. You can leave the unrelated prompt word/string out but do note, that prompt word/string not working in one output image does not mean the LORA or model does not support it. It could be the way the prompt was crafted or the particular generating run not picking up the prompt word/string this particular time, but will affect some other image generation. You can check for this by looking for more images posted by the same user using the same model/lora/prompt if there are any. Otherwise, you have to decide for yourself whether you want to experiment for yourself.Training a model or Lora is not just putting in stuff so that the stuff can be drawn out for use. The "training" is whether the AI process the inputs in such a way that far more outputs are possible than what was put in. A simple way to think of it is that the AI and mixing, matching and coming up with new variations and permutations. Different models have different usage cost to generate, so do note this if you want to generate more images with a fixed amount of credits . Other things affect cost as well which I will mention when I reach those things.The other thing to note is that cost levels when doing text to image generation may vary vs image to image and animation. So a "cheap" model to do an initial generation may end up being credit expensive when you need to fix/enhance the image later.Model+LORA when presented on their information page as photographic, drawn or whatever - may not necessarily produce what they are presented as to produce. The particular combination of the Model and LORA as well as the prompting affects the image greatly. So it is possible, for example, to have an object or clothes in a "drawn" lora appear as photographic. Once again looking at samples and posts is the best way to see this, although unfortunately samples and posts are not always available, in which case you have to decide whether to experiment.One well-known base models is Stable Diffusion (SD) series of models. You will find them named "SD something something". The first "something" is usually the version number. Each version is a new base model so a specific LORA working for a old version should not work for a later version. A LORA updated to support the new version is considered a new Lora. The creator may indicate what the LORA works with or there may be a clue in the name. Otherwise, you can click star to save the LORA to your starred list then in your generation page then select a model. If the lora works for that group of models, when you click on the starred tab for lora selection window, usable LORAs for the model will be brighter and selectable. In the "all" tab of lora selection, Loras that do not work with the model you selected will not appear. The 2nd "something" is typically some variant. For example, you may see an "XL" this basically means the model can create bigger images without separately increasing image size (costs credits).Other base models include Midjourney, Hunyuan, Flux, Kolors etc. Each have its strengths, weaknesses and peculiarities. Spend some time going through others' posts, spot what you like than look at what they used. Over time, you should slowly appreciate the differences.Embeddings are somewhat like add-on prompts, negative prompts with reference images combined. You still need to provide the basic prompt to nudge the image in the direction you want. Embeddings for the beginner is more of an add-on. You can use them either because its more convenient or you are not confident of doing better completely on your own. ControlNet - is not for the beginner. Not necessarily an expert stuff but at least getting there. So, if you have not grasped the beginner stuff yet, don't go into this as you would only be more confused or at least not getting what you want for end result. For the curious beginner, you can think of it as custom sequencing or programming of how the generation is being done.