Stable Diffusion

5079 readers
1 users here now

Discuss matters related to our favourite AI Art generation technology

Also see

Other communities

founded 2 years ago
MODERATORS
1
72
MEGATHREAD (lemmy.dbzer0.com)
submitted 2 years ago by db0 to c/stable_diffusion
 
 

This is a copy of /r/stablediffusion wiki to help people who need access to that information


Howdy and welcome to r/stablediffusion! I'm u/Sandcheeze and I have collected these resources and links to help enjoy Stable Diffusion whether you are here for the first time or looking to add more customization to your image generations.

If you'd like to show support, feel free to send us kind words or check out our Discord. Donations are appreciated, but not necessary as you being a great part of the community is all we ask for.

Note: The community resources provided here are not endorsed, vetted, nor provided by Stability AI.

#Stable Diffusion

Local Installation

Active Community Repos/Forks to install on your PC and keep it local.

Online Websites

Websites with usable Stable Diffusion right in your browser. No need to install anything.

Mobile Apps

Stable Diffusion on your mobile device.

Tutorials

Learn how to improve your skills in using Stable Diffusion even if a beginner or expert.

Dream Booth

How-to train a custom model and resources on doing so.

Models

Specially trained towards certain subjects and/or styles.

Embeddings

Tokens trained on specific subjects and/or styles.

Bots

Either bots you can self-host, or bots you can use directly on various websites and services such as Discord, Reddit etc

3rd Party Plugins

SD plugins for programs such as Discord, Photoshop, Krita, Blender, Gimp, etc.

Other useful tools

#Community

Games

  • PictionAIry : (Video|2-6 Players) - The image guessing game where AI does the drawing!

Podcasts

Databases or Lists

Still updating this with more links as I collect them all here.

FAQ

How do I use Stable Diffusion?

  • Check out our guides section above!

Will it run on my machine?

  • Stable Diffusion requires a 4GB+ VRAM GPU to run locally. However, much beefier graphics cards (10, 20, 30 Series Nvidia Cards) will be necessary to generate high resolution or high step images. However, anyone can run it online through DreamStudio or hosting it on their own GPU compute cloud server.
  • Only Nvidia cards are officially supported.
  • AMD support is available here unofficially.
  • Apple M1 Chip support is available here unofficially.
  • Intel based Macs currently do not work with Stable Diffusion.

How do I get a website or resource added here?

*If you have a suggestion for a website or a project to add to our list, or if you would like to contribute to the wiki, please don't hesitate to reach out to us via modmail or message me.

2
3
4
 
 

Abstract

Existing literature typically treats style-driven and subject-driven generation as two disjoint tasks: the former prioritizes stylistic similarity, whereas the latter insists on subject consistency, resulting in an apparent antagonism. We argue that both objectives can be unified under a single framework because they ultimately concern the disentanglement and re-composition of content and style, a long-standing theme in style-driven research. To this end, we present USO, a Unified Style-Subject Optimized customization model. First, we construct a large-scale triplet dataset consisting of content images, style images, and their corresponding stylized content images. Second, we introduce a disentangled learning scheme that simultaneously aligns style features and disentangles content from style through two complementary objectives, style-alignment training and content-style disentanglement training. Third, we incorporate a style reward-learning paradigm denoted as SRL to further enhance the model's performance. Finally, we release USO-Bench, the first benchmark that jointly evaluates style similarity and subject fidelity across multiple metrics. Extensive experiments demonstrate that USO achieves state-of-the-art performance among open-source models along both dimensions of subject consistency and style similarity. Code and model: this https URL

Technical Report: https://arxiv.org/abs/2508.18966

Code: https://github.com/bytedance/USO

USO in ComfyUI tutorial: https://docs.comfy.org/tutorials/flux/flux-1-uso

Project Page: https://bytedance.github.io/USO/

5
6
7
 
 

Abstract

Text-to-image (T2I) diffusion models excel at generating photorealistic images but often fail to render accurate spatial relationships. We identify two core issues underlying this common failure: 1) the ambiguous nature of data concerning spatial relationships in existing datasets, and 2) the inability of current text encoders to accurately interpret the spatial semantics of input descriptions. We propose CoMPaSS, a versatile framework that enhances spatial understanding in T2I models. It first addresses data ambiguity with the Spatial Constraints-Oriented Pairing (SCOP) data engine, which curates spatially-accurate training data via principled constraints. To leverage these priors, CoMPaSS also introduces the Token ENcoding ORdering (TENOR) module, which preserves crucial token ordering information lost by text encoders, thereby reinforcing the prompt's linguistic structure. Extensive experiments on four popular T2I models (UNet and MMDiT-based) show CoMPaSS sets a new state of the art on key spatial benchmarks, with substantial relative gains on VISOR (+98%), T2I-CompBench Spatial (+67%), and GenEval Position (+131%). Code is available at this https URL.

Paper: https://arxiv.org/abs/2412.13195

Code: https://github.com/blurgyy/CoMPaSS

Project Page: https://compass.blurgy.xyz/

8
 
 

QwenEdit InStyle is a LoRA fine-tune for QwenEdit that significantly improves its ability to generate images based on a style reference. While the base model has style transfer capabilities, it often misses the nuances of styles and can transplant unwanted details from the input image. This LoRA addresses these limitations to provide more accurate style-based image generation.

9
 
 

Major Updates

10
 
 

Chroma1-Base: 512x512 model

Chroma1-HD: 1024x1024 model

Chroma1-Flash: A fine-tuned Chroma1-Base experimental model

Chroma1-Radiance [WIP]: Chroma1-Base pixel space model

11
7
submitted 3 weeks ago* (last edited 3 weeks ago) by Even_Adder to c/stable_diffusion
 
 

Without paywall: https://archive.is/4oEi2

12
13
7
Qwen Image Edit (qianwen-res.oss-cn-beijing.aliyuncs.com)
submitted 4 weeks ago* (last edited 4 weeks ago) by Even_Adder to c/stable_diffusion
 
 

Introduction

We are excited to introduce Qwen-Image-Edit, the image editing version of Qwen-Image. Built upon our 20B Qwen-Image model, Qwen-Image-Edit successfully extends Qwen-Image’s unique text rendering capabilities to image editing tasks, enabling precise text editing. Furthermore, Qwen-Image-Edit simultaneously feeds the input image into Qwen2.5-VL (for visual semantic control) and the VAE Encoder (for visual appearance control), achieving capabilities in both semantic and appearance editing. To experience the latest model, visit Qwen Chat and select the "Image Editing" feature.

Technical Report: https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Image/Qwen_Image.pdf

Code: https://github.com/QwenLM/Qwen-Image

Hugging Face: https://huggingface.co/Qwen/Qwen-Image-Edit

GGUFs: https://huggingface.co/QuantStack/Qwen-Image-Edit-GGUF

14
15
16
 
 

SD.Next Release 2025-08-15

New release two weeks after the last one and its a big one with over 150 commits!

  • Several new models: Qwen-Image (plus Lightning variant) and FLUX.1-Krea-Dev
  • Several updated models: Chroma, SkyReels-V2, Wan-VACE, HunyuanDiT
  • Plus continuing with major UI work with new embedded Docs/Wiki search, redesigned real-time hints, wildcards UI selector, built-in GPU monitor, CivitAI integration and more!

17
 
 

An open-source implementation for training LoRA (Low-Rank Adaptation) layers for Qwen/Qwen-Image models by FlyMy.AI.

18
19
20
21
22
23
19
submitted 1 month ago* (last edited 1 month ago) by Even_Adder to c/stable_diffusion
 
 

Abstract

Scalable Vector Graphics (SVG) is an important image format widely adopted in graphic design because of their resolution independence and editability. The study of generating high-quality SVG has continuously drawn attention from both designers and researchers in the AIGC community. However, existing methods either produces unstructured outputs with huge computational cost or is limited to generating monochrome icons of over-simplified structures. To produce high-quality and complex SVG, we propose OmniSVG, a unified framework that leverages pre-trained Vision-Language Models (VLMs) for end-to-end multimodal SVG generation. By parameterizing SVG commands and coordinates into discrete tokens, OmniSVG decouples structural logic from low-level geometry for efficient training while maintaining the expressiveness of complex SVG structure. To further advance the development of SVG synthesis, we introduce MMSVG-2M, a multimodal dataset with two million richly annotated SVG assets, along with a standardized evaluation protocol for conditional SVG generation tasks. Extensive experiments show that OmniSVG outperforms existing methods and demonstrates its potential for integration into professional SVG design workflows.

Paper: https://arxiv.org/abs/2504.06263

Code: https://github.com/OmniSVG/OmniSVG/

Weights: https://huggingface.co/OmniSVG/OmniSVG

Project Page: https://omnisvg.github.io/

Demo: https://huggingface.co/spaces/OmniSVG/OmniSVG-3B

24
25
view more: next ›