1. Input Image & Mask
Nodes: LoadImage, ImageCrop+, Mask Crop Region
The user uploads or selects an image. A mask is created using tools or by loading a mask file. The mask can be refined or cropped for the region of interest.
2. Segmentation & Masking
Nodes: SAMModelLoader (segment anything), GroundingDinoModelLoader, GroundingDinoSAMSegment, InvertMask, AddMask, SubtractMask, MaskBlur+, ImpactDilateMask
Advanced segmentation models (SAM, GroundingDINO) are loaded and used to generate precise masks for the selected region. Masks can be combined, inverted, blurred, or dilated for fine control.
3. Prompt Encoding
Nodes: Text, Text Find and Replace, CLIPTextEncode, CLIPVisionLoader, IPAdapterModelLoader, IPAdapterAdvanced
User text prompts (main subject and context) are processed and encoded using CLIP and IPAdapter models. These encodings guide the AI on what to generate in the masked area.
4. AI Inpainting & Generation
Nodes: CheckpointLoaderSimple, ControlNetLoaderAdvanced, ACN_AdvancedControlNetApply, KSampler, VAEDecode, DifferentialDiffusion
The diffusion model (e.g., juggernautXL_v9Rdphoto2Lightning.safetensors) is loaded. ControlNet and IPAdapter provide additional guidance. The model generates new content in the masked region, conditioned on the prompt and segmentation.
5. Compositing & Output
Nodes: ImageCompositeMasked, ImageResize+, PreviewImage, Image Save, Display Any (rgthree)
The generated content is composited with the original image using the mask. The result is resized if needed, previewed, and saved for the user to view or download.