This document describes the modifications made to enable parallel GPU processing across multiple graphics cards in SD.Next. These changes allow you to utilize all available GPUs as a single logical unit, combining their VRAM and processing power for generating high-resolution images.
The modifications enhance SD.Next's device management to support data parallelism using PyTorch's DataParallel. When enabled, models (particularly UNet and VAE) are automatically distributed across all available GPUs. This approach:
modules/devices.py AdditionsThe following functions and variables were added to the end of the file, preserving all original functions:
_parallel_enabled = False # Tracks if parallel mode is active
_parallel_device_ids = [] # List of GPU IDs being used
_parallel_models = {} # Stores wrapped parallel models
| Function | Description |
|---|---|
enable_parallel_gpus(device_ids=None) | Activates parallel mode. If device_ids is None, it uses all detected GPUs. Returns True on success. |
disable_parallel_gpus() | Deactivates parallel mode and clears GPU memory. |
is_parallel_enabled() | Returns the current status of parallel mode. |
get_parallel_device_ids() | Returns the list of GPU IDs currently used in parallel. |
| Function | Description |
|---|---|
parallelize_model(model, model_name="default", device_ids=None) | Wraps a given PyTorch model (torch.nn.Module) with DataParallel, distributing it across the specified (or all) GPUs. |
parallelize_unet(sd_model) | Specifically targets a loaded Stable Diffusion model, automatically parallelizing its UNet and VAE components. |
torch_gc: The original garbage collection function was extended to also clear cache on all parallel GPUs.autocast: The original autocast context was extended to work correctly with the primary GPU in parallel mode.webui.py IntegrationTwo key sections were added to the main launch script to automate the process:
Inside the start_common() function, we added a block that detects available GPUs and enables parallel mode if two or more are found:
def start_common():
log.debug('Entering start sequence')
# === AUTO-ENABLE PARALLEL GPUS ===
try:
gpu_count = modules.devices.get_device_count()
if gpu_count >= 2:
modules.devices.enable_parallel_gpus()
log.info(f"🎮 {gpu_count} GPU parallel mode active!")
else:
log.info(f"🎮 {gpu_count} GPU(s) detected, running in single-GPU mode")
except Exception as e:
log.debug(f"Parallel GPU initialization failed: {e}")
# === END OF PARALLEL GPU SETUP ===
# ... rest of the original function ...
>=2 GPUs. If you have 5, all will be used.
After the model is loaded in the webui() function, we automatically parallelize its core components:
def webui(restart=False):
# ... existing code ...
load_model()
# === PARALLELIZE LOADED MODEL ===
try:
if modules.devices.is_parallel_enabled() and shared.sd_model is not None:
modules.devices.parallelize_unet(shared.sd_model)
log.info("Model UNet/VAE converted to parallel mode")
except Exception as e:
log.debug(f"Model parallelization error: {e}")
# === END OF MODEL PARALLELIZATION ===
# ... rest of the original function ...
get_device_count() queries PyTorch for the number of available CUDA GPUs.enable_parallel_gpus() is called. This function:
parallelize_unet(sd_model) is invoked. It:
torch.nn.DataParallel, which handles splitting the input data across GPUs and gathering the results.torch_gc function ensures that when garbage collection is triggered (e.g., after generation), it clears VRAM on all parallel GPUs, preventing memory fragmentation.For more general information about SD.Next features, configuration, and advanced usage, please refer to the official documentation:
➡️ SD.Next Official Documentation
Key sections that might be helpful after enabling multi-GPU:
--medvram, --lowvram, and compile settings: Performance Tuning Guide