Inference

Overview¶

For inference, we provide a function called model_map designed for auto-parallel sampling across GPUs.

def model_map(worker, data, model_required_gpus):

GPU Detection: Reads CUDA_VISIBLE_DEVICES from the environment. If not set, it raises an error.
GPU Grouping: Divides GPUs into n groups based on CUDA_VISIBLE_DEVICES, nvidia-smi topo -m, and model_required_gpus.
Data Splitting: Splits the data into n groups.
Worker Execution: Calls the worker function on each group.
Result Sorting: Sorts and returns the results.

def worker(cuda_devices: list[str], data: list[dict[str, Any]])

Set CUDA Visible Devices:

os.environ["CUDA_VISIBLE_DEVICES"] = ",".join(cuda_devices)

Index Preservation: Each item in data is assigned a key __index__ by model_map. To ensure correct sorting of results, do not remove this key.