Overview¶
For inference, we provide a function called model_map designed for auto-parallel sampling across GPUs.
model_map Signature¶
def model_map(worker, data, model_required_gpus):- worker: The inference logic function.
- data: The data to be processed.
- model_required_gpus: The number of GPUs required per worker.
How model_map Works¶
- GPU Detection: Reads
CUDA_VISIBLE_DEVICESfrom the environment. If not set, it raises an error. - GPU Grouping: Divides GPUs into
ngroups based onCUDA_VISIBLE_DEVICES,nvidia-smi topo -m, andmodel_required_gpus. - Data Splitting: Splits the data into
ngroups. - Worker Execution: Calls the
workerfunction on each group. - Result Sorting: Sorts and returns the results.
Worker Function Signature¶
def worker(cuda_devices: list[str], data: list[dict[str, Any]])- cuda_devices: A list of GPU IDs (e.g.,
['1', '2']). - data: The data for the worker to process.
Inside the Worker¶
- Set CUDA Visible Devices:
os.environ["CUDA_VISIBLE_DEVICES"] = ",".join(cuda_devices) - Implement Inference Logic: Write your custom inference logic.
Important Note¶
- Index Preservation: Each item in
datais assigned a key__index__bymodel_map. To ensure correct sorting of results, do not remove this key.