Overview¶
For inference, we provide a function called model_map
designed for auto-parallel sampling across GPUs.
model_map
Signature¶
def model_map(worker, data, model_required_gpus):
- worker: The inference logic function.
- data: The data to be processed.
- model_required_gpus: The number of GPUs required per worker.
How model_map
Works¶
- GPU Detection: Reads
CUDA_VISIBLE_DEVICES
from the environment. If not set, it raises an error. - GPU Grouping: Divides GPUs into
n
groups based onCUDA_VISIBLE_DEVICES
,nvidia-smi topo -m
, andmodel_required_gpus
. - Data Splitting: Splits the data into
n
groups. - Worker Execution: Calls the
worker
function on each group. - Result Sorting: Sorts and returns the results.
Worker Function Signature¶
def worker(cuda_devices: list[str], data: list[dict[str, Any]])
- cuda_devices: A list of GPU IDs (e.g.,
['1', '2']
). - data: The data for the worker to process.
Inside the Worker¶
- Set CUDA Visible Devices:
os.environ["CUDA_VISIBLE_DEVICES"] = ",".join(cuda_devices)
- Implement Inference Logic: Write your custom inference logic.
Important Note¶
- Index Preservation: Each item in
data
is assigned a key__index__
bymodel_map
. To ensure correct sorting of results, do not remove this key.