Running Inference¶

SLEAP-NN provides powerful inference capabilities for pose estimation with support for multiple model types, tracking, and legacy SLEAP model compatibility. SLEAP-NN supports inference on:

Videos: Any format supported by OpenCV
SLEAP Labels: .slp files

Using uv workflow

This section assumes you have sleap-nn installed. If not, refer to the installation guide.

If you're using the uvx workflow, you do not need to install anything. (See installation using uvx for more details.)
If you are using uv sync or uv pip installation methods, add uv run as a prefix to all CLI commands shown below, for example:

uv run sleap-nn track ...

Run Inference with CLI¶

sleap-nn track \
    --data_path video.mp4 \
    --model_paths models/ckpt_folder/

To run inference on a specific cuda device (say cuda:0 - first gpu),

sleap-nn track \
    --data_path video.mp4 \
    --model_paths models/ckpt_folder/
    --device cuda:0

To run inference on video files with specific frames

sleap-nn track \
    --data_path video.mp4 \
    --frames "0-100,200-300" \
    --model_paths models/ckpt_folder/

To run inference with different backbone weights than the one in models/ckpt_folder/

sleap-nn track \
    --data_path video.mp4 \
    --frames "0-100,200-300" \
    --model_paths models/ckpt_folder/ \
    --backbone_ckpt_path models/backbone.ckpt

Inference on TopDown models

For two-stage models (topdown and multiclass topdown), both the centroid and centered-instance model ckpts should be provided as given below:

sleap-nn track \
    --data_path video.mp4 \
    --model_paths models/centroid_unet/ \
    --model_paths models/centered_instance_unet/

Centroid-only / Centered-instance-only inference

You can run inference using only the centroid model or only the centered-instance model. However, both modes require ground-truth (GT) instances to be present in the .slp file, so this type of inference can only be performed on datasets that already have GT labels.

Note: In centroid-only inference, each predicted centroid (single-keypoint) is matched (by Euclidean distance) to the nearest ground-truth instance, and the ground-truth keypoints are copied for display (thus, the pipeline always expects ground-truth pose data to be available when running inference on just the centroid model). As a result, an OKS mAP of 1.0 for a centroid model simply means all instances were detected—it does not measure pose/keypoint accuracy. To evaluate keypoints, run the second stage (the pose model) instead of centroid-only inference.

Arguments for CLI¶

Essential Parameters¶

Parameter	Description	Default
`--data_path` / `-i`	Path to `.slp` file or `.mp4` to run inference on	Required
`--model_paths` / `-m`	List of paths to the directory where the best.ckpt and training_config.yaml are saved	Required
`--output_path` / `-o`	The output filename to use for the predicted data. If not provided, defaults to '[data_path].slp'	`<data_path>.predictions.slp`
`--device` / `-d`	Device on which torch.Tensor will be allocated. One of ('cpu', 'cuda', 'mps', 'auto', 'opencl', 'ideep', 'hip', 'msnpu'). Default: 'auto' (based on available backend either cuda, mps or cpu is chosen)	`auto`
`--batch_size` / `-b`	Number of frames to predict at a time. Larger values result in faster inference speeds, but require more memory	`4`
`--max_instances` / `-n`	Limit maximum number of instances in multi-instance models. Not available for ID models	`None`
`--tracking` / `-t`	If True, runs tracking on the predicted instances	`False`
`--peak_threshold`	Minimum confidence map value to consider a peak as valid	`0.2`
`--integral_refinement`	If `None`, returns the grid-aligned peaks with no refinement. If `'integral'`, peaks will be refined with integral regression. Default: 'integral'.	`integral`

Model Configuration¶

Parameter	Description	Default
`--backbone_ckpt_path`	To run inference on any `.ckpt` other than `best.ckpt` from the `model_paths` dir, the path to the `.ckpt` file should be passed here	`None`
`--head_ckpt_path`	Path to `.ckpt` file if a different set of head layer weights are to be used. If `None`, the `best.ckpt` from `model_paths` dir is used (or the ckpt from `backbone_ckpt_path` if provided)	`None`

Image Pre-processing¶

Parameter	Description	Default
`--max_height`	Maximum height the image should be padded to. If not provided, the values from the training config are used. Default: None.	`None`
`--max_width`	Maximum width the image should be padded to. If not provided, the values from the training config are used. Default: None.	`None`
`--input_scale`	Scale factor to apply to the input image. If not provided, the values from the training config are used. Default: None.	`None`
`--ensure_rgb`	True if the input image should have 3 channels (RGB image). If input has only one channel when this is set to `True`, then the images from single-channel is replicated along the channel axis. If the image has three channels and this is set to False, then we retain the three channels. If not provided, the values from the training config are used. Default: `None`.	`False`
`--ensure_grayscale`	True if the input image should only have a single channel. If input has three channels (RGB) and this is set to True, then we convert the image to grayscale (single-channel) image. If the source image has only one channel and this is set to False, then we retain the single channel input. If not provided, the values from the training config are used. Default: `None`.	`False`
`--crop_size`	Crop size. If not provided, the crop size from training_config.yaml is used	`None`
`--anchor_part`	The node name to use as the anchor for the centroid. If not provided, the anchor part in the `training_config.yaml` is used. Default: `None`.	`None`

Data Selection¶

Parameter	Description	Default
`--only_labeled_frames`	`True` if inference should be run only on user-labeled frames	`False`
`--only_suggested_frames`	`True` if inference should be run only on unlabeled suggested frames	`False`
`--video_index`	Integer index of video in .slp file to predict on. To be used with an .slp path as an alternative to specifying the video path	`None`
`--video_dataset`	The dataset for HDF5 videos	`None`
`--video_input_format`	The input_format for HDF5 videos	`channels_last`
`--frames`	List of frames indices. If `None`, all frames in the video are used	All frames

Performance¶

Parameter	Description	Default
`--queue_maxsize`	Maximum size of the frame buffer queue	`8`

Run inference with API¶

To return predictions as list of dictionaries:

from sleap_nn.predict import run_inference

# Run inference
labels = run_inference(
    data_path="video.mp4",
    model_paths=["models/unet/"],
    make_labels=False,
    return_confmaps=True
)

Setting return_confmaps=True will also return the raw confidence maps generated by the model, allowing you to inspect or analyze the model's outputs alongside the predicted poses.

To return predictions as a sleap_io.Labels object,

from sleap_nn.predict import run_inference

# Run inference
labels = run_inference(
    data_path="video.mp4",
    model_paths=["models/unet/"],
    output_path="predictions.slp",
    make_labels=True,
)

For a detailed explanation of all available parameters for run_inference, see the API docs for run_inference().

Tracking¶

SLEAP-NN provides advanced tracking capabilities for multi-instance pose estimation. When running inference, the output labels object (or .slp file) will include track IDs assigned to the predicted instances (or user-labeled instances).

When using the sleap-nn track CLI command with both --model_paths and --tracking specified, the tool will perform both pose prediction and track assignment in a single step—automatically generating pose predictions and associating them into tracks.

Tracking Methods¶

Tracking Parameters¶

Parameter	Description	Default
`--tracking` / `-t`	If True, runs tracking on the predicted instances	`False`
`--tracking_window_size`	Number of frames to look for in the candidate instances to match with the current detections	`5`
`--min_new_track_points`	We won't spawn a new track for an instance with fewer than this many points	`0`
`--candidates_method`	Either of `fixed_window` or `local_queues`. In fixed window method, candidates from the last `window_size` frames. In local queues, last `window_size` instances for each track ID is considered for matching against the current detection	`fixed_window`
`--min_match_points`	Minimum non-NaN points for match candidates	`0`
`--features`	Feature representation for the candidates to update current detections. One of [`keypoints`, `centroids`, `bboxes`, `image`]	`keypoints`
`--scoring_method`	Method to compute association score between features from the current frame and the previous tracks. One of [`oks`, `cosine_sim`, `iou`, `euclidean_dist`]	`oks`
`--scoring_reduction`	Method to aggregate and reduce multiple scores if there are several detections associated with the same track. One of [`mean`, `max`, `robust_quantile`]	`mean`
`--robust_best_instance`	If the value is between 0 and 1 (excluded), use a robust quantile similarity score for the track. If the value is 1, use the max similarity (non-robust). For selecting a robust score, 0.95 is a good value	`1.0`
`--track_matching_method`	Track matching algorithm. One of `hungarian`, `greedy`	`hungarian`
`--max_tracks`	Maximum number of new tracks to be created to avoid redundant tracks (only for local queues candidate)	`None`
`--use_flow`	If True, `FlowShiftTracker` is used, where the poses are matched using optical flow shifts	`False`
`--of_img_scale`	Factor to scale the images by when computing optical flow. Decrease this to increase performance at the cost of finer accuracy. Sometimes decreasing the image scale can improve performance with fast movements (only if `use_flow` is True)	`1.0`
`--of_window_size`	Optical flow window size to consider at each pyramid scale level (only if `use_flow` is True)	`21`
`--of_max_levels`	Number of pyramid scale levels to consider. This is different from the scale parameter, which determines the initial image scaling (only if `use_flow` is True)	`3`
`--post_connect_single_breaks`	If True and `max_tracks` is not None with local queues candidate method, connects track breaks when exactly one track is lost and exactly one new track is spawned in the frame	`False`

Fixed Window Tracking¶

This method maintains a fixed-size queue with the last N frames and uses all instances from those frames as candidates for matching.

sleap-nn track \
    -i video.mp4 \
    -m models/bottomup_unet/ \
    -t \
    --candidates_method fixed_window \
    --tracking_window_size 10

Local Queues Tracking¶

This method maintains separate queues for each track ID, keeping the last N instances per track. It's more robust to track breaks but requires more memory and computation.

sleap-nn track \
    -i video.mp4 \
    -m models/bottomup_unet/ \
    -t \
    --candidates_method local_queues \
    --tracking_window_size 5

Optical Flow Tracking¶

This method uses optical flow to shift the candidates onto the frame to be tracked and then associates the untracked instances to the shifted instances.

sleap-nn track \
    -i video.mp4 \
    -m models/bottomup_unet/ \
    -t \
    --use_flow

Track-only workflow¶

You can perform tracking on existing user-labeled instances—without running inference to get new predictions—by enabling tracking (--tracking) and omitting the --model-paths argument.

sleap-nn track \
    -i labels.slp \
    -t \

To run track-only workflow on select frame indices and video index in a slp file:

sleap-nn track \
    -i labels.slp \
    -t \
    --frames 0-100 \
    --video_index 0

Legacy SLEAP Model Support¶

SLEAP-NN supporting running inference with models trained using the original SLEAP TensorFlow/Keras backend (sleap <= 1.4), but only for UNet backbone models. To run inference with legacy SLEAP models (trained with TensorFlow/Keras, sleap ≤ 1.4), simply use the same CLI or API workflows described above in the With CLI and With API sections. Just provide the path to the directory containing both best_model.h5 and training_config.json from your SLEAP training run as your model_paths argument. SLEAP-NN will automatically detect and load these legacy models for inference.

Legacy support Limitation

Currently, only UNet backbone models are supported for inference when converting legacy SLEAP (≤1.4) models.

Evaluation Metrics¶

SLEAP-NN provides comprehensive evaluation capabilities to assess model performance against ground truth labels.

Using CLI:

sleap-nn eval \
    --ground_truth_path gt_labels.slp \
    --predicted_path pred_labels.slp \
    --save_metrics pred_metrics.npz \

Parameter	Description	Default
`--ground_truth_path` / `-g`	Path to ground truth labels file (.slp)	Required
`--predicted_path` / `-p`	Path to predicted labels file (.slp)	Required
`--save_metrics` / `-s`	Path to save metrics (.npz file)	`None`
`--oks_stddev`	Standard deviation for OKS calculation	`0.05`
`--oks_scale`	Scale factor for OKS calculation	`None`
`--match_threshold`	Threshold for instance matching	`0.0`
`--user_labels_only`	Only evaluate user-labeled frames	`False`

Using Evaluator API:

import sleap_io as sio
from sleap_nn.evaluation import Evaluator

# Load ground truth and predictions
gt_labels = sio.load_slp("gt_labels.slp")
pred_labels = sio.load_slp("pred_labels.slp")

# Create evaluator and compute metrics
evaluator = Evaluator(gt_labels, pred_labels)
metrics = evaluator.evaluate()

# Print results
print(f"OKS mAP: {metrics['voc_metrics']['oks_voc.mAP']}")
print(f"mOKS: {metrics['mOKS']}")
print(f"Dist. error @90th percentile: {metrics['distance_metrics']['p90']}")

Running Inference¶

Run Inference with CLI¶

Arguments for CLI¶

Essential Parameters¶

Model Configuration¶

Image Pre-processing¶

Data Selection¶

Performance¶

Run inference with API¶

Tracking¶

Tracking Methods¶

Tracking Parameters¶

Fixed Window Tracking¶

Local Queues Tracking¶

Optical Flow Tracking¶

Track-only workflow¶

Legacy SLEAP Model Support¶

Evaluation Metrics¶

Troubleshooting¶

Common Issues¶

Out of Memory¶

Slow Inference¶

Poor Predictions¶

Tracking Issues¶

Next Steps¶