Sometimes you need to run an object detector as part of a larger system. One of the best tools for bringing in a pre-trained detector may be the detectron2 framework from Meta.

Installation

To install detectron2, it’s as easy as a pip install:

python -m pip install git+https://github.com/facebookresearch/detectron2.git

Detectron2 API

It’s amazingly easy to being to extract objects from images using their pre-trained models. First, you get a model from the model zoo:

from detectron2 import model_zoo

# get a model from the model zoo, in this case a RegNetY-4GF model trained on COCO
model = model_zoo.get("new_baselines/mask_rcnn_regnety_4gf_dds_FPN_400ep_LSJ.py", trained=True)
model = model.eval().to("cuda") # or "cpu", if you don't have a GPU

(More models can be found in the model zoo)

Then, you can use the model to detect objects in an image:

from PIL import Image
import numpy as np
import torch

# Load the frame to a numpy array
frame = np.array(Image.open("input.jpg").convert("RGB"))

# Convert to a torch tensor, and transpose to channels-first (HWC -> CHW)
frame = torch.from_numpy(frame.transpose(2, 0, 1)).float()

# Run the model on the frame
outputs =  model(
                [{"image": frame, "height": frame.shape[1], "width": frame.shape[2], "file_name": "input.jpg"}]
            )

We can then see our results:


# get the boxes, object-masks, classes, and scores from the model output
boxes = outputs[0]["instances"].pred_boxes.cpu().numpy()
masks = outputs[0]["instances"].pred_masks.cpu().numpy()
class_ids = outputs[0]["instances"].pred_classes.cpu().numpy()
scores = outputs[0]["instances"].scores.cpu().numpy()

# We can decode the class IDs using the COCO dataset
from detectron2.data import MetadataCatalog
coco_metadata = MetadataCatalog.get("coco_2017_val")
class_names = [coco_metadata.thing_classes[i] for i in class_ids]

In this way, we can easily extract objects from images using the detectron2 API, and use them in our own systems.