Object Tracking in Videos with OpenCV: A Beginner’s Guide

Learn how to track objects in videos using OpenCV, the most popular computer vision library. Follow our step-by-step guide with code examples to understand the theory behind object tracking and explore techniques such as template matching, mean shift, and Kalman filtering. Start your journey in computer vision today and unlock the potential of video processing!

Updated March 20, 2023

Video of the day:

How to build a DeepSeek R1 powered application on your own server

Subscribe to this Channel to learn more about Computer Vision and AI!

Welcome to this tutorial on how to track objects in videos with OpenCV. Object tracking is a fundamental task in computer vision, with a wide range of applications such as surveillance, traffic monitoring, and robotics.

In this tutorial, we will explore how to track objects in videos with OpenCV. We will discuss the theory behind object tracking and provide multiple code examples to illustrate the concept.

Theory

Object tracking involves identifying and following the motion of an object over time in a video sequence. This can be achieved using various techniques such as template matching, mean shift, and Kalman filtering.

Template matching involves finding the best match between a template image and a sub-image of the video frame. Mean shift involves shifting a kernel around the target object in the frame and updating its position until convergence. Kalman filtering involves modeling the target object’s motion using a state-space model and using Bayesian inference to estimate its position.

OpenCV provides a range of functions and algorithms to track objects in videos. These include the cv2.matchTemplate() function for template matching, the cv2.meanShift() function for mean shift, and the cv2.KalmanFilter() class for Kalman filtering.

Now that we have a basic understanding of the theory, let’s move on to the code examples.

Code Examples

We will use Python for our examples, but the concept applies to other programming languages supported by OpenCV.

First, let’s start by importing the necessary libraries:

import cv2
import numpy as np

Next, let’s load a sample video file and define the initial position of the object to be tracked:

cap = cv2.VideoCapture('sample_video.mp4')
ret, frame = cap.read()
bbox = cv2.selectROI('Select Object', frame, False)

Template Matching

To track an object using template matching, we can use the following code:

# Define the template image
template = frame[int(bbox[1]):int(bbox[1]+bbox[3]), int(bbox[0]):int(bbox[0]+bbox[2])]

while True:
    ret, frame = cap.read()
    if not ret:
        break

    # Perform template matching
    result = cv2.matchTemplate(frame, template, cv2.TM_CCOEFF_NORMED)
    min_val, max_val, min_loc, max_loc = cv2.minMaxLoc(result)

    # Draw a bounding box around the tracked object
    top_left = max_loc
    bottom_right = (top_left[0] + int(bbox[2]), top_left[1] + int(bbox[3]))
    cv2.rectangle(frame, top_left, bottom_right, (0, 255, 0), 2)

    # Display the tracked object
    cv2.imshow('Tracked Object', frame)

    # Wait for a key press
    k = cv2.waitKey(30) & 0xff
    if k == 27:
        break

In the above code, we first define the template image as the region of interest (ROI) selected by the user using the selectROI() function.

Next, we loop through the video frames, perform template matching using the matchTemplate() function, and find the location of the maximum correlation coefficient using the minMaxLoc() function.

Finally, we draw a bounding box around the tracked object using the rectangle() function and display the tracked object using the imshow() function.

Mean Shift

To track an object using mean shift, we can use the following code:

# Define the initial tracking window
x, y, w, h = bbox
track_window = (x, y, w, h)
# Set up the parameters for the mean shift algorithm
term_crit = ( cv2.TERM_CRITERIA_EPS | cv2.TERM_CRITERIA_COUNT, 10, 1 )

while True:
  ret, frame = cap.read()
  if not ret:
    break

    # Convert the frame to HSV color space
    hsv = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV)

    # Calculate the histogram of the region of interest
    roi = hsv[y:y+h, x:x+w]
    roi_hist = cv2.calcHist([roi], [0], None, [180], [0, 180])
    cv2.normalize(roi_hist, roi_hist, 0, 255, cv2.NORM_MINMAX)

    # Perform mean shift tracking
    dst = cv2.calcBackProject([hsv], [0], roi_hist, [0, 180], 1)
    ret, track_window = cv2.meanShift(dst, track_window, term_crit)

    # Draw a bounding box around the tracked object
    x, y, w, h = track_window
    cv2.rectangle(frame, (x, y), (x+w, y+h), (0, 255, 0), 2)

    # Display the tracked object
    cv2.imshow('Tracked Object', frame)

    # Wait for a key press
    k = cv2.waitKey(30) & 0xff
    if k == 27:
        break

In the above code, we first define the initial tracking window as the ROI selected by the user using the selectROI() function.

Next, we loop through the video frames, convert each frame to the HSV color space, calculate the histogram of the ROI using the calcHist() function, and perform mean shift tracking using the meanShift() function.

Finally, we draw a bounding box around the tracked object using the rectangle() function and display the tracked object using the imshow() function.

Kalman Filtering

To track an object using Kalman filtering, we can use the following code:

# Define the state-space model
dt = 1/30.0
A = np.array([[1, 0, dt, 0], [0, 1, 0, dt], [0, 0, 1, 0], [0, 0, 0, 1]])
B = np.zeros((4, 2))
C = np.array([[1, 0, 0, 0], [0, 1, 0, 0]])
Q = np.eye(4)*0.1
R = np.eye(2)*10
x = np.array([[bbox[0]], [bbox[1]], [0], [0]])
P = np.eye(4)

while True:
    ret, frame = cap.read()
    if not ret:
        break

    # Predict the next state using the state transition matrix
    x = A.dot(x) + B.dot(np.array([[np.random.normal()], [np.random.normal()]]))
    P = A.dot(P).dot(A.T) + Q

    # Calculate the measurement using the current frame
    z = np.array([[bbox[0]+bbox[2]/2], [bbox[1]+bbox[3]/2]])
    y = z - C.dot(x)
    S = C.dot(P).dot(C.T) + R
    K = P.dot(C.T).dot(np.linalg.inv(S))

    # Update the state estimate using the measurement
    x = x + K.dot(y)
    P = (np.eye(4) - K.dot(C)).dot(P)

    # Draw a bounding box around the tracked object using the state estimate
    x_, y_, w_, h_ = map(int, [x[0, 0]-bbox[2]/2, x[1, 0]-bbox[3]/2, bbox[2], bbox[3]])
    cv2.rectangle(frame, (x_, y_), (x_+w_, y_+h_), (0, 255, 0), 2)

    # Display the tracked object
    cv2.imshow('Tracked Object', frame)

    # Wait for a key press
    k = cv2.waitKey(30) & 0xff
    if k == 27:
        break

In the above code, we first define the state-space model for the Kalman filter. We then loop through the video frames, predict the next state using the state transition matrix, calculate the measurement using the current frame, and update the state estimate using the measurement and the Kalman gain.

Finally, we draw a bounding box around the tracked object using the state estimate and display the tracked object using the imshow() function.

Conclusion

In this tutorial, we’ve explored how to track objects in videos with OpenCV, one of the most widely used computer vision libraries. We discussed the theory behind object tracking and provided multiple code examples to illustrate the concept.

Object tracking is an essential task in many computer vision applications, from surveillance to robotics. By mastering the techniques and algorithms of object tracking, you can unlock the potential of computer vision and explore the fascinating world of video processing.

We hope that this tutorial has been helpful and informative for beginners and those looking to explore the world of computer vision and video processing. For further information, please refer to the OpenCV documentation and explore the different image and video processing techniques and their applications.