Object Tracking in Videos with OpenCV: A Beginner’s Guide
Learn how to track objects in videos using OpenCV, the most popular computer vision library. Follow our step-by-step guide with code examples to understand the theory behind object tracking and explore techniques such as template matching, mean shift, and Kalman filtering. Start your journey in computer vision today and unlock the potential of video processing!
Updated March 20, 2023
A $250 machine that can run Computer Vision and AI. Too good to be true?
Subscribe to my Channel!
Welcome to this tutorial on how to track objects in videos with OpenCV. Object tracking is a fundamental task in computer vision, with a wide range of applications such as surveillance, traffic monitoring, and robotics.
In this tutorial, we will explore how to track objects in videos with OpenCV. We will discuss the theory behind object tracking and provide multiple code examples to illustrate the concept.
Theory
Object tracking involves identifying and following the motion of an object over time in a video sequence. This can be achieved using various techniques such as template matching, mean shift, and Kalman filtering.
Template matching involves finding the best match between a template image and a sub-image of the video frame. Mean shift involves shifting a kernel around the target object in the frame and updating its position until convergence. Kalman filtering involves modeling the target object’s motion using a state-space model and using Bayesian inference to estimate its position.
OpenCV provides a range of functions and algorithms to track objects in videos. These include the cv2.matchTemplate() function for template matching, the cv2.meanShift()
function for mean shift, and the cv2.KalmanFilter()
class for Kalman filtering.
Now that we have a basic understanding of the theory, let’s move on to the code examples.
Code Examples
We will use Python for our examples, but the concept applies to other programming languages supported by OpenCV.
First, let’s start by importing the necessary libraries:
import cv2
import numpy as np
Next, let’s load a sample video file and define the initial position of the object to be tracked:
cap = cv2.VideoCapture('sample_video.mp4')
ret, frame = cap.read()
bbox = cv2.selectROI('Select Object', frame, False)
Template Matching
To track an object using template matching, we can use the following code:
# Define the template image
template = frame[int(bbox[1]):int(bbox[1]+bbox[3]), int(bbox[0]):int(bbox[0]+bbox[2])]
while True:
ret, frame = cap.read()
if not ret:
break
# Perform template matching
result = cv2.matchTemplate(frame, template, cv2.TM_CCOEFF_NORMED)
min_val, max_val, min_loc, max_loc = cv2.minMaxLoc(result)
# Draw a bounding box around the tracked object
top_left = max_loc
bottom_right = (top_left[0] + int(bbox[2]), top_left[1] + int(bbox[3]))
cv2.rectangle(frame, top_left, bottom_right, (0, 255, 0), 2)
# Display the tracked object
cv2.imshow('Tracked Object', frame)
# Wait for a key press
k = cv2.waitKey(30) & 0xff
if k == 27:
break
In the above code, we first define the template image as the region of interest (ROI) selected by the user using the selectROI() function.
Next, we loop through the video frames, perform template matching using the matchTemplate() function, and find the location of the maximum correlation coefficient using the minMaxLoc() function.
Finally, we draw a bounding box around the tracked object using the rectangle() function and display the tracked object using the imshow() function.
Mean Shift
To track an object using mean shift, we can use the following code:
# Define the initial tracking window
x, y, w, h = bbox
track_window = (x, y, w, h)
# Set up the parameters for the mean shift algorithm
term_crit = ( cv2.TERM_CRITERIA_EPS | cv2.TERM_CRITERIA_COUNT, 10, 1 )
while True:
ret, frame = cap.read()
if not ret:
break
# Convert the frame to HSV color space
hsv = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV)
# Calculate the histogram of the region of interest
roi = hsv[y:y+h, x:x+w]
roi_hist = cv2.calcHist([roi], [0], None, [180], [0, 180])
cv2.normalize(roi_hist, roi_hist, 0, 255, cv2.NORM_MINMAX)
# Perform mean shift tracking
dst = cv2.calcBackProject([hsv], [0], roi_hist, [0, 180], 1)
ret, track_window = cv2.meanShift(dst, track_window, term_crit)
# Draw a bounding box around the tracked object
x, y, w, h = track_window
cv2.rectangle(frame, (x, y), (x+w, y+h), (0, 255, 0), 2)
# Display the tracked object
cv2.imshow('Tracked Object', frame)
# Wait for a key press
k = cv2.waitKey(30) & 0xff
if k == 27:
break
In the above code, we first define the initial tracking window as the ROI selected by the user using the selectROI()
function.
Next, we loop through the video frames, convert each frame to the HSV color space, calculate the histogram of the ROI using the calcHist()
function, and perform mean shift tracking using the meanShift()
function.
Finally, we draw a bounding box around the tracked object using the rectangle()
function and display the tracked object using the imshow()
function.
Kalman Filtering
To track an object using Kalman filtering, we can use the following code:
# Define the state-space model
dt = 1/30.0
A = np.array([[1, 0, dt, 0], [0, 1, 0, dt], [0, 0, 1, 0], [0, 0, 0, 1]])
B = np.zeros((4, 2))
C = np.array([[1, 0, 0, 0], [0, 1, 0, 0]])
Q = np.eye(4)*0.1
R = np.eye(2)*10
x = np.array([[bbox[0]], [bbox[1]], [0], [0]])
P = np.eye(4)
while True:
ret, frame = cap.read()
if not ret:
break
# Predict the next state using the state transition matrix
x = A.dot(x) + B.dot(np.array([[np.random.normal()], [np.random.normal()]]))
P = A.dot(P).dot(A.T) + Q
# Calculate the measurement using the current frame
z = np.array([[bbox[0]+bbox[2]/2], [bbox[1]+bbox[3]/2]])
y = z - C.dot(x)
S = C.dot(P).dot(C.T) + R
K = P.dot(C.T).dot(np.linalg.inv(S))
# Update the state estimate using the measurement
x = x + K.dot(y)
P = (np.eye(4) - K.dot(C)).dot(P)
# Draw a bounding box around the tracked object using the state estimate
x_, y_, w_, h_ = map(int, [x[0, 0]-bbox[2]/2, x[1, 0]-bbox[3]/2, bbox[2], bbox[3]])
cv2.rectangle(frame, (x_, y_), (x_+w_, y_+h_), (0, 255, 0), 2)
# Display the tracked object
cv2.imshow('Tracked Object', frame)
# Wait for a key press
k = cv2.waitKey(30) & 0xff
if k == 27:
break
In the above code, we first define the state-space model for the Kalman filter. We then loop through the video frames, predict the next state using the state transition matrix, calculate the measurement using the current frame, and update the state estimate using the measurement and the Kalman gain.
Finally, we draw a bounding box around the tracked object using the state estimate and display the tracked object using the imshow()
function.
Conclusion
In this tutorial, we’ve explored how to track objects in videos with OpenCV, one of the most widely used computer vision libraries. We discussed the theory behind object tracking and provided multiple code examples to illustrate the concept.
Object tracking is an essential task in many computer vision applications, from surveillance to robotics. By mastering the techniques and algorithms of object tracking, you can unlock the potential of computer vision and explore the fascinating world of video processing.
We hope that this tutorial has been helpful and informative for beginners and those looking to explore the world of computer vision and video processing. For further information, please refer to the OpenCV documentation and explore the different image and video processing techniques and their applications.