Milan Ghimire - Software Developer, ML/AI Enthusiast, and Tech Visionary

Overview

This project does something that feels like it should need a heavy model, but doesn't: it watches a camera feed and reacts the instant something moves. There is no neural network here at all - just the difference between two frames and a little pixel maths.

The script reads the webcam frame by frame, keeps a short rolling buffer of recent frames, and compares the oldest against the newest. Where nothing changed the difference is near black; where something moved it lights up. Those bright regions get boxed in red, the frame is stamped with a "Burglar Detected!" alert, and a timestamped snapshot is saved to disk - throttled so it doesn't write hundreds of near-identical images.

This is the lightweight counterpart to the Computer Vision with OpenCV & YOLOv8 project, which covers the OpenCV basics and the YOLOv8 detection this leans on - and a reminder that not every vision problem needs a model.

Frame differencing flags the moving region in red and prints the alert in real time - no model required, just the difference between two frames. This is a screen recording of the script below actually running.

A few core concepts first

Before the code, the three ideas this project rests on.

Frame differencing - motion as a subtraction

The whole detector is built on one observation: if you subtract one frame from another a few steps later, the parts that stayed still cancel out to near black, and only the parts that moved are left bright. OpenCV's cv2.absdiff(a, b) gives that absolute pixel-by-pixel difference. No training, no model - motion is just a subtraction.

Grayscale and blur - killing the noise

Cameras are noisy: even a perfectly still scene flickers slightly pixel to pixel, and that flicker would register as fake "motion". So before comparing, each frame is converted to grayscale (motion only needs brightness, not colour) and Gaussian blurred to smooth out tiny flicker. Only real, sizeable movement survives that smoothing.

Threshold → dilate → contour

The raw difference is a fuzzy gray image. To turn it into clean "this region moved" boxes it goes through three steps: threshold (anything brighter than a cutoff becomes pure white, the rest black), dilate (fatten the white blobs so nearby motion pixels merge into one solid region), and findContours (trace the outline of each white region). A minimum-area filter then ignores small blobs

a leaf, a shadow - so only something sizeable triggers the alert.

Because there is no model, this detector doesn't know what moved - only that something did. That is the trade-off: near-zero cost and no training, in exchange for not telling a person from a passing cat. Pairing it with YOLOv8 would add the "what".

The full script

The complete program - open the camera, diff each pair of frames, box the motion, alert and snapshot. The walkthrough underneath takes it line by line.

import cv2, os, time
os.makedirs("captures", exist_ok=True)

cam = cv2.VideoCapture(0)
if not cam.isOpened():
    print("Cannot open camera")
    exit()

frames, gap, last_saved = [], 5, 0

while True:
    ok, frame = cam.read()
    if not ok:
        break

    gray = cv2.GaussianBlur(cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY), (21, 21), 0)
    frames.append(gray)
    if len(frames) > gap + 1:
        frames.pop(0)

    motion = False
    if len(frames) >= gap:
        diff = cv2.absdiff(frames[0], frames[-1])
        _, thresh = cv2.threshold(diff, 30, 255, cv2.THRESH_BINARY)
        thresh = cv2.dilate(thresh, None, iterations=2)
        contours, _ = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)

        for c in contours:
            if cv2.contourArea(c) < 500:
                continue
            motion = True
            x, y, w, h = cv2.boundingRect(c)
            cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 0, 255), 2)

    if motion:
        cv2.putText(frame, "Burglar Detected!", (10, 30),
                    cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 0, 255), 2)
        if time.time() - last_saved > 3:
            cv2.imwrite(f"captures/burglar_{int(time.time())}.jpg", frame)
            print("Image saved.")
            last_saved = time.time()

    cv2.imshow("Burglar Detection", frame)
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

cam.release()
cv2.destroyAllWindows()

Line by line

Setup

import cv2, os, time brings in OpenCV (frames, drawing, the diff), os (make the snapshot folder) and time (timestamps and the save throttle).
os.makedirs("captures", exist_ok=True) makes the folder for saved snapshots if it isn't there yet; exist_ok=True means it won't error if it already is.
cv2.VideoCapture(0) opens the default camera (device 0). Pass a filename instead and the same loop runs on a recorded video.
if not cam.isOpened(): guards against a missing or busy camera and exits with a clear message instead of failing silently.
frames, gap, last_saved = [], 5, 0 - frames is a rolling buffer of recent grayscale frames, gap is how many frames apart we compare, last_saved throttles how often we write to disk.

The frame loop

ok, frame = cam.read() grabs the next frame; ok says whether it worked. if not ok: break exits cleanly when the stream ends.
gray = cv2.GaussianBlur(cv2.cvtColor(frame, COLOR_BGR2GRAY), (21,21), 0) converts to grayscale and blurs it - blurring kills tiny pixel flicker so only real motion survives.
frames.append(gray) then if len(frames) > gap + 1: frames.pop(0) keeps the buffer at a fixed length - the oldest frame falls off the front as each new one arrives.

Detecting motion

if len(frames) >= gap: waits until the buffer has filled before comparing - the first few frames have nothing to diff against.
cv2.absdiff(frames[0], frames[-1]) is the core idea: the absolute pixel difference between the oldest and newest frame. Where nothing moved → near black; where something moved → bright.
cv2.threshold(diff, 30, 255, THRESH_BINARY) turns that into pure black/white: any change above 30 becomes white (motion), the rest black.
cv2.dilate(thresh, None, iterations=2) fattens the white blobs so nearby motion pixels merge into one solid region.
cv2.findContours(...) finds the outlines of those white regions.

Per-region drawing

for c in contours: walks each moving region.
if cv2.contourArea(c) < 500: continue ignores small blobs - noise, a leaf, a shadow - and only reacts to something sizeable.
cv2.boundingRect(c) gets a box around the motion; cv2.rectangle(...) draws it in red ((0, 0, 255) in OpenCV's BGR colour order).
motion = True records that this frame had real movement, which triggers the alert below.

Alerting and snapshotting

cv2.putText(frame, "Burglar Detected!", ...) stamps the alert in red across the top of the frame.
if time.time() - last_saved > 3: is the throttle - only save a snapshot if more than 3 seconds have passed since the last one, so we don't write hundreds of near-identical frames.
cv2.imwrite(f"captures/burglar_{int(time.time())}.jpg", frame) writes a timestamped JPG, then last_saved = time.time() resets the throttle.

Showing and cleanup

cv2.imshow("Burglar Detection", frame) displays the annotated frame.
if cv2.waitKey(1) & 0xFF == ord('q'): break waits 1 ms for a key and quits on q - the standard way to make a real-time OpenCV window closable.
cam.release() and cv2.destroyAllWindows() free the camera and close the window when the loop ends.

Key insight

The lesson here is that not every vision problem needs a neural network. A genuinely useful burglar detector is just the difference between two frames: absdiff to find what moved, threshold + dilate + findContours to clean it into regions, and a minimum-area filter to ignore noise. It runs on anything, needs no training data, and costs almost nothing.

The flip side is what it can't do: with no model it knows that something moved, never what. It can't tell a person from a pet or a swaying branch from an intruder. The natural upgrade is to feed the motion regions into the YOLOv8 detector from the companion Computer Vision with OpenCV & YOLOv8 project - cheap motion-gating first, then a model only when something actually moves.

Tech stack

Python 3.12
OpenCV (cv2) - camera I/O, grayscale, blur, absdiff, threshold, dilate, findContours, drawing and display
No model / no training - pure frame differencing
os and time - snapshot folder, timestamps and the save throttle

Reference

OpenCV - absdiff - Core array operations
OpenCV - Contours - Contours: Getting Started
OpenCV - Image Thresholding - Thresholding tutorial

Motion-Based Burglar Detection