Compare Perceptual Similarity between two Videos with Feluda

This notebook demonstrates how to use the VideoHash operator to generate perceptual hashes for two videos and compare their similarity. It processes two sample videos and shows their hash values for similarity analysis.

Install dependencies conditionally based on whether the notebook is running in Colab or locally.

%%time
import sys

IN_COLAB = "google.colab" in sys.modules
print("Running Notebook in Google Colab" if IN_COLAB else "Running Notebook locally")

if IN_COLAB:
    # Since Google Colab has preinstalled libraries like tensorflow and numba, we create a folder called feluda_custom_venv and isolate the environment there.
    # This is done to avoid any conflicts with the preinstalled libraries.
    %pip install uv
    !mkdir -p /content/feluda_custom_venv
    !uv pip install --target=/content/feluda_custom_venv --prerelease allow feluda "feluda-video-hash-tmk" > /dev/null 2>&1

    sys.path.insert(0, "/content/feluda_custom_venv")
else:
    !uv pip install feluda "feluda-video-hash-tmk" > /dev/null 2>&1

Running Notebook locally
Using Python 3.10.12 environment at: /home/aatman/Aatman/Tattle/feluda/.venv
Audited 6 packages in 11ms
CPU times: user 6.38 ms, sys: 4.13 ms, total: 10.5 ms
Wall time: 138 ms

We’ll use one operator for this example.

from feluda.factory import VideoFactory
from feluda.operators import VideoHash

hasher = VideoHash()

Downloading TMK binary to /var/folders/4p/bw6h5x8x1nb_17vsgfc12dz00000gn/T/tmk-hash-video

VIDEO_URLS = [
    "https://github.com/tattle-made/feluda_datasets/blob/main/feluda-sample-media/en-speech.mp4",
    "https://github.com/tattle-made/feluda_datasets/blob/main/feluda-sample-media/hi-speech.mp4",
]

In the below codeblock, we are computing the perceptual hash for two videos using the feluda-video-hash-tmk operator. The operator uses the compiled TMK Binary from Facebook Research.

hashes = []
for i, video_url in enumerate(VIDEO_URLS, 1):
    # Convert GitHub blob URL to CDN raw URL for direct download
    raw_url = video_url.replace("/blob/", "/raw/")

    # Download video using VideoFactory
    video_obj = VideoFactory.make_from_url(raw_url)
    video_path = video_obj["path"]

    # Generate TMK hash for the video
    # Returns Base64-encoded pure average feature vector
    hash_value = hasher.run(video_path)
    hashes.append(hash_value)

    # Display hash information
    print(f"Video {i} URL: {video_url}")
    print(f"TMK Hash: {hash_value[:50]}...")  # Show first 50 characters
    print(f"Hash Length: {len(hash_value)} characters")
    print()

# Compare the two hashes for similarity
hash1, hash2 = hashes

import base64

# Decode Base64 hashes to compare raw feature vectors
try:
    decoded1 = base64.b64decode(hash1)
    decoded2 = base64.b64decode(hash2)

    # Calculate similarity metrics
    hash_length = len(decoded1)
    print(f"Feature vector length: {hash_length} bytes")

    # Simple byte-by-byte comparison
    identical_bytes = sum(1 for a, b in zip(decoded1, decoded2, strict=False) if a == b)
    similarity_percentage = (identical_bytes / hash_length) * 100

    print(f"Identical bytes: {identical_bytes}/{hash_length}")
    print(f"Similarity: {similarity_percentage:.2f}%")

    # Interpret similarity
    if similarity_percentage > 80:
        print("Result: HIGH SIMILARITY - Videos are likely very similar")
    elif similarity_percentage > 50:
        print("Result: MODERATE SIMILARITY - Videos share some characteristics")
    else:
        print("Result: LOW SIMILARITY - Videos are likely different")

except Exception as e:
    print(f"Error comparing hashes: {e}")

# Clean up resources when you're done

hasher.cleanup()