VideoHash Operator
Description
The VideoHash operator generates perceptual hashes from video files using the TMK (Temporal Media Key) algorithm. It extracts the pure average feature from videos and returns it as a Base64-encoded string. This operator is useful for video similarity detection, duplicate detection, and content fingerprinting.
Model Information
Algorithm: TMK (Temporal Media Key) + PDQF (Perceptual Diff Quality Function)
Source: Facebook AI Research
Hash Type: Pure average feature vector
Output Format: Base64-encoded string
Usage: The operator processes video files to generate perceptual hashes that can be used for video similarity comparison, duplicate detection, and content fingerprinting.
Dependencies
FFmpeg
On Windows, you have two methods:
Download from ffmpeg.org and add to PATH
Use
winget install ffmpegfrom an elevated PowerShell (Make sure you have winget installed first)
On Linux/macOS, install via your package manager (e.g.,
sudo apt install ffmpeg)
How to Run the Tests
Ensure that you are in the root directory of the
feludaproject.Install dependencies (in your virtual environment):
uv pip install "./operators/video_hash" uv pip install "feluda[dev]"
Ensure FFmpeg is installed and available in your PATH.
Run the tests:
pytest operators/video_hash/test.py
Usage
Using the Class-based Operator (Recommended)
from feluda.factory import VideoFactory
from feluda.operators import VideoHash
# Initialize the operator
operator = VideoHash()
url = "https://github.com/tattle-made/feluda_datasets/raw/main/feluda-sample-media/sample-cat-video.mp4"
# Load a video
video = VideoFactory.make_from_url(url)
video_path = video["path"]
# Generate hash
hash_value = operator.run(video_path)
print(f"Hash: {hash_value}")
- class operators.video_hash.video_hash.VideoHash[source]
Bases:
OperatorOperator to hash video files using the TMK+PDQF binary.
- static extract_pure_average_feature(tmk_data: bytes) list[float][source]
Extract the pure average feature from TMK binary data.