VidVecRep Operator
Description
The VidVecRep operator extracts vector representations from videos using the CLIP-ViT-B-32 model. It works by extracting I-frames (keyframes) from a video file using FFmpeg, then generating a 512-dimensional feature vector for each frame using the CLIP model. The operator yields both the average vector for the video and vectors for each I-frame.
Model Information
Model: CLIP ViT-B/32
Source: OpenAI, via HuggingFace Transformers
Vector Size: 512
Usage: The model is used to generate embeddings for video frames, enabling downstream tasks such as video similarity, clustering, and zero-shot classification.
System Dependencies
Python >= 3.10
FFmpeg
On Windows, you have two methods -
Download from ffmpeg.org and add to PATH.
Use
winget install ffmpegfrom an elevated powershell. (Make sure you have winget installed first)
On Linux/macOS, install via your package manager (e.g.,
sudo apt install ffmpeg).
Operator Dependencies
PyTorch >= 2.6.0
Torchvision >= 0.21.0
Transformers >= 4.51.1
Pillow >= 11.1.0
How to Run the Tests
Ensure that you are in the root directory of the
feludaproject.Install dependencies (in your virtual environment):
uv pip install "./operators/vid_vec_rep" uv pip install "feluda[dev]"
Ensure FFmpeg is installed and available in your PATH.
Run the tests:
pytest operators/vid_vec_rep/test.py
Usage
from feluda.factory import VideoFactory
from feluda.operators import VidVecRep
# Initialize the operator
operator = VidVecRep()
# Load a video
video = VideoFactory.make_from_file_on_disk("example.mp4")
# Extract features
frames = operator.run(video, remove_after_processing=False)
for image in frames:
print(image.keys())
# Cleanup
operator.cleanup()
- class operators.vid_vec_rep.vid_vec_rep.VidVecRep[source]
Bases:
OperatorOperator to extract video vector representations using CLIP-ViT-B-32.
- static validate_system() None[source]
Validate that required system dependencies are available.
Checks if FFmpeg is installed and accessible in the system PATH.
- get_mean_feature() torch.Tensor[source]
Compute the mean feature vector from the feature matrix.
- Returns:
Mean feature vector
- Return type:
torch.Tensor
- analyze(fname: str) None[source]
Analyze the video file and extract features.
- Parameters:
fname (str) – Path to the video file
- static extract_frames(fname: str) list[PIL.Image.Image][source]
Extract I-frames from the video file using ffmpeg.
- extract_features(images: list) torch.Tensor[source]
Extract features from a list of images using pre-trained CLIP-ViT-B-32.
- Parameters:
images (list) – List of PIL Images
- Returns:
Feature matrix of shape (batch, 512)
- Return type:
torch.Tensor
- gendata() Generator[dict, None, None][source]
Yield video vector representations from the VidVecRep prototype.
- Yields:
dict –
- A dictionary containing:
vid_vec (list): Vector representation
is_avg (bool): A flag indicating whether the vector is the average vector or a I-frame vector