ImageVecRep-Resnet Operator

Description

The ImageVecRep-Resnet operator extracts vector representations from images using the ResNet18 model. It generates a 512-dimensional feature vector from input images, enabling downstream tasks such as image similarity, clustering, and classification. The operator uses the pre-trained ResNet18 model with ImageNet weights and extracts features from the average pooling layer.

Model Information

  • Model: ResNet18

  • Source: PyTorch Vision Models

  • Vector Size: 512

  • Usage: The model is used to generate embeddings for images, enabling downstream tasks such as image similarity, clustering, and classification.

System Dependencies

  • Python >= 3.10

How to Run the Tests

  1. Ensure that you are in the root directory of the feluda project.

  2. Install dependencies (in your virtual environment):

    uv pip install "./operators/image_vec_rep"
    uv pip install "feluda[dev]"
    
  3. Run the tests:

    pytest operators/image_vec_rep/test.py
    

Usage

from feluda.factory import ImageFactory
from feluda.operators import ImageVecRep

# Initialize the operator
operator = ImageVecRep()

# Load an image
image_obj = ImageFactory.make_from_url("https://example.com/image.jpg")

# Extract features
features = operator.run(image_obj)
print(f"Feature vector shape: {features.shape}")  # (512,)
print(f"Feature vector dtype: {features.dtype}")  # float16

# Cleanup
operator.cleanup()
class operators.image_vec_rep.image_vec_rep.ImageVecRep[source]

Bases: Operator

Operator to extract image vector representations using ResNet18.

__init__() None[source]

Initializes the ImageVecRep operator with a pre-trained ResNet18 model.

extract_feature(img: PIL.Image.Image) numpy.ndarray[source]

Extracts a 512-dimensional feature vector from a PIL Image using ResNet18.

Parameters:

img (Image.Image) – Input image (must be a PIL Image).

Returns:

512-dimensional feature vector (float16).

Return type:

np.ndarray

run(image_obj: ImageFactory) numpy.ndarray[source]

Runs the operator on an image object from ImageFactory.

Parameters:

image_obj (dict) – Dictionary with key ‘image’ containing a PIL Image.

Returns:

512-dimensional feature vector.

Return type:

np.ndarray

state() dict[source]

Returns the current state of the operator.

Returns:

State of the operator

Return type:

dict

cleanup() None[source]

Cleans up resources used by the operator.