Dimension Reduction Operator

Description

The Dimension Reduction operator reduces the dimensionality of high-dimensional embeddings using t-SNE and UMAP algorithms. It supports reducing embeddings to 2D or 3D for visualization and analysis purposes while preserving the most important structural information.

Model Information

  • t-SNE: t-Distributed Stochastic Neighbor Embedding for non-linear dimensionality reduction

  • UMAP: Uniform Manifold Approximation and Projection for scalable dimension reduction

  • Vector Size: Configurable (typically 2-3 dimensions for visualization)

  • Usage: Reduces high-dimensional embeddings to lower dimensions for visualization, clustering, and analysis

Dependencies

  • scikit-learn >= 1.6.1

  • numpy >= 1.26,<2.2.0

  • umap-learn >= 0.5.0

How to Run the Tests

  1. Ensure that you are in the root directory of the feluda project.

  2. Install dependencies (in your virtual environment):

    uv pip install "./operators/dimension_reduction"
    uv pip install "feluda[dev]"
    
  3. Run the tests:

    pytest operators/dimension_reduction/test.py
    

Usage

from feluda.operators import DimensionReduction

# Initialize with t-SNE
operator = DimensionReduction("tsne", {
    "n_components": 2,
    "perplexity": 2,
    "random_state": 42
})

# Prepare input data
input_data = [
    {"payload": "sample_1", "embedding": [1.0, 2.0, 3.0, 4.0, 5.0]},
    {"payload": "sample_2", "embedding": [2.0, 3.0, 4.0, 5.0, 6.0]},
    {"payload": "sample_3", "embedding": [3.0, 4.0, 5.0, 6.0, 7.0]}
]


# Run dimension reduction
result = operator.run(input_data)

# Access results
for item in result:
    print(f"Payload: {item['payload']}")
    print(f"Reduced embedding: {item['reduced_embedding']}")
class operators.dimension_reduction.dimension_reduction.DimensionReduction(model_type: str, params: dict[str, Any] | None = None)[source]

Bases: Operator

Main interface for dimensionality reduction.

__init__(model_type: str, params: dict[str, Any] | None = None) None[source]

Initialize the dimension reduction operator.

Parameters:
  • model_type – Type of model to use (‘tsne’ or ‘umap’)

  • params – Optional dictionary of parameters for the model

Raises:

ValueError – If the model type is not supported or initialization fails

static get_reduction_model(model_type: str, params: dict[str, Any]) ReductionModel[source]

Create a dimension reduction model based on the model type.

Parameters:
  • model_type – Type of model (‘tsne’ or ‘umap’)

  • params – Dictionary of parameters for the model

Returns:

A dimension reduction model instance

static gen_data(payloads: list, reduced_embeddings: numpy.ndarray) list[dict][source]

Generates the formatted output.

Parameters:
  • payloads (list) – List of paylods.

  • reduced_embeddings (nd.array) – An array of reduced embeddings.

Returns:

A list of dictionaries containing the payload and corresponding embedding.

Return type:

list

run(input_data: list[dict]) list[dict][source]

Reduce the dimensionality of the provided embeddings using the initialized model.

Parameters:

input_data (list) – A list of dictionaries containing payload and embeddings to be reduced.

Example

[
{

“payload”: “123”, “embedding”: [1, 2, 3]

}, {

“payload”: “124”, “embedding”: [1, 0, 1]

}

]

Returns:

The reduced embeddings and the corresponding payload as a list of dictionaries.

Return type:

list

Example

[
{

“payload”:”123”, “reduced_embedding”: [1, 2]

}, {

“payload”: “124”, “reduced_embedding”: [1, 0]

}

]

Raises:
  • ValueError – If the embeddings input is not a non-empty list.

  • KeyError – If the input data is invalid.

cleanup() None[source]

Cleans up resources used by the operator.

state() dict[source]

Returns the current state of the operator.

Returns:

State of the operator

Return type:

dict