DetectTextInImage Operator
Description
The DetectTextInImage operator extracts text from images using Tesseract OCR (Optical Character Recognition). It supports multiple languages including English, Hindi, Tamil, and Telugu. The operator processes image files and returns the detected text as a string.
Model Information
OCR Engine: Tesseract OCR
Source: Google, via pytesseract Python wrapper
Supported Languages: English (eng), Hindi (hin), Tamil (tam), Telugu (tel)
Usage: The operator uses Tesseract’s OCR capabilities to extract text from images, enabling downstream tasks such as text analysis, content moderation, and document processing.
System Dependencies
Tesseract OCR
On Windows:
Download from UB-Mannheim’s Tesseract
Install and add to PATH
Install language packs for Hindi, Tamil, and Telugu
On Linux:
sudo apt install tesseract-ocr tesseract-ocr-hin tesseract-ocr-tam tesseract-ocr-telOn macOS:
brew install tesseract tesseract-lang
Operator Dependencies
pytesseract >= 0.3.10
Pillow >= 11.1.0
How to Run the Tests
Ensure that you are in the root directory of the
feludaproject.Install dependencies (in your virtual environment):
uv pip install "./operators/detect_text_in_image" uv pip install "feluda[dev]"
Ensure Tesseract OCR is installed and available in your PATH.
Run the tests:
pytest operators/detect_text_in_image/test.py
Usage
from feluda.factory import ImageFactory
from feluda.operators import DetectTextInImage
# Initialize the operator
operator = DetectTextInImage()
# Load an image
image = ImageFactory.make_from_url_to_path("https://example.com/image.png")
# Extract text
text = operator.run(image, remove_after_processing=False)
print(text)
# Check operator state
state = operator.state()
print(f"PSM: {state['psm']}, OEM: {state['oem']}")
# Cleanup resources
operator.cleanup()
Configuration
The operator accepts two configuration parameters:
psm (int): Page segmentation mode (default: 6)
oem (int): OCR Engine mode (default: 1)
These can be set during initialization:
operator = DetectTextInImage(psm=8, oem=3)
- class operators.detect_text_in_image.detect_text_in_image.DetectTextInImage(psm: int = 6, oem: int = 1, tesseract_cmd: str | None = None)[source]
Bases:
OperatorOperator to detect text in images using Tesseract OCR.
- __init__(psm: int = 6, oem: int = 1, tesseract_cmd: str | None = None) None[source]
Initialize the DetectTextInImage class.
- validate_system() None[source]
Validate that Tesseract OCR is installed and accessible.
- Raises:
RuntimeError – If Tesseract is not installed or not in PATH.
- validate_languages() None[source]
Validate that required language packs are installed.
Checks for English, Hindi, Tamil, and Telugu language support.
- run(file: ImageFactory, remove_after_processing: bool = False) str[source]
Run the text detection operator.
- Parameters:
file (ImageFactory) – ImageFactory object
remove_after_processing (bool) – Whether to remove the file after processing
- Returns:
Detected text from the image
- Return type: