Utils¶

Shared utilities for logging, validation, error handling, and common operations.

🎯 Overview¶

The Utils Module provides shared utilities used throughout all Semantica modules. It includes logging, error handling, validation, progress tracking, and common helper functions.

What is the Utils Module?¶

The Utils module provides:

Logging: Structured logging with performance tracking
Error Handling: Custom exception hierarchy and error formatting
Validation: Data validation for entities, relationships, and configuration
Progress Tracking: Track long-running operations
Helpers: Common functions for text cleaning, hashing, file I/O
Type Definitions: Shared TypedDicts and Enums for type safety

Why Use the Utils Module?¶

Consistency: Shared utilities ensure consistent behavior across modules
Error Handling: Standardized error handling and reporting
Logging: Unified logging across all modules
Validation: Reusable validation functions
Type Safety: Shared type definitions for better IDE support

How It Works¶

The Utils module is used internally by all Semantica modules. You typically don't use it directly, but it provides:

Logging Functions: `get_logger()`, `setup_logging()`
Validation Functions: `validate_entity()`, `validate_relationship()`
Error Classes: Custom exceptions for different error types
Progress Tracking: `ProgressTracker` for long operations
Helper Functions: Text cleaning, hashing, file operations

Logging

Structured logging with performance tracking and error reporting
Error Handling

Custom exception hierarchy and error formatting
Validation

Data validation for entities, relationships, and configuration
Progress Tracking

Track long-running operations with console or file output
Helpers

Common functions for text cleaning, hashing, and file I/O
Type Definitions

Shared TypedDicts and Enums for type safety

When to Use

Development: Use setup_logging to configure output.
Data Cleaning: Use clean_text and normalize_entities.
Validation: Use validate_data before processing external input.
Debugging: Use log_performance to find bottlenecks.

⚙️ Key Components¶

Logging¶

Structured Output: JSON or formatted text logs.
Performance Metrics: Decorators for timing functions.
Quality Logging: Specialized loggers for data quality issues.

Validation¶

Schema Validation: Check dictionary structure against requirements.
Type Checking: Runtime type validation.
Constraint Checking: Numeric ranges, string lengths, regex patterns.

Progress Tracking¶

Multi-Environment: Supports Console (tqdm), Jupyter, and File logging.
Module Awareness: Tracks progress per module.

Main Classes¶

Logger¶

Centralized logging configuration.

Functions:

Function	Description
`setup_logging(level)`	Configure global logging
`get_logger(name)`	Get named logger instance
`log_performance(func)`	Decorator for timing

Example:

from semantica.utils import setup_logging, get_logger, log_performance

setup_logging(level="INFO")
logger = get_logger(__name__)

@log_performance
def process_data(data):
    logger.info(f"Processing {len(data)} items")

Validators¶

Data validation functions.

Functions:

Function	Description
`validate_entity(data)`	Check entity structure
`validate_config(cfg)`	Check configuration

Example:

from semantica.utils import validate_entity, ValidationError

try:
    validate_entity({"id": "1", "type": "PERSON"})
except ValidationError as e:
    print(f"Invalid entity: {e}")

ProgressTracker¶

Tracks execution progress.

Classes:

Class	Description
`ProgressTracker`	Main tracker interface
`ConsoleProgressDisplay`	CLI output

Example:

from semantica.utils import track_progress

for item in track_progress(items, desc="Processing"):
    process(item)

Convenience Functions¶

from semantica.utils import clean_text, hash_data, safe_filename

# Text cleaning
clean = clean_text("  Hello   World  ")  # "Hello World"

# Hashing
id = hash_data({"key": "value"})

# File safety
fname = safe_filename("My File?.txt")  # "My_File_.txt"

Configuration¶

Environment Variables¶

export SEMANTICA_LOG_LEVEL=DEBUG
export SEMANTICA_LOG_FORMAT=json
export SEMANTICA_PROGRESS_BAR=true

Best Practices¶

Use get_logger: Always use get_logger(__name__) instead of print for production code.
Validate Early: Validate input data at the boundary (Ingest/Parse) using validate_data.
Handle Exceptions: Catch SemanticaError for framework-specific errors.
Clean Text: Use clean_text before embedding or extraction to improve quality.

Cookbook¶

The Utils module is used throughout all Semantica modules. See any cookbook tutorial for examples of logging, validation, and error handling in practice.

Welcome to Semantica: See utils in action across all modules
Topics: Framework overview, all modules, utilities
Difficulty: Beginner
Use Cases: Understanding utility functions used throughout Semantica

Utils¶

🎯 Overview¶

What is the Utils Module?¶

Why Use the Utils Module?¶

How It Works¶

⚙️ Key Components¶

Logging¶

Validation¶

Progress Tracking¶

Main Classes¶

Logger¶

Validators¶

ProgressTracker¶

Convenience Functions¶

Configuration¶

Environment Variables¶

Best Practices¶

Cookbook¶

See Also¶