Utils¶
Shared utilities for logging, validation, error handling, and common operations.
🎯 Overview¶
The Utils Module provides shared utilities used throughout all Semantica modules. It includes logging, error handling, validation, progress tracking, and common helper functions.
What is the Utils Module?¶
The Utils module provides:
- Logging: Structured logging with performance tracking
- Error Handling: Custom exception hierarchy and error formatting
- Validation: Data validation for entities, relationships, and configuration
- Progress Tracking: Track long-running operations
- Helpers: Common functions for text cleaning, hashing, file I/O
- Type Definitions: Shared TypedDicts and Enums for type safety
Why Use the Utils Module?¶
- Consistency: Shared utilities ensure consistent behavior across modules
- Error Handling: Standardized error handling and reporting
- Logging: Unified logging across all modules
- Validation: Reusable validation functions
- Type Safety: Shared type definitions for better IDE support
How It Works¶
The Utils module is used internally by all Semantica modules. You typically don't use it directly, but it provides:
- Logging Functions:
`get_logger()`,`setup_logging()` - Validation Functions:
`validate_entity()`,`validate_relationship()` - Error Classes: Custom exceptions for different error types
- Progress Tracking:
`ProgressTracker`for long operations - Helper Functions: Text cleaning, hashing, file operations
-
Logging
Structured logging with performance tracking and error reporting
-
Error Handling
Custom exception hierarchy and error formatting
-
Validation
Data validation for entities, relationships, and configuration
-
Progress Tracking
Track long-running operations with console or file output
-
Helpers
Common functions for text cleaning, hashing, and file I/O
-
Type Definitions
Shared TypedDicts and Enums for type safety
When to Use
- Development: Use
setup_loggingto configure output. - Data Cleaning: Use
clean_textandnormalize_entities. - Validation: Use
validate_databefore processing external input. - Debugging: Use
log_performanceto find bottlenecks.
⚙️ Key Components¶
Logging¶
- Structured Output: JSON or formatted text logs.
- Performance Metrics: Decorators for timing functions.
- Quality Logging: Specialized loggers for data quality issues.
Validation¶
- Schema Validation: Check dictionary structure against requirements.
- Type Checking: Runtime type validation.
- Constraint Checking: Numeric ranges, string lengths, regex patterns.
Progress Tracking¶
- Multi-Environment: Supports Console (tqdm), Jupyter, and File logging.
- Module Awareness: Tracks progress per module.
Main Classes¶
Logger¶
Centralized logging configuration.
Functions:
| Function | Description |
|---|---|
setup_logging(level) | Configure global logging |
get_logger(name) | Get named logger instance |
log_performance(func) | Decorator for timing |
Example:
from semantica.utils import setup_logging, get_logger, log_performance
setup_logging(level="INFO")
logger = get_logger(__name__)
@log_performance
def process_data(data):
logger.info(f"Processing {len(data)} items")
Validators¶
Data validation functions.
Functions:
| Function | Description |
|---|---|
validate_entity(data) | Check entity structure |
validate_config(cfg) | Check configuration |
Example:
from semantica.utils import validate_entity, ValidationError
try:
validate_entity({"id": "1", "type": "PERSON"})
except ValidationError as e:
print(f"Invalid entity: {e}")
ProgressTracker¶
Tracks execution progress.
Classes:
| Class | Description |
|---|---|
ProgressTracker | Main tracker interface |
ConsoleProgressDisplay | CLI output |
Example:
from semantica.utils import track_progress
for item in track_progress(items, desc="Processing"):
process(item)
Convenience Functions¶
from semantica.utils import clean_text, hash_data, safe_filename
# Text cleaning
clean = clean_text(" Hello World ") # "Hello World"
# Hashing
id = hash_data({"key": "value"})
# File safety
fname = safe_filename("My File?.txt") # "My_File_.txt"
Configuration¶
Environment Variables¶
export SEMANTICA_LOG_LEVEL=DEBUG
export SEMANTICA_LOG_FORMAT=json
export SEMANTICA_PROGRESS_BAR=true
Best Practices¶
- Use
get_logger: Always useget_logger(__name__)instead ofprintfor production code. - Validate Early: Validate input data at the boundary (Ingest/Parse) using
validate_data. - Handle Exceptions: Catch
SemanticaErrorfor framework-specific errors. - Clean Text: Use
clean_textbefore embedding or extraction to improve quality.
Cookbook¶
The Utils module is used throughout all Semantica modules. See any cookbook tutorial for examples of logging, validation, and error handling in practice.
- Welcome to Semantica: See utils in action across all modules
- Topics: Framework overview, all modules, utilities
- Difficulty: Beginner
- Use Cases: Understanding utility functions used throughout Semantica
See Also¶
- Core Module - Uses Utils for infrastructure
- Pipeline Module - Uses ProgressTracker