Core¶
Framework infrastructure, lifecycle management, and plugin system.
🎯 Overview¶
The Core Module provides framework infrastructure, lifecycle management, configuration, and orchestration capabilities. It's the foundation that enables coordination between all Semantica modules.
What is the Core Module?¶
The Core module provides:
- Orchestration: The
`Semantica`class coordinates multiple modules for complex workflows - Lifecycle Management: Handles initialization, startup, shutdown, and state transitions
- Configuration: Unified configuration management via YAML and environment variables
- Plugin System: Extensible plugin registry for custom modules and capabilities
- Method Registry: Registry for custom orchestration methods
When to Use the Core Module¶
Primary Approach: Individual Modules
For most use cases, use individual modules directly (e.g., semantica.ingest, semantica.kg). This gives you full control and transparency.
When to Use Orchestration
Use the Semantica orchestration class when you need: - Complex Workflows: Multi-step pipelines that span multiple modules - Lifecycle Management: Application-level initialization and shutdown - Centralized Configuration: Global settings that affect multiple modules - Plugin Integration: Custom plugins that need framework coordination
-
Semantica
Orchestration class for coordinating complex workflows across modules
-
:material-lifecycle:{ .lg .middle } Lifecycle Management
Manage initialization, startup, shutdown, and state transitions
-
Configuration
Unified configuration management via YAML and Environment variables
-
Plugin System
Extensible plugin registry for adding custom modules and capabilities
-
Method Registry
Registry for custom orchestration methods and extensibility
⚙️ Algorithms Used¶
Lifecycle Management¶
What is Lifecycle Management?
Lifecycle management handles the initialization, startup, running, and shutdown phases of the Semantica framework. It ensures that all components are properly initialized, resources are managed correctly, and cleanup happens gracefully.
How it works: - State Machine: UNINITIALIZED -> INITIALIZING -> READY -> RUNNING -> STOPPING -> STOPPED - Priority-based Hooks: Startup and shutdown hooks executed in priority order (lower = earlier) - Graceful Shutdown: Ensuring all resources (DB connections, thread pools) are closed properly
Why it matters: - Prevents resource leaks (database connections, file handles) - Ensures proper initialization order (dependencies are ready before use) - Enables clean application shutdown - Supports health monitoring and status tracking
Configuration¶
What is Configuration Management?
Configuration management provides a unified way to configure all Semantica modules. It supports multiple configuration sources with a clear priority order, ensuring consistent settings across your application.
How it works: - Layered Loading: Defaults -> Config File -> Environment Variables (Priority order) - Schema Validation: Validating config structure against defined schemas - Nested Access: Dot notation for accessing nested configuration values
Why it matters: - Centralized configuration for all modules - Environment-specific settings (dev, staging, production) - Secure credential management (via environment variables) - Validation prevents configuration errors
Plugin System¶
What is the Plugin System?
The plugin system allows you to extend Semantica with custom modules and capabilities. Plugins can add new functionality, modify existing behavior, or integrate with external systems.
How it works: - Discovery: Auto-discovery of plugins via directory scanning - Registration: Dynamic registration of classes and functions - Dependency Resolution: Automatic loading of plugin dependencies
Why it matters: - Extend Semantica with custom functionality - Integrate with external systems and APIs - Modify or enhance existing modules - Share custom functionality across projects
Main Classes¶
Semantica¶
The Semantica class is an orchestration class that coordinates multiple modules for complex workflows. It's designed for applications that need lifecycle management, centralized configuration, and multi-step pipeline coordination.
Not a Convenience Wrapper
The Semantica class is not a convenience wrapper. It's an orchestration tool for complex workflows. For most use cases, use individual modules directly for better control and transparency.
What it does: - Coordinates multiple modules (ingest, parse, extract, kg, etc.) - Manages application lifecycle (initialization, shutdown) - Provides centralized configuration - Enables plugin integration - Handles complex multi-step workflows
When to use it: - Building applications with multiple components - Need lifecycle management (startup/shutdown hooks) - Complex workflows spanning multiple modules - Want centralized configuration - Integrating custom plugins
Methods:
| Method | Description |
|---|---|
`__init__(config=None, **kwargs)` | Initialize framework with optional configuration |
`initialize()` | Initialize all framework components and modules |
`build_knowledge_base(sources, **kwargs)` | Orchestrate building a knowledge base from data sources |
`run_pipeline(pipeline, data)` | Execute a processing pipeline |
`get_status()` | Get system health and status |
`shutdown(graceful=True)` | Shutdown the framework gracefully |
Example - Orchestration for Complex Workflow:
from semantica.core import Semantica
# Initialize framework for orchestration
framework = Semantica()
framework.initialize()
# Build knowledge base (orchestrates multiple modules)
result = framework.build_knowledge_base(
sources=["doc1.pdf", "doc2.docx"],
embeddings=True,
graph=True
)
# Check status
status = framework.get_status()
print(f"System state: {status['state']}")
# Shutdown gracefully
framework.shutdown()
Alternative - Using Individual Modules (Recommended):
from semantica.ingest import FileIngestor
from semantica.parse import DocumentParser
from semantica.semantic_extract import NERExtractor, RelationExtractor
from semantica.kg import GraphBuilder
from semantica.embeddings import TextEmbedder
# Use modules directly for full control
ingestor = FileIngestor()
parser = DocumentParser()
ner = NERExtractor()
rel_extractor = RelationExtractor()
builder = GraphBuilder()
embedder = TextEmbedder()
# Build your pipeline step by step
docs = ingestor.ingest_file("doc1.pdf")
parsed = parser.parse_document("doc1.pdf")
entities = ner.extract_entities(parsed.get("full_text", ""))
relationships = rel_extractor.extract_relations(parsed.get("full_text", ""), entities=entities)
kg = builder.build_graph(entities=entities, relationships=relationships)
embeddings = embedder.embed_batch([e.text for e in entities])
ConfigManager¶
Manages global configuration loading, validation, and merging.
Methods:
| Method | Description |
|---|---|
`load_from_file(file_path, validate=True)` | Load config from YAML or JSON file |
`load_from_dict(config_dict, validate=True)` | Load config from dictionary |
`merge_configs(*configs, validate=True)` | Merge multiple configurations |
get_config() | Get current configuration |
set_config(config, validate=True) | Set current configuration |
reload(file_path=None) | Reload configuration from file |
Example:
from semantica.core import ConfigManager
manager = ConfigManager()
config = manager.load_from_file("config.yaml")
# Merge configurations
config1 = manager.load_from_file("base_config.yaml")
config2 = manager.load_from_file("override_config.yaml")
merged = manager.merge_configs(config1, config2)
Config¶
Configuration data class with validation and nested access.
Methods:
| Method | Description |
|---|---|
get(key_path, default=None) | Get nested configuration value by key path |
set(key_path, value) | Set nested configuration value |
update(updates, merge=True) | Update configuration with new values |
validate() | Validate configuration settings |
to_dict() | Convert configuration to dictionary |
Example:
from semantica.core import Config, ConfigManager
manager = ConfigManager()
config = manager.load_from_dict({"processing": {"batch_size": 32}})
# Access nested values
batch_size = config.get("processing.batch_size", default=16)
# Update values
config.set("processing.batch_size", 64)
config.update({"quality": {"min_confidence": 0.9}})
# Validate
config.validate()
LifecycleManager¶
System lifecycle management with hooks and health monitoring.
Methods:
| Method | Description |
|---|---|
startup() | Execute startup sequence with registered hooks |
shutdown(graceful=True) | Execute shutdown sequence |
register_startup_hook(hook_fn, priority=50) | Register a startup hook |
register_shutdown_hook(hook_fn, priority=50) | Register a shutdown hook |
register_component(name, component) | Register component for health monitoring |
health_check() | Perform comprehensive system health check |
get_health_summary() | Get summary of system health |
get_state() | Get current system state |
is_ready() | Check if system is ready |
is_running() | Check if system is running |
Example:
from semantica.core import LifecycleManager
manager = LifecycleManager()
# Register hooks
def init_db():
print("Initializing database...")
manager.register_startup_hook(init_db, priority=10)
manager.startup()
# Register component for health monitoring
class DatabaseComponent:
def health_check(self):
return {"healthy": True, "message": "Connected"}
db = DatabaseComponent()
manager.register_component("database", db)
# Check health
health = manager.health_check()
summary = manager.get_health_summary()
manager.shutdown(graceful=True)
PluginRegistry¶
Plugin registry and management system for dynamic plugin discovery and loading.
Methods:
| Method | Description |
|---|---|
__init__(plugin_paths=None) | Initialize with optional plugin paths for auto-discovery |
register_plugin(plugin_name, plugin_class, version="1.0.0", **metadata) | Manually register a plugin |
load_plugin(plugin_name, **config) | Load and initialize a plugin |
unload_plugin(plugin_name) | Unload a plugin |
list_plugins() | List all available plugins |
get_plugin_info(plugin_name) | Get information about a plugin |
is_plugin_loaded(plugin_name) | Check if a plugin is loaded |
get_loaded_plugin(plugin_name) | Get loaded plugin instance |
Example:
from semantica.core import PluginRegistry
# Auto-discover plugins
registry = PluginRegistry(plugin_paths=["./plugins"])
# Load plugin with configuration
plugin = registry.load_plugin("my_plugin", api_key="xxx")
# List all plugins
plugins = registry.list_plugins()
for plugin_info in plugins:
print(f"{plugin_info['name']}: {plugin_info['version']}")
# Get plugin info
info = registry.get_plugin_info("my_plugin")
MethodRegistry¶
Registry for custom orchestration methods.
Methods:
| Method | Description |
|---|---|
register(task, name, method_func) | Register a custom orchestration method |
get(task, name) | Get method by task and name |
list_all(task=None) | List all registered methods |
unregister(task, name) | Unregister a method |
clear(task=None) | Clear all registered methods |
Example:
from semantica.core import method_registry
def custom_kb_builder(sources, **kwargs):
# Custom logic
return {"knowledge_graph": {}}
method_registry.register("knowledge_base", "custom", custom_kb_builder)
# Use custom method
method = method_registry.get("knowledge_base", "custom")
result = method(sources=["doc.pdf"])
Orchestration Methods¶
Convenience functions for common orchestration tasks.
build_knowledge_base()¶
Build knowledge base from data sources.
from semantica.core.methods import build_knowledge_base
result = build_knowledge_base(
sources=["doc1.pdf", "doc2.docx"],
method="default",
embeddings=True,
graph=True
)
run_pipeline()¶
Execute a processing pipeline.
from semantica.core.methods import run_pipeline
result = run_pipeline(
pipeline={"steps": ["parse", "extract"]},
data="sample text",
method="default"
)
initialize_framework()¶
Initialize Semantica framework.
from semantica.core.methods import initialize_framework
framework = initialize_framework(
config={"llm_provider": {"name": "openai"}},
method="default"
)
get_status()¶
Get system status.
from semantica.core.methods import get_status
status = get_status(framework=my_framework, method="detailed")
get_orchestration_method()¶
Get orchestration method by task and name.
from semantica.core.methods import get_orchestration_method
method = get_orchestration_method("knowledge_base", "custom")
list_available_methods()¶
List all available orchestration methods.
from semantica.core.methods import list_available_methods
all_methods = list_available_methods()
kb_methods = list_available_methods("knowledge_base")
Configuration¶
Environment Variables¶
Configuration can be loaded from environment variables with SEMANTICA_ prefix:
export SEMANTICA_PROCESSING_BATCH_SIZE=64
export SEMANTICA_LLM_PROVIDER_MODEL=gpt-4
export SEMANTICA_QUALITY_MIN_CONFIDENCE=0.8
YAML Configuration¶
llm_provider:
name: openai
model: gpt-4
api_key: ${OPENAI_API_KEY}
embedding_model:
name: openai
model: text-embedding-ada-002
processing:
batch_size: 32
max_workers: 4
quality:
min_confidence: 0.7
logging:
level: INFO
plugins:
my_plugin:
enabled: true
config_key: config_value
JSON Configuration¶
Integration Examples¶
Basic Usage¶
from semantica.core import Semantica, ConfigManager
# 1. Load configuration
config_manager = ConfigManager()
config = config_manager.load_from_file("config.yaml")
# 2. Initialize framework
framework = Semantica(config=config)
framework.initialize()
try:
# 3. Build knowledge base
result = framework.build_knowledge_base(
sources=["doc1.pdf", "doc2.docx"],
embeddings=True,
graph=True
)
# 4. Check status
status = framework.get_status()
print(f"System state: {status['state']}")
finally:
# 5. Shutdown gracefully
framework.shutdown(graceful=True)
Custom Plugin¶
from semantica.core import PluginRegistry
class MyPlugin:
def initialize(self):
print("Plugin initialized")
def execute(self, data):
return {"processed": True}
registry = PluginRegistry()
registry.register_plugin(
plugin_name="my_plugin",
plugin_class=MyPlugin,
version="1.0.0"
)
plugin = registry.load_plugin("my_plugin")
result = plugin.execute("sample data")
Lifecycle Hooks¶
from semantica.core import LifecycleManager
manager = LifecycleManager()
def init_database():
print("Initializing database...")
def cleanup_database():
print("Cleaning up database...")
manager.register_startup_hook(init_database, priority=10)
manager.register_shutdown_hook(cleanup_database, priority=10)
manager.startup()
# ... do work ...
manager.shutdown(graceful=True)
Custom Orchestration Method¶
from semantica.core import method_registry, Semantica
def fast_kb_builder(sources, **kwargs):
framework = Semantica()
framework.initialize()
try:
return framework.build_knowledge_base(
sources=sources,
embeddings=False, # Skip for speed
graph=True,
**kwargs
)
finally:
framework.shutdown()
method_registry.register("knowledge_base", "fast", fast_kb_builder)
# Use custom method
from semantica.core.methods import build_knowledge_base
result = build_knowledge_base(sources=["doc.pdf"], method="fast")
Best Practices¶
-
Always Initialize: Always call
initialize()after creating aSemanticainstance before using it. -
Graceful Shutdown: Always call
shutdown(graceful=True)in afinallyblock to ensure proper cleanup. -
Configuration Management: Use
ConfigManagerfor loading and managing configurations. Prefer YAML files for complex configurations. -
Error Handling: Wrap framework operations in try-except blocks to handle
ConfigurationErrorandProcessingErrorappropriately. -
Health Monitoring: Register components with
LifecycleManagerfor health monitoring and usehealth_check()regularly. -
Plugin Development: Follow the plugin interface (must have
initialize()andexecute()methods) when creating custom plugins. -
Method Registration: Use
MethodRegistryfor extensibility. Register custom methods for knowledge base building, pipeline execution, etc. -
Hook Priorities: Use appropriate priorities for lifecycle hooks. Lower numbers execute first.
-
Configuration Validation: Always validate configurations using
config.validate()before using them. -
Resource Cleanup: Ensure all resources are properly cleaned up in shutdown hooks.
See Also¶
- Pipeline Module - Pipeline execution and orchestration
- Utils Module - Shared utilities used by Core
- Getting Started Guide - Learn the basics
Cookbook¶
Interactive tutorials to learn orchestration and lifecycle management:
- Welcome to Semantica: Comprehensive introduction to all Semantica modules and architecture
- Topics: Framework overview, all modules, architecture, configuration, lifecycle
- Difficulty: Beginner
-
Use Cases: Understanding the framework structure, first-time users
-
Pipeline Orchestration: Build robust, automated data processing pipelines
- Topics: Workflows, automation, error handling, pipeline orchestration
- Difficulty: Advanced
- Use Cases: Complex multi-step workflows, production pipelines