Modules & Architecture¶
Semantica is built with a modular architecture, designed to be flexible, extensible, and scalable. This guide provides a comprehensive overview of all modules, their responsibilities, key features, and components.
About This Guide
This guide covers all 20+ core modules in Semantica, organized by their functional layer. Each module can be used independently or combined into powerful pipelines.
Module Overview¶
Semantica's modules are organized into six logical layers:
| Layer | Modules | Description |
|---|---|---|
| Input Layer | Ingest, Parse, Split, Normalize | Data ingestion, parsing, chunking, and cleaning |
| Core Processing | Semantic Extract, Knowledge Graph, Ontology, Reasoning | Entity extraction, graph construction, inference |
| Storage | Embeddings, Vector Store, Graph Store, Triplet Store | Vector, graph, and triplet persistence |
| Quality Assurance | Deduplication, Conflicts | Data quality and consistency |
| Context & Memory | Context, Seed, LLM Providers | Agent memory, foundation data, and LLM integration |
| Output & Orchestration | Export, Visualization, Pipeline | Export, visualization, and workflow management |
Input Layer¶
These modules handle data ingestion, parsing, chunking, and preparation.
Ingest Module¶
Purpose
The entry point for data ingestion. Connects to various data sources including files, web, databases, and MCP servers.
Key Features:
- Multiple file format support (PDF, DOCX, HTML, JSON, CSV, etc.)
- Web scraping with JavaScript rendering
- Database integration (SQL, NoSQL)
- Real-time streaming support
- MCP (Model Context Protocol) server integration
- Batch processing capabilities
- Metadata extraction and preservation
Components:
FileIngestorβ Read files (PDF, DOCX, HTML, JSON, CSV, etc.)WebIngestorβ Scrape and ingest web pagesFeedIngestorβ Process RSS/Atom feedsStreamIngestorβ Real-time data streamingDBIngestorβ Database queries and ingestionEmailIngestorβ Process email messagesRepoIngestorβ Git repository analysisMCPIngestorβ Connect to MCP servers for resource and tool-based ingestion
Try It:
- Data Ingestion Cookbook: Learn to ingest from multiple sources
- Topics: File, web, feed, stream, database ingestion
- Difficulty: Beginner
- Time: 15-20 minutes
- Use Cases: Loading data from various sources
API Reference: Ingest Module
Parse Module¶
Purpose
Extracts raw text and metadata from ingested documents. Supports OCR, table extraction, and structured data parsing.
Key Features:
- Multiple file format support
- OCR for images and scanned documents
- Table extraction from PDFs and spreadsheets
- Metadata preservation
- Automatic format detection
- Structured data parsing (JSON, CSV, XML)
- Code file parsing with syntax awareness
Components:
DocumentParserβ Main parser orchestratorPDFParserβ Extract text, tables, images from PDFsDOCXParserβ Parse Word documentsHTMLParserβ Extract content from HTMLJSONParserβ Parse structured JSON dataExcelParserβ Process spreadsheetsImageParserβ OCR and image analysisCodeParserβ Parse source code files
Try It:
- Document Parsing Cookbook: Learn to parse various document formats
- Topics: PDF, DOCX, HTML, JSON parsing, OCR, table extraction
- Difficulty: Beginner
- Time: 15-20 minutes
- Use Cases: Extracting text from different file formats
API Reference: Parse Module
Split Module¶
Purpose
Comprehensive document chunking and splitting for optimal processing. Provides 15+ splitting methods including KG-aware chunking.
Key Features:
- Multiple standard splitting methods (recursive, token, sentence, paragraph)
- Semantic-based chunking using NLP and embeddings
- Entity-aware chunking for GraphRAG workflows
- Relation-aware chunking for KG preservation
- Graph-based and ontology-aware chunking
- Hierarchical multi-level chunking
- Community detection-based splitting
- Sliding window chunking with overlap
- Table-specific chunking
- Chunk validation and quality assessment
- Provenance tracking for data lineage
Components:
TextSplitterβ Unified text splitter with method parameterSemanticChunkerβ Semantic-based chunking coordinatorStructuralChunkerβ Structure-aware chunking (headings, lists)SlidingWindowChunkerβ Fixed-size sliding window chunkingTableChunkerβ Table-specific chunkingEntityAwareChunkerβ Entity boundary-preserving chunkerRelationAwareChunkerβ Triplet-preserving chunkerGraphBasedChunkerβ Graph structure-based chunkerOntologyAwareChunkerβ Ontology concept-based chunkerHierarchicalChunkerβ Multi-level hierarchical chunkerChunkValidatorβ Chunk quality validationProvenanceTrackerβ Chunk provenance tracking
Supported Methods:
| Category | Methods |
|---|---|
| Standard | recursive, token, sentence, paragraph, character, word, semantic_transformer, llm |
| KG/Ontology | entity_aware, relation_aware, graph_based, ontology_aware, hierarchical, community_detection, centrality_based |
Try It:
- Text Splitting Cookbook: Learn different splitting methods
- Topics: Recursive, token, sentence splitting, entity-aware chunking
- Difficulty: Beginner
- Time: 15-20 minutes
- Use Cases: Document chunking for processing
Normalize Module¶
Purpose
Cleans, standardizes, and prepares text for semantic extraction. Handles encoding, entity names, dates, and numbers.
Key Features:
- Text cleaning and noise removal
- Encoding normalization (Unicode handling)
- Entity name standardization
- Date and number formatting
- Language detection
- Whitespace normalization
- Special character handling
Components:
TextNormalizerβ Main normalization orchestratorTextCleanerβ Remove noise, fix encodingDataCleanerβ Clean structured dataEntityNormalizerβ Normalize entity namesDateNormalizerβ Standardize date formatsNumberNormalizerβ Normalize numeric valuesLanguageDetectorβ Detect document languageEncodingHandlerβ Handle character encoding
Try It:
- Data Normalization Cookbook: Learn text normalization
- Topics: Text cleaning, encoding normalization, entity standardization
- Difficulty: Beginner
- Time: 15-20 minutes
- Use Cases: Preparing text for processing
API Reference: Normalize Module
Core Processing Layer¶
These modules form the intelligence coreβextracting meaning, building relationships, and inferring knowledge.
Semantic Extract Module¶
Purpose
The brain of Semantica. Uses LLMs and NLP to extract entities, relationships, and semantic meaning from text.
Key Features:
- Multiple NER methods (rule-based, ML, LLM)
- Relationship extraction with confidence scoring
- Event extraction
- Custom entity type support
- Multi-language support
- Semantic network extraction
- Coreference resolution
Components:
NERExtractorβ Named Entity RecognitionRelationExtractorβ Extract relationships between entitiesSemanticAnalyzerβ Deep semantic analysisSemanticNetworkExtractorβ Extract semantic networksEventExtractorβ Extract events from textCoreferenceResolverβ Resolve entity coreferences
Try It:
- Entity Extraction Cookbook: Learn entity extraction
- Topics: Named entity recognition, entity types, extraction methods
- Difficulty: Beginner
- Time: 15-20 minutes
-
Use Cases: Understanding entity extraction
-
Relation Extraction Cookbook: Learn relationship extraction
- Topics: Relationship extraction, dependency parsing, semantic role labeling
- Difficulty: Beginner
- Time: 15-20 minutes
- Use Cases: Building rich knowledge graphs
Quick Example:
from semantica.semantic_extract import NERExtractor, RelationExtractor
# Extract entities
extractor = NERExtractor(method="llm", model="gpt-4")
entities = extractor.extract(normalized_docs)
# Extract relationships
relation_extractor = RelationExtractor()
relationships = relation_extractor.extract(normalized_docs, entities=entities)
for rel in relationships[:5]:
print(f"{rel.subject.text} --[{rel.predicate}]--> {rel.object.text}")
API Reference: Semantic Extract Module
Knowledge Graph (KG) Module¶
Purpose
Constructs and manages knowledge graphs from extracted entities and relationships. Supports multiple backends and advanced analytics.
Key Features:
- Graph construction from entities/relationships
- Multiple backend support (NetworkX, Neo4j)
- Temporal graph support
- Graph analytics and metrics
- Entity resolution and deduplication
- Community detection
- Centrality calculations
- Path finding algorithms
- Graph validation
Components:
GraphBuilderβ Construct knowledge graphsGraphAnalyzerβ Analyze graph structure and propertiesEntityResolverβ Resolve entity conflicts and duplicatesConflictDetectorβ Detect conflicting informationCentralityCalculatorβ Calculate node importance metricsCommunityDetectorβ Detect community structureConnectivityAnalyzerβ Analyze graph connectivitySeedManagerβ Manage seed data for KG initializationTemporalQueryβ Query temporal knowledge graphsDeduplicatorβ Remove duplicate entities/relationships
Quick Example:
from semantica.kg import GraphBuilder, GraphAnalyzer
# Build graph
builder = GraphBuilder(backend="networkx", temporal=True)
kg = builder.build(entities, relationships)
# Analyze graph
analyzer = GraphAnalyzer()
metrics = analyzer.analyze(kg)
print(f"Nodes: {metrics['nodes']}, Edges: {metrics['edges']}")
print(f"Density: {metrics['density']:.3f}")
API Reference: Knowledge Graph Module
Ontology Module¶
Purpose
Defines schema and structure for your knowledge domain. Generates and validates ontologies with OWL/RDF export.
Key Features:
- Automatic ontology generation (6-stage pipeline)
- OWL/RDF/Turtle export
- Class and property inference
- Ontology validation
- Symbolic reasoning (HermiT, Pellet)
- Version management
- SHACL constraint support
- Ontology merging and alignment
Components:
OntologyGeneratorβ Generate ontologies from knowledge graphsOntologyValidatorβ Validate ontologiesOWLGeneratorβ Generate OWL format ontologiesPropertyGeneratorβ Generate ontology propertiesClassInferrerβ Infer ontology classesOntologyMergerβ Merge multiple ontologiesReasonerInterfaceβ Interface with symbolic reasoners
Quick Example:
from semantica.ontology import OntologyEngine
# Initialize engine
engine = OntologyEngine(base_uri="https://example.org/ontology/")
# Generate ontology from data
ontology = engine.from_data({
"entities": [...],
"relationships": [...]
})
# Validate ontology
result = engine.validate(ontology)
if result.valid:
print("Ontology is valid!")
# Export to OWL
owl_content = engine.to_owl(ontology, format="turtle")
print(f"Generated {len(owl_content)} lines of OWL")
API Reference: Ontology Module
Reasoning Module¶
Purpose
Infers new facts and validates existing knowledge using logical rules. Supports forward-chaining, high-performance pattern matching, and explanation generation.
Key Features:
- Forward-chaining inference engine
- IF-THEN rule support with variable substitution
- High-performance Rete algorithm for large-scale rule matching
- Natural language explanation generation for inferred facts
- SPARQL query expansion for RDF graphs
- Conflict detection in inferences
- Priority-based rule execution
Components:
Reasonerβ High-level facade for all reasoning tasksReteEngineβ High-performance pattern matching (Rete algorithm)ExplanationGeneratorβ Generate justifications for inferred factsSPARQLReasonerβ Query expansion for triplet stores
Quick Example:
from semantica.reasoning import Reasoner
reasoner = Reasoner()
# Add rules and facts
reasoner.add_rule("IF Person(?x) AND Parent(?x, ?y) THEN ParentOfPerson(?x, ?y)")
reasoner.add_fact("Person(Alice)")
reasoner.add_fact("Parent(Alice, Bob)")
# Perform inference
inferred = reasoner.infer_facts(["Person(Alice)", "Parent(Alice, Bob)"])
# Inferred: ["ParentOfPerson(Alice, Bob)"]
API Reference: Reasoning Module
Storage Layer¶
These modules handle persistence and retrieval of vectors, graphs, and triplets.
Embeddings Module¶
Purpose
Generates vector embeddings for text, images, and audio. Supports multiple providers with caching and batch processing.
Key Features:
- Multiple provider support (OpenAI, Cohere, HuggingFace, Sentence Transformers)
- Text, image, and audio embeddings
- Multimodal embeddings
- Batch processing
- Caching support
- Custom models
- Similarity calculations
Components:
EmbeddingGeneratorβ Main embedding orchestratorTextEmbedderβ Generate text embeddingsImageEmbedderβ Generate image embeddingsAudioEmbedderβ Generate audio embeddingsMultimodalEmbedderβ Combine multiple modalitiesEmbeddingOptimizerβ Optimize embedding qualityProviderStoresβ Support for OpenAI, Cohere, etc.
Quick Example:
from semantica.embeddings import EmbeddingGenerator
generator = EmbeddingGenerator(
provider="openai",
model="text-embedding-3-small"
)
embeddings = generator.generate(documents)
# Calculate similarity
similarity = generator.similarity(embeddings[0], embeddings[1])
print(f"Similarity: {similarity:.3f}")
API Reference: Embeddings Module
Vector Store Module¶
Purpose
Manages storage and retrieval of high-dimensional vectors. Supports hybrid search combining vector and keyword search.
Key Features:
- Multiple backend support (FAISS, Weaviate, Qdrant, Milvus)
- Hybrid search (vector + keyword)
- Metadata filtering
- Batch operations
- Similarity search with scoring
- Index management
- Namespace support
Components:
VectorStoreβ Main vector store interfaceFAISSStoreβ FAISS integrationWeaviateStoreβ Weaviate integrationHybridSearchβ Combine vector and keyword searchVectorRetrieverβ Retrieve relevant vectors
Quick Example:
from semantica.vector_store import VectorStore, HybridSearch
vector_store = VectorStore(backend="faiss")
vector_store.store(embeddings, documents, metadata)
# Hybrid search
hybrid_search = HybridSearch(vector_store)
results = hybrid_search.search(
query="machine learning",
top_k=10,
filters={"category": "AI"}
)
API Reference: Vector Store Module
Graph Store Module¶
Purpose
Integration with property graph databases for storing and querying knowledge graphs.
Key Features:
- Multiple backend support (Neo4j, FalkorDB)
- Cypher query language
- Graph algorithms and analytics
- Transaction support
- Index management
- High-performance queries
- Batch operations
Components:
GraphStoreβ Main graph store interfaceNeo4jStoreβ Neo4j database integrationFalkorDBStoreβ FalkorDB (Redis-based) integrationNodeManagerβ Node CRUD operationsRelationshipManagerβ Relationship CRUD operationsQueryEngineβ Cypher query executionGraphAnalyticsβ Graph algorithms and analytics
Quick Example:
from semantica.graph_store import GraphStore
store = GraphStore(backend="neo4j", uri="bolt://localhost:7687")
store.connect()
# Create nodes and relationships
alice = store.create_node(
labels=["Person"],
properties={"name": "Alice", "age": 30}
)
bob = store.create_node(
labels=["Person"],
properties={"name": "Bob", "age": 25}
)
store.create_relationship(
start_node_id=alice["id"],
end_node_id=bob["id"],
rel_type="KNOWS",
properties={"since": 2020}
)
# Query with Cypher
results = store.execute_query("MATCH (p:Person) RETURN p.name")
API Reference: Graph Store Module
Triplet Store Module¶
Purpose
RDF triplet store integration for semantic web applications. Supports SPARQL queries and multiple backends.
Key Features:
- Multi-backend support (Blazegraph, Jena, RDF4J)
- CRUD operations for RDF triplets
- SPARQL query execution and optimization
- Bulk data loading with progress tracking
- Query caching and optimization
- Transaction support
- Store backend pattern
Components:
TripletStoreβ Main triplet store interfaceQueryEngineβ SPARQL query execution and optimizationBulkLoaderβ High-volume data loading with progress trackingBlazegraphStoreβ Blazegraph integrationJenaStoreβ Apache Jena integrationRDF4JStoreβ Eclipse RDF4J integrationQueryPlanβ Query execution plan dataclassLoadProgressβ Bulk loading progress tracking
Algorithms:
| Category | Algorithms |
|---|---|
| Query Optimization | Cost estimation, query rewriting, LIMIT injection |
| Caching | MD5-based cache keys, LRU eviction |
| Bulk Loading | Batch processing, retry with exponential backoff |
Quick Example:
from semantica.triplet_store import TripletStore
store = TripletStore(backend="blazegraph", endpoint="http://localhost:9999/blazegraph")
# Add triplet
result = store.add_triplet({
"subject": "http://example.org/Alice",
"predicate": "http://example.org/knows",
"object": "http://example.org/Bob"
})
# Execute SPARQL
query_result = store.execute_query("SELECT ?s ?p ?o WHERE { ?s ?p ?o } LIMIT 10")
API Reference: Triplet Store Module
Quality Assurance Layer¶
These modules ensure data quality, handle duplicates, and resolve conflicts.
Deduplication Module¶
Purpose
Comprehensive entity deduplication and merging. Detects duplicates using multiple similarity methods and merges them intelligently.
Key Features:
- Multiple similarity methods (exact, Levenshtein, Jaro-Winkler, cosine, embedding)
- Advanced String Matching: Jaro-Winkler by default for better company/person name resolution
- Smart Property Handling: Neutral scoring for disjoint properties to prevent false negatives
- Duplicate detection with confidence scoring
- Entity merging with configurable strategies
- Cluster-based batch deduplication
- Provenance preservation during merges
- Relationship preservation
- Incremental processing support
Components:
DuplicateDetectorβ Detects duplicate entities using similarity metricsEntityMergerβ Merges duplicate entities using configurable strategiesSimilarityCalculatorβ Multi-factor similarity between entitiesMergeStrategyManagerβ Manages merge strategies and conflict resolutionClusterBuilderβ Builds clusters for batch deduplication
Merge Strategies:
| Strategy | Description |
|---|---|
"keep_first" | Preserve first entity, merge others |
"keep_last" | Preserve last entity, merge others |
"keep_most_complete" | Preserve entity with most properties |
"keep_highest_confidence" | Preserve entity with highest confidence |
"merge_all" | Combine all properties and relationships |
Quick Example:
from semantica.deduplication import DuplicateDetector, EntityMerger
# Detect duplicates
detector = DuplicateDetector(similarity_threshold=0.8)
duplicate_groups = detector.detect_duplicate_groups(entities)
# Merge duplicates
merger = EntityMerger(preserve_provenance=True)
merge_operations = merger.merge_duplicates(
entities,
strategy="keep_most_complete"
)
merged_entities = [op.merged_entity for op in merge_operations]
print(f"Reduced from {len(entities)} to {len(merged_entities)} entities")
Conflicts Module¶
Purpose
Detects and resolves conflicts from multiple data sources. Provides investigation guides and source tracking.
Key Features:
- Multi-source conflict detection (value, type, relationship, temporal, logical)
- Source tracking and provenance management
- Conflict analysis and pattern identification
- Multiple resolution strategies (voting, credibility-weighted, recency)
- Investigation guide generation
- Source credibility scoring
- Conflict reporting and statistics
Components:
ConflictDetectorβ Detects conflicts from multiple sourcesConflictResolverβ Resolves conflicts using various strategiesConflictAnalyzerβ Analyzes conflict patterns and trendsSourceTrackerβ Tracks source information and provenanceInvestigationGuideGeneratorβ Generates investigation guides
Resolution Strategies:
| Strategy | Algorithm |
|---|---|
| Voting | Majority value selection using frequency counting |
| Credibility Weighted | Weighted average using source credibility scores |
| Temporal Selection | Newest/oldest value based on timestamps |
| Confidence Selection | Maximum confidence value selection |
Quick Example:
from semantica.conflicts import ConflictDetector, ConflictResolver
detector = ConflictDetector()
conflicts = detector.detect_value_conflicts(entities, "name")
resolver = ConflictResolver()
results = resolver.resolve_conflicts(conflicts, strategy="voting")
KG Quality Assurance Module¶
Purpose
Comprehensive quality assessment, validation, and automated fixes for knowledge graphs.
Key Features:
- Quality metrics calculation (overall, completeness, consistency)
- Consistency checking (logical, temporal, hierarchical)
- Completeness validation (entity, relationship, property)
- Automated fixes (duplicates, inconsistencies, missing properties)
- Quality reporting with issue tracking
- Validation engine with rules and constraints
- Improvement suggestions
Components:
KGQualityAssessorβ Overall quality assessment coordinatorConsistencyCheckerβ Consistency validation engineCompletenessValidatorβ Completeness validation engineQualityMetricsβ Quality metrics calculatorValidationEngineβ Rule and constraint validationRuleValidatorβ Rule-based validationConstraintValidatorβ Constraint-based validationQualityReporterβ Quality report generationIssueTrackerβ Issue tracking and managementImprovementSuggestionsβ Improvement suggestions generatorAutomatedFixerβ Automated issue fixingAutoMergerβ Automatic merging of duplicatesAutoResolverβ Automatic conflict resolution
Note: The KG quality assessment module has been temporarily removed and will be reintroduced in a future release.
Context & Memory Layer¶
These modules provide context engineering for agents and foundation data management.
Context Module¶
Purpose
Context engineering infrastructure for agents. Formalizes context as a graph of connections with RAG-enhanced memory. Features GraphRAG with multi-hop reasoning and LLM-generated responses.
Key Features:
- Context graph construction from entities, relationships, and conversations
- Agent memory management with RAG integration
- Entity linking across sources with URI assignment
- Hybrid context retrieval (vector + graph + memory)
- Multi-hop reasoning through knowledge graphs
- LLM-generated responses grounded in graph context
- Reasoning trace showing entity relationship paths
- Conversation history management
- Context accumulation and synthesis
- Graph-based context traversal
Components:
ContextGraphβ In-memory context graph store and builder methodsContextNodeβ Context graph node data structureContextEdgeβ Context graph edge data structureAgentMemoryβ Manages persistent agent memory with RAGAgentContextβ High-level context interface with GraphRAG capabilitiesContextRetrieverβ Retrieves relevant context with multi-hop reasoningMemoryItemβ Memory item data structureEntityLinkerβ Links entities across sources with URI assignment
Algorithms:
| Category | Algorithms |
|---|---|
| Graph Construction | BFS/DFS traversal, type-based indexing |
| Memory Management | Vector embedding, similarity search, retention policies |
| Context Retrieval | Vector similarity, multi-hop graph expansion, hybrid scoring |
| Multi-Hop Reasoning | BFS traversal up to N hops, reasoning path construction |
| LLM Integration | Prompt engineering with context and reasoning paths |
| Entity Linking | Hash-based URI generation, text similarity matching |
Quick Example:
from semantica.context import AgentContext, ContextGraph, AgentMemory
from semantica.llms import Groq
from semantica.vector_store import VectorStore
import os
# Using AgentContext with GraphRAG reasoning
context = AgentContext(
vector_store=VectorStore(backend="faiss"),
knowledge_graph=kg
)
# Configure LLM provider
llm_provider = Groq(
model="llama-3.1-8b-instant",
api_key=os.getenv("GROQ_API_KEY")
)
# Query with multi-hop reasoning and LLM-generated response
result = context.query_with_reasoning(
query="What IPs are associated with security alerts?",
llm_provider=llm_provider,
max_results=10,
max_hops=2
)
print(f"Response: {result['response']}")
print(f"Reasoning Path: {result['reasoning_path']}")
print(f"Confidence: {result['confidence']:.3f}")
# Traditional context graph and memory
graph = ContextGraph()
graph_data = graph.build_from_entities_and_relationships(entities, relationships)
memory = AgentMemory(vector_store=vs, knowledge_graph=kg)
memory_id = memory.store("User asked about Python", metadata={"type": "conversation"})
results = memory.retrieve("Python", max_results=5)
API Reference: Context Module
LLM Providers Module¶
Purpose
Unified interface for LLM providers. Supports Groq, OpenAI, HuggingFace, and LiteLLM (100+ LLMs) with clean imports and consistent API.
Key Features:
- Unified Interface: Same
generate()andgenerate_structured()methods across all providers - Multiple Providers: Groq, OpenAI, HuggingFace, and LiteLLM (100+ LLMs)
- Clean Imports: Simple
from semantica.llms import Groq, OpenAI, HuggingFaceLLM, LiteLLM - Structured Output: JSON generation support
- API Key Management: Environment variable and direct key support
- Error Handling: Graceful fallback when providers unavailable
Components:
Groqβ Groq API provider for fast inferenceOpenAIβ OpenAI API provider (GPT-3.5, GPT-4, etc.)HuggingFaceLLMβ HuggingFace Transformers for local LLM inferenceLiteLLMβ Unified interface to 100+ LLM providers (OpenAI, Anthropic, Azure, Bedrock, Vertex AI, etc.)
Supported Providers via LiteLLM:
- OpenAI, Anthropic, Groq, Azure, Bedrock, Vertex AI, Cohere, Mistral, and 90+ more
Quick Example:
from semantica.llms import Groq, OpenAI, HuggingFaceLLM, LiteLLM
import os
# Groq - Fast inference
groq = Groq(
model="llama-3.1-8b-instant",
api_key=os.getenv("GROQ_API_KEY")
)
response = groq.generate("What is AI?")
# OpenAI
openai = OpenAI(
model="gpt-4",
api_key=os.getenv("OPENAI_API_KEY")
)
response = openai.generate("What is AI?")
# HuggingFace - Local models
hf = HuggingFaceLLM(model_name="gpt2") # or model="gpt2" for consistency
response = hf.generate("What is AI?")
# LiteLLM - Unified interface to 100+ LLMs
litellm = LiteLLM(
model="openai/gpt-4o", # or "anthropic/claude-sonnet-4-20250514", etc.
api_key=os.getenv("OPENAI_API_KEY")
)
response = litellm.generate("What is AI?")
# Structured output
structured = groq.generate_structured("Extract entities from: Apple Inc. was founded by Steve Jobs.")
API Reference: LLM Providers Module
Seed Module¶
Purpose
Seed data management for initial knowledge graph construction. Builds on verified knowledge from multiple sources.
Key Features:
- Multi-source seed data loading (CSV, JSON, Database, API)
- Foundation graph creation from seed data
- Seed data quality validation
- Integration with extracted data using configurable merge strategies
- Version management for seed sources
- Export capabilities (JSON, CSV)
- Schema template validation
Components:
SeedDataManagerβ Main coordinator for seed data operationsSeedDataSourceβ Seed data source definitionSeedDataβ Seed data container
Merge Strategies:
| Strategy | Description |
|---|---|
seed_first | Seed data takes precedence, extracted fills gaps |
extracted_first | Extracted data takes precedence, seed fills gaps |
merge | Property merging, seed takes precedence for conflicts |
Quick Example:
from semantica.seed import SeedDataManager
manager = SeedDataManager()
manager.register_source("entities", "json", "data/entities.json")
foundation = manager.create_foundation_graph()
validation = manager.validate_quality(foundation)
Output & Orchestration Layer¶
These modules handle export, visualization, and workflow management.
Export Module¶
Purpose
Export knowledge graphs and data to various formats for use in external tools.
Key Features:
- Multiple export formats (JSON, RDF, CSV, OWL, GraphML, GEXF)
- Custom export formats
- Batch export
- Metadata preservation
- Streaming export for large graphs
- Vector export support
Components:
JSONExporterβ Export to JSONRDFExporterβ Export to RDF/XMLCSVExporterβ Export to CSVGraphExporterβ Export to graph formats (GraphML, GEXF)OWLExporterβ Export to OWLVectorExporterβ Export vectors
Quick Example:
from semantica.export import JSONExporter, RDFExporter, CSVExporter
# Export to multiple formats
JSONExporter().export(kg, "output.json")
RDFExporter().export(kg, "output.rdf")
CSVExporter().export(kg, "output.csv")
API Reference: Export Module
Visualization Module¶
Purpose
Visual exploration of knowledge graphs, embeddings, and analytics data.
Key Features:
- Interactive graph visualization
- Embedding visualization (t-SNE, PCA, UMAP)
- Temporal data visualization
- Ontology visualization
- Multiple output formats (HTML, PNG, SVG)
- Custom styling
Components:
KGVisualizerβ Visualize knowledge graphsEmbeddingVisualizerβ Visualize embeddings (t-SNE, PCA, UMAP)AnalyticsVisualizerβ Visualize graph analyticsTemporalVisualizerβ Visualize temporal dataOntologyVisualizerβ Visualize ontology structureSemanticNetworkVisualizerβ Visualize semantic networks
Quick Example:
from semantica.visualization import KGVisualizer, EmbeddingVisualizer
# Visualize knowledge graph
KGVisualizer().visualize(kg, output_format="html", output_path="graph.html")
# Visualize embeddings
EmbeddingVisualizer().visualize(embeddings, method="tsne", output_path="embeddings.png")
API Reference: Visualization Module
Pipeline Module¶
Purpose
Orchestrates workflows, connecting modules into robust, executable pipelines.
Key Features:
- Pipeline construction DSL
- Parallel execution
- Error handling and recovery
- Resource scheduling
- Pipeline validation
- Monitoring and logging
- Checkpoint support
Components:
PipelineBuilderβ Build complex pipelinesExecutionEngineβ Execute pipelinesFailureHandlerβ Handle pipeline failuresParallelismManagerβ Enable parallel processingResourceSchedulerβ Schedule resourcesPipelineValidatorβ Validate pipeline configuration
Quick Example:
from semantica.pipeline import PipelineBuilder
from semantica.ingest import FileIngestor
from semantica.parse import DocumentParser
from semantica.semantic_extract import NERExtractor
builder = PipelineBuilder()
pipeline = builder \
.add_step("ingest", FileIngestor()) \
.add_step("parse", DocumentParser()) \
.add_step("extract", NERExtractor()) \
.build()
result = pipeline.execute(sources=["data/"], parallel=True)
API Reference: Pipeline Module
Integration Patterns¶
Pattern 1: Complete Knowledge Graph Pipeline¶
Build a complete knowledge graph from documents using the full pipeline.
For complete examples, see: - Your First Knowledge Graph Cookbook: Complete pipeline walkthrough - Topics: Ingestion, parsing, extraction, graph building, embeddings - Difficulty: Beginner - Time: 20-30 minutes - Use Cases: Learning the complete workflow
Pattern 2: Custom Pipeline with Module Selection¶
Build custom pipelines with specific module selections and quality assurance.
For examples, see: - Building Knowledge Graphs Cookbook: Advanced graph construction - Topics: Custom pipelines, entity merging, conflict resolution - Difficulty: Intermediate - Time: 30-45 minutes - Use Cases: Production graph construction
Pattern 3: GraphRAG with Hybrid Search¶
Build GraphRAG systems with hybrid search combining vector and graph retrieval.
For complete examples, see: - GraphRAG Complete Cookbook: Production GraphRAG system - Topics: GraphRAG, hybrid retrieval, graph traversal, LLM integration - Difficulty: Advanced - Time: 1-2 hours - Use Cases: Production RAG applications
Pattern 4: Temporal Graph with Reasoning¶
Build temporal graphs with logical reasoning capabilities.
For examples, see: - Temporal Graphs Cookbook: Temporal graph construction - Topics: Time-stamped entities, temporal relationships, historical queries - Difficulty: Intermediate - Time: 30-45 minutes - Use Cases: Time-aware knowledge graphs
Quick Reference: All Modules¶
| Module | Import | Main Class | Purpose |
|---|---|---|---|
| Ingest | semantica.ingest | FileIngestor | Data ingestion |
| Parse | semantica.parse | DocumentParser | Document parsing |
| Split | semantica.split | TextSplitter | Text chunking |
| Normalize | semantica.normalize | TextNormalizer | Data cleaning |
| Semantic Extract | semantica.semantic_extract | NERExtractor | Entity extraction |
| KG | semantica.kg | GraphBuilder | Graph construction |
| Ontology | semantica.ontology | OntologyGenerator | Ontology generation |
| Reasoning | semantica.reasoning | Reasoner | Logical inference |
| Embeddings | semantica.embeddings | EmbeddingGenerator | Vector generation |
| Vector Store | semantica.vector_store | VectorStore | Vector storage |
| Graph Store | semantica.graph_store | GraphStore | Graph database |
| Triplet Store | semantica.triplet_store | TripletStore | RDF storage |
| Deduplication | semantica.deduplication | DuplicateDetector | Duplicate removal |
| Conflicts | semantica.conflicts | ConflictDetector | Conflict resolution |
| Context | semantica.context | AgentContext | Agent context & GraphRAG |
| LLM Providers | semantica.llms | Groq, OpenAI, HuggingFaceLLM, LiteLLM | LLM integration |
| Seed | semantica.seed | SeedDataManager | Foundation data |
| Export | semantica.export | JSONExporter | Data export |
| Visualization | semantica.visualization | KGVisualizer | Visualization |
| Pipeline | semantica.pipeline | PipelineBuilder | Workflow orchestration |
Next Steps¶
- Core Concepts β Understand the fundamental concepts
- Use Cases β See real-world applications
- Examples β Practical code examples
- Cookbook β Interactive Jupyter notebook tutorials
- API Reference β Detailed API documentation
π³ Recommended Cookbooks¶
- Welcome to Semantica: Comprehensive introduction to all modules
- Topics: Framework overview, all modules, architecture
- Difficulty: Beginner
- Time: 30-45 minutes
-
Use Cases: Understanding the complete framework
-
Your First Knowledge Graph: Build your first knowledge graph
- Topics: Complete pipeline from ingestion to graph construction
- Difficulty: Beginner
- Time: 20-30 minutes
- Use Cases: Hands-on practice with all modules
Contribute
Found an issue or want to improve this guide? Contribute on GitHub