🧠 Semantica
A Framework for Building Context Graphs and Decision Intelligence Layers for AI
⭐ Give us a Star • 🍴 Fork us • 💬 Join our Discord • 🐦 Follow on X
Transform Chaos into Intelligence. Build AI systems with context graphs, decision tracking, and advanced knowledge engineering that are explainable, traceable, and trustworthy — not black boxes.
The Problem¶
AI agents today are capable but not trustworthy:
- No memory structure — agents store embeddings, not meaning. Retrieval is fuzzy; there's no way to ask why something was recalled.
- No decision trail — agents make decisions continuously but record nothing. When something goes wrong, there's no history to debug or audit.
- No provenance — outputs cannot be traced back to source facts. In regulated industries, this is a compliance blocker.
- No reasoning transparency — black-box answers with no explanation of how a conclusion was reached.
- No conflict detection — contradictory facts silently coexist in vector stores, producing unpredictable answers.
These aren't edge cases. They are the reason AI cannot be deployed in healthcare, finance, legal, and government without custom guardrails built from scratch.
The Solution¶
Semantica is the context and intelligence layer you add to your AI stack:
- Context Graphs — structured graph of entities, relationships, and decisions your agent builds as it works. Queryable, traceable, persistent.
- Decision Intelligence — every decision is a first-class object: recorded, linked causally, searchable by precedent, and analyzable for downstream impact.
- Provenance — every fact links to its source. W3C PROV-O compliant. Full lineage from ingestion to inference.
- Reasoning engines — forward chaining, Rete networks, deductive, abductive, and SPARQL reasoning. Explainable inference paths, not black-box answers.
- Deduplication & QA — conflict detection, entity resolution, and validation built into the pipeline.
Works alongside LangChain, LlamaIndex, AutoGen, CrewAI, and any LLM provider — Semantica is not a replacement, it's the accountability layer on top.
⚡ Quick Installation¶
from semantica.context import AgentContext, ContextGraph
from semantica.vector_store import VectorStore
context = AgentContext(
vector_store=VectorStore(backend="inmemory"),
knowledge_graph=ContextGraph(advanced_analytics=True),
decision_tracking=True,
)
# Store a memory
context.store("GPT-4 outperforms GPT-3.5 on reasoning benchmarks by 40%")
# Record a decision
decision_id = context.record_decision(
category="model_selection",
scenario="Choose LLM for production reasoning pipeline",
reasoning="GPT-4 benchmark advantage justifies 3x cost increase",
outcome="selected_gpt4",
confidence=0.91,
)
# Find similar past decisions and analyze downstream impact
precedents = context.find_precedents("model selection reasoning", limit=5)
influence = context.analyze_decision_influence(decision_id)
Full Quick Start • Cookbook • Join Discord
What's New in v0.3.0¶
First stable release (
Production/Stableon PyPI).
| Area | Highlights |
|---|---|
| Context Graphs | Temporal validity windows, weighted BFS, cross-graph navigation with save/load persistence |
| Decision Intelligence | Full lifecycle: record → trace → impact → precedent; PolicyEngine with versioned rules |
| KG Algorithms | PageRank, betweenness, Louvain community detection, Node2Vec, link prediction |
| Semantic Extraction | LLM extraction fixed (no silent drops), duplicate relation bug removed, "llm_typed" metadata corrected |
| Deduplication v2 | blocking_v2/hybrid_v2 — 63.6% faster; semantic v2 — 6.98x faster |
| Delta Processing | SPARQL-based incremental diff, delta_mode pipelines, snapshot versioning |
| Export | RDF aliases ("ttl", "json-ld"), ArangoDB AQL, Apache Parquet (Spark/BigQuery/Databricks) |
| Pipeline | FailureHandler with LINEAR/EXPONENTIAL/FIXED backoff; PipelineValidator returning ValidationResult |
| Graph Backends | Apache AGE (SQL injection fixed), AWS Neptune, FalkorDB, PgVector (HNSW/IVFFlat) |
| Tests | 886+ passing, 0 failures — 335 context, ~430 KG, 70 semantic extraction, 85 real-world E2E |
Core Value Proposition¶
| Trustworthy | Explainable | Auditable |
|---|---|---|
| Conflict detection & validation | Transparent reasoning paths | Complete provenance tracking |
| Rule-based governance | Entity relationships & ontologies | W3C PROV-O compliant lineage |
| Production-grade QA | Multi-hop graph reasoning | Source tracking & integrity verification |
Features¶
Context & Decision Intelligence¶
- Context Graphs — structured, persistent graph of entities, relationships, and decisions
- Decision tracking —
add_decision(),record_decision()for full lifecycle management - Causal chains —
add_causal_relationship(),trace_decision_chain() - Precedent search — hybrid similarity search over past decisions via
find_similar_decisions() - Influence analysis —
analyze_decision_impact(),analyze_decision_influence() - Policy engine —
check_decision_rules()with versioned, automated compliance rules - Agent memory —
AgentMemorywith short/long-term storage and conversation history
Knowledge Graphs¶
- Graph construction — entities, relationships, properties, typed edges
- Algorithms — PageRank, betweenness centrality, clustering coefficient, community detection
- Node embeddings — Node2Vec via
NodeEmbedder; cosine similarity viaSimilarityCalculator - Link prediction — score potential edges via
LinkPredictor - Temporal graphs — time-aware nodes and edges with validity windows
- Delta processing — incremental updates without full recompute
Semantic Extraction¶
- NER — named entity recognition, normalization, classification
- Relation extraction — triplet generation via LLMs or rule-based methods, with
"llm_typed"metadata - Deduplication v1/v2 — Jaro-Winkler,
blocking_v2,hybrid_v2,semantic_v2;dedup_triplets()for triples
Reasoning¶
- Forward chaining —
Reasonerwith IF/THEN string rules and dict facts - Rete network —
ReteEnginefor high-throughput production rule matching - Deductive / Abductive —
DeductiveReasoner,AbductiveReasoner - SPARQL —
SPARQLReasonerfor query-based inference over RDF graphs
Provenance & Auditability¶
- Entity provenance —
ProvenanceTracker.track_entity() - Algorithm provenance —
AlgorithmTrackerWithProvenance - W3C PROV-O compliant — lineage tracking across all modules
- Change management — version control with checksums, audit trails, compliance support
Vector Store¶
- Backends — FAISS, Pinecone, Weaviate, Qdrant, Milvus, PgVector, in-memory
- Search modes — semantic top-k, hybrid (vector + keyword), metadata-filtered
Data Ingestion¶
- Files — PDF, DOCX, HTML, JSON, CSV, Excel, PPTX, archives
- Sources — web crawl, SQL databases, Snowflake, feeds, email, repositories
- Docling — advanced parsing with table and layout extraction
- Media — image OCR, audio/video metadata
Export¶
- RDF — Turtle, JSON-LD, N-Triples, XML via
RDFExporter - Parquet —
ParquetExporterfor Spark/BigQuery/Databricks pipelines - ArangoDB AQL — ready-to-run INSERT statements
- OWL ontologies — Turtle or RDF/XML
Pipeline & Ontology¶
- Pipeline DSL —
PipelineBuilderwith stage chaining, parallel workers, retry policies - Ontology — auto-generate OWL from KGs, import OWL/RDF/Turtle/JSON-LD, HermiT/Pellet validation
Modules¶
| Module | What it provides |
|---|---|
semantica.context | Context graphs, agent memory, decision tracking, causal analysis, precedent search, policy engine |
semantica.kg | KG construction, graph algorithms, centrality, community detection, embeddings, link prediction |
semantica.semantic_extract | NER, relation extraction, event extraction, coreference, triplet generation, LLM extraction |
semantica.reasoning | Forward chaining, Rete network, deductive, abductive, SPARQL reasoning, explanation generation |
semantica.vector_store | FAISS, Pinecone, Weaviate, Qdrant, Milvus, PgVector; hybrid & filtered search |
semantica.export | RDF, Parquet, ArangoDB AQL, CSV, YAML, OWL, graph formats |
semantica.ingest | Files, web crawl, feeds, databases, Snowflake, MCP, email, repositories |
semantica.ontology | Auto-generation, OWL/RDF export, import, validation, versioning |
semantica.pipeline | Pipeline DSL, parallel workers, validation, retry policies, failure handling |
semantica.graph_store | Neo4j, FalkorDB, Apache AGE, Amazon Neptune; Cypher queries |
semantica.embeddings | Sentence-Transformers, FastEmbed, OpenAI, BGE; similarity calculation |
semantica.deduplication | Entity deduplication, similarity scoring, merging, clustering |
semantica.provenance | W3C PROV-O lineage tracking, source attribution, audit trails |
semantica.parse | PDF, DOCX, PPTX, HTML, code, email, structured data, OCR |
semantica.split | Recursive, semantic, entity-aware, relation-aware, graph-based chunking |
semantica.normalize | Text, entities, dates, numbers, quantities, languages, encodings |
semantica.conflicts | Multi-source conflict detection (value, type, temporal, logical) with resolution |
semantica.change_management | Version storage, change tracking, checksums, audit trails |
semantica.triplet_store | Blazegraph, Jena, RDF4J; SPARQL queries and bulk loading |
semantica.visualization | Interactive/static KG, ontology, embedding, and temporal graph visualization |
semantica.core | Framework orchestration, configuration, plugin system |
semantica.llms | Groq, OpenAI, Novita AI, HuggingFace, LiteLLM integrations |
Built for High-Stakes Domains¶
Where every decision must be accountable and mistakes have real consequences:
- 🏥 Healthcare & Life Sciences — Clinical decision support, drug interactions, patient safety
- 💰 Finance & Risk — Fraud detection, SOX/GDPR/MiFID II compliance, risk assessment
- ⚖️ Legal & Compliance — Evidence-backed research, contract analysis, regulatory tracking
- 🔒 Cybersecurity — Threat attribution, incident response, security audit trails
- 🏛️ Government & Defense — Policy decisions, classified information handling, defense intelligence
- 🏭 Critical Infrastructure — Power grids, transportation safety, emergency response
- 🚗 Autonomous Systems — Self-driving, robotics safety, industrial automation
Choose Your Path¶
-
Quick Start --- Up and running in minutes.
-
Core Concepts --- Knowledge graphs, ontologies, and semantic reasoning explained.
-
API Reference --- Full technical documentation for every module and class.
-
Cookbook --- 14 domain-specific cookbooks with real-world examples.
Installation¶
Now Available on PyPI!
Install with a single command.
Why Semantica?¶
-
🆓 Open Source --- MIT licensed. No vendor lock-in.
-
🚀 Production Ready --- Battle-tested with QA, conflict resolution, and validation built in.
-
🧩 Modular --- Use only what you need. Swap components easily.
-
🌍 Community Driven --- Built by developers, for developers. Active Discord.
-
📚 End-to-End --- From ingestion to reasoning — no duct-taping required.
-
🔬 Research-Backed --- Grounded in knowledge graph, ontology, and semantic web research.
Learn More¶
- Getting Started — your first knowledge graph in 5 minutes
- Core Concepts — knowledge graphs, ontologies, and semantic reasoning
- Cookbook — 14 domain-specific cookbooks with Jupyter notebooks
- API Reference — complete technical documentation