Skip to content
Semantica Logo

🧠 Semantica

Python 3.8+ License: MIT PyPI Version Total Downloads CI Discord X

A Framework for Building Context Graphs and Decision Intelligence Layers for AI

⭐ Give us a Star  •  🍴 Fork us  •  💬 Join our Discord  •  🐦 Follow on X

Transform Chaos into Intelligence. Build AI systems with context graphs, decision tracking, and advanced knowledge engineering that are explainable, traceable, and trustworthy — not black boxes.

Get Started View on GitHub


The Problem

AI agents today are capable but not trustworthy:

  • No memory structure — agents store embeddings, not meaning. Retrieval is fuzzy; there's no way to ask why something was recalled.
  • No decision trail — agents make decisions continuously but record nothing. When something goes wrong, there's no history to debug or audit.
  • No provenance — outputs cannot be traced back to source facts. In regulated industries, this is a compliance blocker.
  • No reasoning transparency — black-box answers with no explanation of how a conclusion was reached.
  • No conflict detection — contradictory facts silently coexist in vector stores, producing unpredictable answers.

These aren't edge cases. They are the reason AI cannot be deployed in healthcare, finance, legal, and government without custom guardrails built from scratch.


The Solution

Semantica is the context and intelligence layer you add to your AI stack:

  • Context Graphs — structured graph of entities, relationships, and decisions your agent builds as it works. Queryable, traceable, persistent.
  • Decision Intelligence — every decision is a first-class object: recorded, linked causally, searchable by precedent, and analyzable for downstream impact.
  • Provenance — every fact links to its source. W3C PROV-O compliant. Full lineage from ingestion to inference.
  • Reasoning engines — forward chaining, Rete networks, deductive, abductive, and SPARQL reasoning. Explainable inference paths, not black-box answers.
  • Deduplication & QA — conflict detection, entity resolution, and validation built into the pipeline.

Works alongside LangChain, LlamaIndex, AutoGen, CrewAI, and any LLM provider — Semantica is not a replacement, it's the accountability layer on top.


⚡ Quick Installation

pip install semantica
from semantica.context import AgentContext, ContextGraph
from semantica.vector_store import VectorStore

context = AgentContext(
    vector_store=VectorStore(backend="inmemory"),
    knowledge_graph=ContextGraph(advanced_analytics=True),
    decision_tracking=True,
)

# Store a memory
context.store("GPT-4 outperforms GPT-3.5 on reasoning benchmarks by 40%")

# Record a decision
decision_id = context.record_decision(
    category="model_selection",
    scenario="Choose LLM for production reasoning pipeline",
    reasoning="GPT-4 benchmark advantage justifies 3x cost increase",
    outcome="selected_gpt4",
    confidence=0.91,
)

# Find similar past decisions and analyze downstream impact
precedents = context.find_precedents("model selection reasoning", limit=5)
influence  = context.analyze_decision_influence(decision_id)

Full Quick Start  •  Cookbook  •  Join Discord


What's New in v0.3.0

First stable release (Production/Stable on PyPI).

Area Highlights
Context Graphs Temporal validity windows, weighted BFS, cross-graph navigation with save/load persistence
Decision Intelligence Full lifecycle: record → trace → impact → precedent; PolicyEngine with versioned rules
KG Algorithms PageRank, betweenness, Louvain community detection, Node2Vec, link prediction
Semantic Extraction LLM extraction fixed (no silent drops), duplicate relation bug removed, "llm_typed" metadata corrected
Deduplication v2 blocking_v2/hybrid_v2 — 63.6% faster; semantic v2 — 6.98x faster
Delta Processing SPARQL-based incremental diff, delta_mode pipelines, snapshot versioning
Export RDF aliases ("ttl", "json-ld"), ArangoDB AQL, Apache Parquet (Spark/BigQuery/Databricks)
Pipeline FailureHandler with LINEAR/EXPONENTIAL/FIXED backoff; PipelineValidator returning ValidationResult
Graph Backends Apache AGE (SQL injection fixed), AWS Neptune, FalkorDB, PgVector (HNSW/IVFFlat)
Tests 886+ passing, 0 failures — 335 context, ~430 KG, 70 semantic extraction, 85 real-world E2E

Core Value Proposition

Trustworthy Explainable Auditable
Conflict detection & validation Transparent reasoning paths Complete provenance tracking
Rule-based governance Entity relationships & ontologies W3C PROV-O compliant lineage
Production-grade QA Multi-hop graph reasoning Source tracking & integrity verification

Features

Context & Decision Intelligence

  • Context Graphs — structured, persistent graph of entities, relationships, and decisions
  • Decision trackingadd_decision(), record_decision() for full lifecycle management
  • Causal chainsadd_causal_relationship(), trace_decision_chain()
  • Precedent search — hybrid similarity search over past decisions via find_similar_decisions()
  • Influence analysisanalyze_decision_impact(), analyze_decision_influence()
  • Policy enginecheck_decision_rules() with versioned, automated compliance rules
  • Agent memoryAgentMemory with short/long-term storage and conversation history

Knowledge Graphs

  • Graph construction — entities, relationships, properties, typed edges
  • Algorithms — PageRank, betweenness centrality, clustering coefficient, community detection
  • Node embeddings — Node2Vec via NodeEmbedder; cosine similarity via SimilarityCalculator
  • Link prediction — score potential edges via LinkPredictor
  • Temporal graphs — time-aware nodes and edges with validity windows
  • Delta processing — incremental updates without full recompute

Semantic Extraction

  • NER — named entity recognition, normalization, classification
  • Relation extraction — triplet generation via LLMs or rule-based methods, with "llm_typed" metadata
  • Deduplication v1/v2 — Jaro-Winkler, blocking_v2, hybrid_v2, semantic_v2; dedup_triplets() for triples

Reasoning

  • Forward chainingReasoner with IF/THEN string rules and dict facts
  • Rete networkReteEngine for high-throughput production rule matching
  • Deductive / AbductiveDeductiveReasoner, AbductiveReasoner
  • SPARQLSPARQLReasoner for query-based inference over RDF graphs

Provenance & Auditability

  • Entity provenanceProvenanceTracker.track_entity()
  • Algorithm provenanceAlgorithmTrackerWithProvenance
  • W3C PROV-O compliant — lineage tracking across all modules
  • Change management — version control with checksums, audit trails, compliance support

Vector Store

  • Backends — FAISS, Pinecone, Weaviate, Qdrant, Milvus, PgVector, in-memory
  • Search modes — semantic top-k, hybrid (vector + keyword), metadata-filtered

Data Ingestion

  • Files — PDF, DOCX, HTML, JSON, CSV, Excel, PPTX, archives
  • Sources — web crawl, SQL databases, Snowflake, feeds, email, repositories
  • Docling — advanced parsing with table and layout extraction
  • Media — image OCR, audio/video metadata

Export

  • RDF — Turtle, JSON-LD, N-Triples, XML via RDFExporter
  • ParquetParquetExporter for Spark/BigQuery/Databricks pipelines
  • ArangoDB AQL — ready-to-run INSERT statements
  • OWL ontologies — Turtle or RDF/XML

Pipeline & Ontology

  • Pipeline DSLPipelineBuilder with stage chaining, parallel workers, retry policies
  • Ontology — auto-generate OWL from KGs, import OWL/RDF/Turtle/JSON-LD, HermiT/Pellet validation

Modules

Module What it provides
semantica.context Context graphs, agent memory, decision tracking, causal analysis, precedent search, policy engine
semantica.kg KG construction, graph algorithms, centrality, community detection, embeddings, link prediction
semantica.semantic_extract NER, relation extraction, event extraction, coreference, triplet generation, LLM extraction
semantica.reasoning Forward chaining, Rete network, deductive, abductive, SPARQL reasoning, explanation generation
semantica.vector_store FAISS, Pinecone, Weaviate, Qdrant, Milvus, PgVector; hybrid & filtered search
semantica.export RDF, Parquet, ArangoDB AQL, CSV, YAML, OWL, graph formats
semantica.ingest Files, web crawl, feeds, databases, Snowflake, MCP, email, repositories
semantica.ontology Auto-generation, OWL/RDF export, import, validation, versioning
semantica.pipeline Pipeline DSL, parallel workers, validation, retry policies, failure handling
semantica.graph_store Neo4j, FalkorDB, Apache AGE, Amazon Neptune; Cypher queries
semantica.embeddings Sentence-Transformers, FastEmbed, OpenAI, BGE; similarity calculation
semantica.deduplication Entity deduplication, similarity scoring, merging, clustering
semantica.provenance W3C PROV-O lineage tracking, source attribution, audit trails
semantica.parse PDF, DOCX, PPTX, HTML, code, email, structured data, OCR
semantica.split Recursive, semantic, entity-aware, relation-aware, graph-based chunking
semantica.normalize Text, entities, dates, numbers, quantities, languages, encodings
semantica.conflicts Multi-source conflict detection (value, type, temporal, logical) with resolution
semantica.change_management Version storage, change tracking, checksums, audit trails
semantica.triplet_store Blazegraph, Jena, RDF4J; SPARQL queries and bulk loading
semantica.visualization Interactive/static KG, ontology, embedding, and temporal graph visualization
semantica.core Framework orchestration, configuration, plugin system
semantica.llms Groq, OpenAI, Novita AI, HuggingFace, LiteLLM integrations

Built for High-Stakes Domains

Where every decision must be accountable and mistakes have real consequences:

  • 🏥 Healthcare & Life Sciences — Clinical decision support, drug interactions, patient safety
  • 💰 Finance & Risk — Fraud detection, SOX/GDPR/MiFID II compliance, risk assessment
  • ⚖️ Legal & Compliance — Evidence-backed research, contract analysis, regulatory tracking
  • 🔒 Cybersecurity — Threat attribution, incident response, security audit trails
  • 🏛️ Government & Defense — Policy decisions, classified information handling, defense intelligence
  • 🏭 Critical Infrastructure — Power grids, transportation safety, emergency response
  • 🚗 Autonomous Systems — Self-driving, robotics safety, industrial automation

Choose Your Path

  • Quick Start --- Up and running in minutes.

    ➡ Start Here

  • Core Concepts --- Knowledge graphs, ontologies, and semantic reasoning explained.

    ➡ Learn Concepts

  • API Reference --- Full technical documentation for every module and class.

    ➡ View API

  • Cookbook --- 14 domain-specific cookbooks with real-world examples.

    ➡ Explore Cookbook


Installation

Now Available on PyPI!

Install with a single command.

pip install semantica

# With all optional dependencies
pip install semantica[all]
git clone https://github.com/Hawksight-AI/semantica.git
cd semantica
pip install -e .          # core
pip install -e ".[all]"   # all extras
git clone https://github.com/Hawksight-AI/semantica.git
cd semantica
pip install -e ".[dev]"
docker pull semantica/semantica:latest
docker run -it semantica/semantica

Why Semantica?

  • 🆓 Open Source --- MIT licensed. No vendor lock-in.

  • 🚀 Production Ready --- Battle-tested with QA, conflict resolution, and validation built in.

  • 🧩 Modular --- Use only what you need. Swap components easily.

  • 🌍 Community Driven --- Built by developers, for developers. Active Discord.

  • 📚 End-to-End --- From ingestion to reasoning — no duct-taping required.

  • 🔬 Research-Backed --- Grounded in knowledge graph, ontology, and semantic web research.


Learn More

  • Getting Started — your first knowledge graph in 5 minutes
  • Core Concepts — knowledge graphs, ontologies, and semantic reasoning
  • Cookbook — 14 domain-specific cookbooks with Jupyter notebooks
  • API Reference — complete technical documentation