🍳 Semantica Cookbook¶
Welcome to the Semantica Cookbook!
This collection of Jupyter notebooks is designed to take you from a beginner to an expert in building semantic AI applications. Whether you're looking for quick recipes or deep-dive tutorials, you'll find it here.
How to use this Cookbook
- Beginners: Start with the Core Tutorials to learn the basics.
- Developers: Check out Advanced Concepts for deep dives into specific features.
- Architects: Explore Industry Use Cases for end-to-end solutions.
Prerequisites
Before running these notebooks, ensure you have: - Python 3.8+ installed - A basic understanding of Python and Jupyter - An OpenAI API key (for most examples)
Installation
Install Semantica from PyPI (recommended):
For more installation options, see the Installation Guide.
� Featured Recipes¶
Hand-picked tutorials to show you the power of Semantica.
-
GraphRAG Complete --- Build a production-ready Graph Retrieval Augmented Generation system.
Topics: RAG, LLMs, Vector Search, Graph Traversal
Difficulty: Advanced
-
RAG vs. GraphRAG Comparison --- Side-by-side comparison of Standard RAG vs. GraphRAG using real-world data.
Topics: RAG, GraphRAG, Benchmarking, Visualization
Difficulty: Intermediate
-
GraphRAG Complete --- Build a production-ready Graph Retrieval Augmented Generation system.
New Features: Graph Validation, Logical Inference, Hybrid Context.
Topics: RAG, LLMs, Vector Search, Graph Traversal
Difficulty: Advanced
-
RAG vs. GraphRAG Comparison --- Side-by-side comparison of Standard RAG vs. GraphRAG using real-world data.
New Features: Inference-Enhanced GraphRAG, Reasoning Gap Analysis.
Topics: RAG, GraphRAG, Benchmarking, Visualization
Difficulty: Intermediate
-
Your First Knowledge Graph --- Go from raw text to a queryable knowledge graph in 20 minutes.
Topics: Extraction, Graph Construction, Visualization
Difficulty: Beginner
-
Real-Time Anomaly Detection --- Detect anomalies in streaming data using dynamic graphs.
Topics: Streaming, Security, Dynamic Graphs
Difficulty: Advanced
🏁 Core Tutorials¶
Essential guides to master the Semantica framework.
-
Welcome to Semantica --- An interactive introduction to the framework's core philosophy and all modules including ingestion, parsing, extraction, knowledge graphs, embeddings, and more.
Topics: Framework Overview, Architecture, All Modules
Difficulty: Beginner
-
Data Ingestion --- Techniques for loading data from multiple sources using FileIngestor, WebIngestor, FeedIngestor, StreamIngestor, RepoIngestor, EmailIngestor, DBIngestor, and MCPIngestor.
Topics: File Ingestion, Web Scraping, Database Integration, Streams, Feeds, Repositories, Email, MCP
Difficulty: Beginner
-
Document Parsing --- Extracting clean text from complex formats like PDF, DOCX, and HTML.
Topics: OCR, PDF Parsing, Text Extraction
Difficulty: Beginner
-
Data Normalization --- Pipelines for cleaning, normalizing, and preparing text.
Topics: Text Cleaning, Unicode, Formatting
Difficulty: Beginner
-
Entity Extraction --- Using NER to identify people, organizations, and custom entities.
Topics: NER, Spacy, LLM Extraction
Difficulty: Beginner
-
Relation Extraction --- Discovering and classifying relationships between entities.
Topics: Relation Classification, Dependency Parsing
Difficulty: Beginner
-
Embedding Generation --- Creating and managing vector embeddings for semantic search.
Topics: Embeddings, OpenAI, HuggingFace
Difficulty: Intermediate
-
Vector Store --- Setting up vector stores for similarity search and retrieval.
Difficulty: Intermediate
-
Graph Store --- Persisting knowledge graphs in Neo4j or FalkorDB.
Topics: Neo4j, Cypher, Persistence
Difficulty: Intermediate
-
Ontology --- Defining domain schemas and ontologies to structure your data.
Topics: OWL, RDF, Schema Design
Difficulty: Intermediate
🧠 Advanced Concepts¶
Deep dive into advanced features, customization, and complex workflows.
-
Advanced Extraction --- Custom extractors, LLM-based extraction, and complex pattern matching.
Topics: Custom Models, Regex, LLMs
Difficulty: Advanced
-
:material-chart-network: Advanced Graph Analytics --- Centrality, community detection, and pathfinding algorithms.
Topics: PageRank, Louvain, Shortest Path
Difficulty: Advanced
-
Advanced Context Engineering --- Build a production-grade memory system for AI agents using persistent Vector (FAISS) and Graph (Neo4j) stores.
Topics: Agent Memory, GraphRAG, Entity Injection, Lifecycle Management
Difficulty: Advanced
-
Complete Visualization Suite --- Creating interactive, publication-ready visualizations of your graphs.
Topics: PyVis, NetworkX, D3.js
Difficulty: Intermediate
-
Conflict Resolution --- Strategies for handling contradictory information from multiple sources.
Topics: Truth Discovery, Voting, Confidence
Difficulty: Advanced
-
Multi-Format Export --- Exporting to RDF, OWL, JSON-LD, and NetworkX formats.
Topics: Serialization, Interoperability
Difficulty: Intermediate
-
Multi-Source Integration --- Merging data from disparate sources into a unified graph.
Topics: Entity Resolution, Merging, Fusion
Difficulty: Advanced
-
Pipeline Orchestration --- Building robust, automated data processing pipelines.
Topics: Workflows, Automation, Error Handling
Difficulty: Advanced
-
Reasoning and Inference --- Using logical reasoning to infer new knowledge from existing facts.
Topics: Logic Rules, Inference Engines
Difficulty: Advanced
Open Notebook - Semantic Layer Construction
Building a semantic layer over your data warehouse or lake.
Topics: Semantic Layer, Data Warehouse
Difficulty: Advanced
-
Temporal Knowledge Graphs --- Modeling and querying data that changes over time.
Topics: Time Series, Temporal Logic
Difficulty: Advanced
🏭 Industry Use Cases¶
Real-world examples and end-to-end applications across various industries.
Biomedical¶
-
Drug Discovery Pipeline --- Accelerating drug discovery by connecting genes, proteins, and drugs using PubMed RSS feeds, entity-aware chunking, GraphRAG, and vector similarity search.
Topics: Bioinformatics, KG Construction, GraphRAG, Vector Search
Difficulty: Advanced
-
Genomic Variant Analysis --- Analyzing genomic variants and their implications for disease using bioRxiv RSS feeds, temporal knowledge graphs, deduplication, and pathway analysis.
Topics: Genomics, Variant Calling, Temporal KGs, Graph Analytics
Difficulty: Advanced
Finance¶
-
Financial Data Integration MCP --- Merging financial data from Alpha Vantage API, MCP servers, RSS feeds, and market feeds with seed data integration.
Topics: Finance, Data Fusion, MCP Integration, Real-Time Ingestion
Difficulty: Intermediate
-
Fraud Detection --- Identifying fraudulent activities and patterns in transaction networks using temporal knowledge graphs, conflict detection, and pattern recognition.
Topics: Anomaly Detection, Graph Mining, Temporal Analysis, Pattern Detection
Difficulty: Advanced
Blockchain¶
-
DeFi Protocol Intelligence --- Analyzing decentralized finance protocols and transaction flows using CoinDesk RSS feeds, ontology-aware chunking, conflict detection, and ontology generation.
Topics: Blockchain, DeFi, Smart Contracts, Ontology, Conflict Resolution
Difficulty: Advanced
-
Transaction Network Analysis --- Mapping and analyzing blockchain transaction networks using blockchain APIs, deduplication, and network pattern detection.
Topics: Blockchain Analytics, Network Analysis, Deduplication, Pattern Detection
Difficulty: Advanced
Cybersecurity¶
-
Real-Time Anomaly Detection --- Detecting anomalies in real-time network traffic streams using CVE RSS feeds, Kafka streams, temporal knowledge graphs, and sentence chunking.
Topics: Network Security, Streaming, Temporal KGs, Pattern Detection
Difficulty: Advanced
-
Threat Intelligence Hybrid RAG --- Combining enhanced GraphRAG with threat intelligence for security insights using security RSS feeds, entity-aware chunking, deduplication, and temporal knowledge graphs.
Topics: Threat Intelligence, GraphRAG, Security, Hybrid Retrieval
Difficulty: Advanced
Intelligence¶
-
Criminal Network Analysis --- Analyze criminal networks with graph analytics and key player detection using OSINT RSS feeds, deduplication, and network centrality analysis.
Topics: Forensics, Social Network Analysis, Deduplication, Graph Analytics
Difficulty: Advanced
-
Intelligence Analysis Orchestrator Worker --- Comprehensive intelligence analysis using pipeline orchestrator with multiple RSS feeds, conflict detection, and multi-source integration.
Topics: Intelligence Analysis, Pipeline Orchestration, Multi-Source Integration, Conflict Resolution
Difficulty: Advanced
Renewable Energy¶
-
Energy Market Analysis --- Analyzing trends and pricing in the renewable energy market using energy RSS feeds, EIA API, temporal knowledge graphs, TemporalPatternDetector, and seed data integration.
Topics: Energy, Time Series, Temporal Analysis, Trend Prediction
Difficulty: Intermediate
Supply Chain¶
-
Supply Chain Data Integration --- Integrating supply chain data to optimize logistics and reduce risk using logistics RSS feeds, deduplication, and multi-source relationship mapping.
Topics: Logistics, Risk Management, Data Integration, Deduplication
Difficulty: Advanced
🛠️ How to Run¶
To run these notebooks locally:
-
Install Semantica from PyPI (recommended):
-
Or install from source (for development):
-
Launch Jupyter: