Skip to content

🍳 Semantica Cookbook

Welcome to the Semantica Cookbook!

This collection of Jupyter notebooks is designed to take you from a beginner to an expert in building semantic AI applications. Whether you're looking for quick recipes or deep-dive tutorials, you'll find it here.

How to use this Cookbook

Prerequisites

Before running these notebooks, ensure you have: - Python 3.8+ installed - A basic understanding of Python and Jupyter - An OpenAI API key (for most examples)

Installation

Install Semantica from PyPI (recommended):

pip install semantica
# Or with all optional dependencies:
pip install semantica[all]

For more installation options, see the Installation Guide.


Hand-picked tutorials to show you the power of Semantica.

  • GraphRAG Complete --- Build a production-ready Graph Retrieval Augmented Generation system.

    Topics: RAG, LLMs, Vector Search, Graph Traversal

    Difficulty: Advanced

    Open Notebook

  • RAG vs. GraphRAG Comparison --- Side-by-side comparison of Standard RAG vs. GraphRAG using real-world data.

    Topics: RAG, GraphRAG, Benchmarking, Visualization

    Difficulty: Intermediate

    Open Notebook

  • GraphRAG Complete --- Build a production-ready Graph Retrieval Augmented Generation system.

    New Features: Graph Validation, Logical Inference, Hybrid Context.

    Topics: RAG, LLMs, Vector Search, Graph Traversal

    Difficulty: Advanced

    Open Notebook

  • RAG vs. GraphRAG Comparison --- Side-by-side comparison of Standard RAG vs. GraphRAG using real-world data.

    New Features: Inference-Enhanced GraphRAG, Reasoning Gap Analysis.

    Topics: RAG, GraphRAG, Benchmarking, Visualization

    Difficulty: Intermediate

    Open Notebook

  • Your First Knowledge Graph --- Go from raw text to a queryable knowledge graph in 20 minutes.

    Topics: Extraction, Graph Construction, Visualization

    Difficulty: Beginner

    Open Notebook

  • Real-Time Anomaly Detection --- Detect anomalies in streaming data using dynamic graphs.

    Topics: Streaming, Security, Dynamic Graphs

    Difficulty: Advanced

    Open Notebook


🏁 Core Tutorials

Essential guides to master the Semantica framework.

  • Welcome to Semantica --- An interactive introduction to the framework's core philosophy and all modules including ingestion, parsing, extraction, knowledge graphs, embeddings, and more.

    Topics: Framework Overview, Architecture, All Modules

    Difficulty: Beginner

    Open Notebook

  • Data Ingestion --- Techniques for loading data from multiple sources using FileIngestor, WebIngestor, FeedIngestor, StreamIngestor, RepoIngestor, EmailIngestor, DBIngestor, and MCPIngestor.

    Topics: File Ingestion, Web Scraping, Database Integration, Streams, Feeds, Repositories, Email, MCP

    Difficulty: Beginner

    Open Notebook

  • Document Parsing --- Extracting clean text from complex formats like PDF, DOCX, and HTML.

    Topics: OCR, PDF Parsing, Text Extraction

    Difficulty: Beginner

    Open Notebook

  • Data Normalization --- Pipelines for cleaning, normalizing, and preparing text.

    Topics: Text Cleaning, Unicode, Formatting

    Difficulty: Beginner

    Open Notebook

  • Entity Extraction --- Using NER to identify people, organizations, and custom entities.

    Topics: NER, Spacy, LLM Extraction

    Difficulty: Beginner

    Open Notebook

  • Relation Extraction --- Discovering and classifying relationships between entities.

    Topics: Relation Classification, Dependency Parsing

    Difficulty: Beginner

    Open Notebook

  • Embedding Generation --- Creating and managing vector embeddings for semantic search.

    Topics: Embeddings, OpenAI, HuggingFace

    Difficulty: Intermediate

    Open Notebook

  • Vector Store --- Setting up vector stores for similarity search and retrieval.

    Difficulty: Intermediate

    Open Notebook

  • Graph Store --- Persisting knowledge graphs in Neo4j or FalkorDB.

    Topics: Neo4j, Cypher, Persistence

    Difficulty: Intermediate

    Open Notebook

  • Ontology --- Defining domain schemas and ontologies to structure your data.

    Topics: OWL, RDF, Schema Design

    Difficulty: Intermediate

    Open Notebook


🧠 Advanced Concepts

Deep dive into advanced features, customization, and complex workflows.

  • Advanced Extraction --- Custom extractors, LLM-based extraction, and complex pattern matching.

    Topics: Custom Models, Regex, LLMs

    Difficulty: Advanced

    Open Notebook

  • :material-chart-network: Advanced Graph Analytics --- Centrality, community detection, and pathfinding algorithms.

    Topics: PageRank, Louvain, Shortest Path

    Difficulty: Advanced

    Open Notebook

  • Advanced Context Engineering --- Build a production-grade memory system for AI agents using persistent Vector (FAISS) and Graph (Neo4j) stores.

    Topics: Agent Memory, GraphRAG, Entity Injection, Lifecycle Management

    Difficulty: Advanced

    Open Notebook

  • Complete Visualization Suite --- Creating interactive, publication-ready visualizations of your graphs.

    Topics: PyVis, NetworkX, D3.js

    Difficulty: Intermediate

    Open Notebook

  • Conflict Resolution --- Strategies for handling contradictory information from multiple sources.

    Topics: Truth Discovery, Voting, Confidence

    Difficulty: Advanced

    Open Notebook

  • Multi-Format Export --- Exporting to RDF, OWL, JSON-LD, and NetworkX formats.

    Topics: Serialization, Interoperability

    Difficulty: Intermediate

    Open Notebook

  • Multi-Source Integration --- Merging data from disparate sources into a unified graph.

    Topics: Entity Resolution, Merging, Fusion

    Difficulty: Advanced

    Open Notebook

  • Pipeline Orchestration --- Building robust, automated data processing pipelines.

    Topics: Workflows, Automation, Error Handling

    Difficulty: Advanced

    Open Notebook

  • Reasoning and Inference --- Using logical reasoning to infer new knowledge from existing facts.

    Topics: Logic Rules, Inference Engines

    Difficulty: Advanced

    Open Notebook - Semantic Layer Construction


    Building a semantic layer over your data warehouse or lake.

    Topics: Semantic Layer, Data Warehouse

    Difficulty: Advanced

    Open Notebook

  • Temporal Knowledge Graphs --- Modeling and querying data that changes over time.

    Topics: Time Series, Temporal Logic

    Difficulty: Advanced

    Open Notebook


🏭 Industry Use Cases

Real-world examples and end-to-end applications across various industries.

Biomedical

  • Drug Discovery Pipeline --- Accelerating drug discovery by connecting genes, proteins, and drugs using PubMed RSS feeds, entity-aware chunking, GraphRAG, and vector similarity search.

    Topics: Bioinformatics, KG Construction, GraphRAG, Vector Search

    Difficulty: Advanced

    Open Notebook

  • Genomic Variant Analysis --- Analyzing genomic variants and their implications for disease using bioRxiv RSS feeds, temporal knowledge graphs, deduplication, and pathway analysis.

    Topics: Genomics, Variant Calling, Temporal KGs, Graph Analytics

    Difficulty: Advanced

    Open Notebook

Finance

  • Financial Data Integration MCP --- Merging financial data from Alpha Vantage API, MCP servers, RSS feeds, and market feeds with seed data integration.

    Topics: Finance, Data Fusion, MCP Integration, Real-Time Ingestion

    Difficulty: Intermediate

    Open Notebook

  • Fraud Detection --- Identifying fraudulent activities and patterns in transaction networks using temporal knowledge graphs, conflict detection, and pattern recognition.

    Topics: Anomaly Detection, Graph Mining, Temporal Analysis, Pattern Detection

    Difficulty: Advanced

    Open Notebook

Blockchain

  • DeFi Protocol Intelligence --- Analyzing decentralized finance protocols and transaction flows using CoinDesk RSS feeds, ontology-aware chunking, conflict detection, and ontology generation.

    Topics: Blockchain, DeFi, Smart Contracts, Ontology, Conflict Resolution

    Difficulty: Advanced

    Open Notebook

  • Transaction Network Analysis --- Mapping and analyzing blockchain transaction networks using blockchain APIs, deduplication, and network pattern detection.

    Topics: Blockchain Analytics, Network Analysis, Deduplication, Pattern Detection

    Difficulty: Advanced

    Open Notebook

Cybersecurity

  • Real-Time Anomaly Detection --- Detecting anomalies in real-time network traffic streams using CVE RSS feeds, Kafka streams, temporal knowledge graphs, and sentence chunking.

    Topics: Network Security, Streaming, Temporal KGs, Pattern Detection

    Difficulty: Advanced

    Open Notebook

  • Threat Intelligence Hybrid RAG --- Combining enhanced GraphRAG with threat intelligence for security insights using security RSS feeds, entity-aware chunking, deduplication, and temporal knowledge graphs.

    Topics: Threat Intelligence, GraphRAG, Security, Hybrid Retrieval

    Difficulty: Advanced

    Open Notebook

Intelligence

  • Criminal Network Analysis --- Analyze criminal networks with graph analytics and key player detection using OSINT RSS feeds, deduplication, and network centrality analysis.

    Topics: Forensics, Social Network Analysis, Deduplication, Graph Analytics

    Difficulty: Advanced

    Open Notebook

  • Intelligence Analysis Orchestrator Worker --- Comprehensive intelligence analysis using pipeline orchestrator with multiple RSS feeds, conflict detection, and multi-source integration.

    Topics: Intelligence Analysis, Pipeline Orchestration, Multi-Source Integration, Conflict Resolution

    Difficulty: Advanced

    Open Notebook

Renewable Energy

  • Energy Market Analysis --- Analyzing trends and pricing in the renewable energy market using energy RSS feeds, EIA API, temporal knowledge graphs, TemporalPatternDetector, and seed data integration.

    Topics: Energy, Time Series, Temporal Analysis, Trend Prediction

    Difficulty: Intermediate

    Open Notebook

Supply Chain

  • Supply Chain Data Integration --- Integrating supply chain data to optimize logistics and reduce risk using logistics RSS feeds, deduplication, and multi-source relationship mapping.

    Topics: Logistics, Risk Management, Data Integration, Deduplication

    Difficulty: Advanced

    Open Notebook


🛠️ How to Run

To run these notebooks locally:

  1. Install Semantica from PyPI (recommended):

    pip install semantica[all]
    pip install jupyter
    

  2. Or install from source (for development):

    git clone https://github.com/Hawksight-AI/semantica.git
    cd semantica
    pip install -e .[all]
    pip install jupyter
    

  3. Launch Jupyter:

    jupyter notebook
    

Using Docker

You can also run the cookbook using Docker:

docker run -p 8888:8888 hawksight/semantica-cookbook