Integrating Neo4j with AI: Building Knowledge Graphs for Enhanced Learning
February 15, 2024
Integrating Neo4j with AI: Building Knowledge Graphs (That Actually Help)
I like Neo4j because it lets me say the quiet part out loud: most data is relationships. When a vector search keeps handing me “similar stuff” but misses the why behind it, I reach for a graph. This post is the playbook I use when I want AI systems to reason with structure, not just proximity.
When graphs beat vectors
- You care about explanations: “why is this relevant?” -> path queries make it visible
- You need typed relationships: teaches, requires, authored_by, cites, etc.
- You want multi-hop reasoning: “find topics I should learn next based on what I know now”
- You need governance: explicit nodes/edges are easier to audit than fuzzy embeddings
I still use embeddings—often both: vectors for recall, graph for reasoning.
Minimal setup I reuse
from neo4j import GraphDatabase
NEO4J_URI = "bolt://localhost:7687"
NEO4J_USER = "neo4j"
NEO4J_PASS = "password" # use env vars/secrets in real code
driver = GraphDatabase.driver(NEO4J_URI, auth=(NEO4J_USER, NEO4J_PASS))
# simple write helper
def run_write(tx, query, **params):
tx.run(query, **params)
# simple read helper
def run_read(tx, query, **params):
return [r.data() for r in tx.run(query, **params)]
with driver.session() as session:
# upsert a tiny learning graph
session.execute_write(
run_write,
"""
MERGE (p:Person {name: $name})
MERGE (t1:Topic {name: $knows})
MERGE (t2:Topic {name: $next})
MERGE (p)-[:KNOWS]->(t1)
MERGE (t2)-[:REQUIRES]->(t1)
""",
name="Amit", knows="RAG", next="LangGraph"
)
# query: what should Amit learn next?
results = session.execute_read(
run_read,
"""
MATCH (p:Person {name: $name})-[:KNOWS]->(t:Topic)
MATCH (n:Topic)-[:REQUIRES]->(t)
RETURN DISTINCT n.name AS next_topic
""",
name="Amit"
)
print(results)
That’s the whole idea: explicit relationships, simple queries, actionable outputs.
Patterns I rely on
- Upserts everywhere:
MERGE
keeps ingestion idempotent - Keep schema boring: a few node labels, a few relation types, strong conventions
- Attach embeddings to nodes for hybrid search (vector recall -> graph reasoning)
- Observe provenance: store
source
,chunk_id
,url
for every content node - Write CQRS-style helpers: small read/write functions for clean app code
Useful Cypher queries
- Paths that explain a recommendation
MATCH (a:Article {id: $id})-[:CITES|:REFERS_TO*1..3]->(b:Article)
RETURN b.id AS related, relationships(p) AS why
- Learning path from current skills
MATCH (p:Person {name: $name})-[:KNOWS]->(t:Topic)
MATCH path = (t)<-[:REQUIRES*1..2]-(n:Topic)
RETURN n.name AS next, path
- Hybrid: vector recall -> graph filter (pseudo)
# 1) Vector search gets candidates
candidates = vector_store.similarity_search(query, k=25)
# 2) Pass candidate IDs into Cypher
MATCH (a:Article)
WHERE a.id IN $candidate_ids AND a.domain = $domain
RETURN a ORDER BY a.rank LIMIT 5
Where this fits in an AI system
- RAG: vectors fetch, graphs justify and chain related facts
- Agents: tools become nodes; decisions become edges; traces map to paths
- Analytics/BI: queries are now user-friendly because relationships are explicit
Final thoughts
Neo4j earns its keep when you need structure, explanations, and multi‑hop logic. Use it alongside embeddings, not instead of them. Keep the graph small but intentional, and your AI will sound smarter—not because it memorized more tokens, but because it finally understands the connections.