2 min read

Graph Traversal with Recursive SQL

I didn’t know this was possible but apparently one can do recursive traversal of graphs with SQL.

Consider the following graph:

import networkx as nx
import matplotlib.pyplot as plt
edges = [
    ('A', 'C'), 
    ('B', 'C'), 
    ('C', 'A'), 
    ('C', 'D'),
    ('B', 'D'), 
    ('B', 'E'), 
    ('E', 'D'), 
    ('D', 'F'), 
    ('A', 'G')
]
G = nx.DiGraph(directed=True)
G.add_edges_from(edges)
fig = plt.figure(figsize=(10, 10))
ax = fig.subplots()
nx.draw_networkx(
	G, 
	pos=nx.spring_layout(G, seed=62), 
	with_labels=True, 
	arrows=True, 
	ax=ax, 
	**{
    	'arrowstyle': '-|>',
    	'arrowsize': 12,
    	'node_size': 1500,
    	'connectionstyle': 'arc3,rad=0.1'
	}
)
plt.show()

Representing the above graph as a list of edges:

import pandas as pd
edges_frame = pd.DataFrame.from_records(edges, columns=['src', 'dst'])
print(edges_frame)
##   src dst
## 0   A   C
## 1   B   C
## 2   C   A
## 3   C   D
## 4   B   D
## 5   B   E
## 6   E   D
## 7   D   F
## 8   A   G

We can perform a query to recursively retrieve all nodes reachable from A:

import duckdb
con = duckdb.connect(database=':memory:')
con.register('graph', edges_frame)
con.execute('''
WITH RECURSIVE reached AS (
    SELECT dst AS node FROM graph WHERE src = 'A'
    UNION
    SELECT graph.dst AS node FROM graph, reached WHERE graph.src = reached.node
)
SELECT * FROM reached
''').fetchall()
## [('C',), ('G',), ('A',), ('D',), ('F',)]

The general form of a recursive query looks like the following:

WITH RECURSIVE accumulated AS (
	[SELECT THE INITIAL ELEMENTS]
	UNION / UNION ALL
	[SELECT UPCOMING ELEMENTS TO BE ADDED TO accumulated BASED ON CURRENT ELEMENTS IN accumulated]
)
SELECT * FROM accumulated;

See SQLite’s documentation on the WITH statement for more examples of SQL Wizardry involving recursive queries.