I didn’t know this was possible but apparently one can do recursive traversal of graphs with SQL.
Consider the following graph:
import networkx as nx
import matplotlib.pyplot as plt
edges = [
('A', 'C'),
('B', 'C'),
('C', 'A'),
('C', 'D'),
('B', 'D'),
('B', 'E'),
('E', 'D'),
('D', 'F'),
('A', 'G')
]
G = nx.DiGraph(directed=True)
G.add_edges_from(edges)
fig = plt.figure(figsize=(10, 10))
ax = fig.subplots()
nx.draw_networkx(
G,
pos=nx.spring_layout(G, seed=62),
with_labels=True,
arrows=True,
ax=ax,
**{
'arrowstyle': '-|>',
'arrowsize': 12,
'node_size': 1500,
'connectionstyle': 'arc3,rad=0.1'
}
)
plt.show()
Representing the above graph as a list of edges:
import pandas as pd
edges_frame = pd.DataFrame.from_records(edges, columns=['src', 'dst'])
print(edges_frame)
## src dst
## 0 A C
## 1 B C
## 2 C A
## 3 C D
## 4 B D
## 5 B E
## 6 E D
## 7 D F
## 8 A G
We can perform a query to recursively retrieve all nodes reachable from A
:
import duckdb
con = duckdb.connect(database=':memory:')
con.register('graph', edges_frame)
con.execute('''
WITH RECURSIVE reached AS (
SELECT dst AS node FROM graph WHERE src = 'A'
UNION
SELECT graph.dst AS node FROM graph, reached WHERE graph.src = reached.node
)
SELECT * FROM reached
''').fetchall()
## [('C',), ('G',), ('A',), ('D',), ('F',)]
The general form of a recursive query looks like the following:
WITH RECURSIVE accumulated AS (
[SELECT THE INITIAL ELEMENTS]
UNION / UNION ALL
[SELECT UPCOMING ELEMENTS TO BE ADDED TO accumulated BASED ON CURRENT ELEMENTS IN accumulated]
)
SELECT * FROM accumulated;
See SQLite’s documentation on the WITH
statement for more examples of SQL Wizardry involving recursive queries.