Graph Theory: Introduction

Before heading into details of how we store, represent, and traverse various kinds of graphs, this post is more of a ramp-up to better understand what graphs are and the different kinds from a computer science point of view, rather than a mathematical one. So, no proofs and equations, mostly just diagrams and implementation details, with an emphasis on how to apply graph theory to real-world applications.

Graph theory is the mathematical theory of the properties and applications of graphs/networks, which is just a collection of objects that are all interconnected.

Graph theory is a broad enough topic to say it can be applied to almost any problem—first (maybe not first, make it 21st) thing in the morning, choosing what to wear - given all of the wardrobe, how many sets of clothes can I make by choosing one from each category (by category, I mean tops, bottoms, shoes, hats, and glasses)? While this sounds like a math problem to find permutations, using graphs to visualize each clothing item as a node and edges to represent relationships between them can be helpful.

Another everyday example is the social network. A graph representation answers questions such as how many mutual friends or how many degrees of separation exist between two people.

1. Types of Graphs

There are a lot of types of graphs, and it's important to understand the kind of graph you are dealing with. Let's go over the most commonly known graph variants.

1.1. Undirected Graph

The most simple kind of graph, where the edges have no orientation (bi-directional). i.e., edge (u, v) is identical to edge (v, u).

Example: A city interconnected by bi-directional roads. You can drive from one city to another and can retrace the same path back.

1.2. Directed Graph/Digraph

In contrast to an undirected graph, directed graphs or digraphs have edges that are directed/have orientation. Edge (u, v) represents that you can only go from node u to node v and not the other way around. As shown in the figure below, the edges are directed, indicated by the arrowheads on the edges between nodes.

Example: This graph could represent people who bought each other gifts. C and D got gifts for each other, E didn't get any nor give any, B got one from A, gave a gift to D, and sent a gift to itself.

1.3. Weighted Graphs

So far, we have seen unweighted graphs, but edges on graphs can contain weights to represent arbitrary values such as distance, cost, quantity, etc.

Weighted graphs can again be directed or undirected. An edge of a weighted graph can be denoted with (u, v, w), where w is the weight.

2. Special Graphs

While directed, undirected and weighted graphs covers the basic types, there are many other types of graphs governed by rules and restrictions.

2.1. Trees

A tree is simply a collection of nodes connected by directed (or undirected) edges with no cycles or loops (no node can be its own ancestor). A tree has N nodes and N-1 edges.

All of the above are indeed trees, even the left-most graph, which has no cycles and N-1 edges.

2.2. Rooted Trees

A related but totally different kind of graph is a rooted tree. It has a designated root node, where every edge either points away from or towards the root node. When edges point away from the root, it's called an out-tree (arborescence) and an in-tree (anti-arborescence) otherwise.

Out-trees are more commonly used than in-trees, so much so that out-trees are often referred to as just "trees."

2.3. Directed Acyclic Graphs (DAGs)

DAGs are directed acyclic graphs, i.e., with directed edges and no cycles or loops. DAGs play an important role and are very common in computer science, including dependency management, workflows, schedulers, and many more.

When dealing with DAGs, commonly used algorithms include finding the shortest path and topological sort (how to process nodes in a graph in the correct order considering dependencies).

Fun Fact: All out-trees are DAGs, but not all DAGs are out-trees.

DAG nodes can have multiple parents, meaning there can be multiple paths that eventually merge. Out-trees are DAGs with the restriction that a child can only have one parent. Another way to see it is that a tree is like single-class inheritance, and a DAG is like multiple-class inheritance.

2.4. Bipartite Graph

A bipartite graph is one whose vertices can be split into two independent groups, U and V, such that every edge connects between U and V. A bipartite graph is two-colorable, in other words, it is a graph in which every edge connects a vertex of one set (Example, set 1: red color) to a vertex of the other set (Example, set 2: blue color).

A common question is to find the maximum matching that can be created on a bipartite graph (covered in a follow-up post). For example, say red nodes are jobs and blue nodes are people. The problem is to determine how many people can be matched to jobs.

2.5. Complete Graph

In a complete graph, there is a unique edge between every pair of nodes, i.e., every node is connected to every other node except itself. A complete graph with n vertices is denoted by the graph K_n.

A complete graph is often seen as the worst-case possible graph and is used for performance testing.

3. Graph Representation

The next important aspect is the data structure we use to represent a graph, which can have a huge impact on performance. The simplest and most common way is using an adjacency matrix.

3.1. Adjacency Matrix

An adjacency matrix m represents a graph, where m[i][j] is the edge weight of going from node i to node j. Unless specified, it's often assumed that the edge of going from a node to itself has zero cost. Which is why the diagonal of the matrix has all zeroes.

For example, the weight of the edge going from node D to node B is 5, as represented in the matrix.

Pros:

Space efficient for representing dense graphs.
Edge weight lookup is constant time: O(1).
Simplest graph representation.

Cons:

Requires O(V²) space, where V is the number of nodes/vertices.
Iterating over all edges requires O(V²) time.

The quadratic space complexity becomes less feasible when dealing with networks with nodes in the order of thousands or more.

3.2. Adjacency List

The other alternative to the adjacency matrix is the adjacency list. This is a way to represent the graph as a map from nodes to lists of outgoing edges. In other words, each node tracks all its outgoing edges. i.e., N₁ = [(N_x, W), (N_y, W), ...]

For example, Node C has 3 outgoing edges, so the map entry for Node C has those 3 entries, each represented by the combination of the destination node and edge weight/cost.

Pros:

Space efficient for representing sparse graphs (no extra space for unused edges).
Iterating over all edges is efficient.

Cons:

Less space efficient for dense graphs.
Edge weight lookup is O(E), where E is the number of edges of a node.
Slightly more complex graph representation.

Adjacency lists are still very commonly used, since edge weight lookup is not a common use case and many real-world use cases involve sparse graphs.

3.3. Edge List

The edge list takes an overly simplified approach to represent a graph simply as an unordered list of edges with the source node, destination node, and the weight. For example, (u, v, w) represents the cost from node u to node v as w.

Pros:

Space efficient for representing sparse graphs.
Iterating over all edges is efficient.
Overly simple structure/representation.

Cons:

Less space efficient for dense graphs.
Edge weight lookup is O(E), where E is the number of edges.

Despite the seeming simplicity and lack of structure, edge lists do come in handy for a variety of problems and algorithms.

4. Graph Problems

One of the best approaches to dealing with graph problems is to better understand and familiarize yourself with common graph theory algorithms. Many other problems can be reduced to a known graph problem.

Does the graph already exist, or is it to be derived/constructed?
Is the graph directed or undirected?
Is it a weighted graph (edges)?
Is it a sparse graph or a dense graph?
Based on all of the above, should I use an adjacency matrix, adjacency list, edge list, or other structures?

4.1. Shortest Path Problem

Given a weighted graph, find the shortest path of edges from Node A to Node B (source and destination nodes).

Algorithms: Breadth First Search (unweighted graph), Dijkstra's, Bellman-Ford, Floyd-Warshall, A*, and many more.

In the example, to find the shortest path from Node A to Node H, the sum of all the weights/costs of the path taken should be the least.

4.2. Connectivity

Along the same lines, to determine if connectivity exists from Node A to Node B. In other words, given the nodes, do they exist in the same network/graph? This is quite commonly used in communication networks such as WiFi, Thread, Zigbee, etc.

Algorithms: Any search algorithm such as BFS (Breadth First Search) or DFS (Depth First Search).

4.3. Negative Cycles

To detect negative cycles in a directed graph. Also known as a negative-weight cycle, it is a cycle in a graph whose edges sum to a negative value.

In the example, nodes B, C, and D form a negative cycle, where the sum of costs is -1, which can lead to cycling endlessly with a smaller cost for every iteration. For instance, finding the shortest path without detecting negative cycles would be a trap, never escaping out of it.

Detecting negative cycles has other applications, such as currency arbitrage. In this context, assign currencies to different vertices, and let the edge weight represent the exchange rate.

Algorithms to detect negative cycles: Bellman-Ford and Floyd-Warshall.

4.4. Strongly Connected Components

SSCs are self-contained cycles within a directed graph, i.e., every vertex/node in a cycle can reach every other vertex in the same cycle.

If each strongly connected component is contracted to a single vertex, the resulting graph is a directed acyclic graph (DAG), the condensation of Graph G.

Algorithms: Tarjan's SSC and Kosaraju's algorithm.

4.5. Traveling Salesman Problem

or the travelling salesperson problem (TSP) asks "Given a list of cities and the distances between each pair of cities, what is the shortest possible route that visits each city exactly once and returns to the origin city?" It is an NP-hard problem.

For the above graph, the TSP (Traveling Salesman Problem) solution has a cost of 9 to travel from Node A to all the other nodes and back to Node A.

Algorithms: Held-Karp, Brand and Bound, Approximation (Ex: Ant Colony) algorithms

4.6. Bridges

A bridge, cut-edge, or cut-arc is an edge of a graph whose deletion increases the graph's number of connected components (islands or clusters).

Detecting bridges is important as they often signify bottlenecks, weak points, or vulnerabilities in a graph. For instance, it's common to ensure that a mesh network is a bridgeless graph.

4.7. Articulation Points

An articulation point, or cut vertex, is similar to a bridge, but instead of edges, they are nodes. When removed, they increase the number of connected components.

In the same graph as for bridges, the nodes connected by the bridges are articulation points.

4.8. Minimum Spanning Tree (MST)

A minimum spanning tree (MST) or minimum weight spanning tree is a subset of the edges of a connected, edge-weighted undirected graph that connects all the vertices together, without any cycles and with the minimum possible total edge weight/cost.

A graph can have multiple minimum spanning trees with the same cost, but the resulting trees (MSTs) are not unique. Common use cases include designing a least-cost network, transportation networks, and more.

Algorithms: Kruskal's, Prim's and Boruvka's algorithm.

4.7. Flow Network

Flow network or the transportation network is a directed graph where the edge weight represents "capacity." The amount of flow on an edge cannot exceed the capacity of the edge. Capacity can represent fluids in a pipe, currents in an electrical circuit, cars on a road, etc.

Problem: For an infinite input to reach the sink, what's the max flow? With this, it's easier to see bottlenecks in the network that slow the flow. Correlating to the example, max flow would be the number of cars, volume of fluid, etc.

Also, there cannot be blockages in the network/flow, the amount of flow into a node equals the amount of flow out of it.

5. Conclusion

With the basics of graph theory covered, including various types of graphs and their representations, we've laid the groundwork for understanding how to efficiently store, represent, and traverse graphs in real-world applications. The next set of posts on Graph Theory will be a deep dive into specific problems and algorithms.

6. References

[1] W. Fiset, "Algorithms repository," GitHub, 2017. [Online]. Available: https://github.com/williamfiset/Algorithms.
[2] V. Schwartz, "Currency Arbitrage and Graphs (2)," Reasonable Deviations, Apr. 21, 2019. [Online]. Available: https://reasonabledeviations.com/2019/04/21/currency-arbitrage-graphs-2/. 
[3] Wikipedia, "Graph theory," Wikipedia, The Free Encyclopedia. [Online]. Available: https://en.wikipedia.org/wiki/Graph_theory.
[4] Wikipedia, "Bidirected graph," Wikipedia, The Free Encyclopedia. [Online]. Available: https://en.wikipedia.org/wiki/Bidirected_graph.
[5] Wikipedia, "Directed graph," Wikipedia, The Free Encyclopedia. [Online]. Available: https://en.wikipedia.org/wiki/Directed_graph.
[6] Wikipedia, "Tree (graph theory)," Wikipedia, The Free Encyclopedia. [Online]. Available: https://en.wikipedia.org/wiki/Tree_(graph_theory).
[7] Wikipedia, "Directed acyclic graph," Wikipedia, The Free Encyclopedia. [Online]. Available: https://en.wikipedia.org/wiki/Directed_acyclic_graph.
[8] Wikipedia, "Bipartite graph," Wikipedia, The Free Encyclopedia. [Online]. Available: https://en.wikipedia.org/wiki/Bipartite_graph.
[9] Wikipedia, "Complete graph," Wikipedia, The Free Encyclopedia. [Online]. Available: https://en.wikipedia.org/wiki/Complete_graph.
[10] Wikipedia, "Adjacency matrix," Wikipedia, The Free Encyclopedia. [Online]. Available: https://en.wikipedia.org/wiki/Adjacency_matrix.
[11] Wikipedia, "Adjacency list," Wikipedia, The Free Encyclopedia. [Online]. Available: https://en.wikipedia.org/wiki/Adjacency_list.
[12] Wikipedia, "Breadth-first search," Wikipedia, The Free Encyclopedia. [Online]. Available: https://en.wikipedia.org/wiki/Breadth-first_search.
[13] Wikipedia, "Depth-first search," Wikipedia, The Free Encyclopedia. [Online]. Available: https://en.wikipedia.org/wiki/Depth-first_search.
[14] Wikipedia, "Bellman–Ford algorithm," Wikipedia, The Free Encyclopedia. [Online]. Available: https://en.wikipedia.org/wiki/Bellman%E2%80%93Ford_algorithm.
[15] Wikipedia, "Floyd–Warshall algorithm," Wikipedia, The Free Encyclopedia. [Online]. Available: https://en.wikipedia.org/wiki/Floyd%E2%80%93Warshall_algorithm.
[16] Wikipedia, "Strongly connected component," Wikipedia, The Free Encyclopedia. [Online]. Available: https://en.wikipedia.org/wiki/Strongly_connected_component.
[17] Wikipedia, "Travelling salesman problem," Wikipedia, The Free Encyclopedia. [Online]. Available: https://en.wikipedia.org/wiki/Travelling_salesman_problem.
[18] Wikipedia, "Held–Karp algorithm," Wikipedia, The Free Encyclopedia. [Online]. Available: https://en.wikipedia.org/wiki/Held%E2%80%93Karp_algorithm.
[19] Wikipedia, "Bridge (graph theory)," Wikipedia, The Free Encyclopedia. [Online]. Available: https://en.wikipedia.org/wiki/Bridge_(graph_theory).
[20] Wikipedia, "Minimum spanning tree," Wikipedia, The Free Encyclopedia. [Online]. Available: https://en.wikipedia.org/wiki/Minimum_spanning_tree.
[21] Wikipedia, "Flow network," Wikipedia, The Free Encyclopedia. [Online]. Available: https://en.wikipedia.org/wiki/Flow_network.

1. Types of Graphs

1.1. Undirected Graph

1.2. Directed Graph/Digraph

1.3. Weighted Graphs

2. Special Graphs

2.1. Trees

2.2. Rooted Trees

2.3. Directed Acyclic Graphs (DAGs)

2.4. Bipartite Graph

2.5. Complete Graph

3. Graph Representation

3.1. Adjacency Matrix

3.2. Adjacency List

3.3. Edge List

4. Graph Problems

4.1. Shortest Path Problem

4.2. Connectivity

4.3. Negative Cycles

4.4. Strongly Connected Components

4.5. Traveling Salesman Problem

4.6. Bridges

4.7. Articulation Points

4.8. Minimum Spanning Tree (MST)

4.7. Flow Network

5. Conclusion

6. References

#index table of contents