Adventures in Modularity

I’ve been more engaged with community detection in networks lately.  Community detection, what’s listed as the Modularity option in Gephi’s statistics toolbox and referred to as such in the literature, relies on a variety of methods to try to identify self-similar regions within a network.  I’ve only used Gephi to look at Modularity, which relies on the Louvain or Blondel method described here (which provides several useful references to other types of community detection).   These algorithms, as described by the authors, work by, “decomposing the networks into sub-units or communities, which are sets of highly inter-connected nodes” such that you have sub-graphs in your network that more highly resemble themselves than they do the rest of the network.  Modularity can be measured by comparing the density of links within a module to the density of links between modules, and while that measurement indicates the strength of such communities from a network perspective, it does not obviate interpretation, as demonstrated below.

It’s a seductive tool, because it quickly and easily draws regions in a network, and regions of a network are a form of compression of a very large dataset into a few more easily annotated or theorized units than the thousands or millions of nodes and edges one has to begin with.  You can also take these sub-graphs and turn them into meta-nodes, visually representing that compression, or deal with them separately (very useful for networks that are so densely connected as to defy easy interpretation of structure).  I’ve never quite been comfortable with modularity, despite how straightforward it looks in code, and I’m particularly piqued now that it has proved useful and confounding in two separate network datasets that I’m examining.  I’ll focus on the confounding one below.

To properly model the effect of changing wind conditions on sea transport in the Roman Empire, I developed a network cost surface by creating a grid of .1 degree by .1 degree centroids across the study area.  That is, to put it mildly, quite a few points.

40,000 points at .1 degree spacingThe hazy gray area is what the resultant 40,000+ points look like zoomed out.  Each of these points has a directed connection to the eight points around it, with an ordinal or cardinal distance and the actual length of the connection (for those of you that have forgotten your geography class or, like Stanford, misplaced your geography department, remember that degrees vary in length depending on where they are on the globe).  The result, up close, looks like this:

Directed cost network gridAs is immediately obvious, this allows for simple directional costs to be applied that can be modified based on the time of the query (to account for changing winds by month) to give you a least cost path from one part of the Mediterranean/Black Sea/Atlantic to another.

Directed path through a networkIn some cases, such as the above, such a least cost path involves overshooting the target and coming back around.  In the real thing, you see variation such as that found below, which track paths to and from various points, indicating in some cases the same path to and from a point, in other cases minor to significant variation.

Draft paths for MarchKeep in mind this is just a draft and the above paths don’t realize you can sail between Sicily and the Italian peninsula, among other issues.  I’ll get into building and using this mesh in a later post, though.  What I want to point out is that this enormous yet simple grid exhibits extremely high (~.95) modularity based solely on the boundary regions created by the coasts.  When looking at those communities (and not taking into account differences in cost based on wind speed) we can see the sea divided into clear, tantalizing regions.

Modules in the MediterraneanWhether these network modules correspond to actual regions is another question entirely (one that I am not yet in a position to answer).  Dropping out the edges between these regions shows a board game-like set of sea zones of varying sizes and connection:

European sea zones

Italy ZonesSicily ZonesAs alluded to above, we can collapse these communities into meta-nodes and look at the connections between zones to see another tantalizing compression of the sea into 44 interconnected areas (far more manageable than even 40,000 points in a grid, much less the actual seas themselves).

Connections between sea zonesNow, before I start naming them and digging up historical and historiographical evidence of the legitimacy of this algorithmically generated view of the sea, if I run that modularity test again, I will get another set of different modules.  There’s always some overlap, and it seems that there are typical fractures and cleavages among the modules that may indicate this is not completely random.  As I said above, I’m confounded by it all.  Perhaps these network communities are real cohesive areas that correspond with historical and cultural regions.  Or perhaps it’s an artifact of a method not designed for a giant grid.  I built this network for the purpose of running directed, dynamic cost paths, and for that purpose it works splendidly, and only ran a check for modularity because it’s easy (centrality and betweenness and closeness show no such significant patterns, except perhaps giving a simple method for defining coastal zones).  If anyone reading this happens to have a ready explanation for either such reason (or another possible explanation) please feel free to let me know.

And if anyone wants to update their World in Flames map using network analysis, it’s worth considering.

This entry was posted in Graph Data Model, Spatial Humanities, Visualization. Bookmark the permalink.

Comments are closed.