So long and thanks for all the digital humanities

This site will no longer be updated after this. There will be no new blog posts, nor explorations of TVTropes or Topic Modeling or occasional interviews with indie videogame developers or webcomic authors.

My first post here was back in August 2010, and was a short review of some work out of the Spatial History Project (which since then became part of the Center for Spatial and Textual Analysis). Since then, I’ve had 241 blog posts (and nearly as many comments–that’s a joke) on subjects ranging from Gephi to Gephi to Gephi, with occasional forays into reviewing digital scholarship and exploring new media. I wrote not to be read but rather I was trying to maintain a sort of CV through blogging. The material produced in this kind of work is not amenable to traditional academic acknowledgment, so my blog became my record of gray output. In fact, I was convinced no one was reading this blog, except for some good-natured Stanford types like Matt Jockers and Glen Worthey, until DH11, when I realized I should probably turn on analytics.

Digital Humanities Specialist Blog StatsIt was exciting to find out that I had 3000 visits a month, and later seeing that grow to 5000-6000. Not exactly big numbers for the Internet, but pretty good for a WordPress site where the closest thing to a cat picture was a grainy shot of Richard Stallman.

But this was always meant to be temporary, until there was a more established and centralized place for many people to post, and we finally have that here at Stanford. So, I will continue to write posts, but from now on publishing them on digitalhumanities.stanford.edu, where it’s my hope that the growing DH community at Stanford will coalesce and grow.

I’ll sign off with a link to my favorite piece on this site: Tecmo Bo Jackson as subject for the 2011 Digital Humanities Conference poster, and a reminder from BrokeP: “We think the internet is real.”

Posted in Natural Law | Comments Off

Is Digital Humanities Too Text-Heavy?

Last week was the marvelous international conference for digital humanities, held this year at beautiful University of Nebraska-Lincoln. Over the course of 4 days, I tried desperately to meet people I only knew from tiny Twitter pictures or gitHub or even citations, and in between attempted to catch as many presentations as I could. The work on display, both in the presentations and the posters, ranged from information visualization used to examine poetry to the use of network analysis to study rhetoric, with projects deploying facial recognition, fuzzy GIS, topic modeling, and various other techniques and methods seen in digital humanities scholarship.

But what I took away from DH13 was something else entirely, a feeling that crystallized when I listened to Willard McCarty give his acceptance speech for the Roberto Busa Award, which is given to “recognise outstanding lifetime achievements in the application of information and communications technologies to humanistic research”. It was named for Father Busa, whose work with IBM on the Index Thomisticus is held out as one of the pioneering works in humanities computing, and later digital humanities.

But that transition in name wasn’t simply corporate rebranding. As Willard noted in his speech, the shift from calling the endeavor “humanities computing” to referring to it as “digital humanities” also came with a dramatic increase in popularity. It wasn’t the name that brought in all the new faces, rather the change in name signaled a shift from a practice involving a few scholars focused on analyzing literature to a messy “big tent” that roughly holds digital libraries, historical GIS, information visualization, network analysis, new media, and post-colonial digital theory.

Even had there not been the sudden inclusion of so many different scholarly agendas and methods, the increase in popularity is not so simple. The drastically increased output of humanities scholars using computational methods brings with it new modes of practice, and the significant increase in the accessibility of tools used to enact these methods brings with it practical and cultural effects seen in open source software and commons-based peer production. Ten years ago, someone “doing humanities computing” would have required much more in the way of technical resources and fallen into a much smaller convex hull of possible activities than someone “doing digital humanities” in 2013.

But a quick look at the abstracts shows how much the analysis of English Literature dominates a conference attended by archaeologists, area studies professors and librarians, network scientists, historians, etc. It seemed, at one point, that there was a 4-day author attribution/stylometrics track, while all the geospatial work had to be presented in a single, standing room only session. To be clear, that’s an exaggeration, and I haven’t done a serious analysis of the abstracts to support it, but I know through conversation and attending some extremely low-attendence but exciting sessions on the far side of the conference that I’m not the only one who felt it.

What makes this a difficult thing to measure and consider is that there’s text analysis and then there’s text analysis. Everyone does text analysis now, whether they’re looking at Korean kinship networks or Jane Austen, but there’s a difference between the well-established humanities computing approach that relies almost exclusively upon it and the more synthetic one that sees text analysis as one component of several.

I put a question mark in the title because I’m not sure, and this is based more on feeling than it is on empirical evidence. And even if it is the case that English literary analysis gets overrepresented because of its long history with humanities computing, I don’t think that means it should be rooted out by “good digital humanists” reporting on “known text wranglers”. But an international conference for a vibrant and diverse community of practice should be as reflective of that community as possible, and if that means we lose a couple authorship attribution sessions in favor of a few information visualization for post-colonial geographic network analysis sessions, then I’m okay with that.1

1I’d even be okay with it even if those new sessions didn’t involve information visualization.

 

 

 

Posted in Algorithmic Literacy, Big Data, Natural Law, New Literature, Spatial Humanities, Text Analysis | 3 Comments

How Collaboration Works and How It Can Fail

I’ve been working on research-oriented digital humanities projects ever since Ruth Mostern decided to pursue a database version of Hope Wright’s An Alphabetical List of Geographical Names in Sung China in 2007. The goals have varied–sometimes the purpose was to explore data and corpora and other times the intention from the very beginning was to produce an interactive publication. But regardless of the end result, my experience of collaboration between someone who was technically more savvy (myself) and someone who was deeply embedded in their discipline (generally a tenure-track faculty, sometimes a grad student or librarian) has generated a few lessons on how collaboration in digital humanities projects can succeed, and how it may not.

There’s been much written about collaboration and the growth of collaborative projects involving humanities scholars, including the excellent Collaborator’s Bill of Rights as well as rumination on what dangers collaboration may pose, such as my own article in JDH1-1. My writing here is focused on a more specific kind of collaboration then the general phenomenon covered by these. My work typically involves direct, long-term collaboration with a an individual scholar over a long but fixed period focused on a single project with some kind of defined end product. This is in contrast to the large-scale structured lab support provided by entities like the Scholar’s Lab at UVa as well as the more generalized support provided by a typical post-doc attached to a project, lab, or scholar. I can’t say if my experience is typical or not, though I’ve had the opportunity to see a few different support situations here at Stanford and at various universities, and it would seem that what follows has at least broad applicability.

Staff is an absurd category, alt-ac is a step in the right direction

Almost exactly two years ago, I offered up a tripartite view of academic production that presented librarians, faculty, and undergraduates as occupying different areas of a data (or content ecosystem) and that digital humanities represented a blurring of the boundaries between them. As much as digital humanities is defined by information visualization or computational approaches, it is also defined by faculty building collections, librarians doing research, and students wandering free between content consumption, content creation, and content management. Of these three groups, it is the the definition of a modern librarian, and by extension a university staff member in any unit supporting research, that is the most difficult to figure.*

The main reason for this is that there’s a professionalization of staff positions that does not exist in the other two groups. Anyone who has worked with undergraduate and graduate research assistants knows that their effort and engagement is not demanded but negotiated. Such is obviously the case with faculty working with other faculty. Staff, on the other hand, with their various layers of management and leadership, are service providers embedded in a more formal hierarchy. This distinction can be the source of tension in situations where faculty, students, and staff are working together to advance digital humanities scholarship.

The development of the concept of alt-ac as a kind of staff that engages more visibly in research has helped to advance our understanding of what staff can do in digital humanities, but I find many of the arguments wrapped up in concepts of fairness, labor equity, and social justice. All of these are important to me, but even a cursory understanding of how the academy as a whole has completely failed adjunct faculty will remind us that such lines of argumentation will fail to resonate.

Collaboration is more than good, it’s necessary

Fortunately, there are other reasons to foster healthy collaboration where faculty and alt-ac staff are on equal footing. The first and most obvious is that someone with expertise in the tools and methods being deployed in digital humanities research will by necessity take part in establishing and furthering the research agenda. In my experience, the most effective digital humanities work is done when a scholar has an innovative, sophisticated agenda that can be furthered by application of computational methods, digital publication, and/or the host of tools now available for analysis and representation. But that line of inquiry has, in every project that I’ve been part of and every project that I’ve been exposed to, mutated and grown to include more sophisticated and more methodologically-rich concepts provided by the nominal technical experts. There may come a time when research and publication of the kind being done in digital humanities is so well-established, or the techniques and methods so generally understood, that this is not the case, but until that time comes, you want a peer collaborator, not an employee.

The second reason for peer collaboration is because they’ll be honest about what can and can’t be done. Imposter syndrome is on full display in alt-ac, and most everyone I know in a position like mine who I’ve had a chance to read or work with is highly critical of her own capabilities when it comes to programming, statistics, and formal types of analysis. But many of us have been doing this kind of thing for quite some time, in a variety of ways and quite rigorously. I was creating data models to represent complex definitions of place back when George W. Bush was president. Despite the common protests of various digital humanities facilitators (for lack of a better term) they typically have the experience and expertise, or the awareness of channels to discover it, to be able to provide a scholar with a good indication of whether a line of research will be successful or not. And, critically, they won’t assume that the explanation would be too difficult for that scholar to understand.

The third reason is more cold-blooded, but long proven here in Silicon Valley: if you can’t pay a person what the position should entail, you need to entice them with ownership. I work on digital humanities projects like someone does at a start-up, including long hours and during periods that are nominally not designated as time to perform such work. While I was on vacation for a week last year, I finally had enough time without meetings to nail down the route-finding code for ORBIS. I didn’t do this because I was afraid of losing my job, quite the opposite. I did it because my name was on ORBIS, both literally and figuratively given the visibility of my work on the project. Because I’m invested in the projects I work on, it provides the same incentive that stock options do. If I can successfully complete a project that is high-profile and sophisticated and lauded as an advancement in the field, then the professional benefit more than makes up for the extra time that I spent to make it work.

The kind of investment and effort necessary to develop the innovative research and publication practices for these projects cannot be afforded by your typical scholar with a typical grant. The only way you can defray that cost is to provide some level of ownership of the creative process. That means something more than just a person’s name prominently displayed on a website, or co-authorship of a paper, or casual platitudes during a presentation about the cast-of-thousands necessary to build a thing. It means partnership and peer interaction.

Digital humanities scholarship continues to be a process of discovery rather than an established practice. I’ve written about Interactive Scholarly Works, and I consider this to be an emerging genre, but it has not yet emerged. There is no established blueprint for an ISW that some research can be molded into. Neither is there a “right” way to engage with humanities topics using the many other digital approaches available. To get there, to provide those first examples of what will one day, presumably, become patterns that can be stamped out in professional, assembly-line fashion, we in the library need to provide support for ambitious humanities scholars that want to pursue innovative new research, and we also need to be clear that the only way it will work is with real, healthy collaboration.

 

Maybe I just think that’s the case because I work in the library, though when I introduce myself as a librarian I get laughed at by faculty and “real” librarians (in a good-natured way, mind you).

 

Posted in Natural Law | Comments Off

Software Development

Diminishing elegance in coding

I’m coming to the end of a long digital humanities project that involves much coding at the database level and with Javascript for the user-facing frontend. It uses D3.js heavily, and does a few things that I think are innovative and exciting, and it leverages a large, esoteric database that features geospatial, network, temporal, and textual data. And I’ll be quite proud of it if it’s ever finished.

Except when it comes to the actual code.

Which is a mess.

We have only so much time to put into these projects, and they require investment across such a broad surface that it seems only natural that the better and more reusable and readable the code is, the more the content and user experience will suffer–and vice versa. Looking over the code, I can see the inefficient, primitive way I made Javascript do what I want, until I found out how to properly leverage array functions or d3.insert() syntax. There are places where I refactored the code to take advantage of these things–sometimes for speed and sometimes because of the deep neurotic understanding of what open source really means: That someone will actually look at this.

Posted in Algorithmic Literacy, D3 | Comments Off

Mesoanalysis

With Matt Jockers’ new book out, and the reviews already coming in, I’m starting to find the macroanalysis/microanalysis framework a little lacking. It’s not that I don’t think it a good approach, and it takes many forms in digital humanities scholarship. There are numerous examples of distant reading paired with close reading as per the classic Moretti definition in literature, or the HGIS flavor seen when Ruth Mostern contrasted state-level change with local administrative practices with Dividing the Realm in Order to Govern, or my own attempt to contrast strategic cartograms with an Oregon Trail-inspired “situated perspective” within the network, as I did with ORBIS|via. But while you can arbitrarily define macro and micro based on the research questions of a DH project, I think there needs to be a formal mesoanalytical layer defined somewhere between macro and micro.

It’s not that I think an arbitrary middle layer is needed, or that we need an exhaustive formal hierarchy like that found in ecology. I think that the meso layer is a necessary complement to the existing approaches. I think of distant reading or macroanalysis as focused on patterns in the data, and close reading or microanalysis as focused on sophisticated interpretation of case studies and their context. Data visualization and analysis patterns are well-established for dealing with these two cases, regardless of whether your macro scale is 3000 novels or terabytes of Twitter data. But, as I was touching on in the Visualization Across Disciplines forum at HASTAC, there’s a fertile space for the representation and analysis of functions and processes.

Partly, I’m in favor of developing mesoanalysis because I find I’m more capable of engaging meaningfully in the representation and analysis of these functions than I am in the micro and macro patterns of the objects being studied. This is natural for most of alt-ac, I suspect, where we’re asked to contribute to research and technical development for projects that are outside our fields of study. But I think it also grows out of digital humanities scholars’ constant unease with computational methods. I thought it was enough to know that something worked well in approaching certain sets of data, then I figured I needed to know how it was working so that I could explain the function to a scholar I was working with, but now I realize that there never is a best process, only a contingent chosen process for showing or analyzing data. As such, it’s incumbent upon me to represent the variations in that process and expose it as a formal construct so that scholars can modify it and critique it.

This goes back to building models, since models are aggregates of processes paired with data and informed by close reading of the data. And it’s driven by the increasing availability of interactive data visualization that lets one perturb such functions to see how they may or may not inflect the results. Xueqiao Xu’s pathfinding piece makes me think of my own Network Analysis toy and how useful it would be to have something like that to explain topic modeling, and community detection, and tasslecap transformations in GIS.

Anyone who codes, or plays video games, or has a particularly awesome refrigerator knows that functions come in many sizes. They can be rather complex, like the pathfinding functions Xueqiao Xu so deftly explores, or interesting more for their formal nature than their complexity, such as the choice in weights for full text searches. For one thing, the sheer variety of functional approaches to data is overwhelming. There’s a hundred page paper on different community detection algorithms for network analysis that I like to use to scare people who think the Modularity button in Gephi is a standard. Ben Schmidt’s recent call for more than one pattern detection method in computational approaches to text is in reaction to the assumption that topic modeling is the only approach to corpora. I don’t think this is a result of the naïveté of the DH community. These functions are usually so wedded to the data and choice of representation as to be explained in a pragmatic sense that when explained they’re given as simply the “best” or “most optimal” choice. But as I settle down to write about how we computed a Tragedy Index for an upcoming project, or how different centrality measures reveal not better or worse but different patterns within a network, I realize that our approach to functions needs to be a bit less pragmatic and bit more post-modern.

This tripartite formalization of digital humanities research also highlights the need for more engagement with the adoption of tools developed for pragmatic industry goals in scholarly research. We’re using Google’s patented formula to determine the importance of literary works, information retrieval aids for the study of poetry, and GIS techniques created for planning highways to understand environmental refugees. There’s more and more publication of code, and detailed explanation of that code such as Jockers’ explanation of how he optimizes data for topic modeling, but I’d like to see more functions expressed visually and interactively, to allow readers the same capacity to manipulate them as they could work with published data and analyze the narrative. This kind of research cannot be adequately reviewed if we limit ourselves to the macro and micro scales in our reading because it forces the reviewer to either choose between accepting or dismissing the functional component, rather than tweaking it or engaging with it in a more rigorous critique.

The original version of this post spelled it “mezoanalysis”, as you can see in the url.

Posted in Algorithmic Literacy, Big Data | Comments Off

The Digital Humanities as a Big Data Conference

The IEEE International Conference on Big Data in July will feature a workshop on Big Data in digital humanities scholarship–which its organizers refer to as Big Humanities.

It’s hard to tell what big data means these days. Is 30,000 British people enough? How about 50,000 species in a biodiversity database? Or 2600 diaries from an archaeological dig? What about ORBIS? There are only 2000 or so official route segments, but given all the permutations, there are trillions of possible “routes”. The simple fact is that big data in the big data world is very big (Twitter is up to 400 million tweets a day at last count) and that means you need to deal with the sum total of human writing or all the anime ever produced to even hope to claim that kind of big data status.

But I’m starting to think “big data” is just another way of saying “distant reading” or “macroanalysis” or “global perspective”. If that’s the case, then the you-must-be-this-tall to get into big data isn’t really that much of a restriction at all, because these techniques and methods can be deployed for quite small data. The real identifying trait is that all of these represent a computational approach to newly available data that are infeasible to analyze individually and which require some processing methods to transform from relatively unstructured data to aggregated, categorized, and/or quantified data.

Posted in Big Data, The Digital Humanities as... | Comments Off

The Digital Humanities as a Movement Expressed in a Method Enshrined In a Tool

Today marks the release of the Journal of Digital Humanities 2.1, focused on topic modeling and with myself and Scott Weingart as guest editors. It is an excellent collection of material about topic models and topic modeling and their application to humanities research. It even includes a section for community responses to be collected later and topic modeled and added to the issue in a month’s time.

Before I get into what I think is most interesting about a publication like this, I want to thank Matt Jockers, who, while noticeably absent from this issue, with the release of Macroanalysis will have more than enough publication this month to make up for it. I have a feeling Matt’s book will become required reading for a host of new DH scholars. Matt introduced me to topic modeling nearly three years ago, explained how it worked, taught me how to pronounce “Dirichlet”, and most importantly made clear that you couldn’t simply accept the results of the model without understanding how it worked and how it could fail. I also need to thank Glen Worthey, our digital humanities librarian here at Stanford, for helping me to realize what it was I was trying to get at in the intro to the issue by explaining to me the concept of a “synechdoche”, which is embedded in the claim Scott and I make about topic modeling and its relationship to this endeavor we clumsily refer to as digital humanities.

Its being an example of the promise and problems of digital humanities research is what I find so interesting about topic modeling. While I’ve used the technique to support digital humanities research by modeling free text content in databases or traditional corpora or Wikipedia entries, I’ve always been more intrigued by the model itself. My piece on Comprehending the Digital Humanities, written over two years ago, was not about trying to use text analysis to understand digital humanities scholars and their self-definition so much as it was about demonstrating explicitly and visually that taking part in the use of such a computational method was something that defines the digital humanities.

It’s the troubled relationship with computational methods such as topic modeling that makes digital humanities so valuable. As is already stated in this issue of JDH–and demonstrated throughout–it is precisely the digital humanities scholars using computational methods that are so critical of such computational methods. Their critiques are far more substantial, and subtle, than those put forth by traditional humanities scholars that don’t understand how these tools and methods really work. But that wariness drives more sophisticated and more interesting application of computational methods in humanities scholarship, as I think is also demonstrated in this issue.

I want to thank everyone at JDH, especially Joan Troyano, who made this such a rewarding experience.

Posted in Text Analysis, The Digital Humanities as... | Comments Off

The Cutting Room Floor

The recent release of City Nature leaves behind several static, dynamic, and interactive pieces that, for one reason or another, were not integrated into the final site. One of the reasons I created this blog was to showcase the work I was doing which, often, would not otherwise make it into the world.

Much of the models and data visualization that get produced is not specifically designed for publication, but rather to serve as an object of consideration and discussion during meetings and workshops. Others are built for presentations. Finally, even those that are designed to be final, whether as a figure accompanying an article or as a standalone work in a site like ORBIS or City Nature, end up being discarded. Whether due to performance issues, or lack of content, or more generic inaccessibility, various digital objects created during the course of a digital humanities project may not make it into the final publication.

One object that didn’t get it into City Nature was a representation of global water resource issues that displayed the 6000+ cities with a population over 50,000, rated by water security according to several different measures. From these, a dozen cities that typified the issues facing their region were selected to be described in a short narrative. All of this was delivered on a D3-based globe or traditionally-projected map:

Global Water Challenge App for City Nature

But the performance was poor except in Chrome and, more importantly, the content and narrative never came together, which meant you had a cool interactive data visualization that wasn’t driven by a robust research agenda. The cutting room floor is littered with these, which can sometimes be dusted off, tuned up (this was built before Bostock changed everything again with topojson) and used for new research.

Global Water Challenge App for City Nature (Zoomed)

Often times they’re simply discarded. But that doesn’t mean they were a waste. Coding skills are improved as a result, of course, and the researchers working with these objects come away with an enhanced sophistication in their understanding of the data, the research, and the mechanisms to represent it. I think of this as the Salon Model of digital humanities, where working with digital objects, even with no plan to utilize them for high-level analysis or publication, results in significantly improved digital literacy necessary to perform quality digital humanities scholarship.

Forgotten Billion Globe View

Another interactive data visualization piece was created to show a city composed not of the neighborhoods that it was actually made of but instead of neighborhoods most similar to each other based off the characteristics we measured.

Voronoi Frankencity for City Nature

You can see the interactive version here:

http://bl.ocks.org/emeeks/5218414

It’s a great concept, I think, but it proved illegible to most users and, as with many of these objects, performed poorly when integrated into the larger site.

There’s more on the floor. For instance, I’ve pointed before to a topic model browser I made in Protoviz that allows the reader to compare writing about species in the IUCN Red List compared to the same entries for those species on Wikipedia to compare the differences in language of the two species databases.

Topic Model Browser built in Protoviz

In process is a new version of this, using my much-improved skills and D3, which will hopefully leverage the lessons folks have learned about representing the results of topic modeling.

It’s my hope that at some point we’ll have a better process for releasing and annotating these objects, and I’d hope there’s a larger solution to better raise the visibility and understanding of this process for those currently on the outside. Until then, I’ll try to keep posting the code on Github and giving some explanation of them here.

Posted in D3, Digital Scholarly Work, Spatial Humanities, Visualization | Comments Off

City Nature

Screenshot from the Stanford City Nature Project site
Today we’re releasing City Nature, the results of work exploring natural environments in urban areas using topic modeling, GIS, and data visualization. The site has rich interactivity, including an amazing parallel coordinates plot that allows you to explore the greenness and demography of 2600 US neighborhoods. I created a few of the dataviz elements and helped with the research, but the majority of the digital humanities development work was by Karl Grossner. PIs on the project are Jon Christensen and Michael Kahan here at Stanford (though Jon has since moved to UCLA) and included a summer research project involving a dozen undergraduates here at Stanford who had the chance to work with remote sensing, GIS, topic modeling, and information visualization.

City Nature Parallel Coordinates Plot

Posted in Digital Humanities at Stanford, Spatial Humanities, Text Analysis, Visualization | Comments Off

Toward a Connected Humanities

A network turn -- Background graph from Christian Swinehart's "Pattern Recognition"

Zephyr Frank and Erik Steiner were kind enough to give me a chance to discuss networks in humanities scholarship for the Visualizing Evidence course here at Stanford. Here’s the talk.

Posted in Algorithmic Literacy, Graph Data Model, Visualization | Comments Off