October 12, 2004

Press Release

This is pretty exciting, I think. I’m one of two visualization leads on the project, which will involve collaboration between humanities faculty and humanities computing research institutes, a library school, computer scientists, and the NCSA. Most of all it’s a chance to work with some really great people.

The Andrew W. Mellon Foundation has granted nearly $600,000 over two years to a multi-institutional project directed by John Unsworth, Dean of the Graduate School of Library and Information Science at the University of Illinois, Urbana-Champaign. The project builds on the D2K (Data to Knowledge) software developed by Michael Welge’s Automated Learning Group at the National Center for Supercomputing Applications, and it will include partners in humanities research computing at the University of Georgia, the University of Maryland, and the University of Virginia. The project will produce software for discovering, visualizing, and exploring significant patterns across large collections of full-text humanities resources in existing digital libraries and collections at Tufts University, the University of Illinois, Indiana University, the University of Michigan, the University of North Carolina, the University of Virginia, and other institutions.

“In search-and-retrieval,” Unsworth says, “we pose specific queries and get back answers to those queries; by contrast, the goal of data-mining is to produce new knowledge by exposing unanticipated patterns. Over the last decade, many millions of dollars have been invested in creating digital library collections: the software tools we’ll produce in this project will make those collections significantly more useful for research and teaching.”

Stephen Ramsay, the University of Georgia’s representative on the project and a member of the UGA English Department, agrees: “literary criticism and data mining share an important common ground: both are concerned with the isolation of patterns in data. Students of literature are often trying to detect patterns of change in the language or structure of literary works. Sometimes, this search for pattern is ordered toward the demonstration of some interpretive insight, but this order is just as often reversed—we notice patterns in texts and those patterns inspire interpretive insight.”

Matthew Kirschenbaum, faculty member in the University of Maryland’s English department and Fellow at the Maryland Institute for Technology in the Humanities (MITH), says that “information visualization will be the essential scholarly genre of the 21st century. It is already commonplace in astronomy, biology, chemistry, economics, engineering, environmental sciences and geology, geography, meteorology, physics, and mathematics. The basic intellectual and imaginative leap for information visualization in the humanities will be the leap from documentary to algorithmic forms of evidence. At the same time, we must understand the ‘iconology’ of these visual displays, their roots in long-standing traditions of image-making, cognitive design, and knowledge representation.”

Martha Nell Smith, Director of MITH, observes that “the cross-institutional collaboration in this initiative will help ensure that we build tools that are widely usable, that are standards-based, and that will advance the production and preservation of digital scholarship in the humanities, in all its diversity.” Bernard Frischer, Director of the University of Virginia’s Institute for Advanced Technology in the Humanities (IATH) points out that “digital scholarship in the humanities requires extensive multimedia collections, and it seeks to explore and document the complex relationships among items in such collections. This, in turn, requires a close collaboration between humanists and computing specialists.” Tom Horton, of the University of Virginia’s Computer Science Department, will oversee a distributed software development process for this project. He notes that “developing successful software tools to work effectively in such complex situations is always a challenge, so we’ll follow principles of user-centered software design in order to create data mining and visualization tools that will give scholars what they need to be effective, efficient and creative as they work with digital library materials.”

The Mellon Foundation provided a $56,000 planning grant for this project, in 2003.

Posted by mgk at October 12, 2004 12:23 PM
Comments

Great news, Matt. I'm curious as to how much cog sci is brought to bear on this project. It's quite clear that the search for "full patterns of text" is, in essence, the search for full patterns of thought or, at least, expressed thought. How will issues of materiality be dealt with here? Are minings going to be conducted across media, withing media sets or without concern for media? Regardless, this has me quite excited. Congratulations.

Posted by: Marc at October 12, 2004 12:49 PM | Link to Comment

Marc,

What makes the project unique, I think, is the rich descriptive markup that characterizes humanities text collections. While there's been a lot done with plain text data mining in other domains, here we'll be able to take advantage of the markup denoting formal, material, historical, and other aspects of the texts. Stay tuned!

Posted by: MGK at October 12, 2004 01:16 PM | Link to Comment

Congratulations again Matt. Very exciting stuff.

Posted by: Jason at October 12, 2004 01:27 PM | Link to Comment

That's what I assumed was happening-- I was just interested, I guess, as to how these markups would be segregated. As you know quite well, this is a huge part of the process. I was simply wondering what the categorizations might be.

Posted by: Marc at October 12, 2004 01:53 PM | Link to Comment

Wow, what great news, Matt! Congratulations. These tools you'll be developing -- will they be applicable to primary sources in languages other than English? I'll join Marc and undoubtedly others in the excitement.

Posted by: vika at October 12, 2004 08:24 PM | Link to Comment

Wow, what great news, Matt! Congratulations. These tools you'll be developing -- will they be applicable to primary sources in languages other than English? I'll join Marc and undoubtedly others in the excitement.

Posted by: vika at October 12, 2004 08:47 PM | Link to Comment

Thanks for the well-wishing everyone. Vika, several of the repositories we'll be working with have substantial holdings in non-English texts. Marc, the kinds of decicions about materiality and categorization to which you refer will ultimately be made by the user, not the software . . .

Posted by: MGK at October 13, 2004 08:43 AM | Link to Comment

Again, I was only referring to the tagging that each text would receive. Tagging is a sorting, I think, that is quite different from that of front end user decisions.

Posted by: Marc at October 13, 2004 11:01 AM | Link to Comment

We're not going to be doing any tagging ourselves. We'll be working with the existing collections of the institutions listed in the announcement.

Posted by: MGK at October 13, 2004 11:10 AM | Link to Comment

Got it. Finally.

Posted by: Marc at October 13, 2004 11:15 AM | Link to Comment

Guess I should probably have read the description a bit more carefully, eh?

Posted by: Marc at October 13, 2004 11:23 AM | Link to Comment

Well, you can keep us honest ;-)

Posted by: MGK at October 13, 2004 08:42 PM | Link to Comment

This is *very* exciting stuff, Matt! Congratulations to you and all the participants.

Posted by: George at October 14, 2004 11:48 PM | Link to Comment

Thanks again, George, everyone. Nice to come home after a trip, clear away the comment spam, and find this under the accumulated crud.

Posted by: MGK at October 17, 2004 07:38 PM | Link to Comment
Due to the proliferation of comment spam, I've had to close comments on this entry. If you would like to leave comment, please send email to me at mgk =at= umd =dot= edu. Thank you.