[Update (March 19)]: Google turned down the Berkman Center application, which included our projects :( ]

We've put in for a couple of Summer of Code projects, in conjunction with the Berkman Center:

1. Syllabus parser. Design, structure and populate an open repository of the information in college syllabi.

  • Assuming we get permission, figure out how to retrieve syllabi from Google. (If we don't get permission, we have a starter set of 500,000+ syllabi.)

  • Figure out how to parse the multiple and free-form formats syllabi are found in.

  • Design an appropriate and open data model for the information in syllabi.

  • Build a Web site with that provides useful end-user and API access to the syllabus data.

2. Scholarly semantic web builder. The aim is to crawl the Google Books corpus looking for useful relationships among scholarly works. Such relationships only begin with citations/footnotes. What other semantic cues can be unearthed to see how scholarly books relate?

  • Research the sorts of relations between books that would be of high value to scholars and researchers, in addition to footnotes.

  • Crawl the Google Books corpus to discover these relations [if Google grants permission].

  • Make these relations accessible in an open way, especially in conjunction with the ShelfLife app that provides community-based wayfaring through Harvard Library's holdings for scholars and researchers.

  • Create interesting and understandable analytics based on the discovered relationships.

We're now waiting to see if the proposals get accepted.

The rest of the Berkman proposals—many fun ones—are here.