A week and a half ago, I was invited to speak at the CNI Workshop (Coalition for Networked Information) in Washington, DC entitled Authors, Identity Management and the Scholarly Communication System. I was there to present our work on the MPACT project at UNC-Chapel Hill and how it relates to the sticky world of name disambiguation in academia.
Attendees included representatives from Elsevier, ProQuest, OCLC, Library of Congress, Internet2, Shibboleth, Thomson Scientific (ISI), the JISC Names Project, Mellon Foundation, CrossRef, IFLA, ISO, ICPSR, MIT, NLM, NIH, NCBI, American Physical Society, Association of Research Libraries, and the Thomas Jefferson Foundation. I may have missed one or two in there…
MPACT is looking at how mentoring as a scholarly activity can be better measured and quantified. Our larger goal is to make the argument that mentoring is not being rewarded enough when faculty members are evaluated on their productivity. Usually, at research institutions, research, teaching, and service are quantified and evaluated through a variety of metrics. These metrics are part of the cultural and institutional infrastructure and have been built up over time to reflect what the university values in their faculty members. Arguably, this leaves out mentoring as a scholarly activity – and that’s a mistake.
I presented our work to date and pointed out to the mostly VPs, CTOs, and CEOs in the room that the engineering of the MPACT project could have been greatly reduced if I hadn’t had to research and construct a system to manage ‘people’ and instead could have focused only on the mentorship connections between them (advisorships and committeeships at the dissertation level). I offered MPACT as a project willing to be a beta tester to whatever interface the large companies/organizations in the room put together for public/limited querying.
One of the most interesting developments that I was privy to last week was a potential collaboration between OCLC and Elsevier.
OCLC currently has over 100 million bibliographic records with over 25 million of those having pages in WorldCat. This represents over 1,200 million library records across 20k libraries. And they’ve identified characters and authors at WorldCat Identities. These are largely book items and manuscripts as represented via MARC records. I also learned that OCLC is ingesting some Wikipedia content as source material to augment/supplement some of their records. I had no idea and hadn’t heard that anywhere else.
Elsevier currently has over 32 million identity records that they are 99%+ sure map to individual people. This represents the world of journal articles.
The interesting collaboration point would be to collide these bibliographic records and these identity records and see where the overlap occurs and how much automatic disambiguation could be realized. Being able to click on a single person and see their books and articles in one place would be a great leap forward – and that’s only the first order benefit.
Another project I had not been aware of is the Virtual International Authority File (VIAF) (also at viaf.org). This is a work in progress to virtually disambiguate and connect the bibliographic records from the major national authority files around the world.
VIAF is a joint project of the Library of Congress (LC), the Deutsche Nationalbibliothek (DNB), the Bibliothèque nationale de France (BnF), and OCLC. The project’s goal is to match and link the library authority files.
They’re doing personal names and geographic names – but not topics. It’s all going to be publicly available and dereferenceable. This is a very big deal.
Thanks to Cliff Lynch and Joan Lippincott for inviting me and bringing together the players in this area who had never before all been in the same room. It was a thrill to have some candid conversations with the people who can move these large datasets into position and continue to change the way we interact with so much information.View blog reactions