Sourcing the Best Practices of Digital Crowdsourcing & Transcription

At first glance, publicly crowdsourced projects, whether in the humanities or sciences, may not seem that novel in concept – especially when one considers the inherently collaborative & social nature of so many corners of the internet. But as initiatives removed from the hierarchy and controlled access of formal collections in the library, archive, & museum world, they do represent a radical departure from conventional practice and, to some of the more entrenched members of the LAM establishment, “best practices”. While the realities of 2020 may have thrown out many of these collections access best practices, crowdsourcing represents a potential public history platform (in one of the most literal invocations of the phrase) for institutions seeking to develop new paths for community engagement and collections use. In particular, my focus is on the opportunities afforded by volunteer-driven digital transcription projects, with the anticipated end result of a crowdsourced transcription site for the Judd Papers, which are currently housed by the Forbes Library in Northampton, Massachusetts.

With the goal of articulating a set of the most, well, practical of best practices, I examined the transcription interfaces of the Papers of the War Department, 1784-1800 (at the Roy Rosenzweig Center for History & New Media, George Mason University), the Library of Congress By the People initiative, and the Smithsonian Digital Volunteers Transcription Center, three established (and presumably well-executed) crowdsourcing projects.

My approach to each began with a rather simple question: How long will it take me, starting from the project home page, to create an account and begin transcribing a record? As it turns out, it took me less than three minutes to get started on all three occasions! That time would have been reduced to “under two minutes”, but the Smithsonian Transcription Center Browse Projects landing page is a bit more difficult to navigate than the others. Here’s some of the best features I noticed, in a roughly chronological manner:

The LoC’s By the People has a shortcut that immediately takes you to a random page in need of either transcription or review, a neat feature that streamlines the process if volunteers are interested more in the process of transcription, rather than the subject matter (currently, there are twelve different “Campaigns” active on the site, with the majority of them focused on some aspect of women’s history). Clicking “jump right in” immediately brought me to “The Second Battle of Concord,” a record found in the National American Woman Suffrage Association Records.

Following the path at the top left brings up a Subject File for Edward H. James, the record’s author, which features a overview layout of the record’s contents, including a page-by-page status breakdown (“Transcribe”, “Review”, & “Complete”) and a progress bar for the full record, as well as a handy little progress color chart as well. I do have to wonder, though, is this layout approach useful for records that have dozens or hundreds of pages?

One thing that is useful, regardless, is the ability for multiple contributors to collaborate on records through both transcription and review.

Edward H. James NAWSAR Subject File

Beyond the record pages, I thought the ways that these services provided guidelines and resources for their transcription volunteers to be particularly intriguing. The Papers of the War Department has a 3-path approach to their Guides page, providing instructions for use of the collection, historical context and paleography resources, and detailed guidelines for completing the transcription process. I find this to be a comprehensive approach, if through a somewhat physically underwhelming interface. However, as an Early Americanist particularly focused on the “Late Eighteenth Century” who also has undertaken a good deal of transcription across various academic and museum collections projects, I do have to acknowledge that I am evaluating these resources from very much an experienced perspective. I do think, regardless, they will be a useful reference point as this semester’s project curates our own set of resources.

Papers of the War Department Guides Page

The Smithsonian takes a much more direct approach to their guidelines; immediately upon opening a transcription record, you are immediately prompted to consult the project-specific (i.e. content-specific) tutorial, though you do have the option to just continue through to transcription.

The Freedmen’s Bureau Papers instructions page is organized, comprehensive, and includes examples of common transcription elements and links to outside resources, as well as the general instructions page for the Transcription Center as a whole.

The Frequently Asked Questions section of the page is also downloadable as a PDF, which I thought was a nice touch.

While these three projects certainly have their own strengths and drawbacks, I found them all to be quite user-friendly, with comprehensive resources and a peer-edited revision & review step, through which volunteers can check each other’s work and collaboratively produce a transcription that will then be reviewed by project staff before being marked “complete.” This feels like a built-in QA process, and also ensure that no single volunteer’s quirks (especially if they’re detrimental ones) will mark the project’s entire tenor.

The Papers of the War Department and My (Great-Great-Great-Great-Great-Great-)Grandpa

Most of my discussion of transcription best practices so far has not been focused on the Papers of the War Department, but that experience ended up being my most memorable, and personal. This past winter, I finally started tackling a family archival project that had been put off since my great-grandmother’s death in 2011 – sorting through her numerous photos and papers, which thankfully had been hanging out in a sturdy box in my parents’ guest bedroom closet, which thankfully provided a very environmentally-controlled space in the interim.

The (half-emptied) box, circa January 2020

In an attempt to figure out who, exactly, was who in the hundreds of images she either took herself or accumulated over the years, (many of which included captions in her neat cursive on the reverse, which my own advocacy for responsible collections metadata practices appreciated so, so much) I did what any reasonable research-happy historian would do – I turned to Ancestry.com. And thanks to the convenient facts that 1. my family has been, literally, in New England for as long as there have been white people in the region, and 2. if there’s a people who love a meticulously-maintained public record structure, it’s New Englanders, I’ve been able to piece together a document record for the Robinsons that goes back 10 generations (while my preference has always been to discover more about the women in this family record, unfortunately traditional Western familial conventions mean that I’m stuck with the dudes, so to speak), or roughly to the 1630s.

One of the more interesting ancestors I’ve been able to turn up has been Elijah Robinson (1750-1826), of which I know:

  • Both Elijah and his wife Mary Dike (1748-1822) were born in Dudley, MA, but had moved to Killingly, CT (right on the state’s northeastern border with Massachusetts) sometime prior to 1775.
  • After “the shot heard ’round the world” was fired at Lexington, MA on April 19th, 1775, Elijah and the Killingly town militia were some of the first Connecticut minutemen to arrive in support of the Massachusetts rebels. They quickly organized under their neighbor, Major General Israel Putnam (who lived about 8 miles west in Brooklyn, CT) and Elijah spent 18 days “mustered out” during this time, according to the muster rolls in the Record of Service of Connecticut Men in the War of the Revolution.
  • I also have enlistment records for various periods and various ranks in 1777, 1778, and 1780, all based out of Connecticut, and what I thought was probably a red herring record from 1799 in Vermont.

Turns out it was my Elijah Robinson after all! Searching him turned up two records, both related to a review of mustered recruits in Windsor, Vermont in January of 1799. Since neither of them had been transcribed yet, I figured I might as well just go ahead and do it! The whole thing took maybe fifteen minutes total (a chunk of which was me reading through supported HTML codes to double-check I wasn’t neglecting to format some of the text properly), and I really liked how smoothly the document viewer pane functioned; it has a crisp zoom & move feature that responds to a mouse scroll wheel, a “reset” button to restore the original display, and rotational arrows, which are immensely helpful for deciphering text written on a slant or perpendicular to the image’s default orientation.

One thing I will especially note from this experience is the fact that the ability of volunteers to produce accurate transcriptions greatly depends on the quality of the original image they are working from. While I had hardly any issue with the Orders for Elijah Robinson, I did have a more difficult time deciphering the faint text visible in the Certification of payments documenting the disbursal of funds by the War Department Accountants Office, which suffered from a grainy black & white scanned image (which itself may have been a photocopy), and also undoubtedly more natural fading of ink than the Orders, which have been digitized at a very high DPI (dots per inch) and in a full greyscale, which (at least in this case) allows for much greater image clarity.

Ultimately, I think my biggest takeaway from my War Department experience would be that sense of engagement (and fulfillment?) that the transcription process gave me as a volunteer. I absolutely felt like I had made a contribution to the project, and it kept me intrigued enough that I’ll most likely go back and try my hand at transcribing some more records (eventually. probably not until Thanksgiving if I’m being realistic.)

I’m going to end this post with a stray observation I did just realize as I was writing the captions for this image gallery – Findagrave.com is ALSO a crowdsourced transcription project in its own way; albeit transcribing headstones and grave markers (and uploading images) that are “out in the wild” and without a focused project goal in mind, rather than a collection managed by a specific institution. Which, to me at least, makes it seem even more on the fringes than a conventional transcription project.

5 Thoughts on “Sourcing the Best Practices of Digital Crowdsourcing & Transcription

  1. Thanks for taking the time to actually sign up for one of these so we can see the interface “in action.” But even more so, thanks for using this as an opportunity to share an amazing personal story. I am from Connecticut and greatly appreciated your research into their involvement at the first moments of the Revolution – I had no idea the CT militia responded in that way.

    Our old box of documents does not go back nearly that far, but we have many old photographs from my grandmother that date back to New York City during the depression. It has been a longstanding family goal to go through it. My grandfather came through Ellis Island on a boat. I will never forget my mom’s happiness when we found my great grandparents on the Ellis Island digital registry, but my Grandfather is only listed as “Baby Anagnostopoulos.” It speaks to how treacherous that journey was, that he would be referred to on the manifest in this way. We have nothing from before their immigration from Greece. They left the Greek Islands at the end of WWI with absolutely nothing.

    I am extremely familiar with findagrave.com, and I guess I never realized it was a crowdsourced effort but that makes perfect sense. My best friend in high school, his dad went through this phase where he spent HOURS going through the grave locations of famous people, pictures of the headstone etc. on there. It would be 8:30 at night and all the sudden we’d hear from downstairs: “PETE, DAVE, I FOUND HENDRIX! YOU GUYS GOTTA COME DOWN HERE AND SEE THIS GRAVESTONE. WOW!” The different ways stuff like this get used can be pretty amusing. I will be texting my buddy Pete tonight to needle him about findagrave.com.

    1. Oh my gosh, thank you for sharing your family’s own experience – that must have been amazing to make that discovery in the Ellis Island records! I know one of my great-great-grandmothers also passed through as a baby, but I haven’t attempted to search for those records yet. And I’m glad to hear that other people know about FindAGrave! I think their requesting system is such a great concept, especially as a means for accessibility for family members or researchers who can’t physically get to visit a grave, but want to know more about the inscription/location. It’s definitely not a “traditional” historical resource, but I really like the interactivity built into it.

  2. What a fantastic way to figure out the effectiveness of a site! I would have never thought of that, but the fact that you were able to get registered in under three minutes is definitely a testament to good formatting. If people have trouble registering or figuring out how to begin, that would be a huge setback for the crowdsource project as a whole. I am also so impressed that you were able to find and transcribe your relative! That story helps me understand the importance of crowdsourcing and feeling connected to your work. I really enjoyed reading your post this week!

  3. That’s interesting that you timed how long it took to make accounts and get started transcribing — I feel like that would be a good test for institutions to incorporate when they’re designing their websites. It made me think, are there other useability tests that could be done before launching a crowdsource project?

    And how cool that you were able to make this family connection while also transcribing some of the War Department Papers. I feel like soliciting volunteers from genealogical societies or from people who could have connections to the materials in some way would be a great strategy, as you clearly prove that a family connection can motivate people to try their hand at transcribing.

    I also hadn’t heard of findagrave.com so that’ s great resource to know about, thanks!

  4. The quality of content is such an important aspect to consider. I have also experience frustrations with low quality images that complicate other issues like bad handwriting and misspellings when I am trying to track down a specific person.

    One thing that I had considered was enabling volunteers to make notes about the handwriting of specific people. The quality of an image can improve legibility and a section for notes on specific quirks, like misshapen letters or a tendency to forget to cross “t” would allow transcribers to support future transcribers of a certain author. What are other’s thoughts on transcriber support and training in understanding styles of handwriting?

Leave a Reply to Lauren Cancel reply

Your email address will not be published. Required fields are marked *