Christina Harlow DataOps or Whatevs

About     Archive     Feed     Instagram    

Notes On Being A Metadata Supervisor

I’ve been meaning to write this post up for a while. It is still very much a work in progress, so please forgive the winding, rambling nature this post will take. I’m trying to pull together and process ideas and experiences that I can eventually use in my own improvement, or maybe as an essay or article proposal on ‘reskilling catalogers’ and how it is part of a larger re-imagining of library metadata work beyond just teaching catalogers to code or calling cataloging ‘metadata’. If you have feedback on this, please let me know: or @cm_harlow on Twitter. Thanks!

My Background and Goals as a Supervisor

First, a bit on me, my work background briefly, and my current job, as well as my idealism for metadata work.

My current position is both my first ‘librarian’ position (although, FYI, I think the term ‘entry-level librarian’ has serious flaws, and it is a really sore spot with me personally) and my first time as a supervisor in a library. I supervise the Cataloging Unit (5 f/t staff members), sometimes referred to by the catalogers themselves (but nobody else at present) as the ‘Cataloging & Metadata Unit’, in a medium-sized academic library. Before this, I was temporarily a ‘professional’ but non-librarian metadata munger, and before that, a support staff or paraprofessional in a large academic library in a variety of posts. Some of those posts involved supervising students, but not officially - I’d be there to assign/guide work, check hours, schedule, do all the on-the-ground stuff, but wasn’t the person who would sign the timesheets or do the hiring. Often, and more frequently in recent years, I was a bit of an unofficial liaison, tutor, whatever you want to call it, for some of the librarians looking to expand technical practices and/or skills, but very much unofficially. A lot of this kind of work came to me because I love exploring new technology and ideas, and I absolutely love informal workshops/skillshares. Outside of libraries, I’ve got some supervisory experience, as well as a year as a public NYC middle school math teacher, under my belt.

In taking my current position, there was a lot more involved in making that decision, but one reason included that I was actually pretty excited to take on being a Cataloging & Metadata Unit supervisor (as well as pretty nervous, of course). I wanted to see how I would adapt both to this position and adapt the position to me. I continue to hope I have a lot to offer to the catalogers I work with because I spent years as a libraries paraprofessional before deciding to get my MLIS and move ‘up the ladder’, and I’m highly suspect of that ladder.

Additionally, I hope this can be a way for me to lead library data work into a new imagining and model through example and experience. Many people talk about how Cataloging == Metadata, and we see more and more traditional MARC cataloging positions being called ‘Metadata’ positions. They might even involve some non-MARC metadata work, but usually remaining divorced from MARC work by differing platforms, standards, data models, or other. There are plenty of people declaring (rightfully in my opinion!) metadata and cataloging to be the same work, yet these statements are usually from one side of the still-existent fence unfortunately. Actually integrating decades of data silos, distinct sets of standards and communities, toolsets/editors, functional units, workflows and procedures, among so many other divisions both real and perceived, is something I want to make actually happen, though I freely admit how daunting it can be. Trying my hand at being a supervisor was one way for me to help us as a library technology and data community work towards this integration.

A lot of what I’ve focused on in the first months of this job is assessing what already exists - catalogers’ areas of expertise and interests, workflows, toolsets, communication lines, expectations - then trying to lay down foundations for where I hope we as a unit can go. As it stands, there was a lot of change going on around my arrival in this position, especially for the catalogers. My library migrated (in an over-rushed fashion, but hindsight is 20-20) ILSes a few months before my arrival. Cataloging procedures had been haphazardly moved to RDA according to particular areas of MARC expertise and interest (such as Music and Video cataloging was moved to RDA policies because the particular catalogers focused on that area are invested in learning RDA). The digital collections metadata work was partially given to the catalogers vis-a-vis a very much locked-down MODS metadata editor, before being taken back over by digital library developers, digitization staff, and archivists (and now managed by me). And there is imminent but yet-to-be-well-defined (due to a number of reasons, including many retirements) technical services re-organization going on, both of department structure and space. As regards non-MARC metadata, though not the metadata work the catalogers were involved in before my arrival, there is migration of multiple digital library platforms to one, an IR platform migration in the works, and migration/remediation of all the previous digital projects metadata from varying versions of DC to MODS.

So, a lot of change to walk into as the new cataloging & metadata unit supervisor, as well as the only cataloging and/or metadata librarian. Even more changes for the catalogers to endure with now a new and relatively green supervisor.

I was pretty prepped to expect that I would be taking on a new sort of library data leadership role that works across departments - to re-imagine, as I understand it, where, how and why cataloging and metadata expertise/work can be applied. And to make sure that all of our library data practices are not just interoperable, but accessible to metadata enhancement and remediation work by the catalogers. This has meant the creation of new workflow, data piplines, tools, and most importantly, comfort areas for the catalogers. Working with them at the forefront of my change efforts has really forced me to develop new skills rather quickly, including trying to situate not just myself, but a team of talented people with varying experiences and goals in a rapidly changing field. Change doesn’t scare me, but it’s not just about me now.

Stop Dumping on Technical Services & Stop Holding onto the Past, Technical Services

Beyond all of these local changes, it is pretty well documented that libraries, in particular, academic libraries’ technical services departments are changing. Some might say shrinking, and I understand that, but I want to see it as positive change - we can take our metadata skills and expertise, and generalize them outside of MARC and the ILSes that so many catalogers associate directly with their work. That generalized skillset - and I hesitate at using the word generalized, perhaps something like more easily transferable, or integrated, or interoperable is better - can then be applied to many different and new library workflows; in particular, all the areas growing around data work writ large in libraries.

In a presentation from a while ago, I made a case for optimism in library technical services, if we can be imaginative and ready to adapt, as well as libraries at a higher level be prepared for what can be best described as more modular and integrated data workflows - no more data/workflow/functional/platform silos. I try not just to say that ‘cataloging is metadata work’, but involve metadata work across data platforms and pipelines, and show the value of making this work responsive and iterative - almost agile, though I feel uncomfortable taking that term from a context I’m less familiar with (agile development). I especially want to divorce cataloging expertise from knowing how to work with a particular ILS or OCLC Connexion editor.

In the Ithaka S+R US Library Survey 2013, the question “Will your library add or reduce staff resources in any of the following areas over the next 5 years?” showed a steep decline of staff resources for technical services in response - close to 30%, and far more of a decline than any other academic library area mentioned in the context of this question. However, we see a lot of growth in response to that question for areas that can use the data expertise currently under-tapped in cataloging and metadata work: areas such as Digital preservation and archiving; Archives, rare books, and special collections; Assessment and data analytics; Specialized faculty research support (including data management); and Electronic resources management. This all uses the skills of cataloging and metadata workers in different ways, but we also need to recognize that there are different and varied skills represented in cataloging and metadata work as it exists now. One way to conceptualize this is the divide in skills required between original MARC cataloging, where the focus is very much on the details of a single object and following numerous standards, versus what may have previously been called ‘database maintenance’ and is more generally seen, to me, now as batch library data munging - where it is necessary to understand the data models involved and how to target enhancements to a set of records while avoiding errors in data outliers.

Cataloging versus Metadata & Where Semantics Hit Institutional Culture

A note on ‘cataloging’ versus ‘metadata’ as a term to describe the work: yes, I agree that its all metadata, and that continuing to support the divide between MARC and non-MARC work is a problem. However, I also recognize that departmental and institutional organizations and culture are not going to change overnight, and that these terms are very much tied into those. There is disruption, then there is alienation, and as a supervisor, I’ve been very aware of the tense balance required therein. I don’t want to isolate the catalogers; I really cannot afford to isolate the administration that helps to decide the catalogers’ professional futures (if job lines remain upon vacancy; if their work continues to be recognized and supported; if they get reassigned to other units with easier to explain areas of operation and outreach; etc.). But I know things needs to change. This explains in part why I am wary of the use of new terms (though metadata is not a new term, but it has only recently grown exponentially in use for describing MARC work) because they can carry the possibility of turning people away from changes, as folks might see the new labels as part of a gimmick and not real, substantive change. I will generally go with describing all of this work as metadata in most contexts, because I do feel like we are beginning to integrate our data work in a way that the catalogers now buy into what is meant really by saying metadata. Yet in certain contexts, I do continue to use cataloging to mean MARC cataloging and metadata as non-MARC work, because it is admittedly an easy shorthand as well as tied into other (perhaps political, perhaps not) considerations.

Back to the post at hand, what I’ve started to build, and see some forward-movement on (as well as some hesitation), is a more integrated cataloging & metadata unit. The catalogers did do some metadata work before I arrived, by which I mean non-MARC metadata creation. However, this was severely limited to simply working with descriptive metadata in a vaccuum - namely, a metadata editor made explicitly for a particular project. From what I can tell, the metadata model and application profile was created outside the realm of the catalogers; they were just brought in to fill in the form for one object at a time. This is not unusual, but hardly touches on what metadata work can be. Worse, the metadata work the catalogers did ended up not being meaningfully used in any platform or discovery layer, resulting in some disenchantment with non-MARC metadata work as a whole (seeing it as not important as ‘traditional MARC cataloging’, or as unappreciated work). I can absolutely understand how this limited-view editor and metadata work decisions can make things more efficient; I somewhat understand the constant changes in project management that left a lot of metadata work unused; but I am trying to unravel now just what this means for the catalogers’ understanding of high-level data processes outside of MARC and how the work they do in MARC records can apply similarly to the work done elsewhere for descriptive metadata. I also need to rebuild their trust of their work being appreciated and used in contexts beyond the MARC catalog. The jury is still out on how this is going.

Cataloging/Metadata Reskilling Workflows So Far

So yeah, yeah, lots of thoughts and hot air on what I am trying to do, what I hope happens. What have I tried? And how is it going? How are the catalogers reacting? Here are a few examples.

Metadata Remediation Sprint

When I first arrived, we had a ‘metadata remediation sprint’. This was a chance for us all to get to know each other in a far less formal work environment - as well as a chance for the catalogers to get to know some of my areas of real interest in data work, in particular, non-MARC metadata remediation using OpenRefine, a set of Python scripts, and GitHub for metadata versioning. This event built on the excitement of the recently announced Digital Library of Tennessee, a DPLA Service Hub with aggregation and metadata work happening at UTK (I’m the primary metadata contact for this work). The catalogers knew something about what this meant, and not only did they want to learn more, but they wanted to get involved. I tried my best to build a data remediation and transformation pipeline for our own UTK collections that could involve them in this work, but some groundwork for batch metadata remediation had to be laid first, and this sprint helped with that.

The day involved having a 8:30 AM meeting (with coffee and pie for breakfast) where I explained the metadata sets, OAI-PMH feeds of XML records, the remediation foci - moving DC to MODS, reconciling certain fields against chosen vocabularies, cleaning up data outliers - and working with this metadata in OpenRefine. There was some talk about the differences between working with data record by record versus working with a bunch of records in batch, as we had at that point about 80,000 DC records needing to be pulled, reviewed, remediated and transformed, collection by collection. Then, each cataloger was given a particular dataset (chosen according to topical interest), and given the day to play around with migration this metadata work. It was seen as a group focus on a particular project, so a kind of ‘sprint’.

The sprint was also a way for me to gauge each cataloger’s interest possibly in doing more of this batch metadata work, who really wanted to dive into learning new tools, and the ability each had for working with metadata sets. This is not to say at all that each cataloger couldn’t learn and excel at batch metadata work, using new tools, or metadata work generally; but matching different aspects of metadata work to folk’s work personalities was key in my admittedly limited opinion. In assigning new projects and reskilling, I didn’t want to throw anyone into new areas of work that they wouldn’t be a good fit for or have some sort of overlapping expertise with, as there was already enough change going on. Cataloging & metadata work is not always consistent or uniform, so there is and remains different types of projects to be better integrated into workflows and given to the person best able to really take ownership (in a positive way) of that project and excel with it.

The catalogers had so much untapped expertise already, that the sprint went very well. Some catalogers warmed to OpenRefine right away, with the ability to facet, see errors, and repair/normalize across records. Other catalogers preferred to stick with using Excel and focusing in on details for each record. All the datasets, each a collection pulled from the OAI-PMH feed and prepared as CSV and as OpenRefine projects by me beforehand, were pulled from GitHub repositories, giving the catalogers a view of version control and one possible use of Git (without me saying, ‘Hey, I’m going to teach you version control and coding stuff’ - the focus was on their area of work, metadata). Better yet, I was able to get their work into either migration paths for our new digital collections platform or even into the first group of records for the DPLA in Tennessee work, meaning the catalogers saw immediately that their work was being used and greatly appreciated (if only by me at first, those others have taken note of this work as well).

From that day, beyond getting the catalogers comfortable with asking me questions and attacking new projects (and new types of projects), the catalogers were able to claim ownership of new kinds of work for a broader view of metadata, helping generate buy-in with some of the metadata migration and integration work I was talking about earlier. Some of the harder to migrate and map collections, due to bad original metadata creation practices, were handed off to the catalogers who are more record-focused; other collections, needing more transformation and reconciliation work, were handed off to the catalogers who really enjoyed working with OpenRefine and batch editing. In particular, the OpenRefine GREL (Google Refine Expression Language, think kind of javascript but for data editing), has warmed 2 of the catalogers to the idea of scripting (but again, without someone explicitly saying ‘hey, you need to learn scripting’). They all are aware of GitHub now and some have even begun using a GitHub client on their workstations to access new datasets for migration work.

The catalogers have done amazingly well with all of this, and I know how lucky I am to work with a team that is this open to change.

Moving Some to Batch Data Work

This movement in part towards batch metadata work and remediation doesn’t just stick with the original focus on non-MARC metadata for that sprint day. In particular, 2 of the catalogers have really taken on a lot of the batch metadata normalization and enhancement with our MARC data as well, informed perhaps by seeing batch data work outside of the context of MARC/non-MARC or specific paltforms during that day or in other such new projects given to them. Though, to be fair, I need to admit two things (at least):

  1. one of the catalogers is already the ‘database maintenance’ person, or what I’d call data administrator, though her position (not HR) title was, upon my arrival, still blank. This fact is tied up to ideas in administration of this database maintenance work not being ‘cataloging’ in a traditional understanding - highlighting the record by record creation versus data munging divide that seems to exist in too many places still. I think this work will lead metadata work in the future, especially as content specialists are more often the metadata creators in digital collections, and catalogers need to be brought in increasingly for data review, remediation, enhancement, and education/outreach. Don’t think this will happen with MARC records? I think it already is when we consider the poor state of most vendor MARC records we often accept. We need to find better ways to review/enhance these records while balanced against the possibility they’ll be overwritten. Leading to my second admission…
  2. The MARC/non-MARC work is still very much tied to platforms, especially the Alma ILS which our department has really bought into at a high level. One of the catalogers who did very well with OpenRefine is now working with the vendor records for electronic resources using MARCEdit outside of the ILS. She has really done very well in being able to review these records in MARCEdit in batch, apply some normalization routines, and only then import those into our Alma ILS. While these do eventually end up in the ILS, it is my hope that the work with the data itself outside of Alma gives the non-MARC data work outside of other platforms and editors more context for her. I don’t know if this is the case, however.

For the catalogers who are more record-focused, we’ve gotten some cleanup projects requiring more manual review lined up - this includes reviewing local records where RDA conversion scripts/rules cannot be applied automatically because they need a closer review, or sets of metadata where fields are used too inconsistently to have metadata mappings applied in batch. This work is not pressing/urgent, so it can be worked on when a break from traditional MARC cataloging is needed, or the platforms for traditional MARC cataloging are down (which seems to occur more and more often).

Centralized, Public, Group-created Documentation

In all of this, one of the key things I’ve needed to do is to get centralized, responsive (as in changing according to new needs and use cases), and open/transparent documentation somewhere. There was some documentation stored in various states in a shared drive when I arrived, but a lot of it had not been updated since the previous supervisor. There were multiple version of procedures floating about in the shared drive as well as in print-outs, leading to other points of confusion. Additionally, it was difficult, sometimes impossible, for other UTK staff who sometimes need to perform minor cataloging work or understand how cataloging happens to access these documents in that shared drive.

Upon my arrival, the digital initiatives department was already planning a move to confluence wikis for their own documentation; I immediately signed up for a Cataloging wiki space as well. In getting this wiki set-up, a lot of the issue was (and remains) buy-in - not just for reading the wiki, but for using and updating the wiki documentation. Documentation can be a pain to write up, and there can be fear about ‘writing the wrong thing’ for everyone to see, particularly in a unit that has had many different workflows and communication whirlpools about.

I’ve tried my best to get wiki documentation buy-in by example and creating an open atmosphere, though I worry at how successful I’ve been with this. I link to everything from procedures, legacy documentation in process of being updated, data dictionaries, mappings, meetings notes, and unit goals in the wiki. Catalogers are asked to fill in lacunae that I can’t fill myself either due to lack of UTK-specific knowledge/experience or time. I try to acknowledge their work on documentation wherever possible - meetings, group emails, etc. Other staff members outside of the Cataloging Unit are often pointed to the wiki documentation for questions and evolving workflows. I hope this gives them a sense of appreciation for doing this work.

Documentation and wiki buy-in remains a struggle, but not because the catalogers don’t see the value of this work (I believe), but because documentation takes time and can be hard to create. To not push too hard on getting this documentation filled out immediately, thus risking burn out, I’ve not pushed on rewriting all possible policies and procedures at once, despite there being many standing documentation gaps. Instead, we aim to focus on documenting areas that we run across in projects or that the catalogers are particularly interested in (like music cataloging, special collections procedures, etc.) or working through currently. I’m heartened to say that, increasingly, they are sharing their expertise more and more in the wiki.

To be continued…

I have outstanding ideas and actions to discuss, including our policy on cataloger statistics (and how they are used), the recent experience of revising job descriptions, and the difficulty between both being a metadata change agent and the advocate for the catalogers when cataloging work is often overlooked or underestimated by either administration or other departments (particularly as more metadata enhancement instead of or in tandem with metadata creation is done). But this will need to be part of a follow-up post.

I’m new to all this, and I’m trying my best to be both a good colleague and supervisor while wanting to move the discussion on what metadata work is in our library technology communities. I have a lot of faults and weaknesses, and as such, if you’re reading this and have ideas, recommendations, criticisms, or other, please get in touch - or @cm_harlow on Twitter (and thanks for doing so). Whatever happens in the future, whether I stay a supervisor or not in the years to come (I do sorely miss having my primary focus on metadata ‘research and development’ so to speak), this has been a really engaging experience so far.