Introduction To Metadata And Cataloging
The following notes were created for 1 class of Library Science 653, taught by Dr. Starr Hoffman for Pratt Institute School of Library and Information Science. I had a wonderful time visiting with the class and talking through some high-level issues with cataloging and metadata as well as going through a record in detail. The original notes shared with the class are available at this Google Document: http://bit.ly/lis653
In this blog post, I have cleaned up those notes a bit as well as created some screencasts reviewing some of what we covered as well as what we did not have time to cover in depth. I hope that they help.
Metadata Introduction Class
Original Notes at http://bit.ly/lis653
Prepared for Starr Hoffman’s class
Christina Harlow, firstname.lastname@example.org, @cm_harlow
Goal: I hope to guide students through the creation of a MARC record, a MODS record, and a Dublin Core record (in the class’ Omeka interface). Time permitting, we will then explore options for transforming records from one schema/format to different schema/formats (crosswalking and XSLT transformations).
Introductory Screencast (quick review and updates on some of what we discussed in class):
Last week, Dr. Hoffman give you an introduction to metadata. Building upon that, I will just discuss a few key points before we dive into examining a few records and how they were created.
We are focusing today on descriptive metadata. This is exactly what it sounds like - data that describes the object in hand (or on your screen). It gives content and context for that object. Descriptive metadata is created or updated using a suite of standards and guidelines:
- Record Format: Specific encodings for a set of elements (or a record). Examples include MARC, METS, Dublin Core…
- Schema: System of recording and structuring descriptive information for use in a record. A metadata schema creates and defines elements and any rules governing their use. Examples include MODS, Dublin Core…
- Model: High-level (or more generalized) approach to object description. Data models define the entities of description and their relationship to one another. Models can be very domain-specific. Examples includes FRBR, RDF…
- Encoding: Conversion of metadata into a definable syntax or coded form. Examples include XML (eXtensible Mark-up Language), RDF NTriples…
- Vocabularies: Usually domain-specific lists of allowable values for certain elements. Classification schemes are often connected to a chosen vocabulary. Examples include LCSH, Getty AAT, LCNAF, VIAF…
- Content Standards: Guidelines on the creation of data for certain elements, often defining what the primary source of a particular element should be. Examples include RDA
Metadata is what enables people to find a resource - it is the backbone of the library catalog and discovery interface. One of the main difficulties is both providing metadata guidelines that can be specific to the needs of particular domains or collections, while also making the metadata interoperable in a larger context. I will explain some of my job and work by my colleagues to highlight this point.
- Columbia’s Blacklight Discovery Interface:
- Note the ‘bento box’ model, where each box shows results from a different datastore.
- This means the search is sent out in such a way as to query each datastore specific to their metadata standards and context.
- Columbia’s Catalog, CLIO:
- This is the traditional, MARC-based library catalog of the Columbia University Libraries.
- Note the data model(s) - in CLIO, each record is generally for a manifestation (a particular publication, for example), not a work. This is not always the case, but is the guiding principle most of the time.
- Columbia’s Institutional Repository, Academic Commons:
- The items in Academic Commons have MODS metadata for each descriptive record.
- In the screencast I show you the current ingest form for creating metadata to go into Academic Commons.
- Columbia’s Online Exhibitions:
- We often (but not always) use Omeka for our online exhibitions.
- However, unlike with your class’ Omeka instance, our particular Omeka installation has been modified to allow MODS metadata, instead of Dublin Core, which comes out of the box with Omeka.
- E-resources often have metadata provided from vendors like Serials Solutions.
- Preservation metadata specific to digitization projects can use local or other schema.
- Other projects use the schema and format that best fits the discovery interface that they prefer, like how our Human Rights Web Archive uses MARC records.
In my experience, a main difference between ‘cataloging’ and ‘metadata’ is that cataloging has firmly established guidelines and tools, while metadata (here meaning non-MARC metadata) does not have a confirmed set of tools and workflows. This will be shown in the examples in the screencast.
MARC (MAchine Readable Cataloging) Bibliographic Records
MARC Screencast (doesn’t recreate classwork, but does show OCLC Connexion Web Interface and method for copy cataloging):
- MARC Official Page Bibliographic Fields: http://www.loc.gov/marc/bibliographic/ecbdhome.html
- RDA Toolkit: http://access.rdatoolkit.org/ (need subscription)
- Library of Congress Authorities: http://authorities.loc.gov/
- PCC Guidelines: http://www.loc.gov/aba/pcc/bibco/index.html
- Classification Web: http://classificationweb.net/ (need subscription)
- OCLC Connexion: http://connexion.oclc.org (need subscription)
- Many Integrated Library Systems (or ILS) have a cataloging module as well
We went through creating a sample record for this book in class. I recreate this record in the screencast but using the OCLC Connexion online interface. I include a copy of this record below.
Even though we accessed the item as an e-book, we cataloged it instead as if a physical book. To create a record for the electronic version would have require a bit more complicated MARC record.
- Depending on your system, a lot of the MARC record can be generated (and validated) for you by the software.
- Also, depending on where you work, a lot of the time you will be modifying existing records (i.e. copy cataloging), not creating the records from scratch (i.e. original cataloging).
- Note that traditional cataloging still very much follows a manifestation-focused record model. In other words, these records describe a particular manifestation of a work. There are people who have ‘FRBR-ized’ their catalogs, however, where each record focuses on a work with manifestations hang off of it. Check out this catalog for one such example: http://catalog.perseus.org/
MODS Bibliographic Record
MODS Screencast: Shows creation of MODS record in Hypatia (a metadata ingest tool at Columbia), then the MODS XML record in Oxygen XML Editor. Briefly discussion XML schema namespaces and transformations.
- Library of Congress Official MODS Documentation: http://www.loc.gov/standards/mods/
- Library of Congress Linked Data Authorities: http://id.loc.gov/
- MODS schema: http://www.loc.gov/standards/mods/v3/mods-3-5.xsd
The tools used can really vary from project to project and institution to institution, depending on the involvement of developers and the needs of the metadata team. I show in the screencast a few ways we can create or modify MODS records at Columbia, then an example of a MODS/XML record.
- Example of Hypatia (Hydra-head ingest tool): http://hypatia.cul.columbia.edu (requires login)
- Oxygen XML Editor (there are many XML editors, but this one is well known and works well for our purposes).
- OpenRefine (http://www.openrefine.org/) is used a lot to clean up data in a flat format (i.e. spreadsheet), then apply a transformation to make it into MODS
MODS can be used to describe a whole work (like the book above) or a part of a work (like just one page from a book). Note that, while technically MARC records can do the same, they generally do not. MODS records, however, are often used in both situations (and others).
In the screencast I show how to create a MODS record for an article. Then I take the MODS/XML of an article already in Academic Commons and put it in Oxygen to show how the MODS validation works. Then I show how XSLT can be used to transform a collection of records. The MODS/XML record we created/used (http://academiccommons.columbia.edu/catalog/ac%3A173687) appears below:
Dublin Core - Omeka Descriptive Record
Dublin Core Screencast: This shows the Omeka Dublin Core interface that we saw in class, with a bit more instruction on each of the elements.
- Dublin Core Metadata Initiative - Core Elements Set: http://dublincore.org/documents/dces/
- Dublin Core Metadata - Terms (includes core elements): http://dublincore.org/documents/dcmi-terms/
- Your class’ Omeka site: https://starr.omeka.net/admin/
In class, I showed you how to log into your class’ Omeka site and where to create a Dublin Core record. In the screencast, I walk through this process again, creating a Dublin Core metadata record for the itemavailable at this link. Yes, this item already has some metadata attached, but we are not entirely cheating by using that for the example since the metadata attached is a Dublin Core - MODS mix (as we have modified our Columbia Omeka installation to include MODS metadata, being a ‘MODS-shop’).
Omeka Item Creation Instructions
Finally, I’m leaving these instructions written up for use by your class as you start creating items and metadata records for your projects. Perhaps they will be of help.
- Log in at https://starr.omeka.net/admin/
- Once, there go ahead to ‘add an item to your archive’.
- Dublin Core section: This is where the descriptive metadata goes. At least adding a Title, Subject (if applicable), Creator, Source, Publisher and Date is recommended, depending on the resource. Note that all of these fields are repeatable (and, in theory, optional).
- As a handy note, since these two are often confused or misused: Source (the physical item that the digitized image came from) and Relation (the ‘work’ that the item is a part of or otherwise related to) are used for linking between records and resources.
- Item Type section: Select the type of item you’re uploading. You can add item-specific and file-specific information here as well. This should complement, not exactly duplicate, your Dublin Core data. This section is optional but usually helpful for matching up Omeka descriptive records with digitized files and directories.
- Collection section: Here you will choose the collection that matches your group.
- Files section: Here you upload the actual file(s). Be wary of issues of copyright with this.
- Tags section: Here you can enter tags. This area is useful if a particular curator, archivist or librarian would like to use a local or department-specific vocabulary for items, but you don’t want this local vocabulary as part of the Dublin Core metadata.
- After you have entered all of the metadata for the Item, go back to the top of the Add Item page.
Questions? Corrections? Follow-up discussion?
I’m always available via email (cmharlow(at)gmail(dot)edu) or on Twitter(@cm_harlow). Thank you for letting me invade your class for one night.