I’m really interested in how GLAMs are using AI and LLMs (large language models) for content management labor, especially for image identification of people or places and also image subjects/tags/descriptions.

I watched Ben Zhao’s keynote recording from OR2025 Conference on how LLMs and GenAI works, if we can trust their output (authenticity), and the dangers/harms AI models pose to open repositories/libraries.

Ben’s talk essentially asked if LLMs are the next/right interface for info/data access. What I got from it was that by themselves LLMs today can’t by design be helpful to automating cultural heritage data labor. LLMs are complex Blackbox tools trained on trillions of tokens/billions of parameters that predicts what should/could happen next without any memory/knowledge/logic capabilities in those predictions. A lot of our labor we’d like to be automated requires logic/knowledge/discernment.

Our content management/DAM systems also have underlying issues like: being outdated; metadata is missing, incorrect, incomplete; disjointed search across multiple systems that don’t talk to each other; and reconciliation issues with complex data needs that can be diverging or incompatible.

Introducing AI for descriptions or image identification of people in photos presents really compelling questions about how to address the underlying issues with an AI-powered interface for our labor:

  1. Will AI-descriptions eventually fall into the current patterns to labor where descriptions are iterative processes meant to be remediated over time as data needs/standards change?

  2. How big should the human quality assurance sample size be for AI-described assets? How many outputs do we sample before it’s “good enough”?

  3. How do we stay transparent about AI-descriptions vs human-descriptions when in the end the database treats all descriptions the same? Data is lumped into relational tables that will likely lose context to whether it’s human or AI driven in a CSV format for data export unless we literally build it in as a discrete column/field in the item record templates.