statistics – Jason Griffey

Beware Library Cobras…

This post is a short excerpt from my upcoming Library Technology Report on Smart Buildings. I’m just returning from attending LITA Forum 2017, and had a fantastic experience. My one disappointment was in the lack of problematization of data collection, retention, and analysis…especially as it relates to the “Internet of Things” and the coming flood of data from IoT.

This excerpt contains no solutions, only questions, concerns, and possible directions. If anyone has thoughts or would like to start a dialogue about these issues, I’d love to talk. The full Library Technology Report on Smart Libraries will be published by ALA TechSource in the next few months.

The end-game of the Internet of Things is that computing power and connectivity is so cheap that it is literally in every object manufactured. Literally everything will have the ability to be “smart”; Every chair, every table, every book, every pencil, every piece of clothing, every disposable coffee cup. Eventually the expectation will be that objects in the world know where they are and are trackable and/or addressable in some way. The way we interact with objects will likely change as a result, and our understanding of things in our spaces will become far more nuanced and details than now.

For example, once the marginal cost of sensors drops below the average cost for human-powered shelf-reading, it becomes an easy decision to sprinkle magic connectivity sensors over our books, making each of them a sensor and an agent of data collecting. Imagine, at any time, being able to query your entire collection for mis-shelved objects. Each book will be able to communicate with each book around it, with the wifi basestations in the building, with the shelves, and be able to know when they are out of place. Even more radical, maybe the entire concept of place falls away, because the book (or other object) will be able to tell the patron where it is, no matter where it happens to be shelved in the building. Ask for a book, and it will be able to not only tell you where it is, it can mesh with all the other books to lead you to it. No more “lost books” for patrons, since they will be able to look on a map and see where the book is in their house, and have it reveal itself via an augmented reality overlay for their phone.

The world of data that will be available to us in 10-20 years will be as large as we wish it to be. In fact, it may be too large for us to directly make sense of it all. My guess is that we will need to use machine learning systems to sort through the enormous mounds of data and help us understand the patterns and links between different points of data. The advantage is that if we can sort and analyze it appropriately, the data will be able to answer many, many questions about our spaces that we’ve not even dreamed of yet, hopefully allowing the designing of better, more effective and useful spaces for our patrons.

At the same time, we need to be wary of falling into measurements becoming targets. I opened the larger Report with Goodhart’s Law, credited to economist Charles Goodhart and phrased by Mary Strathern, “When a measure becomes a target, it ceases to be a good measure.” We can see this over and over, not just in libraries, but in any organization. An organization will optimize around the measures that it is rewarded by, often to negative effects in other areas. This is captured in the idea of perverse incentives, where an organization rewards the achievement of an assessment, only to realize that the achievement undermines the original goal. The classic example of this is known colloquially as the “Cobra effect”, named after the probably-apocryphal story of the British colonizers in India rewarding citizens for bringing in dead cobras in an attempt to control their deadly numbers in cities. Of course, the clever people of India were then incentivized to breed cobras in secret, in order to maximize their profits….

Libraries should be wary of the data they gather, especially as we move into the next decade or two of technological development. The combination of data being toxic to the privacy of our patrons and the risks of perverse incentives affecting decisions because of measure’s becoming targets is actively dangerous to libraries. Libraries that wish to implement a data-heavy decision making or planning process need to be extraordinarily aware of these risks, both acute and chronic. I believe strongly in the power of data analysis to build a better future for libraries and our patrons. But used poorly or unthoughtfully, and the data we choose to collect could be secretly breeding own set of cobras.

Harper Collins and some numbers

126

Yep, that’s right. 126 books, or just about .03079% of our collection. Looking at the titles, that’s even including multiple copies of the same work (we have three copies of A rhetoric and composition handbook that are all on the list of >26, for example).

If you add the total number of times these books circulated, and divide each by 26 to determine how many additional books the library would have had to purchase IF they had all been eBooks under the Harper Collins rules, my library would have had to purchase an additional 148 books in order to meet the demand. That’s under 15 titles a year, on average. I don’t have average costs of Harper Collins ebooks handy, but if they followed the Amazon pricing model for eBooks, they would be between $9.99 and $14.99 each. Let’s split the difference and call the average price $12.99…that means my library would have to find an extra $194.85 a year to keep up.

I understand that eBook have the potential to circulate more often than print…the decrease in access time alone should push them to be more popular choices, if what we’ve seen happen to our print journals is any indication. I also know that one small academic library is the equivalent of anecdata in the grand scheme of libraries. But if we don’t look at numbers, and only look at rhetoric, I think we’re doing ourselves a disservice.

I still disagree with Harper Collins new eBook rules, but for a lot of reasons that don’t necessarily come down to “it’s horrible for my library”. It is, I think, a bad idea to change the rules of the game midstream, at least without a lot of input from all the concerned parties (and no, I don’t actually think that a lot of libraries were consulted about this change). But it’s also a bad idea, as I’ve said a few times now, to just assume that the digital needs to act like the physical. We need to find new ways of dealing with these things, and I hope that situations like #hcod are just growing pains.