All posts tagged data

Privacy

Over at Miskatonic University Press, William Denton writes up a good summary of concerns about the Measure the Future project and privacy. In that post, he says:

Measure the Future has all the best intentions and will use safe methods, but still, it vibes hinky, this idea of putting sensors all over the library to measure where people walk and talk and, who knows, where body temperature goes up or which study rooms are the loudest … and then that would get correlated with borrowing or card swipes at the gate … and knowing that the spy agencies can hack into anything unless the most extreme security measures are taken and there’s never a moment’s lapse … well, it makes me hope they’ll be in close collaboration with the Library Freedom Project.

A fine, fine point, and one that I felt was important enough that I actually made sure to include a slide in my announcement presentation (which will be going up here on the blog asap) about privacy and how I am very aware of the potential issues here. I have spent a decent amount of time thinking about the threat models for data of this type, and how to properly anonymize/aggregate the data collected by our sensors.

While we are still early on in the thinking about how best to collect the data we need, my best guess right now is that we will be using Machine Vision-based low-resolution image sensors to act as our “counters”. There are many, many ways to count people: laser tripwire sensors, infrared sensors, ultrasonic sensors, and more. But the one that gives us the most flexibility of placement, and handles two very tricky problems well (the multiple-body problem and being able to count directionality of movement) pretty well. Plus machine vision based sensors can get better faster via software updates than any of the other types.

Using these types of sensors, the data collected will be something like a timestamped interger along with a directional measure (into the space, out of the space). There won’t be any form of individualized tracking…we don’t need to know who the patron is, and we don’t care. Correlation with circulation data, which I do think will be incredibly valuable in order to see if browsing behaviors correlate with circulation, can be done without caring at all about patron information. All we need to know is if a book circulated (any book) in the range that was browsed (by anyone). It doesn’t matter for our purposes if it’s the same person doing both things, as we’re just going to be looking at correlation in large data sets.

Believe me, I understand how much this data could be abused. Not only do I plan to build measurement tools that respect patron privacy, I’m going to try and build tools whose data structures and management make it impossible for libraries to misuse the data at all. And I’m totally happy to work with The Library Freedom Project or anyone else that can help me make sure that our data is clean and free of concern.