Measure the Future: Privacy and More Details…

Measure the Future: Privacy and More Details…

I realized a few weeks ago that I haven’t said nearly enough about the technology and plans for Measure the Future. Mostly I haven’t because I’m in crazy ramp-up mode, trying to get technology sorted, read All The Things about computer vision and OpenCV and SimpleCV, get some input from my Alpha testers, and generally keep the fires stoked and burning on the project. If I expect the library community to be excited about the work I’m doing, I need to get some of the things out of my head where you can all see them and, hopefully, help make them better.

The thing that I’ve gotten the most comments and emails about is the degree to which Measure the Future is “creepy.” There is both and implicit and explicit expectation of privacy in information seeking in a library, and when someone says they are thinking about putting cameras in and watching patron behavior…well, I totally see why some people would characterize that as creepy.

So here’s why what I am planning isn’t creepy. At least, I don’t think so.

What I want to do is measure movement of people through buildings, as a first step towards better understanding how the library building is used. There are dozens of ways this can be done, and among the ones that I’ve had suggested to me or asked about in the last month include:

  • Infared sensors
  • Lasers
  • Ultrasound sensors
  • Wifi skimming
  • iBeacons

I rule out the first three because they don’t give enough data. You need multiple sensors to get things like direction of movement, and they all have serious trouble dealing with people moving in groups. I ruled out the last two because they ignore huge segments of the population that are integral to the library, only counting people who happen to have certain technology on their person as they move around. I also think that gathering data by passively skimming info from cell phones is inherently identifiable. Both of these last methodologies have the capacity to capture information that is unique to the person, and I want to avoid that can of worms. A core tenet of the system I’m building is Do Not Know Who You Are Counting.

The method that I am going to test first is machine vision…the use of cameras and algorithms to count people as they move through the frame of the camera. WAIT JUST A MINUTE, I hear you screaming. DID YOU JUST SAY YOU WERE GOING TO BE USING CAMERAS ALL OVER THE LIBRARY? Yep, I did. And here’s why doing so doesn’t harm privacy: the cameras aren’t going to be taking pictures in any way that you are probably thinking about “taking pictures.”

A Quick Primer on Computer Vision

Computer Vision is just what it sounds like…a computer using a camera as an input device and being capable of parsing the contents of said image into inputs. The examples that people have likely seen on procedural crime dramas of machine vision are not what we will be doing, however. No facial recognition, no body-heat tracking, no reading-license-plates-from-orbit (that’s reserved for the NSA). What we are focusing on is a feature of computer vision called “blob detection.” The computer watches a space, and ignores the static background, focusing instead only on the moving pixels in the image. By tracking moving, contiguous sets of pixels, you can count people, tell which direction and how quickly they are moving, and get information about how long they stay in an area. In theory, you can look at the size of the “blob” and interpolate from it whether it is one or many people.

simple blog detection

The thing you can’t do from blob detection is identify who the blob in question actually is. In fact, the pictures that are analyzed aren’t even saved. They are examined by the computer vision system, blobs are counted and otherwise noted in a database, and then the photograph is gone and the system moves on to the next one. By storing only these incremented counts, along with a few bits of data like directionality and speed, we can derive a huge amount of information about behavior in the library without resorting to the identification of the individual, and without any lingering data trail for misuse. Photos of individuals are not saved, and they can’t be retrieved for later examination, nor will they ever be saved through our setup.

This is how I plan to gather important and useful data while maintaining the anonymity of the individuals in question. Measure the Future is committed to this as a grounding principle, and our code and hardware will always be open so that others can ensure that we are treating the data in question properly, safely, and responsibly.

Data structure

As a result of these concepts, I think that the structure of the data will be something like:

  • Time/Date Stamp
    • Location of Camera
    • Number of People
    • Direction of movement
      • Speed of movement

Each of these data points will be captured by each individual camera that is “watching” an area. I envision libraries putting cameras in any area where they are interested in recording patron activity, such as the entry doors, meeting room entrances, in common areas, and in the stacks/new book displays. These stats would be saved, and available locally for the library to view via a browser-based interface in the style of Google Analytics, or to query via database calls or an API. The length of time that the stats are saved would be up to the individual library. And it would be up to the individual library whether or not to share the data with other libraries. The data will not be available to the Measure the Future project directly. It will be entirely in the control of the library that has implemented the system.

I won’t promise this for the 1.0 release, but one of my goals is also for the data to be encrypted while stored on the device and in transit to the local interface server.

Hardware

I don’t have nearly all the answers just yet, but I can wave in the direction of what I’d like to use. Another of the goals of the project is to use as much open hardware as possible in putting together the system. This ensures that the project can be replicated without issue, anywhere that you can source the hardware components…there is no “lock in” with proprietary hardware that has to be upgraded or fixed only by a vendor. It also will allow individual libraries and librarians to experiment freely with the systems, hopefully improving them and sharing said improvements with the community of users.

The equipment used will be Raspberry Pi based, and each camera unit will contain a camera + a Raspberry Pi that will act as the local brain for the computer vision. My plan is for all of these sensors to talk to a central statistics server that will act as the hub for collection and reporting of stats to the librarians. You will be able to have many sensors for every server (how many? still working on that…). The server will also likely be Raspberry Pi based, and my preference is for the sensors and the server to talk to each other through a private network, either via Xbee or standard wifi, but (this is important) not connected to the existing network of the library in question. It doesn’t need it, getting networking permission in some libraries is crazy difficult, and it simplifies the design and building. It’s possible that, during testing, we will discover that there is a better platform for this than the Raspberry Pi…the Intel Galileo, for example. Only testing will tell.

My hope is that this entire system, sensor clients and statistics server, will communicate between each other directly, and not over the wider Internet. There are advantages and disadvantages to this plan, but I feel like the benefits of not having to fight with IT issues regarding implementation of the system will make it worth it. This is the most tenuous portion of the overal plan, as there are a myriad of ways that it could be architected and choices made about wireless protocols. But this is the initial road down which I am heading vis a vis development, and if it is necessary at some point to curve with the road, I will let the community know.

Conclusion

We take patron privacy very seriously at Measure the Future. And we reject the notion that gathering useful statistics necessarily puts patron privacy in jeopardy. It is possible to both observe and collect information about patron browsing habits and activity inside the library without doing so in a way that is identifiable to an individual patron.

So that’s exactly what we plan to do.

Keep watching. And let me know what’s wrong with my plan.