Unusually for me, I’ve been heads down in a single project of late. A tool. An actual piece of software that I hope might actually be useful to someone. Not comment, not analysis, but a tool.
I’ve been hanging out in the Quantified Self world for a little while now. I hang out both as a scholar, and as a part of Intel’s extended effort to understand the “data economy”. In this project, we’ve been asking what kind of openness should exist if data is going to circulate in ways that actually benefited people. “People” as in the breathing kind, not the kind invented by legal fiat. There’s a lot of talk in the QS world about facilitating data openness and sharing, and people have various views on what that could look like. Even just getting data out of a service into a standard format is still a huge challenge. But what I learned after spending time with the community is that there is a more fundamental piece still missing, something that has to happen even before we can imagine what could be useful about data sharing—namely, the stuff has to make sense. If you don’t have data that actually makes sense, you don’t really have anything to share. Something personally meaningful is not going to magically emerge from plonking data into a giant pot and hoping the correlations aren’t spurious. (Something may in fact emerge from the giant pot strategy, but my point is that it takes more than just making a pot.)
When I did my own show and tell talk at a local QS meetup, I learned first hand the invisible labor it takes to really get insight out of data. Behind the scenes, many talks involve some serious hours, coding skills and good statistical knowledge. I’m not entirely new to the old spreadsheet, but I’m neither a math person nor a visualization person. Frankly, I had to ask for help. Help came in the form of the patience-of-saint Steven Jonas, the Portland QS organizer who not only made my bad Excel charts into something compelling, but suggested new kinds of calculations that I did not think to do. He suggested I bin my “meal healthiness” scores according to day of week. It looked something like this:
This turned out to be hugely significant for me. I could spot immediately my partner’s teaching schedule, and how that had an effect on my propensity to eat out, and therefore the healthiness of my meals. The amazing thing was, the calculation Steven did was one of the very calculations that we have been building into the tool (“we” really being the computer scientists and designers—I just make sure the ship remains pointed in an ethnographically-sound direction). Seeing my data in this way caught me off guard. For literally the past three months I had been banging on about how important it is to be able to pick out temporal cycles in any dataset, and how hard it is to do that for people without data wizarding skills. So it’s not like I wasn’t sensitized to this form of analysis. I knew it existed, and that it was possible. In fact, it was front and center on my radar. I even knew multiple people who might be kind enough to show me how to do it if I asked. But I didn’t even think to ask. I just didn’t think in those terms, because it wasn’t part of the tools that I saw as at my disposal. Without the tools at the ready, it wasn’t possible for me.
Thankfully there are others who are also helping to make data much easier to work with—Datafist, Fluxstream, Tictrac, Project AddApp, ManyEyes, etc.. In fact, we collaborated with Evan Savage in making our own contribution. All of these tools take various approaches, and we have our own. Ours is to try to help people make the most what they already know about their personal context to make sense of their data. This means providing space for qualitative annotations, offering data processing techniques in ways that map to human experiences, not the underlying mathematical function (“show me some temporal cycles” not “make a histogram”), and making it possible to edit data out. If that holiday you took is artificially skewing things, you should be able to just get rid of it for the purposes of calculating an average. That’s not “cheating,” that’s making sense of a daily routine. You can also just look for patterns in missing data, if there’s some signal in there for you. Frustratingly, there’s not yet a name.
We’re not done yet—a beta version is targeted for early 2014– but I can definitely say I learned some things, perhaps the most powerful of which was that knowing things in the abstract (“temporal cycles really matter!”) is very different from doing it at a personal level. Steven’s willingness to share with me his data skills in turn gave me something more meaningful to share. I also re-ignited my commitment to the “participant” side of participant observation. Getting stuck in to the building process has been an amazing anthropological adventure, though I’ll save that for another post. I’m even slowly making peace with the correlation—that statistical trick used for the last few hundred years as an epistemological trump card to beat up other forms of knowledge making. I’m learning from the computer scientists that there’s more you can do with correlation than just declare victory. Perhaps correlation doesn’t just have to be for the purpose of making scientific generalizations, the importance of which is highly controversial in QS. With more granular ways of playing with data, could correlation be reclaimed as a way to make matters of concern rather than matters of fact? One way or another, we’ll find out.
Speaking of finding things out for yourself, if you want to be one of our beta testers, or just have an opinion on the name, send me a note using the below form and I’ll make sure you are included.