Data mining: how our data is being used. It also allows businesses and companies (and the government) an avenue to predict our future interests based on past trends, patterns through description and prediction. It is a little scary for me that computers are the ones who are making this information understandable for humans—not humans. I have to admit, I like what “Association Learning” does. I’ve actually had a professor suggest we look for books via Amazon’s recommendation tool. Netflix always knows. There’s no escaping it.
When it comes to universities, I think I feel a little differently about data mining, particularly in some of the specific cases the New York Times’ article discuss. Sure, there are benefits to looking at student data and interests, but what about the assumptions that are being made? The article says, “Mr. Lange built a system, rolled out in 2009, that sent professors frequently updated alerts about how well each student was predicted to do, based on their course performance and online behavior.” Wow. There are so many implications in doing this, not to mention the ethics involved. It’s as if students’ futures are completely pre-determined by technology, not allowing for exploration of individual interests if they don’t fit neatly into the path of a particular major. Also, nice Betta fish analogy, Mr. Lange. Not reductionist at all. There are immense consequences of attempting to determine how a student will do in a particular course that privileges a notion of objective truth and makes many assumptions, not to mention the potential for technology glitches, skewed calculations, little real context about students. From this, connections between professors and students also become calculated. In my own college experience, I learned many things about myself by getting my butt kicked in a pre-med track, and by suffering my way through a few intro to philosophy courses. Exploration is an essential part of the university experience. Sure, it’s important to know background information about your students, but the predictions are what strike me as problematic in this case. Is it appropriate for computers to be giving humans these kinds of predictions?
The Forbes essay “How a Deviant Philosopher Built Palatir, a CIA-funded Data-Mining Juggernaut,” I found it was interesting how Karp values his own sense of privacy. I did have a question in this article: What, exactly, is a “need-to-know” system when it comes to access to data? Apparently according to the article there is always a trail being left by those gathering data, but that doesn’t stop the privacy invasion from happening in the first place. What does that mean, in concrete terms, for images being captured by license plate cameras?
As David Goldberg showed us a few weeks ago, we can actually (visually) see the different parties who are following us from site to site, tracking us, our patterns, and our data. Who are the different groups of people who gain access to our data, and what ways will they use it? Many of these things we won’t know. Determining “degree of membership” within Rauhauser’s “Organization, Relationship, & Contact Analyzer” was an interesting concept, and I wonder how much of this analysis is based on data and how much is assumption. I guess that is where I see one of the main tensions within data mining. What do we win and what do we lose by having our data accessed? Is this technology going to be focused mostly on gangs and terrorists, or will it be used to investigate white collar criminals too?
Here are a few of my working definitions:
Database= a collection of data that is organized in a particular way
Relational database= data in a collection of tables; it can group data in relation to one another (sorry, I am trying to think of different words to use here but I am struggling)
SQL= Structured Query Language, a program language that manages the data of relational databases, subdivided into different language elements
NoSQL=allows for data retrieval and storage, database, simplistic, SQL-like query can also be used here. helpful for big data because it becomes simplified.
The Cloud=where our data floats around in a real-time network. good or bad? both. still working on this one. we want to be in it so we can access things all the time and so that other people can have access to our things.