Les billets libellés: personal. Afficher tous les billets.

Data vs Web

mercredi 17 janvier 2018

In 1998, when I graduated from university the internet was in its infancy and the dot com bubble was just getting ramped up. At the time I thought the internet would change the world by making information easily accessible and available, and I was excited to be a part of this new thing that would be revolutionary.

That started to change a few years ago. The focus of internet companies had shifted from providing useful services to collecting as much data as possible on the users to be used to better target advertisements. Rather than providing useful and informative content and services, the emphasis was on keeping users online for as much time as possible. While the negative effects of this business model on the users and society are becoming more and more noticeable, notably in the recent US election, the tech companies continue to ignore them. This is no longer the industry that I signed up to work for, and I no longer want to be a part of it.Anyone who is familiar with the work of Kahneman and Tversky knows that the human brain is very poor at processing and analyzing data. Most of our decisions are made using heuristics, or rules of thumb, that allow us to make quick and easy judgements. These result in cognitive biases, which are ways in which our brains distort reality for the purpose of making decisions. One of the most famous cognitive biases is the "confirmation bias" - which is how people interpret new information in a way such as to support their existing beliefs. Kahneman and Tversky conducted experiments on people ranging from college undergraduates to statistics professors, and everyone was subject to these biases - even PhD level statisticians who should know better. This is why data science is so important.

Our brains are not designed to gather and analyze large amounts of data and we are incredibly bad at doing so. We tend to draw conclusions from small, isolated, but memorable bits of information rather than looking at the overall big picture. One example is how Americans are all very worried about terrorism even though on average only six Americans die per year from foreign terrorism. The media likes to report these stories because they are sensational and memorable, but doing so greatly exaggerates the real risks. There are also numerous medications which are commonly prescribed despite having minimal positive effects, or having no benefits at all.

Data science is a way to draw knowledge from actual observation of the world, rather than just whatever thoughts happen to be strung together in our heads, or whatever sound bites relating to a given subject most easily come to mind. I can come up with whatever theories and ideas I want, but unless they actual reflect on the real world it's all meaningless. This is the basis of scientific inquiry, and this is why I am getting out of web development and into data science.

Libellés: personal, coding, data_science
Aucun commentaire

Discogs.com is one of my favorite web sites. I discovered it 15 years ago, and have used it since to research records and music and such. I just recently discovered that they have an API, so I used it to search for each record in my collection, and if it found results, I imported the link to the discogs page as well as a link to the thumbnail images to my database.

The process was rather convoluted. To start I just did a search for the data as it was in my database - artist, title, label and catalog number. But discogs often has multiple entries for each record - maybe it was released in different countries, or re-released, or has different entries for promos, test pressings and white labels. So my starting algorithm was as follows:

  • Match the full data in my database to a search request to Discogs API.
  • If one and only one result is returned, take that one and link it.
  • If more than one result is returned, filter out the ones for promos, mispresses, white labels, etc.
  • If there is still just one result, use that one. Otherwise mark the number of results returned in the database and move onto the next record.

This matched a couple hundred out of the couple thousand records in my database. Most of my records got 0 matches to Discogs, some still had multiple matches - anywhere from 2 to 35. So I started reviewing the ones with multiples by hand, and I realized that for most of the records with under 5 matches the matches were pretty much equivalent. So for those I just took the first match and assigned it. This matched another couple hundred records.

Now I put aside the few remaining records with from 5 to 35 potential matches and focused on the thousand or so that had no matches. Reviewing some of them manually, I found that many of them were due to typos in my database. So my next step was to omit the artist field and just check the title, label and catalog number. I got another couple hundred matches using this method. Then I went on and just searched using the catalog number. This method matched about half of the remaining unmatched records - but I had to manually verify each match because some catalog numbers are not unique. 

Unfortunately I do not have a catalog number for every record in my collection, and as of now about 1/3 of the records in my database are still unmatched. For those that are matched, on the record information page you will now see a link to the discogs.com page for that record, as well as a thumbnail pulled from discogs.com if available. For anyone interested in collecting records I highly recommend discogs.com as it is by far the most comprehensive database of music releases I know of. 

Libellés: personal, music
Aucun commentaire

Democracy for Realists

mercredi 02 novembre 2016

This election in the US has got me thinking a lot about democracy and how it works, or in this case, doesn't seem to work too well. I get the impression that people don't choose their candidates based on the candidate's policy positions matching their own, but the opposite - they choose their policies based on which candidate or political party they support. Well I just read this book, Democracy for Realists, by C. Achen and L. Bartels, which confirms my fears and goes far beyond that to totally demolish what they call the "folk theory of democracy" using statistics and facts.

What they refer to as the "folk theory of democracy" is basically what you are taught in school - that democracies are responsive to the will of the people and allow people to shape the policies and laws of the government; that the people decide what the government will do. By analyzing election results and other statistics, they take a number of theories about how democracies allow the people to express their will and test them, and find them all woefully lacking. It turns out that only one theory holds up, and that is that voters reward or punish their representatives based on the voters economic prosperity. But the voters are extremely myopic, only taking into account the few months prior to an election when casting their votes and disregarding the rest of the preceeding couple years.

The Founding Fathers of the US set up a representative democracy because they understood that the normal people wouldn't know enough about politics or policy to really make well-informed decisions. So instead of the people voting on the laws the people would elect representatives that they trust to vote on the laws. The representatives would devote their time to studying and debating the issues and would make well-informed decisions. However the Founding Fathers never anticipated the rise of political parties, which today are so firmly entrenched that most people don't even realize they were never part of the plan. 

The folk theory says that people will choose their party based on their political ideology or policy preferences, but in reality it is just as often the other way - people will develop their policy preference based on their partisan identity. The authors go beyond this to say that the party affiliation is mostly based on a person's "social identity" and has little to nothing to do with their political ideology. The way the book describes it people choose their party affiliation based on the kind of person they consider themselves to be and the kind of people they think belong to the political party. As far as I can tell this is basically a fancy way of saying "peer pressure" - if your family is Republican and your friends are Republican you are likely to be a Republican even if you disagree with Republican policies. In fact, people will often either change their ideology to match their party's, or convince themselves that their party's ideology is closer to their own than it actually is. 

Politics today has become so complex that it is nearly impossible for any normal working person to really understand or make well-informed decisions about all of the policies. In order to be able to handle issues this complex we need to simplify them greatly into mental models which unfortunately omit most of the detail and nuance. Instead of having to consider the myriad sides of an issue and the numerous approaches, we take the talking points that the political parties and the mass media give us and just accept and repeat them. It's a lot easier than having to gather massive amounts of information, sort through it, analyze it and come up with our own opinions. One theory is that political parties provide us with easy cues to figure out what our opinions would be if we had enough time and information for us to come up with them on our own, but this theory is also analyzed and largely debunked.

So if the results of elections have little to do with the policy positions of the candidates and the policy preferences of the voters, then what does drive the elections? Well it turns out it's largely random. Voters will reliably vote out the party in power if the economic wellbeing of the voter has decreased in the months before the election, and vote to keep the party in power if their economic wellbeing has increased just before the election. Voters will also vote out the party in power as a result of things beyond the power of any human to control like floods, droughts, and even shark attacks. But the policy preferences of voters really have little to no effect on elections, other than the fact that many people only develop their policy preferences based on adopting those of the party or candidate they support.

This isn't to say that democracy doesn't work at all, it just doesn't work in the way that it is supposed to work and the way I was taught that it works in school. Because politicians do have to be re-elected they must avoid the appearance of impropriety and appear as if they have the best interests of the people in mind. This at least prevents gross abuses that are typical in dictatorships. But as to whether the people really have much say in determing government policy, it would seem that the answer is no.

Personally I think that the party system in the US is a major factor in this. With only two parties dominating the government, they get their voters worked up about silly issues that aren't really all that important and then once they are in power they are largely indistinguishable, except that they keep their members constantly angry with the other party over these wedge issues which will never be addressed. The only people who really have a say in the government are the wealthy donors and corporations who fund the elections and pay the lobbyists. But that is a different book.

I'm sure this book will upset a lot of people because it challenges some basic assumptions people have about America and about democracy in America. People tend to accept facts that confirm the opinions they already have, and get upset when facts contradict their existing opinions. This book really makes you have to think about democracy and how it works and how it doesn't work. I think this is a book that everyone needs to read.

Libellés: personal, politics
Aucun commentaire