#52WeeksOfCode Week 3 – Twitter API

Week: 3

Language: Twitter API

IDE(s): vi

Background:

Once again, it seems that the story is not really about the code. The good news is that it’s more interesting than that.

HTTP is one of the underappreciated aspects of the Web. For those who were not aware, HTTP (HyperText Transport Protocol) was originally designed as a simple, standard way for Web servers to talk to Web clients. But it grew to be much more than that. In addition to serving as a base for other networking protocols like DAV (Distributed Authoring and Versioning) but it’s also been put into service as a ‘wrapper’ for other kinds of network traffic through the addition of SOAP (Simple Object Access Protocol). Thanks to the simplicity and flexibility of HTTP, web services have exploded.

One such service is Twitter. Originally designed as a simple way to post a blog entry (seriously), Twitter users engage in running discussions including ‘live-tweeting’ (discussing an event as it happens), exchange photos and videos and even raise money and promote their favorite causes. Sure, there’s a lot of crap there as well (see Sturgeon’s Law) but that’s people for you.

Anyway, now robots are getting into the action. Well, technically, I’m talking about software. Twitter allows its users to let software programs use their accounts to monitor message streams and even send out tweets and direct messages. Of course, there’s a definite risk of abuse with this so that’s why Twitter has to make sure that the software application is actually authorized to use your account.

That’s where my story starts.

Setup:

For this week I wanted to keep things simple. As the World’s Slowest Coder (™), I really didn’t have time for anything elaborate so I decided to build a script that would do a quick ‘hello world’. I have two Twitter accounts (one for business and one personal) so I wanted a script that would post a DM (Direct Message) from one of my accounts to the other. Fortunately the Twitter API (Application Programming Interface) is pretty open (for certain definitions of open) so that you can use almost any programming language to communicate with Twitter. I chose Python, since I felt that a language designed for simplicity should be used with a platform also designed for simplicity. My development platform of choice this time was Linux (specifically Debian Linux and I may discuss my choice at some other time) because Linux is extremely easy to set up for development.

So I’ve got Linux installed (easy-peasy) and I’ve loaded up all of my Python Twitter libraries and I’m psyched to get going. A bit of research gives me a quick guide to get started.

In order to get Twitter to allow my program to use my account, I have to give it a way to authenticate itself as my designated robot representative. Twitter uses a protocol known as Oauth. This involves registering my application with Twitter and getting a set of code keys that my software will use to identify itself to the Twitter service.

Now as part of the registration process, Twitter wants you to prove that you’re a real human and not some kind of spam-bot out to do evil. The standard way to do this is with a CAPTCHA (Completely Automated Public Turing Test to tell Computers and Humans Apart, possibly one of my favorite acronyms ever). You’ve seen this all over, even if you didn’t know what they were called. An image is displayed on the Web page and you’re asked to type out the letters and numbers pictured. (The idea is that a computer program will just see 1’s and 0’s here.)

So far, so obvious. I filled out the online form describing my application, agreed to the terms of service and down at the bottom of the page was a section labelled CAPTCHA.

It was empty.

I’d never seen this before but I pressed the Submit button just in case (I’m an optimist) but it didn’t work. I reloaded the page and still no image. I signed out and signed back in — no image. I did some research and found out that I wasn’t the only person with this problem.

It seems that Twitter (and a lot of other web sites) use Google’s reCAPTCHA service and ironically enough I stumbled onto a Twitter feed with the hashtag #reCAPTCHA and it turns out that this is a feed where people can report on how the service is working. I read through the comments and reCAPTCHA wasn’t working for ANYBODY.

I dug further and found out that the service was actually under attack by a site that was trying to develop software that would automate solving CAPTCHAs. So their software was slamming Google so hard that it was swamping reCAPTCHA and nobody else could use it.

I want to explain at this point that CAPTCHAs have been under attack by spammers and various other weasel-people since they were first introduced. You may also be interested to know that it doesn’t exactly take Dr. Evil-level genius to do so. Here’s the simple way to do it:

  • Set up a pornography site

  • Get people to visit your porn site. (make it free!)

  • When your bot software comes across a CAPTCHA on another site, it pops up a window on the porn user’s browser.

  • The user is asked to solve the CAPTCHA in order to continue viewing pornography.

  • The answer is sent back to the bot and it continues registering at the other site.

Combine this with the fact that about 60 percent of Internet traffic is driven by bots and you can see that there’s a lot more stuff going on out there while you’re busy downloading cat videos.

But this isn’t even the most alarming thing for me.

The thing that alarms me most is that the loss of a single service broke millions of web sites worldwide. We like to think of the Internet as this resource that interprets censorship as damage and routes around it in the famous quote by John Gilmore. However, it really is what we nerds refer to as a scale-free network. If you’ve heard of the Pareto Rule, then you already have an understanding of scale-free networks. Simply put, 20 percent of the nodes (Internet sites) have 80 percent of the connections to other nodes. So there are a relatively small number of choke points where Internet traffic can be blocked/censored and no amount of re-routing will make that less true.

So do I have a solution to this? Not really. I like to think that the Internet has engaged so many hands and brains that it will continue to evolve and change and I’m interested in seeing where that goes.

Epilogue:

The reCAPTCHA service eventually came back up and I was able to register my Twitter app. However I’m having additional issues with the authentication process that I won’t be able to resolve in time for this week’s post. However, I’ll be coming back to this in future and will reach some closure.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s