During CASCON this year, I participated in a Hackathon that was organized for IBMers. The hackathon brought in a couple of charitable organizations (Scott Mission, World Vision Canada, aboutkidshealth, and Sick Kids Hospital) who described some social media related problems they were having and we tried to solve some of their problems.
I worked on a problem that Sick Kids’ was having where they were having filtering out noise and false positives from the Twitter search queries (e.g., searching for “sick kids” returned tweets like “My kids are sick at home”). I thought it would be interesting to play with the location data that was embedded in tweets; so for example, we could filter out tweets that were located around the Toronto area.
While making this we ran into a couple of problems. The first problem is that most tweets don’t have location data! It seems like there is a big push to tag status updates with location data (I’ve seen them on Facebook as well), so I thought that there would be more data tagged with location. In fact there weren’t, and I had to use a backup of checking the user’s profile location (which is unreliable).
Second, we ran into rate limit issues. We wanted to calculate a “reach” for each tweet, which is basically the number of Twitter users a tweet reached (either from the original update, or by retweets). Calculation of this reach on our search results quickly ran us into the 150 requests per hour limit and we had to stop working on it for awhile. In the end we made the reach calculation done at the user’s request, updating the page via Ajax using the Dojo library (which we are supposed to use in IBM).
I also ran into a rate limit trying to geocode locations (both from the search and from Twitter users’ profile). Google Maps API has a rate limit of 2500 per day. This ended up being problematic because I hit the threshold once we had completed the prototype and were ready to demo! There was some mad researching before I found out that the Yahoo PlaceFinder API does the same thing and has a rate limit of 50,000/day. I switched to that and had no further issue.