Random Thoughts by Fabien Penso

How to filter SPAMs on Twitter efficiently : TwitteRBL ?

I’m working on a service using the streaming API from Twitter, a great feature as it gives you instant access to Tweets. But then you get overloaded by Tweets, and because I’m looking for Tweets talking about money, I get lots of noise.

Looking at TwitBlock, I filtered out lots of it :

  1. Ignore tweets from recent users (if created less than 24 hours ago)
  2. Ignore tweets from users with default profile image
  3. Ignore if fewer than 10 followers
  4. Ignore if user description and name are blank
  5. Ignore if followers fewer than 100, and friends count is > (2*followers_count)
  6. Ignore if followers count over 100, and friends count is > (5*followers_count)
  7. Ignore if the user sends more than 20 tweets per day in average, since its creation

Some of these is working for me, but might not work for you at all. However, I think there could be a better way for a mass use. I ran mail servers for years, a very reliable way to handle this is to use DNSBL (also called RBL). You could have different RBLs for different use, and any twitter client could implement this very easily. Please note this could probably not work for Direct Messages except if Twitter grant specific access to the service, which they would probably never will.

twitterbl