Just posting a few basic bits of info about how the TwitterBot Service is currently running.
So the twitter bot is currently requesting the last 20 tweets fron the public timeline every minute (on a every minute cron) meaning at the moment the public timeline requests come out of the main server IP and are falling far below the 100/hr limit. However currently looking into ideas on how to increase the rate to catch more tweets! (Ideas welcome, please leave a comment or email me).
The tweets which get catched by the tweet grabber bot are saved into a database to be processed, I then have another bot “ProcessingBot” (I am immaginative with my script names i know) runs every 5 minutes and loops through each unprocessed tweet, and each bot in the system checking for matches to 3 phrases stored in the DB (I will be changing this part to make it more flexiable ASAP) and if it matches the tweet gets retweeted to the bot account in the format of (@SCREEN_NAME) TWEET as you can see on @JacksonBot which is currently retweeting tweets with the key phrases of “Jackson” “MJ” and “Micheal Jackson” and seems to be working very well (currently 500+ tweets in 12 hours and 32 followers – Yes I am hoping this doesnt hit the post limit and have applied for whitelisting).
For anyone who is interested 😉
Dan
Rather than catch every (or attempt to) tweet via public stream you could use the search API. That way you are only lifting out tweets that matter to the bot.
So for example say every 10,000 tweets you grab from public timeline one contains the phrase you are looking for then only 0.0001% of the data stored in your DB is useful to the bot.
If you use the search API to only grab those of interest it rises to 100% 🙂
Ah but the issue with that is i then have to run a search api query for every phrase for every bot.
As this system is already running 3 bots and (currently) 3 phrases each that’s 9 requests verse 1 request.
And I am hoping to build on this to eventually end up running tens (maybe even hundreds) of bots which as far as i can tell from the rate limits would just be totally undoable 🙁
Also more bots/phrases the % of useful tweets also goes up ^^
Happy to be corrected if I am wrong though 🙂
You can merge multiple phrases into a single request using OR operators. Wouldn’t suggest doing it for hundreds or thousands of phrases but would def work for 3 phrases. So that’s only 3 requests rather than 9. Also as you do not need to grab every tweet you could reduce the frequency of your requests. So if you were grabbing the public timeline at a rate of once per minute, but your key phrases are only uttered once an hour you could reduce your requests down to once every 4 days (search brings back max of 100 results @ 1 per hr these 100 results would cover 4.16667 days, extreme example but you get what I mean)
But you are correct about it not working if you have hundreds of bots, none of the rate limited services would. But once you reach of those kind of traffic levels maybe you could make a good pitch for requesting access to the ‘Firehose’ – http://blog.twitter.com/2008/07/twitter-and-xmpp-drinking-from-fire.html