spam – blog.flo.cx

how to catch 336 twitter bots in 12 hours…

yesterday our API (the API of qr.cx) returned rubbish for about 12 hours. i apologize for that, this will not happen again. we are working on a reimplementation which should be far more reliable.

however the thing had an upside. we were able to expose twitter bots who published this rubbish without checking. in total we found 336 twitter bots who did so. they included

1	<br /><b>Notice</b>: Undefined variable: [...] in <b>/[...]/qr.cx/htdocs/api/index.php</b>[...]"

in their tweets. a human being would not do that. firstly the API is made for automated use, so why would one use that on a regular basis; secondly the error is apparent to a human user. one would not publish a tweet with the full nonsense. the bots did.

so now we can search twitter for this perfidious string and see which account is a bot. this is good, this could help twitter™ to identify malicious users/bots and protect their normal human users.

but it also helps us, the urlshortener, to safeguard the system. we can identify spam links. we can search the twitter bot’s stream for links it has shortened before. those links are most likely links to spam or fraudulent pages. disabling those would be no harm.

i’m looking forward to implementing these security features. it will definitely require a little more thinking to setup a nice safe system.

loops and stretching with urlshorteners…

there has been some discussion (via @mstrohm) on the security of urlshorteners and i have been thinking about this the past days.

putting the problem of the bottle neck aside it leaves us with the possibility of spamming and/or loops and missing transparency when looking at urlshortening services. let’s say the advantage of shortening urls has to compensate one of the disadvantages; let’s take the bottle neck. it’s clear that one cannot shorten a url and expect the link to be independent or maybe distributed like DNS at the same time.

still there are 3 problems which have to be solved:

spamming: there are concepts which we know from mail services that can narrow this issue down. is.gd uses the surbl blacklisting service to check for spams. with a little fine tuning this is manageable.

loops: similar to the spamming problem, there must be a blocklist of sites that are not accepted for shortening. qr.cx already implements a list of about 200 services that are blocked from shortening. is.gd is saying so too, although they accepted qr.cx links and others at the time of writing. this is really easy to implement and should be done by every shortening service.

transparency: the problem here is that users cannot see where they are going when clicking a shortlink. the solution is again very easy to implement. tinyurl implements it by putting ‘preview’ as subdomain http://preview.tinyurl.com/m5l96j and qr.cx by putting ‘/get’ behind the shortlink: http://qr.cx/1r8/get.

curious as i am i decided

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31