<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>David Simpson &#187; bitlybot</title>
	<atom:link href="http://davidsimpson.me/tag/bitlybot/feed/" rel="self" type="application/rss+xml" />
	<link>http://davidsimpson.me</link>
	<description>Developing the web, one page at a time.</description>
	<lastBuildDate>Thu, 02 Feb 2012 13:02:50 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1</generator>
		<item>
		<title>Bitten a lot by a bitlybot</title>
		<link>http://davidsimpson.me/2010/06/22/bitten-a-lot-by-a-bitlybot/</link>
		<comments>http://davidsimpson.me/2010/06/22/bitten-a-lot-by-a-bitlybot/#comments</comments>
		<pubDate>Mon, 21 Jun 2010 23:29:14 +0000</pubDate>
		<dc:creator>David</dc:creator>
				<category><![CDATA[web]]></category>
		<category><![CDATA[bad coding]]></category>
		<category><![CDATA[bitly]]></category>
		<category><![CDATA[bitlybot]]></category>

		<guid isPermaLink="false">http://davidsimpson.me/?p=829</guid>
		<description><![CDATA[This website and one or two others I run recently experienced what appeared to be a denial-of-service attack. Looking at the access logs, I could see several tens of thousands of requests all originating from a range of amazonaws.com IP addresses. All with the useragent &#8220;bitlybot&#8221;. This post is a quick postmortem of what went [...]]]></description>
			<content:encoded><![CDATA[<div id="tweetbutton829" class="tw_button" style=""><a href="http://twitter.com/share?url=http%3A%2F%2Fdavidsimpson.me%2F2010%2F06%2F22%2Fbitten-a-lot-by-a-bitlybot%2F&amp;text=Bitten%20a%20lot%20by%20a%20bitlybot&amp;related=&amp;lang=en&amp;count=horizontal&amp;counturl=http%3A%2F%2Fdavidsimpson.me%2F2010%2F06%2F22%2Fbitten-a-lot-by-a-bitlybot%2F" class="twitter-share-button"  style="width:55px;height:22px;background:transparent url('http://davidsimpson.me/wp-content/plugins/wp-tweet-button/tweetn.png') no-repeat  0 0;text-align:left;text-indent:-9999px;display:block;"></a></div><p><a href="http://bit.ly/"><img src="/wp-content/uploads/2010/06/blowfish_twtr.png" alt="Bitly" title="Bitly blowfish" width="73" height="73" class="alignleft size-full wp-image-841" /></a></p>
<p>This website and one or two others I run recently experienced what appeared to be a denial-of-service attack.   </p>
<p>Looking at the access logs, I could see several tens of thousands of requests all originating from a range of amazonaws.com IP addresses. All with the useragent &#8220;bitlybot&#8221;.</p>
<p>This post is a quick postmortem of what went wrong and why.<br />
<span id="more-829"></span>	</p>
<h2>So what happened?</h2>
<p>I&#8217;ve been happily using the excellent <a href="http://bit.ly/">bit.ly</a> URL shortening API on the <a href="http://www.read-able.com/">Readability Test Tool</a> website for over a year with no problems at all.  Whenever a user checks the readability of a web page using the Readability Test Tool, a convenient &#8220;tweet this&#8221; link is provided for the results page.  </p>
<p>My bit.ly link also innocently appended a query string &mdash; <strong>&#038;utm_source=twitter&#038;utm_medium=retweet</strong> &mdash; so that I can track click-throughs from Twitter in <a hef="http://www.google.com/analytics/">Google Analytics</a>.  </p>
<p>Looking back at this, it wasn&#8217;t that clever a thing to do, but it only took a couple of minutes to implement, so was very little effort for a nice bit of analytics/measurement return.</p>
<p>All was good for a year.  Google Analytics tracking worked well.  There were no problems.  Indeed looking back at the access logs, the bitlybot user agent had not so much as sniffed the website once in that time. </p>
<p>One day, something changed.  Overnight bitlybot started crawling my website for all the links it had created over the year.  Unfortunately for every link it crawl, it also created another link appending more parameters to the query string. </p>
<p>Which it then crawled. Creating another link with more appended query parameters.  Ouch.</p>
<p>e.g.</p>
<pre class="brush: plain; title: ;">

http://www.read-able.com/check.php?uri=http%3A%2F%2Fwww.example.com%2F&#038;utm_source=twitter&#038;utm_medium=retweet

http://www.read-able.com/check.php?uri=http%3A%2F%2Fwww.example.com%2F&#038;utm_source=twitter&#038;utm_medium=retweet&#038;utm_source=twitter&#038;utm_medium=retweet

http://www.read-able.com/check.php?uri=http%3A%2F%2Fwww.example.com%2F&#038;utm_source=twitter&#038;utm_medium=retweet&#038;utm_source=twitter&#038;utm_medium=retweet&#038;utm_source=twitter&#038;utm_medium=retweet
</pre>
<p>And so on.</p>
<h2>What did I do?</h2>
<p>Initially I ranted on Twitter.  </p>
<p>Then I removed the &#8220;tweet this&#8221; link to prevent further bit.ly URLs from being created.  This wouldn&#8217;t stop things for while, but would at least prevent the problem from getting any worse.</p>
<p>Then I edited robots.txt:</p>
<pre class="brush: bash; title: ;">
# Tell &quot;bitlybot&quot; not to come here at all
User-agent: bitlybot
Disallow: /
</pre>
<p>This did not work &#8211; bitlybot only checks robots.txt once a day, so this would not improve matters instantly.</p>
<p>Then I redirected the traffic to bit.ly:</p>
<pre class="brush: php; title: ;">
if ($_SERVER['HTTP_USER_AGENT'] == 'bitlybot')
{
	header('Location: http://bit.ly/', true, 301);
}
</pre>
<p>That slowed it a bit.  Admittedly, I was still blaming them at this point.</p>
<p>Then <a href="http://twitter.com/dvdsmpsn/status/12973263701">I reached out to @bitly</a>.</p>
<p>They were very responsive. I sent them a detailed email with a section of access logs and they fixed it.  Quickly.</p>
<p>They disabled my account, preventing me from causing any further mischief.  They stopped bitlybot from it&#8217;s crawling activity and reported the progress back to me.</p>
<p>Each contact with bit.ly via twitter or email resulted in a positive response &mdash; they were very quick to respond and my websites were soon back to their usual somewhat diminutive volume of traffic.</p>
<h2>Conclusions</h2>
<ul>
<li>Bit.ly has excellent support &#8211; they are very responsive and my little server was soon back to normal</li>
<li>Think before you write code that uses other people&#8217;s APIs &#8211; you may not fully understand the consequences of your actions</li>
<li>My Google Analytics tracking parameters were added with little thought &#8211; I really ought to have tried a bit harder to weigh up the implications</li>
<li>My small VPS does nicely thank you for the limited traffic that it experiences.  Given some decent volumes of traffic it will fail.</li>
</ul>
<!-- PHP 5.x -->]]></content:encoded>
			<wfw:commentRss>http://davidsimpson.me/2010/06/22/bitten-a-lot-by-a-bitlybot/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

