Friday, April 18, 2008
You ever heard the phrase "that's a nice problem to have"? It's the kind of problem like having too much money (I've yet to experience this one :-) ), or - for an Internet service - so much traffic that your site is brought to its knees. Congratulations on being successful, and by the way your site is down.
A nice problem is still, well, a problem.
So last week we weren't crushed, but FeedBlitz has been much busier than usual, and performance has been, at times (let's be charitable), sluggish. Why? We had the "nice problem" of seeing usage ramp up significantly. On April 15th we sent nearly 3.1 million messages, easily our busiest day ever. Our next busiest day was the day before, at over 2.4 million messages. I haven't written a monthly update for a while, but March's output was 20% greater than February's, at nearly 48 million messages sent. So far this month we've sent 32 million already, so we're on track to blow away March's record as well, probably by a similar amount. That's the "nice" part. But the cost of all this goodness was performance started to deteriorate on the web site. That would be the "problem" side of the equation.
The good news is that as this hit, we were already working on performance improvements knowing that Gawker was about to go live. Gawker being a very busy site, we didn't want them to go live and then have us, their shiny new email service, go and disappear under all the traffic that they might send our way. As it turns out, the timing was coincidental; we just hit a critical mass of publishers around the same time that pushed our servers too close to the edge for comfort. But we were ready - and I'm glad we were.
We added an extra web server and an extra mail server to handle the load better, but (more importantly) we have made other changes that have now made the site much snappier to use and the mailings much less stressful to our infrastructure. We're supporting more subscribers, publishers and visitors with less effort than 10 days ago. FeedBlitz is now much faster and much more responsive than before (possibly ever, in fact).
So far, so good. If you're no nerdling then please disembark this post now as we have reached your destination. Otherwise, read on to find out what we did beyond enabling the extra iron.
Amazon S3
One of the biggest challenges we were facing last week, as it turns out, was not that our servers running out of steam per se - they all had CPU, bandwidth, memory and disk to spare. What we were starting to run out of was sockets for Internet connectivity. Between the WWW access, mail deliveries, TCP timeouts and HTTP keep-alives (see, I told you this would be techie) some of the servers were occasionally exhausting their socket pools, effectively taking them offline for a few seconds during busy times of the day, apparently randomly. Adding extra servers helped, to be sure, but doing so only defers the problem, and is also a relatively expensive, high-maintenance fix. We needed a more long-term solution to managing our day to day WWW usage.
And we found it. A little late to the party, perhaps, but over the last 7 days we have offloaded all the work that doesn't reflect our core value-add but is important to the web site, such as image and script file serving, to Amazon's S3 service. Our static images and scripts are now served from the cloud, and none from our servers. This is the proverbial win-win situation, as Amazon serves the images faster than we ever could, which in turn makes feedblitz.com much more responsive as well as improving our connectivity. So we solved our immediate need and got snappier site performance in the bargain.
Most of our pages now need exactly one socket access per page, to pull down the core HTML; all the other baggage comes from S3, saving about 3 connections per web site visitor, or saving us 75%. And right now it's costing us a whopping ~$3 a day for what is, effectively, an infinitely large image / asset server. It's serving our images more cheaply, more quickly, more reliably and much more manageably than we ever could or can using a self-managed dedicated server.
Obviously, I'm a fan. If you're running a site or service that is going to get big, I'm now of the opinion that you're nuts not to outsource to S3 or a similar service to store and serve objects that aren't core to your value add. It's faster, better and cheaper and whole lot less hassle. Do it!
That said, those in the know will no doubt be asking, what about the complementary Amazon services, EC2 and Simple DB? Well, we're not such a good fit there. Our code doesn't work on the EC2 infrastructure, so that's a non-starter. There may in fact be a role for SDB in future service offerings, but we've no plans to move our current database over to it right now. I will say this, though: If we were to be creating FeedBlitz now, there's no doubt in my mind that I'd use SDB (or a similar service) and S3 as the back end from the get go.
FastCGI
I also mentioned that we're not a good fit for EC2 because of the way our application is built. Won't bore you with the details. Up until this week it delivered the goods to the WWW via an old back-end protocol called CGI. Which worked nicely, but isn't known for its performance. This week we deployed new versions of the core HTML newsletter application and our email ad server using FastCGI, which (as the name implies) is like CGI, only faster. Since we're a custom app, it wasn't trivial to add FastCGI to what we do, but now we have and it's working a treat. It's reduced the load on our systems all along the stack and increased both throughput and responsiveness.
So there you have it: some re-engineering (FastCGI), selective outsourcing (S3) and extra servers have made the pain go away. Until the next nice problem to have, anyway ...
A nice problem is still, well, a problem.
So last week we weren't crushed, but FeedBlitz has been much busier than usual, and performance has been, at times (let's be charitable), sluggish. Why? We had the "nice problem" of seeing usage ramp up significantly. On April 15th we sent nearly 3.1 million messages, easily our busiest day ever. Our next busiest day was the day before, at over 2.4 million messages. I haven't written a monthly update for a while, but March's output was 20% greater than February's, at nearly 48 million messages sent. So far this month we've sent 32 million already, so we're on track to blow away March's record as well, probably by a similar amount. That's the "nice" part. But the cost of all this goodness was performance started to deteriorate on the web site. That would be the "problem" side of the equation.
The good news is that as this hit, we were already working on performance improvements knowing that Gawker was about to go live. Gawker being a very busy site, we didn't want them to go live and then have us, their shiny new email service, go and disappear under all the traffic that they might send our way. As it turns out, the timing was coincidental; we just hit a critical mass of publishers around the same time that pushed our servers too close to the edge for comfort. But we were ready - and I'm glad we were.
We added an extra web server and an extra mail server to handle the load better, but (more importantly) we have made other changes that have now made the site much snappier to use and the mailings much less stressful to our infrastructure. We're supporting more subscribers, publishers and visitors with less effort than 10 days ago. FeedBlitz is now much faster and much more responsive than before (possibly ever, in fact).
So far, so good. If you're no nerdling then please disembark this post now as we have reached your destination. Otherwise, read on to find out what we did beyond enabling the extra iron.
Amazon S3
One of the biggest challenges we were facing last week, as it turns out, was not that our servers running out of steam per se - they all had CPU, bandwidth, memory and disk to spare. What we were starting to run out of was sockets for Internet connectivity. Between the WWW access, mail deliveries, TCP timeouts and HTTP keep-alives (see, I told you this would be techie) some of the servers were occasionally exhausting their socket pools, effectively taking them offline for a few seconds during busy times of the day, apparently randomly. Adding extra servers helped, to be sure, but doing so only defers the problem, and is also a relatively expensive, high-maintenance fix. We needed a more long-term solution to managing our day to day WWW usage.
And we found it. A little late to the party, perhaps, but over the last 7 days we have offloaded all the work that doesn't reflect our core value-add but is important to the web site, such as image and script file serving, to Amazon's S3 service. Our static images and scripts are now served from the cloud, and none from our servers. This is the proverbial win-win situation, as Amazon serves the images faster than we ever could, which in turn makes feedblitz.com much more responsive as well as improving our connectivity. So we solved our immediate need and got snappier site performance in the bargain.
Most of our pages now need exactly one socket access per page, to pull down the core HTML; all the other baggage comes from S3, saving about 3 connections per web site visitor, or saving us 75%. And right now it's costing us a whopping ~$3 a day for what is, effectively, an infinitely large image / asset server. It's serving our images more cheaply, more quickly, more reliably and much more manageably than we ever could or can using a self-managed dedicated server.
Obviously, I'm a fan. If you're running a site or service that is going to get big, I'm now of the opinion that you're nuts not to outsource to S3 or a similar service to store and serve objects that aren't core to your value add. It's faster, better and cheaper and whole lot less hassle. Do it!
That said, those in the know will no doubt be asking, what about the complementary Amazon services, EC2 and Simple DB? Well, we're not such a good fit there. Our code doesn't work on the EC2 infrastructure, so that's a non-starter. There may in fact be a role for SDB in future service offerings, but we've no plans to move our current database over to it right now. I will say this, though: If we were to be creating FeedBlitz now, there's no doubt in my mind that I'd use SDB (or a similar service) and S3 as the back end from the get go.
FastCGI
I also mentioned that we're not a good fit for EC2 because of the way our application is built. Won't bore you with the details. Up until this week it delivered the goods to the WWW via an old back-end protocol called CGI. Which worked nicely, but isn't known for its performance. This week we deployed new versions of the core HTML newsletter application and our email ad server using FastCGI, which (as the name implies) is like CGI, only faster. Since we're a custom app, it wasn't trivial to add FastCGI to what we do, but now we have and it's working a treat. It's reduced the load on our systems all along the stack and increased both throughput and responsiveness.
So there you have it: some re-engineering (FastCGI), selective outsourcing (S3) and extra servers have made the pain go away. Until the next nice problem to have, anyway ...
Labels: Amazon S3, FastCGI, FeedBlitz
|
0 Comments:
Post a Comment
Note: Only a member of this blog may post a comment.
<< Home