FeedBlitz News - Email Contact and Social Media Marketing Automation: October 2009

RSS Metrics: The Good, the Bad and the Invisible

Tuesday, October 06, 2009

Metrics matter - it's rule #1 for marketing programs. If you can't measure the success of a program (whatever success means to you) then you have no idea how well it's working and whether you should invest in it, tune it, or kill it. If you can't measure it, you can't manage it. Simple, really.

So online marketers invest a lot of time setting up measurement systems for their online programs, such as Google Analytics for web sites and open / click through tracking for email marketing programs. It's even possible to get stats-addicted, endlessly micromanaging and tuning online programs while the bigger picture - and the bigger opportunities - pass you by.

But that aside, getting to know your basic metrics - and their trends - is fundamental. That's true for bloggers and new media sites, because we all feel good when our stats go up and feel in our guts when a key metric burps.

So how come the state of RSS metrics is so parlous? Most bloggers focus on their RSS circulation as the key metric, and the only other useful metric commonly available is reach (more on both these below). RSS services like FeedBlitz and FeedBurner give you that top line circulation number, even though it's almost certainly meaningless.

RSS Metrics: The Bad

Screeeeech. Rewind. Circulation is "meaningless"?

Yup. Not because the number isn't accurate. It's certainly the best number the relevant service can come up with, and it can be tuned or altered from time to time to "improve" it.

No, it's not because it's inaccurate (but, man, we can surely debate that until the proverbial cows come home). No, circulation is mostly rot because it's not a particularly useful metric to be tracking. It's analogous to the total number of email subscribers in your email list. As any decent email marketer will (or should) tell you, size doesn't matter; it's quality that counts. And quality is largely measured by metrics such as open and click through rates. In other words, it's how many recipients interact with your mailing that determine how successful each mailing is. Same with advertising - better targeting yields better response rates which, in turn, command higher prices. And so it should be with feeds.

RSS Metrics: The Good

So where are we in RSS-land? We have subscribers (or as we call it, circulation), the total (-ish) number of subscribers to your RSS feed, and then there is reach, which is a measure of how many of your subscribers have interacted with your feed. Reach is a good quality metric, as it tells you how much activity your feed is generating. While reach will vary from day to day and post to post, the trend in reach will tell you whether you are gaining or losing attention from your readership. Reach, not circulation, is what you should really care about.

The FeedBlitz RSS service calculates your reach on any given day by determining how many unique readers interacted with your feed (either opened it or clicked through). Useful, yes. But sadly reach, too, has its problems. Your reach total is probably under-reporting activity, because it has the same problems as email open rates: If the subscriber doesn't have images displayed you can't track the act of opening the email or reading the RSS article, because it's the call to the server for a tracking image that is counted. No images, no counting. Instead, for that subscriber, you have to catch them when (or, rather, if) they click through.

RSS Metrics: The Invisible

Here's where the bad news kicks in. Did you know that you are simply missing out on almost all subscriber clicks on your RSS feed? That you're not tracking the vast majority of these subscriber interactions?

It's true. Unless our invisible subscriber above happens to click through to the source article (usually by clicking on the post's title) - if they click on anything else - dollars to doughnuts you're going to miss them. And that means you're not getting the big picture at all. In fact, perhaps (or even probably) you're getting only a very small fraction of it. Why? Because you're not counting clicks on links that are within the post. If our invisible subscriber clicks on anything inside the post - anything at all - they won't be tracked and that subscriber's activity missed.

By way of example I offer you TechCrunch's RSS feed at http://feeds.feedburner.com/techcrunch (not to pick on TechCrunch, by the way; I'm just using a well-known technology feed to make my point).

TechCrunch, like most new media companies, liberally sprinkles its articles with links, some internal to the site and some to the third party sites mentioned in the post. Looking at the current feed I see the article R.I.P. Good Times: One Year Later - which has sixteen (16) links in the post, not including ads and FeedFlares (remember, you're accessing the posts via the feed http://feeds.feedburner.com/techcrunch not their website). TechCrunch has link tracking enabled via FeedBurner, and so the link back to the site via the article's title is tracked. You can tell because the link doesn't look like a TechCrunch site link; it has codes and funny characters (a tilde: ~ ) in it if you hover over it with your cursor / mouse pointer.

Not so those 16 links inside the post. They're regular URLs - no tracking. Now, let's say you're one of the bajillion people who subscribe to TechCrunch in your RSS aggregator. What do you click on? Do you click on the post itself, only to read exactly the same article online (only with more ads)?

Nah. More than likely you click on the links inside the article and head off to read about Sequoia Capital or the the deadpool or the crunchbase or whatever else is linked to in the post.

Which means that when you do that, the RSS feed metrics will under-record the feed's reach because your click isn't counted. The good folks at TC simply can't tell you how many of their RSS readers click on links inside any post because they're simply not collecting that data. And nor are you for your blog. If you extensively use links in your posts you're absolutely, positively, no-doubt-about-it missing out on feed-based activity.

If the likelihood is that these are, in fact, the links that are being clicked on by subscribers, can you imagine how much intelligence, how much useful information is being lost? Does that translate into lost revenue? Remember, if you can't measure it, you can't manage it. You're optimizing your online programs with massively incomplete information, and (worse) you don't even know how incomplete your knowledge is.

RSS Metrics @ FeedBlitz: Making the Invisible Visible

Wow, talk about burying the lead. Anyway, as of now, the FeedBlitz RSS service ALSO tracks internal links inside feed posts automatically. These links will appear on the RSS report and also make the reach figures larger (because we'll be capturing more activity) and more accurate (for the same reason). How much larger will depend on how often you use links inside your posts; the more often you add links within a post (I think TechCrunch, with double-figure counts, is fairly extreme) the better the metrics and the more likely you are to see your reach rise as a result.

As an example, read this article at http://feeds.feedblitz.com/feedblitz (why not subscribe while you're there!). Look at the links inside. Coded. Tracked. This will be the first article - possibly ever - on an RSS feed where I, the feed owner, will be able to see all the activity the post generates. Finally, fret not, SEO mavens: all the links are 301 redirects, giving you the full Google-juice benefit.

Better yet, if you use your FeedBlitz feed to power your email marketing, you get the same benefits too. This is a first for online social media marketing.

Cool.

Marketing metrics matter. Get the whole story and start a trial today.

Labels: features, reporting, RSS

FeedBlitz RSS Traffic up 17x over last week

Thursday, October 01, 2009

If you were affected by the ho-hum availability of the FeedBlitz RSS service recently (we're A-OK now, by the way), here's why: Traffic jumped by a factor of 17 over this time last week and it took us too long to get to grips with it, for which I apologize. I hope this post will help explain why that happened, what we did and didn't do, and what we have learned from the experience.

Chronology

Late last Friday (September 25th) we noticed that incoming traffic had - very suddenly - started to clog the RSS service. We dug around a little, found out why, and added a new server. That seemed to handle things nicely and that was that - or so we thought at the time. Over the weekend the service was doing OK - a couple of flags here and there but nothing that seemed to merit urgent attention - and this continued through into Monday. So far, so good, and I was able to talk about the new integrated comment feature instead. Swell!

Come Tuesday morning, though, and things weren't so happy. So we span up yet more servers to hold the fort (let's hear it for Amazon EC2 and cloudy services!), which seemed to settle things down.

Yesterday (Wednesday), however, the traffic was again swamping the virtual server array and it was clear that simply spinning up more of the servers we had been using wasn't cutting it. So, instead, yesterday morning we hauled out a series of much larger servers into (from?) the cloud. At one point we had 7x the number of servers running than we did a week ago. Right now, the computing capacity currently deployed to serve feeds is more like 30x more than this time last week.

Results

The good news is that the current infrastructure is now holding its own really well - in fact, we're starting to take some of the servers down now that we understand better the nature of this flood and how to handle it. There's plenty of headroom left right now to cope with at least a doubling of traffic should that happen and RSS delivery is currently extremely stable and responsive.

From a numbers perspective, comparing yesterday to a week ago, RSS traffic we served went from just under 15 requests per second to 251 requests per second yesterday - that's the 17-fold increase. That's a huge change in what had otherwise been a predictable service with predictable traffic patterns.

Questions, Questions.

So that's what we did - but where did all this traffic come from? And how come we were so surprised?

Let me take the second question first. After about 6 months running the RSS offering we know (or thought we knew) how it worked and how it changed. Basically, as we incrementally acquire new customers for the service, the load on the service grows incrementally in parallel. Most of the feeds we see have circulations in the thousands and tens of thousands, nothing scary about that at all and well within the ability of last week's infrastructure to handle for the foreseeable future. We had no idea last Friday that this was about to change.

At the end of last week a new client started to use FeedBlitz to serve their feed. Their feed was not being accessed via traditional aggregators, though. Instead, it was being accessed by an automatically updating browser toolbar. With a circulation, apparently, of around 7 million individual users. So when a user with the toolbar fires up their browser they fetch the feed (and because the toolbar itself isn't particularly well-written, they keep on asking for the full feed, despite the headers we send back. Grrr!). Literally one minute things were fine, and the next, overload central.

The toolbar factor is important. By way of background, most bloggers and content publishers have large portions of their audiences consolidated by web aggregators, such as Google Reader, email services like FeedBlitz, or have well-behaved desktop aggregators that know not to keep hitting up feed every 10 seconds. So even if a blog network comes to us with an audience of, say, a million RSS subscribers, it's more than likely that a good half of those will be handled by a few large aggregators, and the load from the personal systems might add a million hits or so a day to the system or so. That's easily handled. RSS aggregators are well behaved too; they understand things like entity tags in HTTP headers which help minimize traffic.

The toolbar added an extra 20 million hits per day to the infrastructure. It's therefore roughly equivalent to the load generated by a blog with a multi-million subscriber RSS feed. Does anyone in the real world have a feed that size, anyway? I don't think even TechCrunch's or Scoble's RSS audiences are that big. So, no, we just didn't expect this kind of load appearing, unannounced, overnight.

Looking Ahead

So, anyway, hence the performance issues earlier this week. We strive to do our best here but for a while we weren't able to keep up. That's personally disappointing for me and I am certainly sorry for the inconvenience wrought on some of our clients as a result. The good news is that the RSS infrastructure is now not only better able to cope with a similar sized surge in the future, we're better able to ramp it up quickly and aggressively as and when we have to. One key lesson learned is that we scaled too conservatively when the initial hits came in, causing us to have to scramble later on - weekend traffic is always lighter than weekday and we failed to take that into account adequately. With hindsight we should have overscaled at first, then pulled back as the full traffic profile became clearer. We won't be making that mistake again.

Geekery: AWS, EC2, S3 and Memcached

Technically, there were a few surprises as well as we drilled into how to improve performance. You may not realize this, but when we serve a feed to you - say http://feeds.feedblitz.com/feedblitz - we're not simply sending a static file. The feed is subtly different for each individual user. Which is why we need beefy servers to handle the load because FeedBlitz RSS is CPU-limited, not bandwidth limited. In other words, serving a feed is work for the computer involved; we're not simply yanking the file out of a local cache and declaring victory.

But talking of caches, here's another interesting technical factoid we discovered. Like many services, we use a technology called memcached to help scale. We'd assumed that it would work well in the EC2 cloud because there was no reason to think otherwise. And we were wrong. Our instrumentation in the middle of this episode clearly showed that caching and retrieving data from Amazon's S3 service was consistently an order of magnitude faster for servers under stress than fetching the same data from an EC2-hosted memcached server. My assumption is that there are environmental optimizations within the AWS environment that grease the electronic skids for S3 requests. If you're using memcached inside EC2 my recommendation is that you instrument the code you're using - memcached might not be working as effectively for you as you think it is. For what it's worth, we had been using memcached as a primary cache with S3 as a backstop. Now we just go directly to S3 instead. It's way faster.

The Bottom Line

So there's the scoop. Big flood of traffic, didn't scale fast enough, but we're good now.

Post Script

Oh, and by the way: If you're planning on dropping a few million subscribers on us overnight, please do. We can handle it now. Just give us a heads up first, okay? Thanks!

Previous News

News Archives

RSS Metrics: The Good, the Bad and the Invisible

Tuesday, October 06, 2009

FeedBlitz RSS Traffic up 17x over last week

Thursday, October 01, 2009