Thursday, November 03, 2005
*sigh* It's been something of a bumpy ride this week. Last night's run was stopped by previously unencountered feed date format problems. FeedBlitz has been updated this morning to gracefully handle this class of feed coding error and the run is now under way. So apologies to everyone, again.
Dates, for what it's worth, are the most important and most common cause of problems (usually manifesting themselves as "My emails didn't come through!" emails to support) after junk mail filters. Last night's problems came as a result of dates in feeds representing 1/1/1970 and a date in 2038. Clearly, these dates in feeds are bogus (most programmers will recognize them), but they were nevertheless present in a couple of feeds polled by FeedBlitz last night and these failures, unfortunately, caused the polling process to halt. It's off and running now, but clearly there's work to be done here.
So, as of today job #1 is to put in much more work into improving FeedBlitz's resistance to failure during long process operations like the nightly poll. Everything else will be put on hold until this has been done. We recognize that reliability is key and that reliability recently has not been where it needs to be. This will be dealt with.
Update 11/3 11:46 pm - Changes have been made to significantly reduce the risk of a single problem derailing an entire run. We'll watch stability over the next few days to ensure that the service is consistently running at the levels you expect.
But back to dates for a minute. FeedBlitz is a date and time-driven application. Here, in brief is how it works when deciding which posts to send out:
1) FeedBlitz asks the blog sever if the feed has changed.
2) If the feed is downloaded FeedBlitz inspects the datestamp in the feed container envelope.
3) If the container indicates entries changed in the timeframe that FeedBlitz is looking for, FeedBlitz goes on to examine the individual entries
4) If the time stamp in the entry XML matches the time range FeedBlitz is checking, the post is included.
5) If there is no timestamp (e.g. RSS 0.9x, horrible format, please change to something smarter if you're still using it, there's really no excuse), FeedBlitz goes back to the source server and pulls each entry's link if the server says that the link changed in the relevant time frame. Data about the most recent entry is cached to limit bandwidth usage on future polls.
6) For posts with multiple date entries (typically created, modified and published dates), FeedBlitz uses created if available, modified if not. This means that if you edit a post next day it won't be re-sent (it didn't used to be that way, but this prevents accidentally spamming all your subscribers if you make a wholesale change to your blog template).
7) After all this, FeedBlitz has a list of entries from the feed that match the date criteria. These are added to the outbound mail.
So when will a post not show up in your email? Most likely when:
1) The date does not conform to any reasonable formatting scheme (usually a problem with manually crafted feeds).
2) The feed wasn't updated at the time FeedBlitz scanned it.
3) The container date is not updated even if there are more recent posts.
4) The dates continually change when you republish, which continually moves the post out of the relevant scanning window.
5) Correctly formatted but meaningless dates are used (e.g. 1970 or 2038).
6) Your server couldn't be reached by FeedBlitz, or it failed to respond in a timely fashion.
7) There's a problem with FeedBlitz itself.
8) There's a sp*m filter blocking emails.
For what it's worth, #7 happens the least (although I appreciate that it doesn't feel like that this week) and #8 the most.
So. There you have it. The bad news is that there are, again, delays. The good news (such as it is) is that we're more or less on top of the situation, the fixes are in, and the mail will be delivered.
Thanks for bearing with us,
Phil
Dates, for what it's worth, are the most important and most common cause of problems (usually manifesting themselves as "My emails didn't come through!" emails to support) after junk mail filters. Last night's problems came as a result of dates in feeds representing 1/1/1970 and a date in 2038. Clearly, these dates in feeds are bogus (most programmers will recognize them), but they were nevertheless present in a couple of feeds polled by FeedBlitz last night and these failures, unfortunately, caused the polling process to halt. It's off and running now, but clearly there's work to be done here.
So, as of today job #1 is to put in much more work into improving FeedBlitz's resistance to failure during long process operations like the nightly poll. Everything else will be put on hold until this has been done. We recognize that reliability is key and that reliability recently has not been where it needs to be. This will be dealt with.
Update 11/3 11:46 pm - Changes have been made to significantly reduce the risk of a single problem derailing an entire run. We'll watch stability over the next few days to ensure that the service is consistently running at the levels you expect.
But back to dates for a minute. FeedBlitz is a date and time-driven application. Here, in brief is how it works when deciding which posts to send out:
1) FeedBlitz asks the blog sever if the feed has changed.
2) If the feed is downloaded FeedBlitz inspects the datestamp in the feed container envelope.
3) If the container indicates entries changed in the timeframe that FeedBlitz is looking for, FeedBlitz goes on to examine the individual entries
4) If the time stamp in the entry XML matches the time range FeedBlitz is checking, the post is included.
5) If there is no timestamp (e.g. RSS 0.9x, horrible format, please change to something smarter if you're still using it, there's really no excuse), FeedBlitz goes back to the source server and pulls each entry's link if the server says that the link changed in the relevant time frame. Data about the most recent entry is cached to limit bandwidth usage on future polls.
6) For posts with multiple date entries (typically created, modified and published dates), FeedBlitz uses created if available, modified if not. This means that if you edit a post next day it won't be re-sent (it didn't used to be that way, but this prevents accidentally spamming all your subscribers if you make a wholesale change to your blog template).
7) After all this, FeedBlitz has a list of entries from the feed that match the date criteria. These are added to the outbound mail.
So when will a post not show up in your email? Most likely when:
1) The date does not conform to any reasonable formatting scheme (usually a problem with manually crafted feeds).
2) The feed wasn't updated at the time FeedBlitz scanned it.
3) The container date is not updated even if there are more recent posts.
4) The dates continually change when you republish, which continually moves the post out of the relevant scanning window.
5) Correctly formatted but meaningless dates are used (e.g. 1970 or 2038).
6) Your server couldn't be reached by FeedBlitz, or it failed to respond in a timely fashion.
7) There's a problem with FeedBlitz itself.
8) There's a sp*m filter blocking emails.
For what it's worth, #7 happens the least (although I appreciate that it doesn't feel like that this week) and #8 the most.
So. There you have it. The bad news is that there are, again, delays. The good news (such as it is) is that we're more or less on top of the situation, the fixes are in, and the mail will be delivered.
Thanks for bearing with us,
Phil
|
1 Comments:
Glad to hear you have a clear idea of what is causing the problem and a viable solution forthcoming.
Post a Comment
Note: Only a member of this blog may post a comment.
<< Home