Thursday, 5 September 2013

Task-execution-ETA in AppEngine Push Queues given lack of server clock synchronization

Task-execution-ETA in AppEngine Push Queues given lack of server clock
synchronization

AppEngine Push Queues allow Tasks to be scheduled for future execution if
they are added with the TaskOptions.etaMillis(...) option. This method
expects a long parameter that specifies the time when to execute the task
in absolute ms, just as returned by System.currentTimeMillis().
Given that AppEngine makes no guarantees about server clock
synchronization and the clocks can be off by something on the order of
HOURS!!! (see "Google I/O 2010 - Data pipelines with Google App Engine" at
0:36:07), how can this be reliable?
Let's consider the following example:
An http request comes in and gets routed to an instance whose clock
happens to be 30 mins ahead
During the request handling, I would like to defer some batch processing
to a background task
I would like to have the results available to report back to the user
within around 10s
So, I schedule the task with an ETA of System.currentTimeMillis() + 10,000
Given the 30 minute clock skew, this ETA actually corresponds to 30 mins
and 10 secs from now
Thus, if the task is now processed by a different instance, it might be on
hold for over 30 mins
Needless to say, for the user it would seem as if my service had died
Is this prevented somehow in the underlying API? If not, how can Task ETAs
be useful at all? Wouldn't the ETA have to be specified as a relative time
rather than an absolute one for this to work?
The really sad part is that there actually is a function called
TaskOptions.countdownMillis(...) that does expect a relative time, but
looking at the source code that ultimately handles this value, one sees
that it is simply converted to an absolute time specification based off
the same highly unreliable System.currentTimeMillis().
Worse still: If you don't specify an ETA or a countdown, this function
simply uses the current system time rather than 0, so even a task that you
expect to execute immediately might end up being on hold for an hour or
more!
Is this some major bug or am I missing something?
Also, the same should apply to leases of Tasks in Pull Queues, right?

No comments:

Post a Comment