2006-11-27

Catalyst Job Queue - Part 2 - Issues

Differentiating between Job and HTTP Requests

The first issue raised was how can the application code determine whether a request came from the JobQueue or from a real user request.

The solution to this one is simple, let the JobQueue set a custom HTTP header, let's say X-Via-JobQueue: yes and your code can check for thae presence or absence of the header and act accordingly.

Job Results

The second problem is to find a more sensible way to deal with job results.

For now, the result is printed to STDOUT, but it would be nice to be able to store it and serve it (perhaps via HTTP) and/or send it by email.

Also, the way a job is removed from the queue based on it's results status should be configurable (i.e. make it possible to provide a list of status codes for which the job is to be remove from the queue)

One-time Jobs

Another issue is with jobs that should run a limited number of times. Currently a job will run forever at specified intervals. Running a job just one time is possible by making it return an error code, but it's clumsy. This method doesn't allow for running jobs at "now + x minutes/hours/days"

What is needed to solve this issue is the possibility to schedule a job using an at-like syntax or a natural time specification (like "next friday", using DateTime::Format::Natural)

Job Scheduling

We would also like to programtically schedule job from within the main application (initiated of course by a user HTTP request). This requires a communication channel. Or more exactly a framework for communicating between the job queue server and the http server.

Continuation Jobs

An easier way to setup a job in the job queue is to save the state of a request with Catalyst::Plugin::Continuation and have the job queue server retrieve it. Unfortunately this would also require a communication channel to transmit the session_id of the request. But it would save us the effort to setup a request.

2006-11-24

Websites Rings

Today I (re-)discovered that Website Rings still exist. In my mind these go together with Altavista and GeoCities

2006-11-21

Micromanaging zombies

From the I misread it on the internet category:

Micromanaging zombies creates employees

(misread on the Creating Passionate Users blog)

... No further comments

2006-11-20

Catalyst Job Queue - Part 1 - First Implementation

Overview

The JobQueue's current incarnation is a proof-of-concept (i.e. no tests, no docs) Engine based on Catalyst::Engine::HTTP::POE.

Job

Right now, a job is a simple affair, which is translated into an application HTTP request. Internally the job is a hash associated with an ID (the ID is the refaddr of the hash). The hash is built from a crontab (5) line and contains the job data:

  • cronspec - a crontab (5) periodicity specification, it is passed to POE::Component::Cron
  • user - the user the job should be run as (not neccessarily a system user). Currently unused, will be used for authentication process
  • request - the request URI passed to the application code, may contain params.

Running a job

What the engine does is schedule a job_run routine to be run according to the cronspec. This routine gets the job ID passed as param. It then sets these %ENV vars (a la CGI engine):

  • REMOTE_ADDR - 127.0.0.1 (technically true, although we have no TCP/IP connection)
  • REMOTE_HOST - localhost (see above comment)
  • REQUEST_METHOD - GET (should be configurable though it is not)
  • SERVER_NAME - 127.0.0.1 (doesn't make much sense, no TCP/IP connection)
  • SERVER_PORT - 80 (see above comment)
  • SERVER_PROTOCOL - HTTP/1.0 (should be configurable too)
  • PATH_INFO - the path part of the request URI configured in the crontab
  • QUERY_STRING - the rest of the request URI

After this the request almost looks like a real HTTP request, therefore it can be processed like a usual request (by the engine methods inherited from Catalyst::Engine::CGI and the rest of the application) so the engine just goes through the movements (as shown in C::E::HTTP::POE).

The response from the application code is printed by the CGI Engine to STDOUT. Now, in a CGI environment this would send the response back to the client, but here we have no client so the response is just printed to the console (or /dev/null depending on how you started the job queue server).

If the job status (returned by the application code) is either 0 or greater or equal than 400 (which would translate to an HTTP client or server error status code) the job is removed from the queue. If not, it will be cleaned up and re-run at the appropiate time.

(to be continued)