happiest unalice ever

October 8, 2008

Rebootable Talos Machines (and why that’s a good thing)

Filed under: mozilla, talos — alice @ 3:27 pm

Last quarter we optimistically added Bug 447696 (make talos machines rebootable) to the Release Engineering goals list.  We wanted to solve this consistently repeated order of events:

  1. Talos box starts behaving oddly (low/high numbers, stopped reporting all together, etc), developer notices and files bug
  2. IT looks at bug, Talos documentation is incredibly complex and doesn’t inspire confidence
  3. IT hands bug to Release Eng
  4. Release Eng attempts to fix the machine without having to resort in a reboot
  5. Reboot deemed necessary, IT bug filed for manual reboot
  6. Machine rebooted, IT passes bug back to Release Eng
  7. Release Eng goes through manual configuration steps to get machine back to stable state
  8. Release Eng restarts buildbot slave
  9. Success!

The worst of this is that all those steps with ‘Release Eng’ really end up meaning ‘Alice’.  The Talos project suffers somewhat from having all of its knowledge centralized in a single human - and this human wants to take a holiday now and then.

Thankfully, we are now in a place where all active Talos boxes are fully rebootable.  Go ahead - reboot whatever you want.  It should come back up clean and ready to test.  The new order of operations is thus:

  1. Talos box starts behaving oddly (low/high numbers, stopped reporting all together, etc), developer notices and files bug
  2. IT looks at bug, whatever the problem is IT reboots the machine
  3. Success!

We got here by having me carefully tease out the various configuration settings necessary on our five supported platforms (WinXP, Vista, Tiger, Leopard, Ubuntu) and then learning all about how automation works under various frameworks (batch files! plists! rc.local! etc, etc, etc).  Once I had created a plan of attack for each platform I moved on to manually updating our 58 active Talos machines.  Big thanks to John O’Duinn and Nick Thomas for helping out with the machine updates; it would have taken a lot longer without their assistance.  You can dig around from that main bug to all the various bits and pieces that it took to put this together.  It was time consuming.  It was painful. It was totally worth it.

Did I mention that I’m going on holiday next month…?

September 30, 2008

Standalone Talos, V1.3.1

Filed under: mozilla, talos — alice @ 4:14 pm

I recently updated Standalone Talos to version 1.3.1.  This version includes updates to Talos and Pageloader code along with some simple fixes:

  • Upon starting, Standalone Talos now warns the user that all open browser windows must be closed before testing can begin and then exits.  This was in response to the not terribly pleasant behavior described in Bug 454999 - Standalone talos kills all running Firefox processes.  Be warned that Talos enters the browser kill/clean up code after each individual test is run - so if you start a secondary Firefox during testing it will get summarily killed.
  • Removed some bad pages from the included web page test set used by Tp.  A couple of the included pages displayed Flash warnings and can kill the browser.  Since these problems were most likely introduced in the attempt to clean up the pages for testing purposes it isn’t considered a ‘real’ browser crash and we’d rather just ignore it.
  • Better ability to recognize browser freeze up/crash.  Way back with Bug 416911 - per-test timeout in talos, code was added to Talos to monitor the browser for activity - if it doesn’t load a page in a given amount of time it is considered to be busted. This is finally making its way into Standalone Talos.

Currently, Standalone Talos is simply a zipped up package of code, manifest files and test pages.  In the future it would be smarter to have it checked into its own directory or have its own branch - as is, it only gets updated if I happen to remember that it has fallen out of date.  That said, if you see a fix go into Talos that hasn’t yet been included in a version of Standalone Talos feel free to enter a bug under component “Release Engineering: Talos”.

« Newer Posts

Powered by WordPress