happiest unalice ever

January 9, 2009

Investigating Unthrottling Talos Boxes

Filed under: mozilla, talos — alice @ 3:12 pm

Way back in the day when the Talos project was just getting off the ground we had concerns about the granularity of JavaScript timers and their ability to accurately measure performance results. We quickly decided to go with a CPU throttling system, whereby we could slow down the testing and get repeatable test results (Bug 393940 - throttle mac mini XP speed down to something slower). Since that day we’ve had throttling on all Talos WinXP, Vista and Ubuntu boxes, and would have had on Tiger and Leopard if we’d managed to figure out a way to do it.

John Resig recently put together a very thorough blog post about the accuracy of JavaScript time. From his graphs you can see that Firefox 3 does a fine job of reporting to the millisecond timing, and it appears to have been doing so since Bug 363258 - bad millisecond resolution for (new Date).getTime() / Date.now() on Windows was fixed back in the summer of 2007. Now, I could be wrong here, but it was my understanding the the timing was still bad on Firefox 2, which Talos continued to test till just last month when the boxes were retired for that branch.

In either case, we now have good timing on Firefox 3 and we no longer have to worry about the Firefox 2 question so it’s time to examine if we want to keep throttling Talos boxes. There are several obvious pluses to not running throttled: faster test results, cross-comparison between performance results across all platforms (at the moment, we cannot compare Tiger/Leopard results to anything else as they are unthrottled), closer match between test environment and end users’ machines and a more simplified set up for Talos boxes themselves. We will lose out on the ability to compare current results to any historic data that we have already collected - turning off throttling doesn’t simply half the results so there isn’t any easy way to tell what a new unthrottled results “means” in comparison to an old throttled result.

I’ve started to take steps to get us to an unthrottled state (Bug 468680 - unthrottle talos winxp, vista and ubuntu boxes.). For now that just means turning off throttling on all of our Talos staging machines and letting it run for a while. In a week I’ll look over the numbers and see if we are getting consistent, low-variance results out of the newly unthrottled machines. If all that goes smoothly a downtime will be scheduled and the scripts on all the Talos machines that control throttling will be removed and a drop will be seen for all performance results on WinXP/Vista/Ubuntu.

Just for the record, this does break my heart just a little bit. Controlling throttling on Talos boxes has been one of the biggest headaches of the system. Turns out that modern operating systems don’t really want you to do anything to touch the CPU, and are quite happy to hide it very deep and ignore requests when they see fit. That throttling is on consistently across three operating systems and is correctly restarted on reboot is due to weeks of effort. Now that you have all shed a single tear for my loss I’ll get back to removing all my throttling scripts. Sunrise, sunset and all that.

2 Comments »

  1. >Just for the record, this does break my heart just a little bit.

    The tears will be over compensated by the love that you get from all the people waiting for the tinderbox to finally delivering its result. Just try to estimate how much hours you alone will save the project.

    Comment by bernd — January 10, 2009 @ 7:35 am

  2. [...] Although in these cases, the regression was actually a win in terms of performance, it shows that the algorithm works. The second regression is due to Alice unthrottling the Talos boxes. [...]

    Pingback by Automated Talos Analysis | chris' random ramblings — February 5, 2009 @ 8:45 pm

RSS feed for comments on this post. TrackBack URL

Leave a comment

Powered by WordPress