If you’ve been following along with Bug 480413 - design test to monitor browser shut down time you’ll know that I attempted to roll out Tshutdown on the 18th. Unfortunately, there were a couple of bugs that weren’t discovered in staging and I spent 6 or 7 hours on a beautiful Saturday afternoon attempting to fix them on the fly. Not able to make it work I had to back it all out and fume.
With a clearer head I approached the bugs on Monday morning and resolved both of them. Then it was just a question of waiting for an appropriate downtime. Though I’m as excited about 3.5b4 as anyone, it really gets in the way of arranging Talos code changes. Finally, yesterday afternoon we were able to shut down everything and check things in.
I’m glad to say that this landing was successful. There’s a good chance that Tshutdown will clean up the issues in Bug 478603 - intermittent orange on Windows mozilla-central talos Ts and Tdhtml tests (”failed to initialize browser”), as we’ll start to record the longer shutdown times instead of freezing up and reporting orange.
It’ll be a few more days till we have enough data for Tshutdown to look like anything but a scatter graph, but you can check out the reported results at graphs-new.mozilla.org. There’s actually more to Tshutdown than a single test. You’ll see “Tp3 Shutdown”, “Tp3 Nochrome Shutdown”, “Tp3 Fast Shutdown” and “Ts Shutdown”. Basically, we record the shutdown time after running the given test. Post Ts results in a ‘clean’ shutdown time as we run Ts with an empty, new profile; post Tp3 results in a ‘dirty’ shutdown time as the browser has just completed cycling 10 times through 400 web pages. The post Tp3 results will also show greater variance because we only run Tp3 once per full test cycle (the test does take a good hour or more to complete depending on the platform) and we only have that single value to report, with Ts we rapidly open and close the browser 20 times so we have a data set that we can average to get a more consistent value.
I’m very pleased to get the whole mess put to bed. There were non-threadsafe python libraries (subprocess, I’m looking at you) to deal with, twisted banana errors (I kid you not), and a whole mess of timing issues. You can’t build what you don’t monitor and shutdown is an import part of our user’s experience - hopefully we’ll be able to start to trim down our shutdown time now that we are reporting results from all our active Talos boxes.
[...] Basically, each Talos test assumed that the previous suite had already ended and exited browser successfully. However, sometimes (usually Vista!), we found that closing a healthy browser took longer then expected. This would cause the next Talos suite to fail out because of the lingering process left by the previous talos suite. This fix should greatly reduce intermittent oranges from Talos in mozilla-central, mozilla-1.9.1 and tracemonkey. In the few days since its been enabled, things look much better already! 2) Users care about shutdown times Just like we measure startup time, it feels right to measure shutdown time. It was never measured before, but once the idea came up, this felt like a good thing to measure. There are also some edge cases where users exit-and-quickly-restart firefox, which can become unhappy if the browser process is still slowly closing down. The curious can find more details in Alice’s blogpost here. [...]
Pingback by John O’Duinn’s Soapbox » Talos now measures shutdown times — May 4, 2009 @ 12:39 pm