happiest unalice ever

May 6, 2011

Addon Performance Testing: Updates and Future Work

Filed under: addons, mozilla, talos — alice @ 4:36 pm

The addons performance testing system has been up and live for a few weeks now.  With so many more eyes then mine on the system I’ve seen a bunch of bug filings - which is awesome.  With each bug fixed the Talos system works better for both addons and for the general Firefox performance testing.

Here’s what’s already fixed and rolled out:

Here’s what’s fixed but waiting on deployment (which will probably happen early next week):

There are still more bug fixes in the works. Next on my list is Bug 648225 - Performance of platform-dependent add-ons is not tested, which will improve the testing system’s simulation of the real world.

In terms of future plans, we are going forward with a second quarter goal of completing an on-demand addon testing service.  Basically, this would allow an addon author to request that their addon be tested at any time, instead of waiting for our weekly tests of the 100 most downloaded addons.  This will gain us greater coverage of more addons along with a means to double or triple check results.  Did your addon perform poorly?  Retest!  Are you suspicious of the results?  Retest!  Did the addon fail to download or install?  Retest!  If you want to follow along the bugs that will lead to this system are:

Once the on-demand system is in place, we’ll be working to introduce a greater variety of tests.  The ts (test startup) was an easy test to begin with, but it can be of limited meaning for a lot of addons.  I’d like us to cover far more of the available Talos tests, concentrating on the tp (test pageload) tests.  Tp is interesting because it uses a set of collected, local web pages (100 culled from the Alexa top 300 list of worldwide top used sites) that are then cycled through ten times.  With a given addon installed and active (for some meaning of ‘active’ which will be different for different addons) this will give a greater idea of how real world page load time is impacted.  As a side benefit of Tp, Talos will monitor the memory footprint and CPU usage during this test.  By comparing an addon run to a no-addon run we’ll be able to observe memory and CPU usage differences.

I believe that we are also going to need to put effort into provided some testing hooks/prefs for Talos to use.  As in Bug 459965 - Add standardized support for first-run pages to install.rdf.  Talos doesn’t react well to first run pages and it would be great to have the means to disable them with a single pref, instead of a customized pref per-addon - especially since the standard use case is that users do not see the first-run page on a regular basis, as they only see it post-installation and never again.  I believe that there are probably some other settings like this that would standardize creating a Talos testing environment, and thus make addon-testing more applicable to the type of bulk testing that Talos does.

Testing addons has been a whole new area of Talos testing, and it has its own unique set of challenges.  I’ve spent most of my time at Mozilla concentrating on automated browser performance testing and the addons world is still quite new to me.  Each addon effects the browser in its own way; while I’ve grown accustomed to standardizing tests across browsers versions, platforms and machines this is definitely a new horizon.  The Talos tests are just one way to look at performance impact, but not the final word.  It may never be appropriate for a given type of addons, but I most definitely want to work with addon developers to try and get the best coverage that we can.

October 13, 2010

Plans for Tp5

Filed under: mozilla, talos — alice @ 4:13 pm

When I was writing the history of Tp tests for the Talos wiki page I realized that it is time to update Tp.

Good to start with some background first.  Tp stands for ‘test pageload’.  It is a basic browser test that cycles though a number of web pages.  The web pages are culled from the Alexa Top 500. The current version of Tp is Tp4.  To create it I grabbed a local copy of as many of the top 500 as I could (some pages always seem to be unreachable). The pages where then narrowed down to the top 100 by following these guidelines:

  • Does the page correctly load from a local copy?  Are all the images/video present?  Does the layout/css still work?  Drop if the page is broken in any way.
  • Is the page a duplicate?  The Alexa Top 500 contains many localized copies of the Google home page and one is enough.  Drop if it is a dupe or there is already a similar page in the set.
  • Is the page ‘interesting’?  Does it contain a font that isn’t covered by another page?  Does it have many images?  Complicated or large layout?  Keep if the page meets any of these criteria.

The question now is if we want to repeat these steps to create Tp5.  The main complaint against Tp4 is that the Alexa Top 500 points to the home pages of sites - say the login screen for Facebook or the basic en-US Google page.  Yet, these are not things that people are actually using each site for.  If we wanted a representative Facebook page it should be someone’s logged in home page; if we wanted a representative Google page it should be a page of search results, or, even better, a Gmail user’s inbox.  So, how do we go about doing this?  How do we choose the best representation of a given site?

Bug 601798 - create tp5 pageset has been filed to track requirements for the new Tp5 test web page set development.  I have some crazy theory that I could post instructions of how to make local copies of web pages and then post a ‘Most Wanted’ list and see if anybody out there is willing to send me content.  Anybody up for that?  Do you want to have your Gmail inbox be the standard for Mozilla testing?  Is that awesome - or terrifying?

October 7, 2010

Talos in Hg! Plus New Version of Standalone Talos

Filed under: mozilla, talos — alice @ 4:25 pm

Bug 556530 - move Talos from cvs into Mercurial has landed and there is much rejoicing. The version of Talos in CVS will no longer be maintained.

As part of the switchover I’ve updated Standalone Talos to V2. The only change in this version is that it is based upon the hg talos repo.

September 27, 2010

Talos Code in Hg (almost)

Filed under: mozilla, talos — alice @ 5:00 pm

The repository has been created, the buildbot patches have been r+ed and we are now so, so close to having a full switch over of Talos from CVS to Mercurial.  Talos has long lagged behind the rest of the Mozilla universe by stubbornly staying with CVS.  The main sticking point was a belief that we would have to install Mercurial on all Talos boxes to get parity with the current CVS set up - as Talos is checked out from CVS per-test run by all Talos boxes.  In the end, I found that I could avoid the requirement of getting Mercurial installed on upwards of 300 Talos boxes running 7 different operating systems.  The new design has Talos downloaded as a ZIP per-run, making the system version control agnostic.

Right now we are only blocked on Releng Downtime - unscheduled for week of Sept 27, 2010 .  This is a mega-downtime bug that has been accumulating blocked bugs for weeks while Release Engineering tries to find a gap in the Firefox 4 beta 7 code freeze schedule to safely close the trees and do landings.  You can play along at home with the code freeze by watching the beta 7 blocking bugs.

For now all Talos patches are being landed in both Mercurial and CVS.  I wish good luck to all Release Engineering and developer folks working hard on beta 7, for the simple selfish reason that I’m very much looking forward to never, ever having to check in a patch to CVS again.

September 24, 2010

Standalone Talos V1.9

Filed under: mozilla, talos — alice @ 3:14 pm

V1.8 was unable to run the twinopen tests due to Bug 589194 - Make twinopen work when XUL is disabled.  The main Talos code was patched but the change wasn’t ported to Standalone Talos.  V1.9 fixes that oversight.

September 2, 2010

Talos Documentation Updates

Filed under: mozilla, talos — alice @ 3:43 pm

I recently put some time into updating the talos documentation. There are now sections describing how numbers are calculated, the history of the tp test along with updates to descriptions of tests, description of talos hardware, where to file bugs and so on.

Any comments and suggestions on what needs addition or further clarification would be great.  I’m mostly going by the questions that are directed at me the most frequently.

May 7, 2009

Standalone Talos V1.5

Filed under: mozilla, talos — alice @ 11:30 am

In this version:

See Standalone Talos documentation for download and installation instructions.

April 30, 2009

Tshutdown Goes Live

Filed under: mozilla, talos — alice @ 4:42 pm

If you’ve been following along with Bug 480413 - design test to monitor browser shut down time you’ll know that I attempted to roll out Tshutdown on the 18th. Unfortunately, there were a couple of bugs that weren’t discovered in staging and I spent 6 or 7 hours on a beautiful Saturday afternoon attempting to fix them on the fly. Not able to make it work I had to back it all out and fume.

With a clearer head I approached the bugs on Monday morning and resolved both of them. Then it was just a question of waiting for an appropriate downtime. Though I’m as excited about 3.5b4 as anyone, it really gets in the way of arranging Talos code changes. Finally, yesterday afternoon we were able to shut down everything and check things in.

I’m glad to say that this landing was successful. There’s a good chance that Tshutdown will clean up the issues in Bug 478603 - intermittent orange on Windows mozilla-central talos Ts and Tdhtml tests (”failed to initialize browser”), as we’ll start to record the longer shutdown times instead of freezing up and reporting orange.

It’ll be a few more days till we have enough data for Tshutdown to look like anything but a scatter graph, but you can check out the reported results at graphs-new.mozilla.org. There’s actually more to Tshutdown than a single test. You’ll see “Tp3 Shutdown”, “Tp3 Nochrome Shutdown”, “Tp3 Fast Shutdown” and “Ts Shutdown”. Basically, we record the shutdown time after running the given test. Post Ts results in a ‘clean’ shutdown time as we run Ts with an empty, new profile; post Tp3 results in a ‘dirty’ shutdown time as the browser has just completed cycling 10 times through 400 web pages. The post Tp3 results will also show greater variance because we only run Tp3 once per full test cycle (the test does take a good hour or more to complete depending on the platform) and we only have that single value to report, with Ts we rapidly open and close the browser 20 times so we have a data set that we can average to get a more consistent value.

I’m very pleased to get the whole mess put to bed.  There were non-threadsafe python libraries (subprocess, I’m looking at you) to deal with, twisted banana errors (I kid you not), and a whole mess of timing issues.  You can’t build what you don’t monitor and shutdown is an import part of our user’s experience - hopefully we’ll be able to start to trim down our shutdown time now that we are reporting results from all our active Talos boxes.

April 23, 2009

Sheriffs Take Notice: We Can Retest Builds With Talos Sendchange

Filed under: mozilla, talos — alice @ 6:01 pm

I’m a little late in posting anything about this, but Bug 468731 - talos testing of builds using sendchange is a big deal. I had initially thought that we wouldn’t be able to make use of Buildbot’s senchange systems due to the weird hacking that Talos does to the Buildbot change object to move around all the pieces of information required by Talos.  Thankfully, catlee wasn’t nearly so pessimistic and found a way to make it work.

Mostly it sounds like a bunch of Buildbot nonsense that shouldn’t really interest anyone outside of the Release Engineering team, but it has huge benefits to sheriffs and developers.  With Talos buildbot now supporting sendchange we can push builds through the Talos testing infrastructure provided only a correctly formatted link to the build in question.  I’ve already used this to force a re-test of a build that had failed; it immediately failed again on different Talos test boxes and proved that the issue was in the build and not in Talos.

If a build is on stage and downloadable we can make Talos test it as many times as we want.  We still don’t have the means to push any build we like through, but retesting pretty much any build on staging can be very useful when trying to narrow down a regression range or figure out if a regression is ‘real’.  If you are in a situation where you would like to retest a build contact the Release Engineering team member on buildduty (identified as nick-buildduty on irc) and they’ll get it going for you.

Thanks, catlee!

April 22, 2009

Towards Switching To New Graph Server Full Time

Filed under: graphs, mozilla, talos — alice @ 4:53 pm

Bug 487329 - Graph server migration tracking is almost fixed and complete. What does this mean for people who use the graph server?

  1. graphs-new.mozilla.org will become graphs.mozilla.org
  2. The current graphs.mozilla.org will become graphs-old.mozilla.org
  3. No new data will be sent to graphs-old.mozilla.org
  4. graphs-old.mozilla.org will remain up so that older data can be viewed/searched

We’ve been seeding the new graph server with data for a month now and I think that the majority of people are already using it as their main means of viewing performance data; for the most part the switch over should be painless.

What may be slightly more controversial is that there is no current plan to migrate any data from the old graph server to the new.  There’s some very good reasons for this:

  • Data in the old graph server was generated by using throttled test slaves, we no longer throttle slaves so the numbers would not be comparable
  • We are close to rolling out a new Tp test page set, we did not test with this new page set before so the numbers in the old graph server would not be comparable
  • Most of the numbers on the old graph server were collected before we rolled out reboot-every-test-slave-post-every-test - the numbers have greater variance and there are large swaths of data that aren’t trustworthy or useful in anyway (basically, long periods of time when the box in question was in serious need of a reboot)

Instead of banging our heads against shifting data from the old, poorly thought out schema to the new, super fast schema we’re going to consider designing a system whereby we can pick up old builds of interest and push them through the current testing harness. That way we save ourselves headaches and get data that is actually comparable to current results.

Once all the dependent bugs filed against graph server migration have been fixed we’ll roll all this out.  I’m hoping that that will be in the next week or two.  Right now I’m more interested if anyone has any strong feelings about what builds are ‘interesting’ enough that we should come up with the means to re-test them with our current test harness.  Any favorites out there?  Top ten?

Belorussian translation, added February 2011 by Martha Ruszkowski.

Older Posts »

Powered by WordPress