happiest unalice ever

May 6, 2011

Addon Performance Testing: Updates and Future Work

Filed under: addons, mozilla, talos — alice @ 4:36 pm

The addons performance testing system has been up and live for a few weeks now.  With so many more eyes then mine on the system I’ve seen a bunch of bug filings - which is awesome.  With each bug fixed the Talos system works better for both addons and for the general Firefox performance testing.

Here’s what’s already fixed and rolled out:

Here’s what’s fixed but waiting on deployment (which will probably happen early next week):

There are still more bug fixes in the works. Next on my list is Bug 648225 - Performance of platform-dependent add-ons is not tested, which will improve the testing system’s simulation of the real world.

In terms of future plans, we are going forward with a second quarter goal of completing an on-demand addon testing service.  Basically, this would allow an addon author to request that their addon be tested at any time, instead of waiting for our weekly tests of the 100 most downloaded addons.  This will gain us greater coverage of more addons along with a means to double or triple check results.  Did your addon perform poorly?  Retest!  Are you suspicious of the results?  Retest!  Did the addon fail to download or install?  Retest!  If you want to follow along the bugs that will lead to this system are:

Once the on-demand system is in place, we’ll be working to introduce a greater variety of tests.  The ts (test startup) was an easy test to begin with, but it can be of limited meaning for a lot of addons.  I’d like us to cover far more of the available Talos tests, concentrating on the tp (test pageload) tests.  Tp is interesting because it uses a set of collected, local web pages (100 culled from the Alexa top 300 list of worldwide top used sites) that are then cycled through ten times.  With a given addon installed and active (for some meaning of ‘active’ which will be different for different addons) this will give a greater idea of how real world page load time is impacted.  As a side benefit of Tp, Talos will monitor the memory footprint and CPU usage during this test.  By comparing an addon run to a no-addon run we’ll be able to observe memory and CPU usage differences.

I believe that we are also going to need to put effort into provided some testing hooks/prefs for Talos to use.  As in Bug 459965 - Add standardized support for first-run pages to install.rdf.  Talos doesn’t react well to first run pages and it would be great to have the means to disable them with a single pref, instead of a customized pref per-addon - especially since the standard use case is that users do not see the first-run page on a regular basis, as they only see it post-installation and never again.  I believe that there are probably some other settings like this that would standardize creating a Talos testing environment, and thus make addon-testing more applicable to the type of bulk testing that Talos does.

Testing addons has been a whole new area of Talos testing, and it has its own unique set of challenges.  I’ve spent most of my time at Mozilla concentrating on automated browser performance testing and the addons world is still quite new to me.  Each addon effects the browser in its own way; while I’ve grown accustomed to standardizing tests across browsers versions, platforms and machines this is definitely a new horizon.  The Talos tests are just one way to look at performance impact, but not the final word.  It may never be appropriate for a given type of addons, but I most definitely want to work with addon developers to try and get the best coverage that we can.

October 13, 2010

Plans for Tp5

Filed under: mozilla, talos — alice @ 4:13 pm

When I was writing the history of Tp tests for the Talos wiki page I realized that it is time to update Tp.

Good to start with some background first.  Tp stands for ‘test pageload’.  It is a basic browser test that cycles though a number of web pages.  The web pages are culled from the Alexa Top 500. The current version of Tp is Tp4.  To create it I grabbed a local copy of as many of the top 500 as I could (some pages always seem to be unreachable). The pages where then narrowed down to the top 100 by following these guidelines:

  • Does the page correctly load from a local copy?  Are all the images/video present?  Does the layout/css still work?  Drop if the page is broken in any way.
  • Is the page a duplicate?  The Alexa Top 500 contains many localized copies of the Google home page and one is enough.  Drop if it is a dupe or there is already a similar page in the set.
  • Is the page ‘interesting’?  Does it contain a font that isn’t covered by another page?  Does it have many images?  Complicated or large layout?  Keep if the page meets any of these criteria.

The question now is if we want to repeat these steps to create Tp5.  The main complaint against Tp4 is that the Alexa Top 500 points to the home pages of sites - say the login screen for Facebook or the basic en-US Google page.  Yet, these are not things that people are actually using each site for.  If we wanted a representative Facebook page it should be someone’s logged in home page; if we wanted a representative Google page it should be a page of search results, or, even better, a Gmail user’s inbox.  So, how do we go about doing this?  How do we choose the best representation of a given site?

Bug 601798 - create tp5 pageset has been filed to track requirements for the new Tp5 test web page set development.  I have some crazy theory that I could post instructions of how to make local copies of web pages and then post a ‘Most Wanted’ list and see if anybody out there is willing to send me content.  Anybody up for that?  Do you want to have your Gmail inbox be the standard for Mozilla testing?  Is that awesome - or terrifying?

October 7, 2010

Talos in Hg! Plus New Version of Standalone Talos

Filed under: mozilla, talos — alice @ 4:25 pm

Bug 556530 - move Talos from cvs into Mercurial has landed and there is much rejoicing. The version of Talos in CVS will no longer be maintained.

As part of the switchover I’ve updated Standalone Talos to V2. The only change in this version is that it is based upon the hg talos repo.

September 27, 2010

Talos Code in Hg (almost)

Filed under: mozilla, talos — alice @ 5:00 pm

The repository has been created, the buildbot patches have been r+ed and we are now so, so close to having a full switch over of Talos from CVS to Mercurial.  Talos has long lagged behind the rest of the Mozilla universe by stubbornly staying with CVS.  The main sticking point was a belief that we would have to install Mercurial on all Talos boxes to get parity with the current CVS set up - as Talos is checked out from CVS per-test run by all Talos boxes.  In the end, I found that I could avoid the requirement of getting Mercurial installed on upwards of 300 Talos boxes running 7 different operating systems.  The new design has Talos downloaded as a ZIP per-run, making the system version control agnostic.

Right now we are only blocked on Releng Downtime - unscheduled for week of Sept 27, 2010 .  This is a mega-downtime bug that has been accumulating blocked bugs for weeks while Release Engineering tries to find a gap in the Firefox 4 beta 7 code freeze schedule to safely close the trees and do landings.  You can play along at home with the code freeze by watching the beta 7 blocking bugs.

For now all Talos patches are being landed in both Mercurial and CVS.  I wish good luck to all Release Engineering and developer folks working hard on beta 7, for the simple selfish reason that I’m very much looking forward to never, ever having to check in a patch to CVS again.

September 24, 2010

Standalone Talos V1.9

Filed under: mozilla, talos — alice @ 3:14 pm

V1.8 was unable to run the twinopen tests due to Bug 589194 - Make twinopen work when XUL is disabled.  The main Talos code was patched but the change wasn’t ported to Standalone Talos.  V1.9 fixes that oversight.

September 2, 2010

Talos Documentation Updates

Filed under: mozilla, talos — alice @ 3:43 pm

I recently put some time into updating the talos documentation. There are now sections describing how numbers are calculated, the history of the tp test along with updates to descriptions of tests, description of talos hardware, where to file bugs and so on.

Any comments and suggestions on what needs addition or further clarification would be great.  I’m mostly going by the questions that are directed at me the most frequently.

June 18, 2010

Update to Standalone Talos Yet Again (Now 1.7.1)

Filed under: mozilla — alice @ 3:07 pm

I introduced a not-quite-an-error in 1.7 in that the tests were attempting to load test pages through localhost.  This wouldn’t work unless you have a local web server up and working and with the correct document root set.  As Standalone Talos is designed to have minimal requirements I’ve updated it to use file:// instead of http://.  Now it will run without any webservice.

Again, if you use Standalone Talos for local testing please update to latest.

June 11, 2010

Standalone Talos V1.7

Filed under: mozilla — alice @ 10:59 am

I’ve updated Standalone Talos to version 1.7 - available here.  It’s been quite a while since a standalone talos update so there are several changes:

If you use Standalone talos for local testing please download and update to latest.

June 3, 2010

Universal Manifest For Unit Tests: A Proposal

Filed under: mozilla — alice @ 11:10 am

I’ve come up with a proposed universal manifest format for unit tests based upon the many good comments and suggestions that I received after my last two blog posts. The simplest case is still very simple and the more complicated cases are now easier to read and understand. I’ve put together a list of examples from the current manifests and then converted them to the new manifest. We are still not totally finalized here, but I think that this is a good working format that can be refined.

I’m still working from the stance that script doesn’t belong in manifest files so the format handles && and || cases without code. With the help of my auto-tools chums we searched the code base for difficult examples to ensure that this was still flexible enough to handle the expressions that we currently support.

Also, having worked on the conversions from old to new I have to say that it was really easy to get this going - admittedly, I’d done some reading on JSON and was immersed in the problem set - but I think that it is a good indication that this could make reading and writing manifests a lot nicer.

I’m once again happy to get any comments on the current proposal. If things look good it will be time to get everyone together for a Brown Bag to hash this out and finalize. I’ll get that in the works once I see if I have to take this back to the drawing board.

  1. simple case
  2. old
     == 399209-1.html 399209-1-ref.html
     == 399209-2.html 399209-2-ref.html
     == 399384-1.html 399384-1-ref.html
    proposed
    {   "tests" :
        [
            {  "==" : ['399209-1.html', '399209-1-ref.html']  },
            {  "==" : ['399209-2.html', '399209-2-ref.html']  },
            {  "==" : ['399384-1.html', '399384-1-ref.html']  }
        ]
    }
  3. ref test with two fails-if clauses and one random-if clause
  4. old
    fails-if(MOZ_WIDGET_TOOLKIT=="windows") fails-if(MOZ_WIDGET_TOOLKIT=="cocoa") random-if(MOZ_WIDGET_TOOLKIT!="cocoa"&&MOZ_WIDGET_TOOLKIT!="windows") != 399636-quirks-html.html 399636-quirks-ref.html # windows failure bug 429017, mac failure bug 429019
    proposed
    {   "tests" :
        [
            {  "!=" : ['399636-quirks-html.html', '399636-quirks-ref.html'],
               "fails" : { 'if' : ['windows']},
               "fails" : { 'if' : ['cocoa']},
               "random : { 'if' : ['!cocoa', '!windows']}  }
        ]
    }
  5. ref test with two random-if clauses
  6. old
    random-if(MOZ_WIDGET_TOOLKIT=="gtk2") random-if(MOZ_WIDGET_TOOLKIT=="cocoa") == mirroring-02.html mirroring-02-ref.html
    proposed
    {   "tests" :
        [
            {   "==" : ['mirroring-02.html', 'mirroring-02-ref.html'],
                "random" : { 'if' : ['gtk2']},
                "random" : { 'if' : ['cocoa']} }
        ]
    }
  7. ref test with one fails-if clause and one skip-if clause
  8. old
    fails-if(!haveTestPlugin) skip-if(!prefs.getBoolPref("dom.ipc.plugins.enabled")) == pluginproblemui-direction-1.html pluginproblemui-direction-ref.html
    proposed
    {   "tests" :
        [
            {   "==" : ['pluginproblemui-direction-1.html', 'pluginproblemui-direction-ref.html'],
                "fails" : { 'if' : ['!haveTestPlugin']},
                "skip" : { 'prefs' : ["dom.ipc.plugins.enabled" : "False"] }  }
        ]
    }
  9. js test with one fails-if clause and one skip-if clause
  10. old
    fails-if(!xulRuntime.shell&&!isDebugBuild) skip-if(!xulRuntime.shell&&isDebugBuild) script regress-455464-04.js # bug xxx - hangs reftests in debug, ### bug xxx - NS_ERROR_DOM_NOT_SUPPORTED_ERR in opt
    proposed
    {   "tests" :
        [
            {  "script"  : "regress-455464-04.js",
               "fails" : { 'if' : ['!shell', '!isDebugBuild']},
               "skip" : { 'if' : ['!shell', 'isDebugBuild']}  }
        ]
    }
  11. js test with one random-if clause and one asserts-if(count) clause
  12. old
    random-if(!xulRuntime.shell&&xulRuntime.OS=="WINNT") asserts-if(!xulRuntime.shell&&xulRuntime.OS=="WINNT",1) script regress-344804.js # bug 524732
    proposed
    {   "tests" :
        [
            {  "script" :  'regress-344804.js',
               "random" : { 'if' : ['!shell', 'WINNT']},
               "asserts"  : { 'if' :  ['!shell, 'WINNT'], "count" : 1} }
        ]
    }
  13. js test with one random-if clause, one fails-if clause and one skip-if clause
  14. old
    fails-if(xulRuntime.OS=="WINNT") random-if(xulRuntime.OS=="Linux"&&!XPCOMABI.match(/x86_64/)) skip-if(xulRuntime.OS=="Linux"&&XPCOMABI.match(/x86_64/)) script regress-3649-n.js # No test results on windows, sometimes no test results on 32 bit linux, hangs os/consumes ram/swap on 64bit linux.
    proposed
    {   "tests" :
        [
            {  "script" :  'regress-3649-n.js'
               "fails" : { 'if' : ['WINNT']},
               "random" : { 'if' : ['Linux', '!x86_64']},
               "skip" : { 'if' : ['Linux', 'x86_64']}  }
        ]
    }
  15. ref test with one asserts(min,max) clause
  16. old
    asserts(0-1) == background-draw-nothing-malformed-images.html background-draw-nothing-ref.html
    proposed
    {   "tests" :
        [
            {  "==" : ['background-draw-nothing-malformed-images.html', 'background-draw-nothing-ref.html'],
               "asserts" : {'min' : 0, 'max' : 1}
        ]
    }
  17. ref test with one asserts-if(count) clause
  18. old
    asserts-if(MOZ_WIDGET_TOOLKIT=="gtk2",1) == 355548-3.xml 355548-3-ref.xml # bug 456899
    proposed
    {   "tests" :
        [
            {  "==" : [' 355548-3.xml', '355548-3-ref.xml],
               "asserts" : { 'if' : ['gtk2'], 'count' : 1}  }
        ]
    }
  19. load test with one asserts-if(count) and one asserts(count) clause
  20. old
    asserts(10) asserts-if(MOZ_WIDGET_TOOLKIT=="windows",8) load 265986-1.html
    proposed
    {   "tests" :
        [
            {  "load" : '265986-1.html',
               "asserts" : { 'if' : ['windows'], 'count' : 8},
               "asserts" : { 'count' : 10}  }
        ]
    }

May 14, 2010

Universal Manifest Format For Unit Tests (part 2)

Filed under: mozilla — alice @ 4:14 pm

There was a lot of good feedback from the post introducing the concept of universal manifest formats for all unit tests.  I want to continue the discussion based upon those questions/comments.

Why aren’t we just going to take one of our existing manifest formats and roll it out to the rest of the tests?  Why re-invent the wheel?
The most suggestions were for adopting the reftest manifest format across the board.  While the reftest manifests are excellent at what they are designed for we are hitting the point where they can’t handle the growth in terms of 32 vs. 64 bit tests, qt vs. gtk vs. android (which all end up in the same ‘linux’ bucket), shell tests vs. full browser, etc.  Here’s some of the worst offenders that are already in use:

./js/src/tests/js1_5/extensions/jstests.list:
random-if(xulRuntime.OS=="WINNT"&&!isDebugBuild) skip-if(xulRuntime.OS=="Linux") script regress-342960.js # slow
./js/src/tests/js1_5/Regress/jstests.list:
fails-if(xulRuntime.OS=="WINNT") random-if(xulRuntime.OS=="Linux"&&!XPCOMABI.match(/x86_64/)) skip-if(xulRuntime.OS=="Linux"&&XPCOMABI.match(/x86_64/)) script regress-3649-n.js # No test results on windows, sometimes no test results on 32 bit linux, hangs os/consumes ram/swap on 64bit linux.
./js/src/tests/js1_6/extensions/jstests.list:
fails-if(!xulRuntime.shell&&!isDebugBuild) skip-if(!xulRuntime.shell&&isDebugBuild) script regress-455464-04.js # bug xxx - hangs reftests in debug, ### bug xxx - NS_ERROR_DOM_NOT_SUPPORTED_ERR in opt
./layout/reftests/bugs/reftest.list:
fails-if(MOZ_WIDGET_TOOLKIT=="windows") fails-if(MOZ_WIDGET_TOOLKIT=="cocoa") random-if(MOZ_WIDGET_TOOLKIT!="cocoa"&&MOZ_WIDGET_TOOLKIT!="windows") != 399636-quirks-html.html 399636-quirks-ref.html # windows failure bug 429017, mac failure bug 429019
./modules/plugin/test/reftest/reftest.list:
fails-if(!haveTestPlugin) skip-if(!prefs.getBoolPref("dom.ipc.plugins.enabled")) == pluginproblemui-direction-1.html pluginproblemui-direction-ref.html

It’s all pretty unreadable, and things are just going to get worse as more mobile platforms come online and we have to handle exceptions for each one.

Adding tests is currently easy and I don’t want to lose that.
Totally in agreement here! There may end up being a few more keystrokes, but overall things really aren’t changing much for the simplest case. Something like this:
#old syntax
== file1.html file1-ref.html
#new syntax
{"type" : "=="
"files" : ["file1.html", "file1-ref.html"]
}

What’s the benefit here?

  • Greater granularity
  • Make it possible to target specific sections of code for test runs.  Specialized manifests would be easy to create around an area of interest. For instance, if I make a manifest that includes all content tests, then I can do something like:
    > python runtests.py --manfiest=mycontentmanfiest.json
    and it can run only the tests that I want based on my patch.

  • Better benchmarks
  • Enable us to easily benchmark things - run all known [orange] tests, run specific performance intensive tests several times for profiling diagnostic, etc.

  • Finer filtering
  • Want to enable a test on maemo only? Want to never run a test on android? We want to make this simpler and easier without having to do a bunch of inline javascript hacking.

Why are you picking on reftests?
We chose refest to be used as a proof of concept.  We wanted to roll the new manifest format out to reftest first to ensure that everything worked the way that we wanted (kick the tires a bit) and then slowly convert over our other test harnesses.

What about inline js calls like for isDebugBuild or checks against MOZ_WIDGET_TOOLKIT?
We no longer want to support js code in manifest files. We believe that that sort of code should live outside the manifest. Support would be built into the manifest for the states that these calls check (ie, lists of available platforms to test against, lists of build types, native theme types, etc). This way every test would use the same check, and if the check needs to be updated to support a new platform/arch/build/theme/etc it can be done in a central location. This will come in handy for the mobile platforms which would all currently end up identified as ‘linux’.

Where’s the new format?
This is probably the biggest question.  Truth is, we aren’t finalized here yet.  We are pretty sure that json is the way to go for the flexibility that we want - but we are still very open to suggestions.  Right now this is what we are considering:

{   "tests" :
    [
        {    "type": "==",
             "files": ["file1.html", file2.html"],
             "skip-if": ["maemo", "android"],
             "prefs": {
                          pref1 : bool1
                          pref2 : bool2
                      },
        },
        {    "type" : "js",
             "files" : ["jsfile.html"],
             "random-if" : ["winnt"],
             "skip-if": ["isdebug"],
        }
        {    "group" : "withTestplugin",
             "plugins": ["plugin1.xpi"],
             "prefs": {
                          enablePlugin : True
                      }
             "tests" :
             [
                 {    "type" : "!=",
                      "files" : ["file1.html", "file6.html"],
                 }
             ]
         }
    ]
}

This format would allow you to easily set prefs either per-test or per-group of tests.  There is also a means to install/enable plugins and then group tests together that use that plugin.  A lot has been borrowed from reftest manifest (skip-if, random-if) as that is stuff that works - we want to take concepts that work and keep them around.  What do you like about it?  What don’t you like?  What would you like included as an option?  Would you prefer xml?  yaml?  We are in very early stages here and the end result should be a manifest format that everyone is happy with.

Older Posts »

Powered by WordPress