Unit tests on Maemo (linux-arm) Fennec are live, viewable on the Mobile tinderbox page. This wouldn't have been possible without the hard work of both Ted and Joel.
Previously, I had six Nokia N810s running Talos performance tests (3 running the Tp3 page load test; the other 3 running all the other tests). I triggered all 6 at the same time, build-driven, which resulted in a lot of data for certain builds and no data at all for any builds in between test runs. I knew this wouldn't scale once I added branches and unit tests.
I now have 17 Nokia N810s running Talos + unit tests (14 production, 3 staging). 14 is more than enough to dedicate each N810 to a specific test suite, but:
With test suites of varying time requirements, single-threading the longer-running tests will result in far fewer runs of those tests (while the shorter test devices may be sitting idle), and
There is no depth. If one or two devices fall over, there are no test runs for that test suite until someone intervenes. Having devices fall over is, unfortunately, a not-uncommon event.
Instead, I image each N810 in the exact same way, so they're each capable of running any and all of the test suites. Once pooled, they're available for any buildbot builder to utilize for pending tests. As long as one N810 remains functional, there will be test data (although much slower throughput than with 14).
If these concepts seem familiar, it's because they're the same as what we're doing for Firefox builds and tests.
We split the unit tests up out of necessity.
A single run of just the Mochitest suite can take 4-6 hours or more on an N810. Adding other test suites to the same builder would delay test results even further.
The N810 reboots itself when out of memory. Joel had to "chunk" up certain test suites to avoid oom errors/reboots, just to get the tests to run. Splitting the long test suites (namely Mochitest) into multiple parallel tasks was a natural progression.
When pooled, running each test suite individually and splitting Mochitest up into four pieces gives us results much faster than long-running single-threaded tasks.
(The Firefox version of split tests is currently shown on the Firefox-Unittest tree.)
(I have to mention that this capability, running just the unit tests without having to build first, is due to Ted's packaged unit test work in Q1. We are all reaping the benefits now.)
I had to write mobile-specific unit test parsers rather than rely on the existing ones.
Since Joel's Maemkit calls run_tests.py multiple times per test suite, the total passed/failed/known numbers at the end of each run are not comprehensive.
In order to have a baseline green, we need to add an additional threshold for tests that are expected to pass on desktop Firefox but failing on Maemo Fennec. This isn't the ideal solution, but it is a relatively fast solution to get the unit tests live and green. We've been discussing different longer-term solutions to this (packaging different tests for different products/platforms? Manifests of known failures per product/platform?)
Since we allow for n number of failed tests per suite, the existence of a "TEST-UNEXPECTED-" string doesn't automatically mean WARNINGS. Instead, I need to
m = re.findall("TEST-UNEXPECTED-", string) and compare
len(m) against n.
Finally, the numbers on the waterfall (passed/failed/known) are massaged to account for n: I subtract n from the failed number and add it to known. And (hopefully) avoid going negative.
This is all a workaround until we figure out how to solve the different numbers of known failures for the disparate products/platforms.
There are still some outstanding issues and wanted features above and beyond the above: static IPs for the N810s. Tracking timeouts. Populating a unit test db like Ted's. Otherwise figuring out what causes the reds/crashes and fixing.
I think things are in a state where I can leave it alone for a bit, though. I foresee a whole lot of Windows Mobile and Windows CE in my near-term future.