escapewindow | brainstorm: splitting mozharness (Reply)

[stating the problem]

Mozharness currently handles a lot of complexity. (It was designed to be able to, but the ideal is still elegantly simple scripts and configs.)

Our production-oriented scripts take (and sometimes expect) config inputs from multiple locations, some of them dynamic; and they contain infrastructure-oriented behavior like clobberer, mock, and tooltool, which don't apply to standalone users.

We want mozharness to be able to handle the complexity of our infrastructure, but make it elegantly simple for the standalone user. These are currently conflicting goals, and automating jobs in infrastructure often wins out over making the scripts user friendly. We've brainstormed some ideas on how to fix this, but first, some more details:

[complex configs]

A lot of the current complexity involves config inputs from many places:

buildbot-configs, through
- environment variables,
- command line options, and
- buildbot properties sent through buildprops.json
mozharness, through
- one or more mozharness config files (e.g. configs/b2g/releng-fota-eng.py),
- the script itself (e.g. b2g_build.py)
in-tree config files (e.g. testing/config/mozharness/linux_config.py or b2g/config/emulator/config.json), and
web-based resources, whether through hgweb or a service like mapper.

We want to lock the running config at the beginning of the script run, but we also don't want to have to clone a repo or make external calls to web resources during __init__(). Our current solution has been to populate runtime configs during one of our script actions, but then to support those runtime configs we have to check multiple config locations for our script logic. (self.buildbot_config, self.test_config, self.config, ...)

We're able to handle this complexity in mozharness, and we end up with a single config dict that we then dump to the log + to a json file on disk, which can then be reused to replicate that job's config. However, this has a negative effect on humans who need to either change something in the running configs, or who want to simplify the config to work locally.

[in-tree vs out-of-tree]

We also want some of mozharness' config and logic to ride the trains, but other portions need to be able to handle outside-of-tree processes and config, for various reasons:

some processes are volatile enough that they need to change across the board across all trees on a frequent basis;
some processes act across multiple trees and revisions, like the bumper scripts and vcs-sync;
some infrastructure-oriented code needs to be able to change across all processes, including historical-revision-based processes; and
some processes have nothing to do with the gecko tree at all.

[brainstorming solutions]

Part of the solution is to move logic out of mozharness. Desktop Firefox builds and repacks moving to mach makes sense, since they're

configurable by separate mozconfigs,
tasks completely shared by developers, and
completely dependent on the tree, so tying them to the tree has no additional downside.

However, Andrew Halberstadt wanted to write the in-tree test harnesses in mozharness, and have mach call the mozharness scripts. This broke some of the above assumptions, until we started thinking along the lines of splitting mozharness: a portion in-tree running the test harnesses, and a portion out-of-tree doing the pre-test-run machine setup.

(I'm leaning towards both splitting mozharness and using helper objects, but am open to other brainstorms at this point...)

[splitting mozharness]

In effect, the wrapper, out-of-tree portion of mozharness would be taking all of the complex inputs, simplifying them for the in-tree portion, and setting up the environment (mock, tooltool, downloads+installs, etc.); the in-tree portion would take a relatively simple config and run the tests.

We could do this by having one mozharness script call another. We'd have to fix the logging bug that causes us to double-log lines when we instantiate a second BaseScript, but that's not an insurmountable problem. We could also try execing the second script, though I'd want to verify how that works on Windows. We could also modify our buildbot ScriptFactory to be able to call two scripts consecutively, after the first script dynamically generates the simplified config for the second script.

We could land the portions of mozharness needed to run test harnesses in-tree, and leave the others out-of-tree. There will be some duplication, especially in the mozharness.base code, but that's changing less than the scripts and mozharness.mozilla modules.

We would be able to present a user-friendly "inner" script with limited inputs that rides the trains, while also allowing for complex inputs and automation-oriented setup beforehand in the "outer" script. We'd most likely still have to allow for automation support in the inner script, if there's some reporting or error checking or other automation task that's needed after the handoff, but we'd still be able to limit the complexity of that inner script. And we could wrap that inner script in a mach command for easy developer use.

[helper objects]

Currently, most of mozharness' logic is encapsulated in self. We do have helper objects: the BaseConfig and the ReadOnlyDict self.config for config; the MultiFileLogger self.log_obj that handles all logging; MercurialVCS for cloning, ADBDeviceHandler and SUTDeviceHandler for mobile device wrangling. But a lot of what we do is handled by mixins inherited by self.

A while back I filed a bug to create a LocalLogger and BaseHelper to enable parallelization in mozharness scripts. Instead of cloning 90 locale repos serially, we could create 10 helper objects that each clone a repo in parallel, and launch new ones as the previous ones finish. This would have simplified Armen's parallel emulator testing code. But even if we're not planning on running parallel processes, creating a helper object allows us to simplify the config and logic in that object, similar to the "inner" script if we split mozharness into in-tree and out-of-tree instances, which could potentially also be instantiated by other non-mozharness scripts.

Essentially, as long as the object has a self.log_obj, it will use that for logging. The LocalLogger would log to memory or disk, outside of the main script log, to avoid parallel log interleaving; we would use this if we were going to run the helper objects in parallel. If we wanted the helper object to stream to the main log, we could set its log_obj to our self.log_obj. Similarly with its config. We could set its config to our self.config, or limit what config we pass to simplify.

(Mozharness' config locking is a feature that promotes easier debugging and predictability, but in practice we often find ourselves trying to get around it somehow. Other config dicts, self.variables, editing self.config in _pre_config_lock() ... Creating helper objects lets us create dynamic config at runtime without violating this central principle, as long as it's logged properly.)

Because this "helper object" solution overlaps considerably with the "splitting mozharness" solution, we could use a combination of the two to great efficacy.

[functions and globals]

This idea completely alters our implementation of mozharness, by moving self.config to a global config, directly calling logging methods (or wrapped logging methods). By making each method a standalone function that's only slightly different from a standard python function, it lowers the bar for contribution or re-use of mozharness code. It does away with both the downsides and benefits of objects.

The first, large downside I see is this solution appears incompatible with the "helper objects" solution. By relying on a global config and logging in our functions, it's difficult to create standalone helpers that use minimized configs or alternate logging configurations. I also think the global logging may make the double-logging bug more prevalent.

It's quite possible I'm downplaying the benefit of importing individual functions like a standard python script. There are decorators to transform functions into class methods and vice versa, which might allow for both standalone functions and object-based methods with the same code.

[related links]

Jordan Lund has some ideas + wip patches linked from bug 753547 comment 6.

Andrew Halberstadt's Sharing code not always a good thing and How to deal with IFFY requirements

My mozharness core principles example scripts+configs and video

Lars Lohn's Crouching Argparse Hidden Configman. Afaict configman appears to solve similar problems to mozharness' BaseConfig, but Argparse requires python 2.7 and mozharness locks the config.