Jul. 27th, 2013

escapewindow: escape window (Default)
  • What is gecko.git?
  • What is beagle?
  • Problems encountered / lessons learned: master-only conversion
  • Problems encountered / lessons learned: full branchlist conversion
  • Still to do

  • What is gecko.git?

    gecko.git is a git repo on git.mozilla.org that Mozilla's B2G partners view as the Repository of Record (RoR) for Mozilla B2G gecko code. It's a read-only synced version of a handful of Mozilla's mercurial gecko repos.

    Hal Wine set this repository up, using his vcs-sync repos; his vcs-sync docs are well written and comprehensive for the single-repo variety of job.

    As Hal noted in his RelEngCon talk, this repository has a number of requirements we don't have elsewhere (see the "Challenge Areas (Con't)" section):

    • all changes must be fast-forwardable; no deletes
    • the conversion is not foolproof
    • git email validation is stricter than hg
    • commits already live on hg.mozilla.org, so we have to change hg-git to allow for special cases

    We have identified some upstream processes and behaviors that would cause problems, should they show up in gecko.git. For example, historically our merge day process has involved closing the tip of the downstream branch (e.g. Aurora), and landing the upstream branch (e.g. Central -> Aurora) in, creating a new default head. This process violates the "only fast-forwardable changes" rule above. We were avoiding this issue previously by not converting the aurora/beta/release repos, and only pushing the b2g18* repos, which don't ride trains.

    Since then, we identified hg debugsetparents as a potential solution to make the merge day landings artificially fast-forward; it's unclear if we currently have that implemented. We would need to before we could successfully push aurora/beta/release to gecko.git without violating this fast-forward rule.

    Also, our current release process for Firefox desktop and Firefox Mobile (aka Fennec) involves creating a BUILDn tag and a RELEASE tag. The former generally doesn't change; the latter changes on any respin. Because of this, we use a -f in our hg tag command. Here's a changeset where we move a RELEASE tag for a build 2.

    However, Git doesn't allow for re-tagging like mercurial does. If you push a new tag with the same name as an old tag, any downstream repository owners will need to take action to move their tags. This is problematic on sensitive release repositories that multiple downstream partners rely on.

    I can imagine two longer-term solutions here: one is to continue to severely limit which tags can be pushed to gecko.git; the other is to change how Mozilla's release automation determines which revision to pull. Our current strategy of not pushing aurora/beta/release has delayed this decision.

    As noted in a recent thread (Whoops, I'm bad with git...), it's safest to not make changes to git history if there are enough users downstream to form a lynch mob.


    What is beagle?

    I'm not entirely sure about the code name, but the purpose is an official replacement of Ehsan's github/mozilla/mozilla-central repo with shas that match gecko.git. (Due to differences in the conversion processes and toolchains, the current two repos' shas do not match.)

    As I understand it, this is a developer-oriented repo, with important mercurial repos pulled in as git branches. This allows for easier cross-branch diffing for git-based developers, though landings still happen in hg.

    There are a number of branches tracked by the existing github/mozilla/mozilla-central repo that don't necessarily follow the above rules for gecko.git, which complicates having a single git repo that serves both purposes. Notably, larch, birch, alder, and cypress are project branches, which, if they follow the standard project branch life cycle, will get hard-reset at some point. This violates both the expectation of fast-forwardability as well as no deletes. And, as noted above, it appears as though the movable *_RELEASE tags from the beta+release repos are getting populated here, which would be problematic for gecko.git.

    Also, if the "Whoops..." thread was any indication (as well as the occasional "please purge this revision from hg.m.o" bug that pops up): in this world of many downstream users, we have to become better at not needing to purge revisions or rewrite history. But given a split between a partner-oriented gecko.git and a developer-oriented beagle, we're at least allowed some additional leeway in the latter.

    I tackled the beagle project with an eye to being able to support both gecko.git and beagle.


    Problems encountered / lessons learned: master-only conversion

    I came in with the assumption that this was a fully solved problem, and I would merely be making existing processes more robust, more scalable, and maintainable. RelEng is currently converting and supporting over 300? repos on git.mozilla.org, using early prototype code currently live in production, so this needs to be improved. And certainly, many of the issues were already fixed and upstreamed, and many of the processes were well documented, just not all. I definitely underestimated how much I would have to learn about the process.

    However, I knew that I would be changing the workflow and process (as Hal is fond of saying, his scripts are a proof of concept running in production). I wasn't going to just tweak existing scripts; I wanted to make the entire process automated and config-driven, rather than human-intensive.

    How best to test these changes? The most data we have to test with is in m-c history. So converting m-c from scratch, and verifying that everything looks the way we want afterwards, was the best test for my new process.

    The project was down-scoped to just hg->git conversions. (Previously I had been aiming for a generic, config-driven vcs sync project that could convert hg -> git or git -> hg, or sync hg -> hg or git -> git (e.g., github -> git.m.o) to cover all of our vcs-sync needs.) Once I had a machine to test on, I started my first m-c conversion to verify my script + configs. About a week in, gps blogged about faster hg-git.

    As I noted earlier, I switched over. But that wasn't seamless; Hal was running on a forked hg-git 0.3.2; gps' changes were landed as a part of 0.4.0. Rather than try to backport gps' changes, we thought I should use the latest hg-git (0.4.0), since as far as we knew all of the forked changes had been upstreamed.

    However, as noted in bug 835202 comment 9, my conversion failed using the latest upstream hg-git; our timezone fix had never been upstreamed because of missing unit tests. When I forked hg-git with this patch, I was able to continue and convert everything as expected... except the shas diverged. Which I consider to be a major no-no, a violation of a central goal of this project.

    It turns out that hg-git 0.4.0 deals with this revision, which I fondly refer to as <h<surkov, differently than Hal's 0.3.2-moz2 fork. These two revisions change the second angle bracket '<' into a question mark '?', turning the initial portion of the email address into <h?surkov. The previous behavior was to drop the second angle bracket '<' entirely, as seen in gecko.git's <hsurkov. Both are valid ways to change <h<surkov into a well-formed git email address, but only one of these resulted in the same shas as gecko.git. I backed those commits out of my hg-git fork, and started verifying that my latest 0.4.0-moz2 fork gave us expected behavior. And it did, for a master-branch-only conversion, like we currently have in gecko.git. (Phew!)

    (While it may seem like I was spending an inordinate amount of time converting and re-converting mozilla-central, I was, in fact, testing against the largest data set we had in terms of unexpected, real-world commit data. Shaking out bugs in the conversion process before pushing live avoids massive headaches in the future for developers, partners, and maintainers. If the conversion scripts could robustly handle anything we'd seen so far, we'd have greater confidence about these systems in live use.)


    Problems encountered / lessons learned: full branchlist conversion

    A master-branch-only conversion was good for gecko.git, but I was looking to support beagle as well. Ehsan's repo currently has all of mozilla-central's tags and branches, and not only did he gexport each of those branches, he also ran git filter-branch on them after the cvs prepend, and translated each of the tags. Had I proceeded without doing this work, I could foresee a future where we would be asked to shoehorn these changes onto a live production repo, well after the downstream audience had crossed lynch-mob-levels. Worse still, I wasn't able to push to github without running a git gc aggressive, which would make future conversion of these branches and tags very difficult. I couldn't use gecko.git as the base if I wanted these branches and tags; I couldn't use Ehsan's repo as the base if I wanted the shas to match gecko.git. It wasn't difficult for me to add this added workflow to the script, just compute-time-intensive. So I added it, and started the full conversion at that point.

    This took a bit longer to complete than I predicted, because I had to either use the -f option in git filter-branch and potentially overwrite revisions already created in the map directory, or not halt on failure should one of the git filter-branch runs exit non-zero. Neither of these seemed like comfortable sensitive-production-service decisions to make. (In the end, the -f turned out to be the lesser evil.)

    As a side note: certainly at this point (if not before), a second machine or a fast local disk with enough inodes+space to handle a second parallel conversion would have helped speed up the process immensely, as I could have run both conversion workflows in parallel, rather than serially as I have been (with lots of tar backups and restores to try to save time). I did finally get that second machine last week (yay!), though that was well after most of my time-consuming compute churn was over. (I estimate that had I had this second machine at the start of the project, I would have saved at least a month of calendar time; very likely more.) Still, I'm making good use of it now.

    Unbeknownst to me, this seemingly innocuous commit landed between my successful master-only conversion and the first full-branch conversion attempt that made it through without halting-on-failure. During the cvs prepending step, for some unknown reason, git filter-branch changes "Carlos G." into "Carlos G" sans-period; gecko.git preserves the original name. My theory was that hg-git was dealing with this user's name appropriately, and git filter-branch was munging it. Enter hg strip: I theorized that if I stripped the problem commit before cvs prepending, and then updated and converted via hg-git normally afterwards, we'd be golden. And that's almost true. But not quite.

    The main problem, as I found out after a second full conversion (and having my conversion machine reboot from under me, which only cost me 5 compute days) pass later, is that the hg strip didn't leave default as the tip of its own branch; due to unnamed branches+merges as DVCS allows, I had to hg strip three times to get to a single tip-of-default head. Without that, git filter-branch would move the master branch to cvs-based shas, but the additional forks stayed on the old, non-cvs-based shas. Once I landed this change, (and once I restarted this conversion pass after the machine reboot), we now have a full conversion of mozilla-central, with all mozilla-central branches and tags, with cvs history, with shas that match up with gecko.git. The toolchain, config, and script have been tested against a huge number of historical revisions and held up; this is a good sign for future robustness. A bonus side effect is we can predictably recreate our git history now, which is great, but was never the primary goal.

    I pushed the resulting repo to this test repository on github; please feel free to poke at it (but be forewarned that I reserve the right to reset it in the future). I started populating the repo branch list to include branches that we think make the most sense given the use cases for this repo and all the gotchas described above. During this process, I came across some things that need fixing in the code+configs (for instance, you'll note that the mozilla-central branches+tags aren't there, and the mozilla-beta+mozilla-release release tags. I have the former in my conversion repo, and have a partial plan for the latter). But we're definitely past the long compute time hump.

    (As another side note -- I fixed some of these cvs prepend issues via specific revision hardcodes rather than generic solutions. This isn't ideal; I'd love to write this tool in a way that could apply to any project, anywhere. However, given some of the unique constraints (cvs prepending, matching previous shas that were built with a different process and toolchain), and limited time, I decided hardcodes in this section of code, which in theory should only need to be run once barring some sort of emergency, was the lesser of evils.)


    Still to do

    • From the start, even before we down-scoped the project, I was planning on being able to push a subset of repos, branches, and tags to a number of different locations. My configs for doing that are a bit ugly currently, but they work! So if I'm pulling mozilla-b2g18 into my conversion repo, I can push the master branch of that repo to gecko.git:gecko-18, beagle:b2g18, and create a standalone mozilla-b2g18 repo on github or git.m.o where it's b2g18:master. Since these would be pushed from the same conversion directory, they would be guaranteed to have the same shas.

      And since I've made the effort to make my code modular, config-driven, and well tested, I would have no qualms about running two instances of the script in parallel: one for gecko.git, one for beagle. After painstaking and time-consuming effort to try to guarantee matching shas, and with the same toolchains and code paths, I wouldn't be worried about the two repos diverging. The split would mainly serve to avoid pushing any bad changes to the more sensitive, partner-oriented gecko.git. It's easier to guarantee a change won't be pushed if the change doesn't exist in the conversion repo at all, rather than only relying on push-time checks.

      Currently I'm leaning towards the split (for more stable gecko.git), with the multiple-target logic in the former living in the beagle process.

    • As I was pushing the various repos to my test-beagle github repo, I noticed a few things about tags and branches. The branch and tag configs are not creating the {FIREFOX,FENNEC}_*_{BUILDn,RELEASE} tags, among others, even when I changed the tag_regexes. This is probably because the release tags are all created on relbranches, and outside of mozilla-central, I'm only converting the branches listed in an explicit branch whitelist.

      I imagine I need branch and tag configs for incoming branches+tags (branches and tags designated for conversion); these would need to be a superset of all branches and tags that we would want to push. Limiting this set would help prevent pushing any unwanted branches and tags, which could prove problematic if they either violate the no-deletes or fast-forward-only rules, or if the tags in question move on the mercurial side. A whitelist would work, but by itself would require a lot of manual intervention. I need to add regex or wildcard support. The outgoing branch and tag configs would be similar, but could be a subset of available branches and tags, and continue to involve different configs for different push targets (e.g., the above gecko.git:gecko-18, beagle:b2g18, b2g18:master example).

      Since we'll be adding to this list and modifying it on the fly when this script is in production, a config file validator and a more sane format become less optional and more production best practice.

    • For bug 848908 - prevent repo corruption due to bad pulls for hg-git conversion, I already have three types of repos on local disk: clean staging clones, the conversion repo, and test target repos.

      The conversion repo is the same as gecko.git and Ehsan's repo. The staging clones are clean clones designed to catch any hg.mozilla.org repo corruption from a pull, as noted in in the bug. Basically, it's a matter of minutes to blow away and reclone a clean clone. It's a matter of hours or days if we corrupt the conversion repo.

      I haven't caught any repo corruption yet, but I do know we've seen at least three or more instances of repo corruption in Hal's production conversion repos, each time a multi-hour restoration process, each delaying further changes from making their way downstream. I knew that beagle, with its massively larger matrix of repos, branches, and tags, would be even harder and likely more time consuming to un-corrupt.

      The local disk test target repos are there to give debuggable local repos to push to. I imagine we could set them up in such a way that they would need to pass certain tests before allowing the script to push to a public repo.

      To do here: catch staging clone repo corruption, and make sure the automated response deals with it appropriately; potentially add test target repo validation to avoid pushing bad commits to public repos. I'm not sure these are blockers, but they would go a long way to making the process more robust.

    • For bug 799845 - For git mirrored repositories, please provide status on what the last successfully sync time was, I create a json file with what I think is adequate information about the latest pull/push times, and the configuration involved. It currently looks like this.

      I need to make sure the format works for downstream users, and find a place for this file (and the logs, etc.) to get uploaded to.

    • When bug 892563 - Add a timeout parameter to mozharness' run_command landed, I added output_timeout settings to various commands I felt might hang at some point in the future; they also run through the retry() method, and each successful and unsuccessful run sends email (configurable, of course). I have yet to see timeout/retry code successfully exercised; I'd love some verification before production.

    • Split the patch up for review!


    I hope that's been useful. Feel free to send me feedback / ideas / questions, or to take a look at my work in progress.

    The bug is here.
    The script is here; I went through and added comments and docstrings, to hopefully make it more readable.
    The config is here.
    My test github push repo is here. (This repo may be reset in the future!)
    My previous blog post (hg-git part i) is here.

    February 2017

    S M T W T F S

    Most Popular Tags

    Style Credit

    Expand Cut Tags

    No cut tags
    Page generated Mar. 27th, 2017 08:27 am
    Powered by Dreamwidth Studios