escapewindow | Entries tagged with vcs-sync

This is a long overdue blost post to say that gecko-dev and gecko-projects are fully live and cut over (and renamed, for those who have been looking at the now-defunct links in my previous blog posts).

Gecko-dev is on git.mozilla.org and on github; gecko-projects is on github only until it's clear that we won't need regular branch and repo resets. We have retired the following github repos, which we will probably remove after a period of time:

The mapfiles are still in http://people.mozilla.org/~asasaki/vcs2vcs/, which is not the final location. There's a combined mapfile in here for convenience, but the gecko-mapfile is the canonical mapfile.

I've been working on a db-based mapper app, which, given a git or hg sha, would allow us to query for the corresponding sha. It also allows for downloading the full mapfile for a repo (gecko.git, gecko-dev, gecko-projects) or the combined full mapfile of two or more of those repos. It also allows for downloading the shas inserted since a particular datetime. Screenshots of my development instance are here.

Bear with me; I've got lots on my plate and not enough time. Mapper is in the works, however, and I think this should smooth over some of the rough edges when dealing with both hg and git. Additional eyeballs and/or patience would be appreciated!

My db-based mapper repo is here.
The db-based mapper bug is here.
The permanent upload location bug is here.

(you missed these, a tiny bit, didn't you.)

As I've mentioned here and here, non-fastforward commits are problematic for downstream partners, and are to be strictly avoided. And as I mentioned here, we had identified that our original merge day mechanics introduced non-fastforward commits to our Repositories of Record on a periodic basis.

I was under the impression that this was a Solved Problem, but I noticed a non-fastforward commit in mozilla-release after the mozilla-beta -> mozilla-release merge for Firefox 25.0 (thanks to the vcs-sync emails I'm getting now). I hadn't updated the instructions on the Merge Documentation wiki page, and the merge scripts followed that document.

The scripts are now available in this github repo. One pull request later, plus :lsblakk's fixes from the long Firefox 26 merge day, and we now use hg debugsetparents to "merge" in old heads in a fast-forwardable way. One more step towards our goal.

(Resuming the shorter blog posts about features in the new vcs sync process...)

After I enabled beagle conversions in cron, I started getting some intermittent non-fastforward emails. When I poked around, I noticed that some changes had landed on GECKO90_2011121217_RELBRANCH on mozilla-release, but not mozilla-beta. When syncing mozilla-release, the tip of this branch stayed in place; when syncing mozilla-beta, the tip of this branch was behind several commits, resulting in a non-fastforward push. We'll continue to see problems like this, if we sync branches with shared names, across mercurial repos, which contain different histories, but we're good for now.

I was able to solve this by transplanting some changesets, and I recorded my actions for this and one other branch here.

(This is more a blog post about manual investigation+actions taken after automated emails highlighted the issue, rather than a feature of the new vcs sync process, but the emails themselves were key...)

:bhearsum filed filed bug 924024 (kill relbranches), which should help reduce the frequency of this happening.

(Interrupting the shorter blog posts about features in the new vcs sync process, to talk a bit about discussions I've been having...)

We have three official git repos now:

releases/gecko.git, which is partner-oriented, and our highest priority to keep sane (currently lives on git.m.o only),
integration/gecko-dev, which is developer-oriented, and we want to offer a strong SLA for. It contains all release and inbound branches (currently lives on both github and git.m.o), and
integration/gecko-projects, which contains mercurial repos without strict pre-commit hooks, and are periodically reset; RelEng reserves the right to reset the repo if these cause vcs-sync issues. (Currently on github only, due to the ease of resetting branches (or the entire repo).)

All three share the same SHAs for shared commits.

As for this recent, valid question: "Wow, that's a lot of complexity. Not worth just making a clean once-off break of renaming branches?" (In regards to how we name branches differently in gecko-dev and gecko.git.)

I originally wanted to name the branches after the trains, so gecko.git would be largely hands-off. If they want 1.2 code now, it's in aurora; if they want it after merge day, they'll need to pull from b2g26_v1_2. Easy enough. However, merge day would require careful communication and process timing across languages, companies, and timezones. All work is easier when it's someone else's, but on an objective level, communication and logistics can be more difficult than a technical solution. So now we point the v1.2 branch at trunk, then aurora, then b2g26_v1_2, and our partners can keep pointing at v1.2. And my vcs-sync configs are a little complex as a result of those branch names; a decent trade-off.

(So that's the why of the naming of things. I'm impressed you've read your way through nine of these (I assume). Insomnia?)

(Interrupting the shorter blog posts about features in the new vcs sync process, to talk a bit about discussions I've been having...)

In Firefox desktop browser and Firefox Mobile, the shipped product is a binary that Release Engineering produces. The process of creating these binaries, as well as the venues for distributing them — the Application Update Service (AUS), the ftp server+website, and various Android marketplaces — loom large in our world. While it is also important to support our developers and make it easier for them to write and maintain the products we ship, if we're forced to prioritize between the two, we know the answer.

Shipping product has been, is, and will continue to be Release Engineering's highest priority.

In B2G land, we don't ship a binary; partners build the binaries and ship them on devices. However, their binaries are based off of code that we deliver to them. And the vehicle for that code delivery is gecko.git; gecko.git is effectively our shipping product.

So if we're in a conversation or heated debate, and someone says the equivalent of "I don't care about partners" or "I don't care about gecko.git", I often hear that as "I don't care about Mozilla's capability to ship Firefox OS". It's not conscious, and I'm trying to get better about that. Most likely that isn't what the other party is intending to say. But if a conversation starts that way, most likely the conversation will end up running face-first into The First Law of Release Engineering.

(This is one of several shorter blog posts about features in the new vcs sync process.)

At first blush, pushing all branches and tags from a converted mercurial repo to its git mirror seems like a logical choice; no renaming, no filtering. Except, of course, default -> master. Easy enough: a blanket conversion, with one exception.

However, as noted earlier, we move tags in hg land. Mercurial makes it easy to do, and moving tags is currently part of our official release process. Git, on the other hand, makes it impossible to move tags invisibly; to pick up a moved tag, every downstream user would need to explicitly delete and re-fetch the tag. Otherwise, their tag will continue to point at the old revision. Given the choice between no tags and tags pointing at the wrong revision, I much prefer no tags. We do have a small subset of tags in our partner-oriented gecko.git, though, so in my mind we needed a tag whitelist. With wildcards/regexes, so we wouldn't have to keep updating a static list.

Branches could have remained a bit simpler, but I had several types of git repo to support. For example, mozilla-b2g18: in gecko.git the default branch is gecko-18. In github/mozilla/mozilla-central, it's b2g18. Neither of those repositories have other, non-default branches from mozilla-b2g18. However, with the lack of {FIREFOX,FENNEC}.*{BUILDn,RELEASE} tags on the git side, I wanted to at least support the GECKO_.*RELBRANCH branches from mozilla-beta and mozilla-release. So not only do we need a whitelist here, but also a map, to say that mozilla-aurora is aurora in gecko-dev, but v1.2 in gecko.git. (We were even considering supporting the standalone git repos; mozilla-aurora:default would have been master in releases-mozilla-aurora. We're going to nuke those in bug 847643, however.)

In addition to the above, we want to strictly avoid noise (e.g., unwanted branches+tags, or branches+tags with the wrong names) in the partner-oriented gecko.git. I can control this by strictly limiting on push. However, a single layer of safety like that feels a bit dicey; a loose regex or a bug can push the wrong thing. Also, I'm not currently keeping track of which branches+tags I convert for each mercurial repo, so a loose regex for one mercurial repo's conversion whitelist, coupled with a loose regex for another mercurial repo's push whitelist, can result in pushing the former mercurial repo's unwanted branches+tags during the latter mercurial repo's push. It seems easier to not convert unwanted branches+tags in the first place. Because of this, I'm restricting what we convert at all via strict whitelists, and then again restricting at the push-per-target level, for an additional level of safety.

(Does that make sense? I almost feel like drawing a picture.)

When I look at the config files, they don't seem very elegant, but given the complexity we're trying to encapsulate, I think they're a pretty decent first draft.

(This is one of several shorter blog posts about features in the new vcs sync process.)

We've seen instances of hg corruption before, like

abort: data/dom/network/interfaces/nsIDOMNetworkStats.idl.i@50a5a9362723: unknown parent!
abort: data/mobile/android/base/DoorHangerPopup.java.i@62e6137d125c: no match found!
abort: data/b2g/config/gaia.json.i@327bccdb70e8: no node!
abort: unknown revision '177bf37a49f5'!

People have theorized that at least one of these may be caused by a race condition while pulling from an http head during an ssh push (edit: bug 737865); others seem a bit more random. We deal with this in our TBPL build/test CI by detecting these types of failures, and nuking/recloning the repo.

We also see these in the legacy vcs-sync process. With a single-, non-cvs-prepended-, mozilla-central-based- repo, recovering from hg corruption in the working conversion directory is a manual process that can take multiple hours. I saw this as a non-starter for a repo like beagle or gecko.git, where rebuilding the entire thing from scratch can take over a week.

As I mentioned here, the new process has an intermediate stage_source clone:

hg.mozilla.org -> stage_source clone -> conversion_dir

When we detect corruption in the stage_source clone, we don't have to worry very much; just clobber and reclone. The time to recreate a fresh clone of a single mercurial repo is a matter of a few minutes, not multiple hours. This approach uses more disk, but helps prevent long downtimes.

Previously I had been running hg verify in each stage_source clone before proceeding, but that slowed each pull/convert/push cycle down by ~5 minutes per source mercurial repo (and doesn't always fix the problem), making it a non-viable option.

(This is one of several shorter blog posts about features in the new vcs sync process.)

Focusing on beagle first turned out to be the right call (thanks :joduinn) -- I severely underestimated the time it would take to solve the initial mozilla-central cvs-prepend step in an automated, repeatable fashion, as noted here and here. However, this meant that after all my beagle-specific testing, I had to refactor to support the other vcs-sync processes, and re-test.

One consequence: my ~6 minute estimate ballooned to ~9+ minutes for each conversion job when I changed from a single conversion_dir push to a push-per-source-repo. With each job cron'ed every 5 minutes, a commit could take up to 20-some minutes to show up in git (if it happened right after the previous conversion started).

:nbp had given me a shell script to look at, and :hwine had suggested we use hg incoming to check for any changes before proceeding with the pull/convert/push loop. The latter seemed simple to add, so I did.

The average no-op conversion time dropped from ~400-550 seconds to ~12. This includes rsyncing the updated status json and ~600 log files to an upload server. (That number of logfiles will go down exponentially when we have dated upload dirs, so we don't have to keep so many backups in the same directory.) This is dependent on hg.mozilla.org load, and can spike up to ~40 seconds.

The average conversion time dropped from ~400-550 seconds to a little over a minute, and additional repos don't add much more time.. sometimes ~2 minutes for 4 repos' worth of conversion. I also bumped the cron job frequency up to once per minute, so on average mercurial commits should show up in git within a minute or three. Plus, it's harder to find multiple repos' worth of changes within a single minute, so it keeps the incoming changes down. The longest delays tend to come when we hit hg corruption (uncommon), and even then we're auto-fixing within 8-9 minutes (see a later blog post about this). I still want to get more built-in parallelization support into mozharness, but with these numbers it's a lot less urgent.

One side effect of this: sometimes we skip over repos that need to be synced. For example, if we add a new target to a repo (e.g., a git.m.o repo, when we had previously only been pushing to github), or a new branch or tag regex (there will be a later blog post on these). If the repo in question has a lot of activity, this would populate on the next push. If it's a closed or low-activity repo, that might not happen for days, or weeks, or ever.

I added a --no-check-incoming commandline option, with a corresponding global check_incoming config setting to skip this behavior, and convert/sync everything. Also, I added per-repo check_incoming flags (defaulting to True) for more granularity. This helped in debugging the relbranch issue on mozilla-beta (see later blog post).

(This is the first of several shorter blog posts about features in the new vcs sync process.)

Back when the vcs sync project was first dropped in my lap, I quickly decided the initial implementation was going to push to a local repo. On disk, not on a network server. This has the nice properties of

ruling out any server-permission- or network- related issues,
allowing for development without an active network connection,
speeding up the testing process to a small degree,
and allowing for immediate inspection of the pushed repo's contents.

I named this a test_push, though I'm waffling on that name.

When it became clearer that non-fastforward pushes and deletions would be an issue for downstream partners, we were looking for ways to prevent that.

Ideally we prevent this at the RoR (repository of record), with pre-commit hooks. However, not all of our upstream repos have pre-commit hooks (see github), so this can't be a blanket solution. (I think our single-head hook on our release branches catches a lot of this, but there might be more we can do on hg.m.o).

Next, it would be good to have these denied at the partner repository level (we've done this on gecko.git and gaia.git). This is less ideal than the pre-commit hook, because a deletion or non-fast-forward commit can land upstream, and then the sync process has nowhere to go. Also, this is the last place we can catch this issue -- if this is set incorrectly, or missed, or if others have administrative rights to the repo and unset this, then we're in a bad position. (hg debugsetparents can fix non-fastforward commits. I don't know how to recover from deletions, other than track that revision down somewhere and re-push; luckily deletions look to be difficult to do in mercurial.)

I finally decided to add receive.denyNonFastForwards to the local test_push repo, via the --shared=true option in git init. If the test_push happens before any network pushes, and test_push failure prevents any network pushes for that branch, we get another layer of safety. It's still not as good a solution as preventing that change in the first place, but it's something we can control locally.

It looks like I'm missing receive.denyDeletes from the test_push... I added that to my development branch so we get that check in each test_push soon as well.

live

As of about 7pm PDT last Friday (October 11), gecko-dev (née beagle) and gecko-projects are live. Here's the newsgroup post. Here's gecko-dev on github and git.mozilla.org. Here's gecko-projects (with a README.md on how to use it) on github. The logs, repo_update.json files, and mapfiles are temporarily living here. The bug is here.

Both of these repositories are RelEng-supported, and have SHAs that match gecko.git.

If you use git for your gecko development, please start using these repos and make sure they work for you. AreWeFastYet and some developers have already switched over just fine. If you hit any problems, please let us know.

coming shortly

We need to move the logs, repo_update.json files, and mapfiles to a permanent location. We'd like to tie this in with mapper -- hopefully that app can grow to support most mapfile needs, but I'm not sure what those needs are currently. If you use the mapfile and have special needs around it, could you let me know?
To reduce confusion, we need to retire the earlier, experimental mozilla-central-based github repos:
- https://github.com/mozilla/releases-mozilla-central, which has no CVS history and divergent SHAs from gecko.git. Same with https://github.com/mozilla/releases-mozilla-aurora, https://github.com/mozilla/releases-mozilla-beta, and https://github.com/mozilla/releases-mozilla-release. Lots of people are using these repos, so this will take plenty of communication to migrate those users over to the new supported repos.
- At some point we need to retire https://github.com/mozilla/mozilla-central, which Ehsan has been kind enough to set up and maintain for years now.

what's next

Beagle was definitely the largest piece in RelEng's vcs-sync puzzle, but it's not the final piece.

gecko.git (hg->git): I already have configs and a test repo. Once we're confident that gecko-dev and gecko-projects are solid and solve our developers' needs, we can switch our partner-oriented repo over from the legacy system to being converted by this new production system. This switchover does not require changing any SHAs, and should hopefully be an invisible, seamless cutover.
l10n (hg->git) I already have configs and have tested this locally. The workflow I went with here involves reading the b2g/locales/all-locales and languages_dev.json files on various branches, and building the list of repos-to-sync dynamically that way. Along with the ability to create new repos on git.mozilla.org, this allows us to sync new locale repos on demand, rather than requiring manual Release Engineering + IT intervention every time.
git->git sync support. We have a large number of b2g github repos that need to be populated in git.mozilla.org. If I follow the l10n model, we would dynamically create this list from b2g-manifests, rather than require manual Release Engineering + IT intervention.
git->hg sync support. This is needed for gaia currently.
hg->hg sync support. We've had some, but limited, use of the repos mirrored on bitbucket.com

As we add support for these, we can cut over legacy processes over to the new process; stay tuned for announcements for each of these switchovers.

Also, I plan to write a few shorter blog posts about some of what went into this project.

What is gecko.git?

What is beagle?

Problems encountered / lessons learned: master-only conversion

Problems encountered / lessons learned: full branchlist conversion

Still to do

What is gecko.git?

gecko.git is a git repo on git.mozilla.org that Mozilla's B2G partners view as the Repository of Record (RoR) for Mozilla B2G gecko code. It's a read-only synced version of a handful of Mozilla's mercurial gecko repos.

Hal Wine set this repository up, using his vcs-sync repos; his vcs-sync docs are well written and comprehensive for the single-repo variety of job.

As Hal noted in his RelEngCon talk, this repository has a number of requirements we don't have elsewhere (see the "Challenge Areas (Con't)" section):

all changes must be fast-forwardable; no deletes
the conversion is not foolproof
git email validation is stricter than hg
commits already live on hg.mozilla.org, so we have to change hg-git to allow for special cases

We have identified some upstream processes and behaviors that would cause problems, should they show up in gecko.git. For example, historically our merge day process has involved closing the tip of the downstream branch (e.g. Aurora), and landing the upstream branch (e.g. Central -> Aurora) in, creating a new default head. This process violates the "only fast-forwardable changes" rule above. We were avoiding this issue previously by not converting the aurora/beta/release repos, and only pushing the b2g18* repos, which don't ride trains.

Since then, we identified hg debugsetparents as a potential solution to make the merge day landings artificially fast-forward; it's unclear if we currently have that implemented. We would need to before we could successfully push aurora/beta/release to gecko.git without violating this fast-forward rule.

Also, our current release process for Firefox desktop and Firefox Mobile (aka Fennec) involves creating a BUILDn tag and a RELEASE tag. The former generally doesn't change; the latter changes on any respin. Because of this, we use a -f in our hg tag command. Here's a changeset where we move a RELEASE tag for a build 2.

However, Git doesn't allow for re-tagging like mercurial does. If you push a new tag with the same name as an old tag, any downstream repository owners will need to take action to move their tags. This is problematic on sensitive release repositories that multiple downstream partners rely on.

I can imagine two longer-term solutions here: one is to continue to severely limit which tags can be pushed to gecko.git; the other is to change how Mozilla's release automation determines which revision to pull. Our current strategy of not pushing aurora/beta/release has delayed this decision.

As noted in a recent thread (Whoops, I'm bad with git...), it's safest to not make changes to git history if there are enough users downstream to form a lynch mob.

Top

What is beagle?

I'm not entirely sure about the code name, but the purpose is an official replacement of Ehsan's github/mozilla/mozilla-central repo with shas that match gecko.git. (Due to differences in the conversion processes and toolchains, the current two repos' shas do not match.)

As I understand it, this is a developer-oriented repo, with important mercurial repos pulled in as git branches. This allows for easier cross-branch diffing for git-based developers, though landings still happen in hg.

There are a number of branches tracked by the existing github/mozilla/mozilla-central repo that don't necessarily follow the above rules for gecko.git, which complicates having a single git repo that serves both purposes. Notably, larch, birch, alder, and cypress are project branches, which, if they follow the standard project branch life cycle, will get hard-reset at some point. This violates both the expectation of fast-forwardability as well as no deletes. And, as noted above, it appears as though the movable *_RELEASE tags from the beta+release repos are getting populated here, which would be problematic for gecko.git.

Also, if the "Whoops..." thread was any indication (as well as the occasional "please purge this revision from hg.m.o" bug that pops up): in this world of many downstream users, we have to become better at not needing to purge revisions or rewrite history. But given a split between a partner-oriented gecko.git and a developer-oriented beagle, we're at least allowed some additional leeway in the latter.

I tackled the beagle project with an eye to being able to support both gecko.git and beagle.

Top

Problems encountered / lessons learned: master-only conversion

I came in with the assumption that this was a fully solved problem, and I would merely be making existing processes more robust, more scalable, and maintainable. RelEng is currently converting and supporting over 300? repos on git.mozilla.org, using early prototype code currently live in production, so this needs to be improved. And certainly, many of the issues were already fixed and upstreamed, and many of the processes were well documented, just not all. I definitely underestimated how much I would have to learn about the process.

However, I knew that I would be changing the workflow and process (as Hal is fond of saying, his scripts are a proof of concept running in production). I wasn't going to just tweak existing scripts; I wanted to make the entire process automated and config-driven, rather than human-intensive.

How best to test these changes? The most data we have to test with is in m-c history. So converting m-c from scratch, and verifying that everything looks the way we want afterwards, was the best test for my new process.

The project was down-scoped to just hg->git conversions. (Previously I had been aiming for a generic, config-driven vcs sync project that could convert hg -> git or git -> hg, or sync hg -> hg or git -> git (e.g., github -> git.m.o) to cover all of our vcs-sync needs.) Once I had a machine to test on, I started my first m-c conversion to verify my script + configs. About a week in, gps blogged about faster hg-git.

As I noted earlier, I switched over. But that wasn't seamless; Hal was running on a forked hg-git 0.3.2; gps' changes were landed as a part of 0.4.0. Rather than try to backport gps' changes, we thought I should use the latest hg-git (0.4.0), since as far as we knew all of the forked changes had been upstreamed.

However, as noted in bug 835202 comment 9, my conversion failed using the latest upstream hg-git; our timezone fix had never been upstreamed because of missing unit tests. When I forked hg-git with this patch, I was able to continue and convert everything as expected... except the shas diverged. Which I consider to be a major no-no, a violation of a central goal of this project.

It turns out that hg-git 0.4.0 deals with this revision, which I fondly refer to as <h<surkov, differently than Hal's 0.3.2-moz2 fork. These two revisions change the second angle bracket '<' into a question mark '?', turning the initial portion of the email address into <h?surkov. The previous behavior was to drop the second angle bracket '<' entirely, as seen in gecko.git's <hsurkov. Both are valid ways to change <h<surkov into a well-formed git email address, but only one of these resulted in the same shas as gecko.git. I backed those commits out of my hg-git fork, and started verifying that my latest 0.4.0-moz2 fork gave us expected behavior. And it did, for a master-branch-only conversion, like we currently have in gecko.git. (Phew!)

(While it may seem like I was spending an inordinate amount of time converting and re-converting mozilla-central, I was, in fact, testing against the largest data set we had in terms of unexpected, real-world commit data. Shaking out bugs in the conversion process before pushing live avoids massive headaches in the future for developers, partners, and maintainers. If the conversion scripts could robustly handle anything we'd seen so far, we'd have greater confidence about these systems in live use.)

Top

Problems encountered / lessons learned: full branchlist conversion

A master-branch-only conversion was good for gecko.git, but I was looking to support beagle as well. Ehsan's repo currently has all of mozilla-central's tags and branches, and not only did he gexport each of those branches, he also ran git filter-branch on them after the cvs prepend, and translated each of the tags. Had I proceeded without doing this work, I could foresee a future where we would be asked to shoehorn these changes onto a live production repo, well after the downstream audience had crossed lynch-mob-levels. Worse still, I wasn't able to push to github without running a git gc aggressive, which would make future conversion of these branches and tags very difficult. I couldn't use gecko.git as the base if I wanted these branches and tags; I couldn't use Ehsan's repo as the base if I wanted the shas to match gecko.git. It wasn't difficult for me to add this added workflow to the script, just compute-time-intensive. So I added it, and started the full conversion at that point.

This took a bit longer to complete than I predicted, because I had to either use the -f option in git filter-branch and potentially overwrite revisions already created in the map directory, or not halt on failure should one of the git filter-branch runs exit non-zero. Neither of these seemed like comfortable sensitive-production-service decisions to make. (In the end, the -f turned out to be the lesser evil.)

As a side note: certainly at this point (if not before), a second machine or a fast local disk with enough inodes+space to handle a second parallel conversion would have helped speed up the process immensely, as I could have run both conversion workflows in parallel, rather than serially as I have been (with lots of tar backups and restores to try to save time). I did finally get that second machine last week (yay!), though that was well after most of my time-consuming compute churn was over. (I estimate that had I had this second machine at the start of the project, I would have saved at least a month of calendar time; very likely more.) Still, I'm making good use of it now.

Unbeknownst to me, this seemingly innocuous commit landed between my successful master-only conversion and the first full-branch conversion attempt that made it through without halting-on-failure. During the cvs prepending step, for some unknown reason, git filter-branch changes "Carlos G." into "Carlos G" sans-period; gecko.git preserves the original name. My theory was that hg-git was dealing with this user's name appropriately, and git filter-branch was munging it. Enter hg strip: I theorized that if I stripped the problem commit before cvs prepending, and then updated and converted via hg-git normally afterwards, we'd be golden. And that's almost true. But not quite.

The main problem, as I found out after a second full conversion (and having my conversion machine reboot from under me, which only cost me 5 compute days) pass later, is that the hg strip didn't leave default as the tip of its own branch; due to unnamed branches+merges as DVCS allows, I had to hg strip three times to get to a single tip-of-default head. Without that, git filter-branch would move the master branch to cvs-based shas, but the additional forks stayed on the old, non-cvs-based shas. Once I landed this change, (and once I restarted this conversion pass after the machine reboot), we now have a full conversion of mozilla-central, with all mozilla-central branches and tags, with cvs history, with shas that match up with gecko.git. The toolchain, config, and script have been tested against a huge number of historical revisions and held up; this is a good sign for future robustness. A bonus side effect is we can predictably recreate our git history now, which is great, but was never the primary goal.

I pushed the resulting repo to this test repository on github; please feel free to poke at it (but be forewarned that I reserve the right to reset it in the future). I started populating the repo branch list to include branches that we think make the most sense given the use cases for this repo and all the gotchas described above. During this process, I came across some things that need fixing in the code+configs (for instance, you'll note that the mozilla-central branches+tags aren't there, and the mozilla-beta+mozilla-release release tags. I have the former in my conversion repo, and have a partial plan for the latter). But we're definitely past the long compute time hump.

(As another side note -- I fixed some of these cvs prepend issues via specific revision hardcodes rather than generic solutions. This isn't ideal; I'd love to write this tool in a way that could apply to any project, anywhere. However, given some of the unique constraints (cvs prepending, matching previous shas that were built with a different process and toolchain), and limited time, I decided hardcodes in this section of code, which in theory should only need to be run once barring some sort of emergency, was the lesser of evils.)

Top

Still to do

From the start, even before we down-scoped the project, I was planning on being able to push a subset of repos, branches, and tags to a number of different locations. My configs for doing that are a bit ugly currently, but they work! So if I'm pulling mozilla-b2g18 into my conversion repo, I can push the master branch of that repo to gecko.git:gecko-18, beagle:b2g18, and create a standalone mozilla-b2g18 repo on github or git.m.o where it's b2g18:master. Since these would be pushed from the same conversion directory, they would be guaranteed to have the same shas.

And since I've made the effort to make my code modular, config-driven, and well tested, I would have no qualms about running two instances of the script in parallel: one for gecko.git, one for beagle. After painstaking and time-consuming effort to try to guarantee matching shas, and with the same toolchains and code paths, I wouldn't be worried about the two repos diverging. The split would mainly serve to avoid pushing any bad changes to the more sensitive, partner-oriented gecko.git. It's easier to guarantee a change won't be pushed if the change doesn't exist in the conversion repo at all, rather than only relying on push-time checks.

Currently I'm leaning towards the split (for more stable gecko.git), with the multiple-target logic in the former living in the beagle process.
As I was pushing the various repos to my test-beagle github repo, I noticed a few things about tags and branches. The branch and tag configs are not creating the {FIREFOX,FENNEC}_*_{BUILDn,RELEASE} tags, among others, even when I changed the tag_regexes. This is probably because the release tags are all created on relbranches, and outside of mozilla-central, I'm only converting the branches listed in an explicit branch whitelist.

I imagine I need branch and tag configs for incoming branches+tags (branches and tags designated for conversion); these would need to be a superset of all branches and tags that we would want to push. Limiting this set would help prevent pushing any unwanted branches and tags, which could prove problematic if they either violate the no-deletes or fast-forward-only rules, or if the tags in question move on the mercurial side. A whitelist would work, but by itself would require a lot of manual intervention. I need to add regex or wildcard support. The outgoing branch and tag configs would be similar, but could be a subset of available branches and tags, and continue to involve different configs for different push targets (e.g., the above gecko.git:gecko-18, beagle:b2g18, b2g18:master example).

Since we'll be adding to this list and modifying it on the fly when this script is in production, a config file validator and a more sane format become less optional and more production best practice.
For bug 848908 - prevent repo corruption due to bad pulls for hg-git conversion, I already have three types of repos on local disk: clean staging clones, the conversion repo, and test target repos.

The conversion repo is the same as gecko.git and Ehsan's repo. The staging clones are clean clones designed to catch any hg.mozilla.org repo corruption from a pull, as noted in in the bug. Basically, it's a matter of minutes to blow away and reclone a clean clone. It's a matter of hours or days if we corrupt the conversion repo.

I haven't caught any repo corruption yet, but I do know we've seen at least three or more instances of repo corruption in Hal's production conversion repos, each time a multi-hour restoration process, each delaying further changes from making their way downstream. I knew that beagle, with its massively larger matrix of repos, branches, and tags, would be even harder and likely more time consuming to un-corrupt.

The local disk test target repos are there to give debuggable local repos to push to. I imagine we could set them up in such a way that they would need to pass certain tests before allowing the script to push to a public repo.

To do here: catch staging clone repo corruption, and make sure the automated response deals with it appropriately; potentially add test target repo validation to avoid pushing bad commits to public repos. I'm not sure these are blockers, but they would go a long way to making the process more robust.
For bug 799845 - For git mirrored repositories, please provide status on what the last successfully sync time was, I create a json file with what I think is adequate information about the latest pull/push times, and the configuration involved. It currently looks like this.

I need to make sure the format works for downstream users, and find a place for this file (and the logs, etc.) to get uploaded to.
When bug 892563 - Add a timeout parameter to mozharness' run_command landed, I added output_timeout settings to various commands I felt might hang at some point in the future; they also run through the retry() method, and each successful and unsuccessful run sends email (configurable, of course). I have yet to see timeout/retry code successfully exercised; I'd love some verification before production.
Split the patch up for review!

Top

I hope that's been useful. Feel free to send me feedback / ideas / questions, or to take a look at my work in progress.

The bug is here.
The script is here; I went through and added comments and docstrings, to hopefully make it more readable.
The config is here.
My test github push repo is here. (This repo may be reset in the future!)
My previous blog post (hg-git part i) is here.

I was about a week into my m-c hg-git conversion^* when gps blogged about faster hg-git.

I was about two weeks into the initial conversion when I tried the latest upstream hg-git, with gps' patches. That conversion finished in less than a day, while the initial conversion was still chugging along on a different core. However, the git shas differed from our existing pre-cvs conversion repositories. This was due to different behavior when faced with a bogus email address.

Once I backed out the corresponding email munging commits, and landed Ehsan's timezone fix for a separate issue that halted the conversion process, we're now back to a converted m-c with the same shas.

At this point I killed the still-running initial conversion.

I'm still working on automating the cvs prepending and pulling in other repos/branches, but this is a good sanity-check point.

^* Note, I had pulled both mozilla-central and mozilla-b2g18 into the same working directory as a proof-of-concept workflow test before converting. In hindsight this was a terrible decision, as it more than doubled the estimated conversion time.