Oct. 22nd, 2013

escapewindow: escape window (Default)

(This is one of several shorter blog posts about features in the new vcs sync process.)

We've seen instances of hg corruption before, like

abort: data/dom/network/interfaces/nsIDOMNetworkStats.idl.i@50a5a9362723: unknown parent!
abort: data/mobile/android/base/DoorHangerPopup.java.i@62e6137d125c: no match found!
abort: data/b2g/config/gaia.json.i@327bccdb70e8: no node!
abort: unknown revision '177bf37a49f5'!

People have theorized that at least one of these may be caused by a race condition while pulling from an http head during an ssh push (edit: bug 737865); others seem a bit more random. We deal with this in our TBPL build/test CI by detecting these types of failures, and nuking/recloning the repo.

We also see these in the legacy vcs-sync process. With a single-, non-cvs-prepended-, mozilla-central-based- repo, recovering from hg corruption in the working conversion directory is a manual process that can take multiple hours. I saw this as a non-starter for a repo like beagle or gecko.git, where rebuilding the entire thing from scratch can take over a week.

As I mentioned here, the new process has an intermediate stage_source clone:

hg.mozilla.org -> stage_source clone -> conversion_dir

When we detect corruption in the stage_source clone, we don't have to worry very much; just clobber and reclone. The time to recreate a fresh clone of a single mercurial repo is a matter of a few minutes, not multiple hours. This approach uses more disk, but helps prevent long downtimes.

Previously I had been running hg verify in each stage_source clone before proceeding, but that slowed each pull/convert/push cycle down by ~5 minutes per source mercurial repo (and doesn't always fix the problem), making it a non-viable option.

escapewindow: escape window (Default)

(This is one of several shorter blog posts about features in the new vcs sync process.)

At first blush, pushing all branches and tags from a converted mercurial repo to its git mirror seems like a logical choice; no renaming, no filtering. Except, of course, default -> master. Easy enough: a blanket conversion, with one exception.

However, as noted earlier, we move tags in hg land. Mercurial makes it easy to do, and moving tags is currently part of our official release process. Git, on the other hand, makes it impossible to move tags invisibly; to pick up a moved tag, every downstream user would need to explicitly delete and re-fetch the tag. Otherwise, their tag will continue to point at the old revision. Given the choice between no tags and tags pointing at the wrong revision, I much prefer no tags. We do have a small subset of tags in our partner-oriented gecko.git, though, so in my mind we needed a tag whitelist. With wildcards/regexes, so we wouldn't have to keep updating a static list.

Branches could have remained a bit simpler, but I had several types of git repo to support. For example, mozilla-b2g18: in gecko.git the default branch is gecko-18. In github/mozilla/mozilla-central, it's b2g18. Neither of those repositories have other, non-default branches from mozilla-b2g18. However, with the lack of {FIREFOX,FENNEC}.*{BUILDn,RELEASE} tags on the git side, I wanted to at least support the GECKO_.*RELBRANCH branches from mozilla-beta and mozilla-release. So not only do we need a whitelist here, but also a map, to say that mozilla-aurora is aurora in gecko-dev, but v1.2 in gecko.git. (We were even considering supporting the standalone git repos; mozilla-aurora:default would have been master in releases-mozilla-aurora. We're going to nuke those in bug 847643, however.)

In addition to the above, we want to strictly avoid noise (e.g., unwanted branches+tags, or branches+tags with the wrong names) in the partner-oriented gecko.git. I can control this by strictly limiting on push. However, a single layer of safety like that feels a bit dicey; a loose regex or a bug can push the wrong thing. Also, I'm not currently keeping track of which branches+tags I convert for each mercurial repo, so a loose regex for one mercurial repo's conversion whitelist, coupled with a loose regex for another mercurial repo's push whitelist, can result in pushing the former mercurial repo's unwanted branches+tags during the latter mercurial repo's push. It seems easier to not convert unwanted branches+tags in the first place. Because of this, I'm restricting what we convert at all via strict whitelists, and then again restricting at the push-per-target level, for an additional level of safety.

(Does that make sense? I almost feel like drawing a picture.)

When I look at the config files, they don't seem very elegant, but given the complexity we're trying to encapsulate, I think they're a pretty decent first draft.

November 2022

S M T W T F S
  12345
67 89101112
13141516171819
20212223242526
27282930   

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Apr. 23rd, 2025 09:54 am
Powered by Dreamwidth Studios