# gerritcp A utility to copy changesets from one gerrit instance to another ## Theory of operation The tool is built primarily for the use case of downstreaming coreboot into Chromium OS. If feasible, extension of its scope to other scenarios is welcome. The tool must: * copy submitted changes from upstream to reviewable change sets in downstream; * mask out changes to a configured set of files; * keep a configured set of files around that only exists in downstream; * provide multiple work streams with well-defined semantics so that changes, to a given subdirectory (originally: util/crossgcc) can be handled separately from everything else; * un-abandon change sets in downstream if they would be revived by some transaction of this tool; * skip change sets that are marked WIP in downstream; * keep already downstreamed change sets alone, e.g. if they have been reordered. gerritcp keeps a bare git repo for tracking both upstream and downstream. Its new commits are created by directly adding objects to the git repo so there's no current checked-out tree that can get out of sync. It works its way backward in the source branch, collecting information about which work stream a change belongs to and if it is already present in downstream and usable. As soon as all work streams have a root change to work from, gerritcp works forward through the collected changes, creating new change sets and immediately pushing them to downstream gerrit (marking them unabandoned as necessary). A change is "usable" if it is not marked WIP and if it contains metadata of upstream's change submission process: WIP allows skipping changes, even the top-of-patchtree change, in case they need to be put aside for a while, e.g. when waiting for an accompanying fix. Checking for upstream's metadata ensures that the downstream change in question is a downstreamed change: There are sometimes changes that have been pushed (and reviewed) downstream first and then put upstream for submission when ready. These should be integrated with the history, but they shouldn't derail other upstream changes. When applying a change to downstream, several things need to be taken care of: - Since it's possible that changes have been taken out of downstream's patch train, gerritcp can't simply adopt the toplevel `tree` object of the upstream commit. Even files can't be adopted 1:1 but need to be merged because there might be changes to skip. - Since downstream handles a few files differently, these might have to be skipped entirely, by dropping the downstream object ids into the upstream-ish `tree` objects that result from the prior step. - For work streams, the right parent commit needs to be chosen: It must be the work stream specific parent (if that one isn't merged already) or the upstream parent (in other cases) to account for situations in which commits in the work stream touch files outside its direct responsibility. Pushes to downstream need to be authenticated and there needs to be enough access to gerrit to be able to unabandon changes, while pulls from upstream can be anonymous. Future additions such as watching upstream's event stream might require authentication on the upstream side as well. ## Caching opportunities There are fewer caching opportunities than one might think: Every run to push changes to downstream needs to newly figure out the root changes to apply work streams to because downstream might have changed: commits might have been submitted, patches been marked WIP or reordered, ... However, even though starting from scratch, if not much happened between two runs, the scan needn't to go deep into upstream's history (and therefore doesn't have to mess with downstream a whole lot). The root change should be found pretty soon. For smaller work streams there might be nothing to do at all (if no commit matching its specification has been collected while going back until a common point in time has been found) ## Limited work streams With onlyIfTouching configured, finding the right root change to work from is slightly more involved than just going back through git history. Once finding the old commit that needs to be processed (as determined by gerrit metadata in commit messages and commits lining up in upstream and downstream), the commit to use as parent needs to be identified. For this, the next-oldest commit touching any of the `onlyIfTouching` paths needs to be used determined, which can be done locally (git log $those $paths on upstream history). The next thing to determine is if that commit is already submitted downstream. If not, it becomes the parent commit for the work stream. Otherwise the upstream's parent commit (no matter if it's touching any of these files or not) becomes the downstream parent change. ## Configuration format Configuration is stored in a single file in yaml format: ```yaml sites: upstream|downstream: url: string repo: string branch: string authentication: none|cookie cookieName: string cookieVal: string workstreams: {name}: onlyIfTouching: - string # paths - ... neverModify: - string # paths - ... ``` There's a template [coreboot.org -> chromiumos configuration](chromium-coreboot.yaml) in this repo.