gerritcp/README.md
2021-10-29 19:38:21 +02:00

124 lines
5.4 KiB
Markdown

# gerritcp
A utility to copy changesets from one gerrit instance to another
## usage
gerritcp -d [dir] -c [config.yaml]
dir specified a directory to keep the git repo in (because it's usually too
large to keep in RAM), config.yaml points to the configuration for upstream,
downstream and workstreams.
## Theory of operation
The tool is built primarily for the use case of downstreaming coreboot into
Chromium OS. If feasible, extension of its scope to other scenarios is welcome.
The tool must:
* copy submitted changes from upstream to reviewable change sets
in downstream;
* mask out changes to a configured set of files;
* keep a configured set of files around that only exists in downstream;
* provide multiple work streams with well-defined semantics so that changes,
to a given subdirectory (originally: util/crossgcc) can be handled
separately from everything else;
* un-abandon change sets in downstream if they would be revived by some
transaction of this tool;
* skip change sets that are marked WIP in downstream;
* keep already downstreamed change sets alone, e.g. if they have been
reordered.
gerritcp keeps a bare git repo for tracking both upstream and downstream.
Its new commits are created by directly adding objects to the git repo so
there's no current checked-out tree that can get out of sync. It works its
way backward in the source branch, collecting information about which work
stream a change belongs to and if it is already present in downstream and
usable. As soon as all work streams have a root change to work from, gerritcp
works forward through the collected changes, creating new change sets and
immediately pushing them to downstream gerrit (marking them unabandoned
as necessary).
A change is "usable" if it is not marked WIP and if it contains metadata
of upstream's change submission process: WIP allows skipping changes, even
the top-of-patchtree change, in case they need to be put aside for a while,
e.g. when waiting for an accompanying fix. Checking for upstream's metadata
ensures that the downstream change in question is a downstreamed change:
There are sometimes changes that have been pushed (and reviewed) downstream
first and then put upstream for submission when ready. These should be
integrated with the history, but they shouldn't derail other upstream changes.
When applying a change to downstream, several things need to be taken care of:
- Since it's possible that changes have been taken out of downstream's patch
train, gerritcp can't simply adopt the toplevel `tree` object of the
upstream commit. Even files can't be adopted 1:1 but need to be merged
because there might be changes to skip.
- Since downstream handles a few files differently, these might have to be
skipped entirely, by dropping the downstream object ids into the upstream-ish
`tree` objects that result from the prior step.
- For work streams, the right parent commit needs to be chosen: It must be
the work stream specific parent (if that one isn't merged already) or
the upstream parent (in other cases) to account for situations in which
commits in the work stream touch files outside its direct responsibility.
Pushes to downstream need to be authenticated and there needs to be enough
access to gerrit to be able to unabandon changes, while pulls from upstream
can be anonymous. Future additions such as watching upstream's event stream
might require authentication on the upstream side as well.
## Caching opportunities
There are fewer caching opportunities than one might think:
Every run to push changes to downstream needs to newly figure out the root
changes to apply work streams to because downstream might have changed:
commits might have been submitted, patches been marked WIP or reordered, ...
However, even though starting from scratch, if not much happened between
two runs, the scan needn't to go deep into upstream's history (and therefore
doesn't have to mess with downstream a whole lot). The root change should be
found pretty soon. For smaller work streams there might be nothing to do at
all (if no commit matching its specification has been collected while going
back until a common point in time has been found)
## Limited work streams
With onlyIfTouching configured, finding the right root change to work from
is slightly more involved than just going back through git history. Once
finding the old commit that needs to be processed (as determined by gerrit
metadata in commit messages and commits lining up in upstream and downstream),
the commit to use as parent needs to be identified.
For this, the next-oldest commit touching any of the `onlyIfTouching`
paths needs to be used determined, which can be done locally (git log $those
$paths on upstream history). The next thing to determine is if that commit
is already submitted downstream. If not, it becomes the parent commit for
the work stream. Otherwise the upstream's parent commit (no matter if it's
touching any of these files or not) becomes the downstream parent change.
## Configuration format
Configuration is stored in a single file in yaml format:
```yaml
sites:
upstream|downstream:
url: string
repo: string
branch: string
authentication: none|cookie
cookieName: string
cookieVal: string
workstreams:
{name}:
onlyIfTouching:
- string # paths
- ...
neverModify:
- string # paths
- ...
```
There's a template [coreboot.org ->
chromiumos configuration](chromium-coreboot.yaml) in this repo.