How to get a branch on which a commit is? - libgit2

Is there a way of getting the info about branch (using libgit2 library), ideally git_reference* from git_commit* ?
What I'm trying to do is basically a getting info about a branch from a commit.
Thank you

No, there is no way to do this. Git commits are not "on" a branch, instead a branch is a pointer to a commit. So the data storage goes the opposite direction. As a result, many branches may point to the same commit, so there would be no way to get a single branch from a commit.
You could find all branches that contain a given commit, but due to the rather vague nature of this, it's not functionality that libgit2 provides. You could certainly implement it yourself using libgit2, by revwalking the branches that you're interested in to see if the given commit is found. But it will be very disappointing, performance-wise.

Related

Entity Framework Migrations - Managing In Branches

I've been using Entity Framework code first migrations for a while now on a brand new project, and they've been working fine so far on a the main trunk version of the application.
However, we're now at a point in the project where branches are being created as we have multiple workstreams going on. As part of the last lot of work, we realised that using migrations across branches can be problematic - so my question is what have people found the best way of managing this?
For example (I've obviously simplified these for the sake of discussion):
Branch A:
Developer 1 adds an 'Add-UserDateCreated' migration which adds a field to the User entity. The migration file contains the code to add the field, and has the model state at the time.
Branch B:
Developer 2 adds an 'Add-UserMiddleName' migration which adds another field to the User entity. The migration file contains the code to add the field, and the has the model state at the time (but it obviously DOESN'T have the field added in the other migration).
These migrations work fine on their branches, but when you merge them back into the trunk, you get stuck:
You can't just keep the individual migration files as the stored model state won't be correct. For example, the 'Add-UserMiddleName' migration SHOULD have the state of the model with the 'Add-UserDateCreated' field added - but it won't.
You can't merge the migrations into one file as you hit the same issue*
Which means the only way to truely avoid these issues is to work on a different copy of the database for each branch, ignore the migrations when you do the merge into the trunk, and add a one shot master migration when the merge is complete (on the trunk version of the database) - but then you potentially end up with lots of changes in one migration which is never a good idea (and also lose any custom code you've written in your migration class).
So how do other people manage these kind of situations? I'd be curious to know other peoples' opinions.
*I'm curious what issues this would actually cause in the real world, if anyone knows?
When you are working on a branch and require a migration, you always need to consider the state of the other (active) branches. For example, if there are migrations on trunk that you do not have on the branch, you should merge them to the branch before creating your new migration. Once you have created the migration on the branch, it is probably a good idea to merge it back to trunk.
So in your example, developer A and B needs to communicate with each other. Developer B, realizing that a migration is needed, should check if other migrations have been done and merge them to her branch before creating the "middle user name" migration.
I find that it is useful to think of migrations as a dead-lock for the entire team (if that makes sense to you.) New migrations or plans to create them should be mentioned in daily stand ups. Sometimes, when things are hectic, it might be useful to keep a list of all migrations on a white board for all team members to see.
I also find that it is crucial that the migrations are small commits, making them easily portable between branches.
It should also be noted that it helps to use a version control system that is good with branching, merging and cherry-picking, like Git, for example.

Relational database backend for mercurial or git

What I like about fossil is that it uses plain old sqlite to store changesets, files, etc. I can use its command line tool to query the repository, but if I want something not supported by it, I can fallback to writing an sql query.
Mercurial and git are more mature, they have more libraries, more momentum, but they use their own repository format. I wonder if it's possible to have sqlite as their repository backend. (I know there are tools to query a mercurial or git repo directly, but sql seems easier.)
As Jefromi writes, Mercurial is also using a custom format to achive high compression and fast access to any revision. This is the revlog format which is an append-only data structure that takes advantage of the immutability of changesets in Mercurial.
However, it is of course possible to replace this storage format with another if you like. Google did this when they put Mercurial on Bigtable for code.google.com. One funny consequence of them using their own backend format is that you don't see any revision numbers in their web interface. In normal Mercurial, the revision numbers (the local-only integer you can use instead of the full changeset hash) are the index of the changesets in the revlog. When changesets are not stored in revlogs, there is no natural index and therefore Google shows you no revision numbers.
With git, the repository format is a pretty fundamental part of the way everything works. You'd have to do a lot of work to change that.
I haven't read any of mercurial's source, but I imagine the situation isn't much different.
As I suggested in my comment, I'm not really sure why you'd want to do this. For git to still be able to have all of its advantages, you'd have to store git objects in your sqlite database. You'd still need all of the low-level git tools to access and manipulate them - you're not going to be just looking up blobs and trees by their SHA1s and doing all the rest of the work yourself. (And even if for some reason you wanted to, you could do that just as easily by looking in the git objects directory.)
My suggestion would be that, if you find that there are operations you want to perform in git that are unsupported, you familiarize yourself with some of the plumbing commands and figure out how to write them as scripts. Git really does expose pretty much the lowest level of operations you could want.
P.S. If you should find a specific unsupported operation you want to do, and are having trouble finding the plumbing you need to perform it, or with the scripting necessary to implement it, post a question here! No reason to get stuck just because you can't use sql.
It's possible with libgit2 backends :
https://github.com/libgit2/libgit2-backends/blob/master/sqlite/sqlite.c
I haven't done any measurement, but performance should suffer a bit. However, it's also more convenient (single file for the entire repo history, classic SQL query language..etc..)
Speaking for Git, you cannot use different backend with the official binaries. However, the libgit2 project allows you to use different backends to store the database. However, you'll have to build all the binaries you wish to use for committing, merging, pushing, pulling, rebasing, etc. Also, you won't be able to modify your repository with the official binaries. You'll have to push it to a standard repo first.

What is a branch specification in Perforce?

it seems to me that keeping the "branch" object in Perforce may not be entirely necessary after an integration has been submitted.
I.e. the "real" branch, is actually a folder path, and even if you delete the branch object that was created to perform the integ, the folder path is still valid, and all the files in this path are still there (with versionning restarted from #1, etc...).
What puzzled me is that when I try to edit a branch (object) name, it instead creates a new branch which is the copy of the previous one with a new name. But if I delete the previous one, no harm seems to have been done (at first glance).
Is a branch object in Perforce in reality just a convenient mechanism for the tool that can be destroyed and recreated at will, as long as the mapping it describes is kept identical ?
Thomas
By "branch object", I assume you mean "branch specification"? Branch specifications are what you create on the tab labeled "Branches" in P4V. Yes, these are just a convenience and in no way effect the actual branched files. You can delete them and the actual branch they describe will not be touched.
A branch specification is not necessary to perform branching and integration operations. That can all be performed via the Integrate... item that is available on the context menu of files and folders in the Depot Tree. The branch specifications allows you to do that more easily by setting up mappings between trunk and branch. We typically don't use them because our branch specs would typically consist of something like this:
//depot/foo/dev/... //depot/foo/v1.5/...
Creating a branch spec for something this simple doesn't really save us any time. It's when the mapping between the trunk files and the branch gets more complicated that branch specifications prove to be useful.
Branch specifications are entirely separate from actual branch/integration operations, however branch specifications allow more complicated integrations than are easy to do with straight paths. (e.g. gathering multiple paths together, or re-arranging a tree).
I'm not sure if there's anything that couldn't be expressed as a sequence of integrations of filepaths?
The advantage of keeping a complicated branch specification around is that it makes it easier to perform incremental integrations.

Incremental linearizing of git DAG

I'm the author of GitX. One of the features GitX has is the visualization of branches, as can be seen here.
This visualization is currently done by reading commits which are emitted from git in the correct order. For each commit the parents are known, so it's fairly easy to build up the lanes in the correct way.
I'd like to speed up this process by using my own commit pool and linearizing the commits myself. This allows me to reuse existing loaded commits and allows git to emit commits faster because it doesn't have to emit them in the correct order.
However, I'm not sure what algorithm to use to accomplish this. It is important that the building is incremental, as the loading of commits can take a long time (>5 seconds for 100,000 commits, which should all be displayed).
Gitk has gone the same way, and there's a patch here that shows how it is implemented, but my TCL skills are weak and the patch isn't very thoroughly commented and a bit hard to follow.
I'd also like this algorithm to be efficient, as it'll have to handle hundreds of thousands of commits. It also has to be displayed in a table, so it's important that access to specific rows is fast.
I'll describe the input I have so far, the output that I want and a few observations.
Input:
I have a current pool of commits in the form of a hash table that maps commit ids to commit objects. This pool does not have to be complete (have all commits necessary)
I have a separate thread loading in new commits from git, with a callback that can be called every time a new commit is loaded. There is no guaranteed order in which the commits come in, but in most of the cases the next commit is a parent of the previous commit.
A commit object has its own revision id and the revision ids of all its parents
I have a list of branch heads that should be listed. That is, there isn't a single 'top' of the DAG that should be displayed. There also does not have to be a single graph root.
Output:
I'll need to linearize these commits in topological order. That is, a commit cannot be listed after its parents have been listed.
I also need the 'branch lines' that can be seen in the screenshot above. These probably need to be precomputed as most of them depend on their children.
A few remarks:
It's necessary to relocate a list of commits. For example, we might have to commits (branch heads) that are unrelated, until a commit shows up which makes one head an ancestor of the other.
Multiple branch tips must be shown
It's important that this process is incremental, so that at least a partial view is available while the data is still loading. This means that new data has to be inserted halfway and that the branch lines have to be readjusted.
The standard topological sort is O(n) (OK, O(V+E)), i.e. you should be able to sort a million commits in memory in a fraction of a second. No incremental hack like those in Tcl is needed.
BTW, I use GitX (looks much better than Gitk on OS X) everyday and don't have any issue with it (maybe because I don't have those crazy merges in my repositories) :)
OK, so I'm having a similarly hard time reading the entirety of that patch, but let's see if I can piece it together from what I did figure out.
To start with, gitk simplifies things by condensing a string of commits into an arc, containing a series of commits that each only have one parent and one child. Aside from anything else, doing this should cut down pretty dramatically on the number of nodes you have to consider for your sort, which will help out any algorithm you use. As a bonus, related commits will end up grouped together.
This does introduce some complexity in terms of finding an arc when you read a new commit. There are a few situations:
The new commit has a single parent, or no parents. It extends a (possibly empty) arc. Most of the time, you'll just extend the most recent arc. There are a few interesting subcases:
It may cause an existing arc to be split, if its parent already has a child (i.e. its parent turns out to be a branch point, which I gather you don't know ahead of time).
It could be a "missing link" that connects two arcs together.
You may already know that this commit has multiple children
The new commit has multiple parents (a merge commit).
You may want to include the multi-child or multi-parent commits in arcs, or it may make more sense to keep them separate. Either way, it shouldn't be too difficult to build up this set of arcs incrementally.
Once you have these arcs, you're still left with trying to linearize them. In your case, the first algorithm described on the aforementioned Wikipedia page sounds useful, as you have a known set of branch points to use as your initial set S.
Other notes:
Relocating commits should be manageable. First of all, you only have to care when you connect two arcs, either through a new merge commit, a newly-discovered branch point, or combining two arcs into one. Any given arc can easily maintain its current row number range (assuming you're fine with putting an arc on sequential rows), so traversing up the tree checking that all new ancestors show up later should be pretty quick.
I don't know enough to say much about drawing the graph lines, but I imagine it won't be too different from what you do now.
Anyway, I hope that helps. It was interesting to think about, at least.
Do you really need to display 100k commits at once? What kind of user can soak up that kind of info?
Have you thought about paging? I.e just compute for ~100 commits or something. If a branch-line goes way back (off-page), you could use something like Github's back-pointing arrow to show that.
I haven't used GitX, so maybe I'm missing something, but it seems like you could walk back from child to parent(s) from the head of each current branch until you can draw a few screens of the graph.
That might not give you the optimal visual layout of branches that are rooted earlier. But it seems like responsiveness would be more important than waiting to draw a graph with the fewest crossings, since most users are likely to be interested in recent activity.

Migrate clearcase to perforce

I have a large quantity of clearcase data which needs to be migrated into perforce. The revisions span the better part of a decade and I need to preserve as much branch and tag information as possible. Additionally we make extensive use of symbolic links, supported in clearcase but not in perforce. What advice or tools can you suggest which might make this easier?
The first step is to decide if you need to migrate everything, or just certain key versions. If you only migrate the important versions (releases and major milestones) you'll end up with a much simpler history in Perforce, without losing anything important. Then ClearCase can be keep as a historical archive in case it is ever needed. (Unless IBM has changed things ClearCase licenses do not expire when maintainance runs out, you just lose the right to new upgrades and patches and acces to support)
Keep in mind that Perforce does not version control directories and does not keep a full per-element version tree - this means a 1:1 with exact results is going to be impossible. Recreating the important snapshots is a much more achievable goal; keeping everything may be impossible, as Perforce lacks features ClearCase relies upon.
To see what Perforce says about the miration, check out
http://perforce.com/perforce/ccaseconv.html
This explains the key differences and covers a few approaches you can take.
Start by doing a Google search on "clearcase to perforce conversion".
Then read the ClearCase to Perforce Conversion Guide.
Once you're done crying, you're going to have to decide (1) how much effort you can afford, and (2) what you really need to capture as part of the conversion. You're not going to get it all, so you might as well just focus on getting the important branches.
Another consideration would be to just capture the current state of each supported branch as a snapshot, import that into Perforce, and then turn off the old ClearCase server, saving it in a known good state for that day when you need to access something from the deep, dark, pre-Perforce days...
The other answers are outdated. Now you can import CC->Perforce with many options also preserving history.
http://www.perforce.com/sites/default/files/pdf/migration-planning-guide-clearcase-to-perforce.pdf
What you also have to keep in mind is the fact, that your importerscript may slightly commit in another sequence than the clearcase commits(maybe you are traversing dir, may be histories of files, etc.)
So, unless you gather all version information into a (large) database and sort them afterwards, you will end up with commits which are not very useful to look into(except of course history of single files). As you (hopefully) change your commit-policy to commit atomic changes into perforce, it will be visible when development started: The commits before just do not make any sense on a project scope.
So you really should think of leaving clearcase history behind. Tags/Branches creation is also a different problem, as you need your old configspecs for your old branches.
At the end you will get wrong filenames in old tags(as perforce do not support dir-vers.) so you will use clearcase for this(and it is very tricky to get the correct filename for each version of a file!).
The last problem you will encounter: importer run time:
if you have large VOBs(eg. 10 years, 50 GB size), you will wait days for the importer to gather all information and convert it to a nice shiny perforce repo. All this day your devteam will stop working.
Just a quick note on the one import I saw from ClearCase to Perforce.
As noted in the ClearCase to Perforce Conversion Guide:
Perforce supports atomic change transactions; ClearCase doesn't.
Note that labels are often used to simply denote a snapshot in time for a particular easily-specified set of files; this is inherently easy to do in Perforce without using a label, due to Perforce's use of atomic change transactions and file naming syntax.
For example, the state of all the files in //depot/projecta as of change 42 can be obtained with
p4 sync //depot/projecta/...#42
That means the ClearCase project that got imported was an UCM one, since the concept of baseline closely follows the one of global revision.
Only files with a baseline on them were imported, the other versions were discarded.