Git Checkout Speed - Under the Hook - file-io

NOTE: This is not about how to use Git.. This is about under the hood working.
Can you explain how Git can checkout files so quickly from one branch to another?
For example, giving I have two branches, each branch has 10Gb of files...
I am able to checkout from one branch to the other in under a second.
When I checkout, the new files are on disk and ready to go.
However, if I was to delete 10gb and then restore 10gb of files it would take a few minutes.
I understand the basics of Git Objects and how Commit'ish, Trees and blobs work however, when checking out a diferent branch, the file needs to restored from the blobs and any existing untracked files removed. How is this happening so quickly???
I am assuming this is done using some form of HDD pointers..
I read over the Git code but my C skills are rusty.

Related

local Repository taking huge space

I am using gitlab for my project. There are 3 or 4 different repository which is taking huge space. Is there any better way to handle large space taken up? I have huge performance issue with computer. Local repository had to be deleted every time after branch work is completed to freeup space. This means, I am cloning the repo every time I need to work on new branch which is taking 30mins sometime which is again not helping and consuming huge time. I m also working on all three repository sequentially which means clone and delete 4 times for one assigned work which doesn't seem efficient.
Is it possible at all to keep all 4 repo in my local and still be efficient with space and performance of computer ?
I am using VScode.
Any suggestion appreciated?
Best Strategies to keep local repository and yet efficient and avoid deleting every time.
I've had similar problems in the past with repos that contained many branches, tags, and commits. What helped me was using the --single-branch option of the clone command.
Since you mentioned you're using VS Code and GitLab, I'm also assuming you're using the GitLab WorkFlow VS Code extension. Sadly, I don't know how to specify the --single-branch option with that extension, but there should be a way.

Git: merge two working copies without committing

I have a clone for server development and another for client development. Both material will eventually make it into the same branch, but I want to synchronize them and I want it to perform a merge as though I had commit pushed and pulled, but I want to do it without that.
I'm able to make a patch with this script I wrote:
git diff --cached
git diff
on the server, but applying that to the client is much harder.
I've tried the Unix patch command, for some reason, it keeps asking me what files to patch, like I can't find them. (Yes, they're there) I've tried
git apply -3 patch.patch
but that gives a lot of errors like "with conflicts" (without making any changes) and "does not match index". It doesn't even seem to be trying to patch the other half of the files.
Stashing, then applying the patch, and then popping from the stash doesn't work, because unstashing refuses to do merges.
It looks like doing it without the pulling isn't going to work--I haven't found a way to do it conveniently and safely. However, my problem with committing is that I didn't want to spam the git log with garbage like:
Sync'ing to client
Sync'ing back to server
Oops! Sync'ing something I forgot to the server again!
etc.
But I can avoid all this by committing, then pulling from the remote repos. In the end, I wouldn't have to push those commits, since I would use reset to remove them all from the local repo and then, with all my changes in the working directory, do a proper commit and push it.
Gotachas
They are many.
It's commonly known that you shouldn't reset your local repo if something has already pulled from it. This is probably from the obvious confusion that results when one repo delete commits that another repo believes were there. For that reason it's important that the same reset is performed on both repos before they start sharing code again.
If after you've done your commits that you later want to reset, then pull/merge, you could make things very difficult for yourself. There should be a way to manage it, but I haven't yet figured it out. One idea is to reset, stash, pull, merge, and commit again. Another involves revert with the -n option.
Instructions
The following example assumes you have 2 clones; one called "client" and the other "server".
Following https://help.github.com/articles/adding-a-remote, setup your client's and server's repo on each others' systems to they can pull from each other.
When you want to sync, just commit on the donor system, then instead of pulling from the origin, pull from a remote. Say the client wanted a commit from the server. On the client:: git pull myserver-repo mybranch.
Merge and conflict-resolve as necessary.
Loop back to 2 as many times as is necessary.
After several iterations of 2-4, you arrive at the point when you are ready to push your changes to the server. Go to whichever local repo has all the changes you want pushed, then run git log. Find the commit before the first commit you did in 2. Copy its hash to the clipboard.
Then git reset: git reset <hash you copied in 5>.
You should then see all the commits you don't want disappear from the log and all the changes therein in your working directory. Commit and push.
It's important that you do a cleanup on the repo from which you didn't perform 5-7. So if you pushed from your server repo, you need to perform the same reset operation on your client, then dispense with the changes as you see fit. My preferred method is git stash save "delete_me".

Possible to branch in Perforce without creating a new folder?

Is it possible to create branches in Perforce in a similar style to Git? I.e. without creating a new folder.
I would prefer for my client to manage the branches transparently whilst I work against a single copy of the directory tree on disk.
It seems awfully wasteful for the client to create an exact copy of the entire tree if you're only modifying say a couple of files. I much prefer Git's workflow in this regard.
If it's not possible using straight Perforce I'm happy to move to GitSwarm.
For info I'm running Perforce version 2015.1/1233444.
Possible yes, but with the centralized version of the system it involves a bit of 'magic'. Basically, the branch part doesn't need to involve the client at all anymore. Take a peek at p4 populate. That'll create another folder on the server, but won't do anything locally. Then you can edit your client workspace to map the branched files instead of the trunk files, and it'll just re-sync over top the files on your disk.
Now, having said that, if you wanted to take a look at our DVCS version of working, then you can just do "p4 switch -c " and it'll create a new branch locally, switch your workspace over to it (shelving any open current work in the process) and away you go.
My original answer was deleted because I thought a link was a better idea than repeating content. My mistake.
At any rate, I believe the DVCS features in Perforce Helix supply exactly the sort of thing you're after. In a blog I wrote in the subject (link here for reference) I explained how to create a new in-place branch with a single command:
p4 switch -c newBranchName
That will create a new branch with the name "newBranchName" and save any existing work in progress by default. To discover on which branch you're working you can use the switch command with the list argument as follows:
p4 switch -l
That would show you output like this, the asterisk showing that you're now working on the newBranchName branch.
newBranchName *
main
You can switch back and forth as you like, changing contexts as needed as often as you like. Your work in progress will continue to be saved on each branch in progress. When you're ready to merge your work back to main and push it back to the server, you can use the following sequence of commands:
p4 switch main
p4 merge --from newBranchName
p4 resolve –as
The first command switches back to the main branch, the second merges your work from the newly created branch into main, and the third resolves any potential conflicts automatically. If there are any conflicts that can't automatically be merged, then you can use the usual commands to walk through the resolution process.
Alternately, if you prefer to stick with Git, you can use that directly with our Helix Versioning Engine through our Git Fusion technology or use Git directly with our new GitSwarm technology. That is a pretty amazing option (in my opinion) as it makes it possible to mirror content automatically and bidirectionally between GitSwarm and the back end server. That way you get all the features of Git with GitSwarm (which itself is based on GitLab) and all the goodies from the rest of Helix.
Hope that helps!
If you use streams (Perforce's "managed" version of a branch, as opposed to doing completely ad hoc inter-file branching with arbitrary paths), it's pretty simple. As P4Gabe said, "switch -c" is a one-shot option on a local server.
On a shared server it's only a little more complicated because you have to do the "populate" explicitly (this is to keep naive users from accidentally branching lots of files lots of times on a shared server), but it's still only a few steps and it's something that you as an advanced user could script easily:
p4 stream -P (current stream) -t development (new stream name)
p4 populate -r -S (new stream name)
p4 switch (new stream name)
The equivalent is possible using ad hoc ("classic") branches as well if you have a good understanding of how client views work -- use populate to create the new branch, modify your client view to map the new branch into the namespace currently occupied by the old branch, and sync.
This blog post on what exactly "p4 switch" does might help if you're trying to engineer your own solution that's similar-to-but-not-quite the "switch" command: https://www.perforce.com/blog/150428/p4-switch-switching-it

bazaar pull special usage

I have local folder that is branch from formal_versions.
My workflow is:
Mkae changes and than commit them
The integrator merge them in his local branch.
The integrator push its local branch to formal_versions
I use pull to make my local branch identical to formal_versions
This is working fine.
However what should I do in the following scenario:
After pull from formal_versions , I compile the code. As a result , some workspace and obj file are changed (I.E date and time of compilation) and of cource , bazaar explorer inform me on modified files
I again want to make my branch mirror of formal version. What should I do?
A. Why using pull again says that "nothing to pull" even if
I use --overwrite switch ? it is suppose to make my local branch as mirror of the pulled branch...
B. Is my only option is to use revert working tree?
It is generally considered best practice (as well as good for one's sanity) not to version files that are the result of the build process. Executables, shared libraries, and even source files generated from by a 4GL are examples. You can ignore files by using bzr ignore <pattern>, for example bzr ignore *.exe. If the files are already versioned, you will also have to remove them using bzr remove.
bzr pull says there is nothing to pull because the formal version has had no new commits since your last pull.
If you must version the files in question, bzr revert is the only way I know of when bzr pull does not find new revisions. If there had been new revisions in the formal branch, the files should be updated (and will potentially be reported as conflicts).

Storing Drupal SQL in Git

I have a drupal site, and I am storing the codebase in a git repository. This seems to be working out well, but I'm also making changes to the database. I'm considering doing periodic dumps of the database and committing to git. I had a few questions about this.
If I overwrite the file, will git think it is a brand new file or will it recognize that it is an altered version of the same file.
Will this potentialy make my repo huge (the database is 16mb)
Can I zip this file? or will this mess Git up ... the zipped version is only 3mb
Any other suggestions?
If you have enough space, a non-compressed dump in source control is pretty handy because you can compare using a diff program what rows were added/modified/deleted.
Another solution is to use the features module which is supposed to capture drupal config in code. It stores this captured data as a feature module which you can put into version control.
For my database applications, I store scripts of DDL statements (like CREATE TABLE) in some sort of version control system. These scripts sometimes include static "seed" data as well. All the version control systems I use are good at recognizing differences in these files, and they are much smaller than the full database with data.
For the dynamically-generated data, I store backups (e.g. from mysqldump) in an appropriate location (depending on the importance of the data, that may include offsite backups).
1) It's all text, so GIT will just see it as it would any other file.
2) No, due to the above it should add 16mb to the repo (or less, due to GITs own compression), it won't add a new file every time, just the changes, so the repo will change by the size of the additions to the repository
3) No, or GIT won't be able to see the differences - GIT does it's own compression anyway