Can I optimize a Mercurial clone?

Can I optimize a Mercurial clone? - optimization

My Mercurial clone has become incredibly slow, presumably due to on-disk fragmentation. Is there a way to optimize it?
The obvious way it to make a new clone, then copy my MQ, saved bundles, hgrc, etc, to the new clone and delete the old one. But it seems like someone might have run into this problem before and made an extension to do it?

If the manifest gets particularly large then it can result in slow performance. Mercurial has an alternative repository format - generaldelta - that can often result in much smaller manifests.
You can check the size of your manifest using:
ls -lh .hg/store/*manifest*
To get maximum value from generaldelta:
Install Mercurial 2.7.2 or later (2.7.2 includes a fix to a bug in generaldelta that could result in larger manifest sizes - but there's a good chance you won't hit the bug with an earlier version).
Execute hg --config format.generaldelta=1 clone --pull orig orig.gd.
This may give some improvement in the manifest size, but not the full benefit.
Execute hg --config format.generaldelta=1 clone --pull orig.gd orig.gd.gd.
The clone of the clone may give a much greater improvement in the manifest size. This is because when pulling from a generaldelta repo things will be reordered to optimise the manifest size.
As an example of the potential benefits of generaldelta, I recently converted a repo that was ~55000 SVN commits (pulled using hgsubversion) plus ~1000 Mercurial commits/merges/grafts, etc. The manifest in the original repo was ~1.4GB. The manifest in the first clone was ~600MB. The manifest in the clone of the clone was ~30MB.
There isn't a lot of information about generaldelta online - there's still work to be done before it can become the default format, but it works well for many projects. The first few Google search results have some information from when it was first introduced, and there was some recent discussion on the mercurial-dev mailing list.

I deleted the repo and recloned, and that improved performance.

Turn off real-time anti-virus monitoring of the folder the repo is cloned to and defrag. There's not much else you can do.

Related

Can a change to package-lock.json ever affect the deployment?

I'm reading the NPM docs about package-lock.json and my interpretation is that a committed change to it can never cause issues in the deployed version.
During the roll-out we run npm install which creates (or overwrites) the lock file anyway. In my mind, the lock file is more of a receipt of the state of the concurrent world while installing, rather than a pointer on how the installation should be performed.
However, I haven't been successful convincing my team that it is so. They feel uneasy relying on the statement above (not contradicting it nor arguing against it, just not entirely convinced to the degree that they would bet a testicle on it).
Is it at all possible that package-lock.json might affect the actual installation?
Since I'm new with the company, my track record of 10+ years has limited impact. And I'm myself humbly considering that even though the lock file never caused me any issues before, my experience might be irrelevant if the local environment is configured in a way I'm not familiar with yet. So I'm too cautious to bet my reputation as we're about to make a very important release.

In my mind, the lock file is more of a receipt of the state of the concurrent world while installing, rather than a pointer on how the installation should be performed.
Maybe I am interpreting your statement wrong but package-lock is a pointer for future installations in a way. See the general documentaion on lock files (different link than the one you shared), following statement from the above doc might be helpful:
This file describes an exact, and more importantly reproducible node_modules tree. Once it’s present, any future installation will base its work off this file, instead of recalculating dependency versions off package.json.`
A read on following discussion on this topic might be helpful to you too. Thanks!

Multi-Version Code Support in Git

I have been working on a big SQL based project that is taking an increasing amount of time and effort to maintain its versions. Lets keep it simple. I have three folders for each version of the code called Ver1, Ver2, and Ver3. All three version folders have the exact same filenames within it, but their content differs from version to version. If I make a change to a particular file in Ver3 that exists in Ver2 and Ver1, how can I use Git not to necessarily make the same changes in those other versions (not always practical due to partial rewrites for performance or logic changes), but to let me know that the other two versions of the file need to be updated in order to catchup to the Ver3? If Git isn't suited for this task, or if you have any experience with a similar issue, I would much appreciate any suggestions.

Xcode mixing Branches

This may sound strange, but I am having issues with Xcode mixing branches, or at least messing everything up.
I created a new branch (v1.4), then created a new data model and renamed an entity. Had to switch back to previous branch (v1.3) to check something and I get errors at run time on v1.3, it's looking for the new entity name from Branch v1.4 - what the %$^#% is happening. I searched the files, the new entity name is nowhere in v1.3.
I switch branches again to v1.2 and it ran fine. So, switched to V1.3 again - nogo. Switched back to v1.2 and it has the same issue now, runtime error because it can't find the new entity name.
What is happening? Anyone else have this issue?
Any thoughts/suggestions would be greatly appreciated.
OS X 10.11.6
Xcode 7.1.1
==[EDIT]===
I am not real familiar with GIT, just starting to learn. I ran the couple commands as mentioned, get nothing for either git diff commands.
Running git status --ignored I do get multiple files as untracked - still working on understanding why (Separate issue) - couple object files and 2 data models (Was 3, but manually added one to commit.
Also I get 3 ignored files:
.DS_Store
projectName.xcodeproj/project.xcworkspace/xcuserdata/
projectName/.DS_Store
That's as far as I've gotten. Not familiar enough with git to know if these ignored files are the ones I should delete.
Second option - will restoring from time machine fix this? It may be a little extra work for me to recreate v1.4 but probably less time than I've already spent trying to figure out how to fix it.
I do appreciate both comments so far - thank you.
==[SECOND EDIT]==
Thanks again for your comments.
However, do to time and schedule I perform a Time Machine backup before ElpieKay posted the last comment, so I will not be able to test it.
Reverting back did "fix" it as you'd expect, but I did lose several hours of work but life happens. I will keep this for if/when this happens again and try the git clean -df to see it fixes it.
On a side note - while I was switching back and forth between V1.3 and V1.4 trying to figure this out, 2 of the model versions disappeared on v1.4 - i.e. the name turned red in Xcode and when I viewed the contents of the file they were missing. I do not know if this is related or not, but I thought I would mention it. This happened one other time and I thought maybe I did something - I did a time machine restore to fix it last time. Wonder if git clean -df would have fixed it.

How to stop or limit indexing in IntelliJ 13?

My IntelliJ 13.1.5 constantly indexes my project which really slows my machine down. It does it when I rebuild my project as well as when I start my jetty server.
Does anybody know how to disable or at least limit that behavior?
The previous version didn't do that so often.

Actually, I found what was wrong.
Once of my modules didn't have the target folder excluded and that was causing IntelliJ to always index and since that module is big it would take forever to index it.
Solution:
Go to "Project Structure" -> "Modules" and excluded all target folders.

Starting from IntelliJ 2017.2, indexing can at least be paused:

To other unfortunate souls working for enterprise mostly on VDI-s without an SSD: Idea actually parses/indexes a lot more then your project folders. Likely candidates that makes your whole day a rant session:
Libraries and Linters specified at a global level. For example "Languages & Frameworks/ Javascript/ Libraries" or "TypeScript / TsLint / TsLint Packages". If you work in multiple languages then this can bloat your index quite a lot. Its usually much better to open just one tiny bit from a project related to what your are working on to keep the index as small as possible.
as mentioned before: target, node_modules folders
dist, mock, resource folders
Do not open multiple projects/ modules in the same project scope. I theory this saves you time because you dont have to wait to reopen the given module in an other window, but the reality is that you just adding more stuff to index. If you happen to git pull a project with 5-6 different modules your idea will go into stasis for half an hour to index all the changes.

Try Invalidating the cache and restarting IntelliJ.
I had similar issue it solve with :
IntelliJ IDEA caches a great number of files, therefore the system cache may one day become overloaded. In certain situations the caches will never be needed again, for example, if you work with frequent short-term projects. Also, the only way to solve some conflicts is to clean out the cache.
To clean out the system caches:
On the main menu, choose File | Invalidate Caches/Restart. The Invalidate Caches message
Source link.

Mercurial practices: use with IDEs and scalability

I am not an experimented user of SCM tools, even though I am convinced of their usefulness, of course.
I used some obscure commercial tool in a former job, Perforce in the current one, and played a bit with TortoiseSVN for my little personal projects, but I disliked having lot of .svn folders all over the place, making searches, backups and such more difficult.
Then I discovered the interest of distributed SCM and I chose to go the apparently simpler (than git) Mercurial way, still for my personal, individual needs. I am in the process of learning to use it properly, having read part of the wiki and being in the middle of the excellent PDF book.
I see often repeated, for example in Mercurial working practices, "don't hesitate to use multiple trees locally. Mercurial makes this fast and light-weight." and "for each feature you work on, create a new tree.".
These are interesting and sensible advices, but they hurt a bit my little habits with centralized SCM, where we have a "holy" central repository where branches are carefully planned (and handled by administrators), changelists must be checked by (senior) peers and must not break the builds, etc. :-) Starting to work on a new branch takes quite some time...
So I have two questions in the light of above:
How practical is it to do lot of clones, in the context of IDEs and such? What if the project has configuration/settings files, makefiles or Ant scripts or shell scripts or whatever, needing path updates? (yes, probably a bad idea...) For example, in Eclipse, if I want to compile and run a clone, I have to do yet another project, tweaking the Java build path, the Run/Debug targets, and so on. Unless an Eclipse plugin ease that task. Do I miss some facility here?
How do that scale? I have read Hg is OK for large code bases, but I am perplex. At my job, we have a Java application (well, several around a big common kernel) of some 2 millions of lines, weighting some 110MB for code alone. Doing a clean compile on my old (2004) Windows workstation takes some 15 minutes to generate the 50MB of class files! I don't see myself cloning the whole project to change 3 files. So what are the practices here?
I haven't yet seen these questions addressed in my readings, so I hope this will make a useful thread.

You raise some good points!
How practical is it to do lot of clones, in the context of IDEs and such?
You're right that it can be difficult to manage many clones when IDEs and other tools depend on absolute paths. Part of it can be solved by always using relative paths in your configuration files -- making sure that a source checkout can compile from any location is a good goal in itself, no matter what revision control system you use :-)
But when you cannot or dont want to bother with several clones, then please note that a single clone can cope with multiple branches. The "hgbook" emphasizes many clones since this is a conceptually simple and very safe way of working. When you get more experience you'll see that you can use multiple heads in a single repository (perhaps by naming them with bookmarks) to do the same.
How do that scale?
Cloning a 110 MB repository should be quite fast: it depends on how long it takes to write 110 MB to your disk. In a recent message to the Mercurial mailinglist it was reported that cloning 6.3 GB took 4 minutes -- scaling that down to 110 MB gives about 4 seconds. That should be fast enough that your tea is still warm :-) Part of the trick is that the history data are simply hard-linked (yes, also on Windows) and so it is only a matter of writing out the files in the working copy.

PhiLo: I'm new at this, but mercurial also has "internal branches" that you can use within a single repository instead of cloning it.
Instead of
hg clone toto toto-bug-434
you can do
cd toto
hg branch bug-434
hg update bug-434
...
hg commit
hg update default
to create a branch and switch back and forth. Your built files not under rev control won't go away when you switch branches, some of them will just go out of date as the underlying sources are modified. Your IDE will rebuild what's needed and no more. It works much like CVS or subversion.
You should still have clean 'incoming' and 'outgoing' repositories in addition to your 'work' repository. Just that your 'work' can serve multiple purposes.
That said, you should clone your work repo before attempting anything intricate. If anything goes wrong you can throw the clone away and start over.

Question 1:
PIDA IDE has pretty good Mercurial integration. We also use Mercurial for development itself. Personally I have about 15 concurrent clones going of some projects, and the IDE copes fine. We don't have the trouble of tweaking build scripts etc, we can "clone and go".
It is so easy that in many cases I will clone to the bug number like:
hg clone http://pida.co.uk/hg pida-345
For bug #345, and I am ready to fix.
If you are having to tweak build scripts depending on the actual checkout directory of your application, I might consider that your build scripts should be using some kind of project-relative path, rather than hard-coded paths.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas