is libgit2 automatically packing repositories - libgit2

I did not see a garbage collection command in LibGit2 so I was wondering if it is currently automatically packing files in a local repository.

There is no automatic repacking. This is something which you absolutely never want the library to do. All objects start off as loose objects and remain that way until some tool decides it would like to do housekeeping.
Repacking (and gc operations in general) is 90% policy, which is not something that the library should be doing. Whatever tool wants to do its should choose an appropriate time to create a packfile out of them based on the specific knowledge of usage.

Related

Objective-C - Finding directory size without iterating contents

I need to find the size of a directory (and its sub-directories). I can do this by iterating through the directory tree and summing up the file sizes etc. There are many examples on the internet but it's a somewhat tedious and slow process, particularly when looking at exceptionally large directory structures.
I notice that Apple's Finder application can instantly display a directory size for any given directory. This implies that the operating system is maintaining this information in real time. However, I've been unable to determine how to access this information. Does anyone know where this information is stored and if it can be retrieved by an Objective-C application?
IIRC Finder iterates too. In the old days, it used to use FSGetCatalogInfo (an old File Manager call) to do this quickly. I think there's a newer POSIX call for that these days that's the fastest, lowest-level API for this, especially if you're not interested in all the other info besides the size and really need blazing speed over easily maintainable code.
That said, if it is cached somewhere in a publicly accessible place, it is probably Spotlight. Have you checked whether the spotlight info for a folder includes its size?
PS - One important thing to remember when determining the size of a file: Mac files can have two "forks", the data fork, and the resource fork (where e.g. Finder keeps the info if you override a particular file to open with another application than the default for its file type, and custom icons assigned to files). So make sure you add up both forks' sizes, or your measurements will be off.

Why Does a .dll file allow programs to be modularized?

Excerpt From Micrsoft's "What is a .dll?":
"By using a DLL, a program can be modularized into separate
components. For example, an accounting program may be sold by module.
Each module can be loaded into the main program at run time if that
module is installed. Because the modules are separate, the load time
of the program is faster, and a module is only loaded when that
functionality is requested. Additionally, updates are easier to apply
to each module without affecting other parts of the program. For
example, you may have a payroll program, and the tax rates change each
year. When these changes are isolated to a DLL, you can apply an
update without needing to build or install the whole program again."
Ref:http://support.microsoft.com/kb/815065
DLL's are:
loaded at runtime
can "dynamically loaded" (by multiple programs at the same time)
- which allows saving of resources
- lowers disk space requirements
But why do they promote "modulizing" programs?What would happen if there weren't .dll files?Could someone provide/expand on the example
Modular programs provide a way of making a particular functionality available to many programs without having to include the same code in all of them. Also, they allow greater compatibility between programs since they would essentially use the same methods in common DLLs to obtain the same results.
One would write a program in a modular fashion such that different parts of the program could be maintained separately. Say you had some clever way of reading and writing your own data format to files. Say you make improvements to that technique. If the code for reading and writing the files lived in a DLL, you would only need to update the DLL. The program itself would remain unchanged.
If you have one monolithic EXE, you have to
pay for all the extra time relinking it, even if 1 source file changed (this is painful if it's > 80 MB, as is the case in large projects),
ship the entire EXE, when you could only ship a single DLL which is a fraction of the size (for patches/updates).
Breaking it up into DLLs you
have pluggability: The EXE is the host application and others can write DLLs that "plug into" the host via a well-defined interface. DLLs can be interchanged as long as they conform to the interface.
can share code across other DLLs and EXEs.
can have some DLLs be optionally loaded on demand, only if they're used, and unloaded when they're not needed
similar to above, have optional functionality. With a single EXE you have to download everything, even if some components are rarely used. With DLLs, you could have a system that downloads and installs features as needed.
The biggest advantage of dlls is probably during development of the original program. Without dlls you wouldn't be able to integrate with existing libraries without including the original source code. By including an existing library as a dll you don't need the source since it's all encapsulated in the dll. It would be a nightmare to develop in frameworks like .Net without dlls since you constantly include other libraries...
The alternative to breaking your program down in n > 1 pieces is to keep it in n == 1 piece. Why is this bad? Well it isn't always bad (maybe the BIOS is a good example?). But for user programs it usually is. Why? First we need to define what a program is.
What is a program?
A simple "program", roughly speaking, consists of an entry point (i.e. offset to the main function), functions and global variables. A function consists of instructions and information about what local variables are needed to run the function. To be executed a program must be loaded in primary memory/RAM (the aforementioned information). Because our program has functions (and not just jump statements), that implies the existence of a stack, which implies the existence of a containing environment managing the stack. (I suppose you could have a program that manages its own stack but I'd argue then your program is not a program anymore but an environment.) This environment contains the program, starts in the entry point and executes each instruction, be it "go to this part of the RAM and add it to whatever is in this register" or "If this register is all 0 then jump ahead this many instructions and resume execution there" indefinitely or until the program gives control back to its environment. (This is somewhat simplified - context switches in multi-process environments, illegal memory access, illegal instructions, etc. can also cause control to be taken from the program.)
Anyway, so we have two options: either load the entire program at once or have it stored and loaded in pieces.
n == 1
There are some advantages to doing it all at once:
Once the program is in memory no disk access is required to execute further (unless the program explicitly asks there to be).
Since the program is compiled/linked before execution begins you can do everything without any sort of string names/comparisons - go directly to the address (or an offset).
Functions are never out of sync with one another.
n > 1
There are some disadvantages, though, which mirror the advantages:
Most programs don't execute all code paths most of the time. I think there's some studies that in most programs most of the time spent executing is spent in a fraction of the instructions present in the program. In other words something like 20% of the program is executed 80% of the time (I just made that particular figure up - but you get the idea). If we divide our program up enough and only load instruction sets (i.e. functions) as they are needed then we won't waste time loading the 80% we'll never use this execution of the program. Along these lines we can ultimately fit more concurrently executing programs in our RAM at once if we only end up loading the fraction of the program we need.
Most programs share similar functions (i.e. storing data/trees/hashes/sorting/etc., reading input, writing output, etc.) and if each program has its own local copy then you can't reuse instruction code.
Many programs depend on the existence of others and are maintained by separate companies/groups/individuals. By releasing versioned modules we don't have to synchronize releases all the time.
Conclusion
These aren't the only points to consider but the first ones that came to my mind. I'd recommend reading about compilers, linkers and operating systems. That will answer this question more thoroughly than I and other questions I'm sure this has brought up. To recap dll's aren't the "best" way of packaging executable programs in all situations and circumstances - they have a particular use and advantages and disadvantages.

Lock Embedded Resources in Cocoa

I'm developing on Cocoa using Xcode. I was wondering if there is a way to lock the embedded resources of my app, like logos, images, sounds, ... so nobody can change them?
Probably the easiest way would be to check timestamps on the resources, but this is also easy to circumvent. A better way would be to compute the hash of your application's resources directory on launch, and compare to a known value.
If any of the resources have been modified then the hash will differ and you can show a message and quit. You could use a custom build script step in Xcode to calculate the hash and have it available at compile time so that the process is all automated.

What's the best approach to incremental compilation when building a DSL using Eclipse?

As suggested by the Eclipse documentation, I have an org.eclipse.core.resources.IncrementalProjectBuilder that compiles each source file and separately I also have a org.eclipse.ui.editors.text.TextEditor that can edit each source file. Each source file is compiled into its own compilation unit, but it can reference types from other (already compiled) source files.
Two tasks for which this is important are:
Compiling (to make sure the types we're using actually exist)
Autocomplete (to look up the type so we can see what properties/methods are present on it)
To accomplish this, I want to store a representation of all the compiled types in memory (referred to below as my "type store").
My question is two fold:
Task one above is performed by the builder and task two by the editor. So that they both have access to this type store, should I create a static store somewhere that they both can have access to, or does Eclipse provide a neater way to deal with this problem? Note that it is eclipse, not me, that instantiates the builders and editors when they are needed.
When opening eclipse, I don't want to have to rebuild the whole project just so I can re-populate my type store. My best solution so far is to persist this data somewhere and then repopulate my store from that (perhaps upon project open). Is this how other incremental compilers typically do this? I believe Java's approach is to use a special parser that efficiently extracts this data from the class files.
Any insights would be really appreciated. This is my first DSL.
This is an interesting question and one that doesn't have a simple solution. I'll try to describe a potential solution and also describe in a little bit more detail how JDT accomplishes incremental compilation.
First, a bit about JDT:
Yes, JDT does read class files for some of its information, but only for libraries that don't have source code. And this information is really only used for editing assistance (content assist, navigation, etc).
JDT computes incremental compilation by keeping track of dependencies between compilation units as they are compiled. This state information is stored on disk and retrieved and updated after each compile.
As a more complete example, let's say that after a full build, JDT determines that A.java depends on B.java, which depends on C.java.
If there is a structural change in C.java (a structural change is a change that can affect outside files (e.g., adding/removing a non-private field or method)), then B.java will be recompiled. A.java will not be recompiled since there was no structural change in B.java.
After this bit of clarification on how JDT works, here are some possible answers to your questions:
Yes. This must be done through statically accessible global objects. JDT does this through the JavaCore and JavaModelManager objects. If you don't want to use global singletons, then you can access to your type store available through your plugin's Bundle activator instance. The e4 project does allow dependency injection, which is probably even better (but is not really a part of the core Eclipse APIs).
I think persisting the information on the file system is your best bet. The only real way to determine incremental compile dependencies is to do a full build, so you need to persist the information somewhere. Again, this is how JDT does it. The information is stored in your workspaces' .metadata directory somewhere in the org.eclipse.core.resources plugin. You can have a look at the org.eclipse.jdt.internal.core.builder.State class to see the implementation.
So, this may not be the answer you are looking for, but I think this is the most promising way to approach your problem.

Software configuration management tool for hundreds of binary files, many are large

Note: I've tried searching, Stackoverflows near useless. I am not sure what kind of tool I need.
At my organization we need to keep track of the software configuration for many types of computers including the binary installers and automation scripts. Change is infrequent but the size of latest version of the configuration is several gigs.
We are trying to use Mercurial to store changes but it is just too slow, even without many revisions at all. I did an hg status but killed it after it took 10 minutes without finishing.
We are looking for a way to store the current configuration as well as having the old configurations there just in case. I have never done anything like this before and do not know what tools are available or even suitable for such tasks. Can someone point me in the right direction or tell me how the are solving this problem? Thanks
Since hard disk space is cheap and being able to view binary differences isn't very helpful, perhaps the best option you have is to store each configuration in a new directory that is indexed somehow. Example below:
/software/configs/2009-03-15
/software/configs/2009-09-28
/software/configs/2009-09-30
Given the size of your files and the infrequent number of changes, this would allow you to pick a configuration from a given 'tag' without the overhead of revision control.
If you pack your files into a single tar file and generate a SHA-512 hash, then you can be reasonably sure that no one has tampered with your files since they were archived.
While I don't know specific details about how to implement this strategy in mercurial, I have been working with git and git-fat. It sets up a general procedure that is likely to be feasible on mercurial as well. Basically the idea is whenever you add a binary file to the repository, under the hood, the repo creates a symlink to the file that is actually stored in another location as a checksummed object.
This allows large files to be tracked by the repo, without storing the actual data inside. It requires the data to be stored in some other location (perhaps in a binary management system).
It might take some configuration to do it in mercurial, but I think it's an elegantly simple solution.