When running srb init on a large Rails application, the process uses a lot of memory (10GB+) and takes a long time (upwards of 10 or 15 minutes) to complete. Is is possible to update hidden definitions for a single file or sub-directory in order to speed-up this process?
I am especially thinking of the case where a new gem or file-change requires an update to the hidden definitions but I don't want to re-initialize the entire project.
Computing hidden-definitions.rbi is necessarily whole-program wide. The algorithm is:
load all code in your project, including gems
run sorbet over all code in your project, including RBIs that were already created for gems
output an RBI containing the diff of the previous two steps.
So fundamentally, hidden-definitions.rbi must be computed for an entire project.
Related
I've been using AWS SAM without much issue for the last 6 months, with about 40 functions split across 4 projects.
Just today I was cleaning up some hard disk space and notice my codebase folder for one project was 4.5 gbs, and I realized it was the .aws-sam folder. Looking further, I could see a build folder inside, with each function storing the equivalent of every single dependency. I am wondering if I am doing something wrong, or if not, why is this necessary, considering I don't utilize this folder whatsoever when I'm building locally.
I usually find hidden subfolders in working directories, which, as I suppose, were produced by the Perl 6 compiler, e.g.:
.precomp/0717742595706FA8D59800F9F9F7074236546DE7.1505852292.23535/0B/0BDF8C54D33921FEA066491D8D13C96A7CB144B9.repo-id
So, I have two questions:
Is it normal?
Is it indispensable for the compiler or there is a way to avoid it?
The .precomp folder houses the precompiled form of PerlĀ 6 modules.
The first time you use a module it gets compiled and stored in .precomp so that it doesn't have to be compiled it again. (currently only modules, not programs)
You can delete the directory and your code will continue to function. It will just be slower. Note that it will be recreated again the next time you use a module unless the directory can't be written to. I occasionally delete it myself; though that is because I regularly rebuild Rakudo from git. I do that just to clean the remnants of older installs.
The reason for the long seemingly arbitrary directory names are due to the fact that multiple versions from multiple authors of a module may be installed at once, and the possibility of Unicode module names. There has been talk of using another system which would give the files/directories more reasonable names, it just hasn't happened yet.
Excerpt From Micrsoft's "What is a .dll?":
"By using a DLL, a program can be modularized into separate
components. For example, an accounting program may be sold by module.
Each module can be loaded into the main program at run time if that
module is installed. Because the modules are separate, the load time
of the program is faster, and a module is only loaded when that
functionality is requested. Additionally, updates are easier to apply
to each module without affecting other parts of the program. For
example, you may have a payroll program, and the tax rates change each
year. When these changes are isolated to a DLL, you can apply an
update without needing to build or install the whole program again."
Ref:http://support.microsoft.com/kb/815065
DLL's are:
loaded at runtime
can "dynamically loaded" (by multiple programs at the same time)
- which allows saving of resources
- lowers disk space requirements
But why do they promote "modulizing" programs?What would happen if there weren't .dll files?Could someone provide/expand on the example
Modular programs provide a way of making a particular functionality available to many programs without having to include the same code in all of them. Also, they allow greater compatibility between programs since they would essentially use the same methods in common DLLs to obtain the same results.
One would write a program in a modular fashion such that different parts of the program could be maintained separately. Say you had some clever way of reading and writing your own data format to files. Say you make improvements to that technique. If the code for reading and writing the files lived in a DLL, you would only need to update the DLL. The program itself would remain unchanged.
If you have one monolithic EXE, you have to
pay for all the extra time relinking it, even if 1 source file changed (this is painful if it's > 80 MB, as is the case in large projects),
ship the entire EXE, when you could only ship a single DLL which is a fraction of the size (for patches/updates).
Breaking it up into DLLs you
have pluggability: The EXE is the host application and others can write DLLs that "plug into" the host via a well-defined interface. DLLs can be interchanged as long as they conform to the interface.
can share code across other DLLs and EXEs.
can have some DLLs be optionally loaded on demand, only if they're used, and unloaded when they're not needed
similar to above, have optional functionality. With a single EXE you have to download everything, even if some components are rarely used. With DLLs, you could have a system that downloads and installs features as needed.
The biggest advantage of dlls is probably during development of the original program. Without dlls you wouldn't be able to integrate with existing libraries without including the original source code. By including an existing library as a dll you don't need the source since it's all encapsulated in the dll. It would be a nightmare to develop in frameworks like .Net without dlls since you constantly include other libraries...
The alternative to breaking your program down in n > 1 pieces is to keep it in n == 1 piece. Why is this bad? Well it isn't always bad (maybe the BIOS is a good example?). But for user programs it usually is. Why? First we need to define what a program is.
What is a program?
A simple "program", roughly speaking, consists of an entry point (i.e. offset to the main function), functions and global variables. A function consists of instructions and information about what local variables are needed to run the function. To be executed a program must be loaded in primary memory/RAM (the aforementioned information). Because our program has functions (and not just jump statements), that implies the existence of a stack, which implies the existence of a containing environment managing the stack. (I suppose you could have a program that manages its own stack but I'd argue then your program is not a program anymore but an environment.) This environment contains the program, starts in the entry point and executes each instruction, be it "go to this part of the RAM and add it to whatever is in this register" or "If this register is all 0 then jump ahead this many instructions and resume execution there" indefinitely or until the program gives control back to its environment. (This is somewhat simplified - context switches in multi-process environments, illegal memory access, illegal instructions, etc. can also cause control to be taken from the program.)
Anyway, so we have two options: either load the entire program at once or have it stored and loaded in pieces.
n == 1
There are some advantages to doing it all at once:
Once the program is in memory no disk access is required to execute further (unless the program explicitly asks there to be).
Since the program is compiled/linked before execution begins you can do everything without any sort of string names/comparisons - go directly to the address (or an offset).
Functions are never out of sync with one another.
n > 1
There are some disadvantages, though, which mirror the advantages:
Most programs don't execute all code paths most of the time. I think there's some studies that in most programs most of the time spent executing is spent in a fraction of the instructions present in the program. In other words something like 20% of the program is executed 80% of the time (I just made that particular figure up - but you get the idea). If we divide our program up enough and only load instruction sets (i.e. functions) as they are needed then we won't waste time loading the 80% we'll never use this execution of the program. Along these lines we can ultimately fit more concurrently executing programs in our RAM at once if we only end up loading the fraction of the program we need.
Most programs share similar functions (i.e. storing data/trees/hashes/sorting/etc., reading input, writing output, etc.) and if each program has its own local copy then you can't reuse instruction code.
Many programs depend on the existence of others and are maintained by separate companies/groups/individuals. By releasing versioned modules we don't have to synchronize releases all the time.
Conclusion
These aren't the only points to consider but the first ones that came to my mind. I'd recommend reading about compilers, linkers and operating systems. That will answer this question more thoroughly than I and other questions I'm sure this has brought up. To recap dll's aren't the "best" way of packaging executable programs in all situations and circumstances - they have a particular use and advantages and disadvantages.
I am trying to set up a central symbol server for my organization and its various products. Each product has a nightly build, as well as "one-off" beta, RC, and release builds.
The goal I have is to keep about a month's worth of nightly build symbols, as we do a lot of "dogfooding' here so people use internal builds, and we'd like to easily debug files we get from our internal winqual when possible.
I also need to be able to permanently keep all beta, RC, and release build symbols.
After doing much research, I think the best approach here is to have two symbol servers: one for the nightly builds (which have the previous ~30 builds registered), and another to permanently store the beta, RC, and release symbols. I would have the build scripts add to the symbol store using the product and version tags to record the product and build number. After a successful build, a script would use history.txt from the symbol server to identify the oldest build not deleted, then delete it from the symstore.
In the case of the "one off" builds for betas, RCs, and release versions, they would be identified by a build & install person once they're created, and added to the 2nd symbol server (for permanent storage) as well.
So I've a few questions: Does this seem at all reasonable? There must be an easier way to do this, won't most organizations with a symbol server need to tackle this problem?
Secondly, if I am to go ahead with this approach, is there a fool-proof way to identify the oldest known symbol set registered with the server? I'd thought about using last modified dates, but history.txt seems most appropriate but a script parsing that may be error-prone. I was hoping it'd be possible to just add a symbol with product & version info, as well as delete one with product & version info.
Thanks in advance for any help. I'll gladly answer any questions anyone may have, or provide any clarifications.
I think two separate symbol stores is indeed going to be your best bet. For managing the store for your nightly builds, I'd recommend taking a look at AgeStore: http://msdn.microsoft.com/en-us/library/ff560046(v=vs.85).aspx
I wrote a widget in Wise Script that runs for every build that does the following.
Assuming our versioning is 1.0.1.0
1.) 1.0.1.0 through 1.0.6.0 symbols are added to the store (5 runs)
2.) every time there's a build, the symbol store history file is parsed...
3.) when 1.0.7.0 is built and the symbols are added, 1.0.1.0 symbols are deleted from the symbol store.
I'm basically parsing out the version number and if the third place is more than 5 less than current, I parse the transaction ID and run a symstore.exe del /i %TRANS_ID%
This prunes my symbols for daily/CI builds to only the last 5 build symbols.
Any notable symbols such as a release, hotfix, patch...I simply change the product name and add the symbols manually...that way, I am only pruning the dailies.
If you'd like, I could cut/paste the code in here as SMS Installer code (same as Wise). I also wrote a similar widget that keeps my local archive 5 builds deep. That way, I'm not wasting space for my local archive for CI builds. I use both, but you could use either. The both use a simple .INI file for runtime navigation. That way, I can put them both in my Jenkins/Jobs/ folder and simply edit the .INI file for each. They're very light weight as the .EXE is only 161K for the archive_prunerator and 161 for the symstore_widget, plus a 4 or 5 line .INI file.
AJ
To compile my current project, one exe with ~90,000 loc + ~100 DLL's it takes about a half hour or more depending on the speed of the workstation.
The build process is one of running devenv from Powershell scripts. This works very well with no problems.
The problem is that it is slow. I want to speed up this build process.
MSBuild (using VS-2005) is one option but there's a bug specifying icons to the vb compiler/linker on the command line such that it won't successfully link.
What other options are there to "make" VB.NET programs?
(Faster workstation is not an option.)
Do you absolutely have to compile the whole solution every time? With that many assemblies it seems unlikely that they all need to be built unless they actually change. If your solution is made up of multiple projects, you might consider creating multiple solutions in your build environment. One master solution could contain all the projects, another that includes the ones that change most often. You can then configure your build process to focus on the projects that have changed. Depending on the source control system you use, you may be able to query the system to determine which projects have changed since the last build, and only build those projects.
There's NAnt, and Cruisecontrol.NET for continuous build.
You mentioned that getting a faster PC is not an option, but how much memory do you have? 2GB should be the minimum for a developer machine. Also, using a fast 10K RPM hard disk makes a big difference.
Have you tried disabling any virus scanner during your build?
If you can, upgrade to the 3.5 version of MSBuild. It can build solution files, and enables support for multiprocessor support (or here if you need to host it yourself) enabling it to build projects in parallel.
The caveat is that you need to be using project references so it knows what to build.
Also, how long is it taking now? Have you looked at the CPU/Memory Usage (using something like PerfMon) to see if it is a bottleneck?
There's not much you can do to make the build process any faster short of adding more cores, CPU power, and memory to your machine, but that isn't an option in your case.
Most large projects are not self-contained in a single EXE. More often, logical units are moved into seperate assemblies, which can either be a DLL or EXE. The end result is a whole bunch of little assemblies, instead of one enormous one.
To cite one example, one project that I worked on was enormous, consisting of 700+ forms and 10s of 1000s of classes. Functionally related forms, such as those related to printing, report generation, user interrogation, etc were self-contained in their own EXEs. If I was working on the reports, I'd exclude all projects not related to reports from the build process, and this helps bring the compilation time down from a half hour to a few seconds.
This programming style can be tricky, but when it done right, it simply works and works flawlessly.
If you have a big number of projects then you should try to reduce them. You can always split them up in dll's later. The fewer projects the faster it can build. Especially if it has to build them in a certain order.
Breaking them in smaller solutions is also an option.