Roslyn slower than CodeDom

Roslyn slower than CodeDom - vb.net

I am using Roslyn to generate a runtime dll using a number of VB files on disk.
I am first creating syntax trees from all those files and then using them for creating VBCompilation and them emitting the assembly. The whole process takes ~ 10 min. Creating syntax trees takes up a significant portion of that time.
If I compile the files using CodeDom, it takes ~3 min.
Is there a way to make Roslyn compile files faster?

Related

Fortran .mod files becoming too large

In our Fortran project we are making extensive use of modules, since we are doing object oriented programming in Fortran 2003. We now have a hierarchy of objects with approximately 7 different levels. The problem for us is that the modules and files in the high levels take quite long to compile, even if they consist of only a few lines.
Looking at the created files, I think that the probable cause for this are the very large .mod files being written. Exploring these files (for gfortran 4.6 for instance) I can see that the complete interfaces for all the objects and modules recursively being used are included in each high level .mod file. These files are much smaller in gfortran 4.9, where they are stored in compressed format, but still they are quite large and slow down the compile times.
We have tried using private statements in each module (so that the underlying used modules cannot be seen just by using the current module) but this doesn't affect the size of the resulting .mod files.
Is there any programming practice or compiler directive which can solve this issue?

System.IO.Directory.GetFiles VS My.Computer.FileSystem.GetFiles

I wanted to get the names of all the files in a specific directory which includes over 25,000 files. I tried using these methods:
System.IO.Directory.GetFiles and My.Computer.FileSystem.GetFiles
I've found out that System.IO is significantly faster My.Computer
By significantly I mean around 20 second faster.
Could anyone explain to me the difference between These two methods?

if i remember this correctly System.IO.Directory.GetFiles uses a indexed file method that is stored in reg, and My.Computer builds the indexed files as it iterates throw the file dir so the second time you used the same My.Computer method if i remember right it will be faster

What form is DLL & what makes it processor dependent

I know DLL contains one or more exported functions that are compiled, linked, and stored separately..
My question is about not about how to create it.. but it is all about in what form it is stored.. Is it going to be in the form of 0's & 1's.. or in assembly commands ADD, MUL, DIV, MOV, CALL, RETURN etc..
Also what makes it to be processor dependent.. (like x86, x87, IBM 700 instruction set)..
Can someone please explain it little briefly..!

First of all, everything in a computer is in the form of "0's & 1's" . The fact that the computer can display some of these as text, pictures, sounds, 3D models, etc. is just a matter of how you interpret them. But down there, at the metal, it's all just "0's & 1's" (also known as bits). Note though that they are always grouped together in groups of 8, and these are called "bytes". It's really for the sake of efficiency, because operating with every bit individually would be too tedious. Actually, todays computers don't even operate on single bytes anymore (or rather - they do it very rarely). Mostly you operate with 4 or 8 bytes at a time, depending on whether you have a 32-bit or 64-bit CPU (that's in layman's terms, it's actually a bit more complicated than that).
As for a .DLL file - like an .EXE file, it contains bytes that describe instructions that a CPU can execute. The CPU takes these bytes directly from the .DLL/.EXE and executes them without any further modifications. That's why these files are CPU-specific. In different CPU architectures the same combination of bytes means different things, so a .DLL/.EXE will run correctly only on the CPU for which it was designed. On other CPUs these bytes will mean some other instructions, and when run, the program will most likely do some utter nonsense and crash immediately.
The assembly commands you mentioned also deserve an explanation. "Assembler" is not a language that a CPU can understand. It's a language a human can understand. It was created because writing directly in machine code (the bytes that the CPU actually understands) is very difficult. What you get is utter gibberish on the screen (try opening some .EXE file in Notepad!) but every bit has to be precisely set for it to work.
So assembly language is basically the same thing, except these instructions are written in text that humans can read. For every machine code that a CPU can understand, there is am instruction with a human-friendly name. An assembly compiler simply reads these instructions and replaces them with the bytes that represent the actual instructions for the CPU to execute. It's a 1:1 operation. Every command in assembly language matches a single machine instruction (again, in layman's terms).
So you see, there isn't even a single assembly language. Every CPU architecture has its own assembly language, because they each have different instructions.
Note though that all this applies to native .DLL/.EXE files. .NET files are different - they don't contain machine code, but rather instructions for an abstract, nonexistent CPU. It's like Java bytecodes. When a .NET .DLL/.EXE is run, the .NET runtime translates it from the abstract instructions to the instructions that the specific CPU can understand. They use a lot of tricks to make this very fast, so these files run almost as fast as simple .DLL/.EXE files.
Does this clear things up? :)

Native DLLs (not .NET assemblies) usually contain machine code that can only be run on a certain platform. The machine code is a sequence of bytes that the processor treats as instructions (ADD, MOV, etc.).

In Windows, dll's are stored in the PE format which is basically a collection of sections that holds the information about how to map it into memory. some sections contains the program's code (which is of course processor dependent), others contains the program's data, other the exported and imported functions and so on.
Managed code is compiled to some intermediate language that is JITed by the run-time as it is executed. therefore, your dll won't contain any processor dependent code and you'll be able to execute your program on any platform with the relevant run-time.

it depends on your DLL. generally, a DLL contains executable code as an EXE file. those code DLLs are processor dependent since the code can only be executed on a specific platform. the code is stored using the same "format" as an EXE file (binary machine code).
however, a DLL can sometimes contains only data: they are then called "resource DLL" and are not processor dependent at all. they act as a container for data files used by applications.
note that many DLLs are hybrids: they contain both code and resources. for example, most DLLs which comprises the user part of the Windows operating system are hybrid: you can open them using Visual Studio or a Resource Explorer to see the resources (the data segments) they contain, or open them with Dependency Walker or dumpbin to see the functions (the code segments) they contain.
(of course this answer is really Windows specific, i don't know for .so files which are the linux equivalent of a DLL)

Both a DLL and an EXE contain executable code.
In the case of a DLL it doesn't have the necessary parts to be directly executable. It must be called from an other piece of executable code. One DLL can call another, but all must ultimately be called from and EXE.
So the rules about what's compatible with what processor that apply to EXEs also apply to DLLs.

DLL and LIB files - what and why?

I know very little about DLL's and LIB's other than that they contain vital code required for a program to run properly - libraries. But why do compilers generate them at all? Wouldn't it be easier to just include all the code in a single executable? And what's the difference between DLL's and LIB's?

There are static libraries (LIB) and dynamic libraries (DLL) - but note that .LIB files can be either static libraries (containing object files) or import libraries (containing symbols to allow the linker to link to a DLL).
Libraries are used because you may have code that you want to use in many programs. For example if you write a function that counts the number of characters in a string, that function will be useful in lots of programs. Once you get that function working correctly you don't want to have to recompile the code every time you use it, so you put the executable code for that function in a library, and the linker can extract and insert the compiled code into your program. Static libraries are sometimes called 'archives' for this reason.
Dynamic libraries take this one step further. It seems wasteful to have multiple copies of the library functions taking up space in each of the programs. Why can't they all share one copy of the function? This is what dynamic libraries are for. Rather than building the library code into your program when it is compiled, it can be run by mapping it into your program as it is loaded into memory. Multiple programs running at the same time that use the same functions can all share one copy, saving memory. In fact, you can load dynamic libraries only as needed, depending on the path through your code. No point in having the printer routines taking up memory if you aren't doing any printing. On the other hand, this means you have to have a copy of the dynamic library installed on every machine your program runs on. This creates its own set of problems.
As an example, almost every program written in 'C' will need functions from a library called the 'C runtime library, though few programs will need all of the functions. The C runtime comes in both static and dynamic versions, so you can determine which version your program uses depending on particular needs.

Another aspect is security (obfuscation). Once a piece of code is extracted from the main application and put in a "separated" Dynamic-Link Library, it is easier to attack, analyse (reverse-engineer) the code, since it has been isolated. When the same piece of code is kept in a LIB Library, it is part of the compiled (linked) target application, and this thus harder to isolate (differentiate) that piece of code from the rest of the target binaries.

One important reason for creating a DLL/LIB rather than just compiling the code into an executable is reuse and relocation. The average Java or .NET application (for example) will most likely use several 3rd party (or framework) libraries. It is much easier and faster to just compile against a pre-built library, rather than having to compile all of the 3rd party code into your application. Compiling your code into libraries also encourages good design practices, e.g. designing your classes to be used in different types of applications.

A DLL is a library of functions that are shared among other executable programs. Just look in your windows/system32 directory and you will find dozens of them. When your program creates a DLL it also normally creates a lib file so that the application *.exe program can resolve symbols that are declared in the DLL.
A .lib is a library of functions that are statically linked to a program -- they are NOT shared by other programs. Each program that links with a *.lib file has all the code in that file. If you have two programs A.exe and B.exe that link with C.lib then each A and B will both contain the code in C.lib.
How you create DLLs and libs depend on the compiler you use. Each compiler does it differently.

One other difference lies in the performance.
As the DLL is loaded at runtime by the .exe(s), the .exe(s) and the DLL work with shared memory concept and hence the performance is low relatively to static linking.
On the other hand, a .lib is code that is linked statically at compile time into every process that requests. Hence the .exe(s) will have single memory, thus increasing the performance of the process.

Compiling Massive VB.NET Project

To compile my current project, one exe with ~90,000 loc + ~100 DLL's it takes about a half hour or more depending on the speed of the workstation.
The build process is one of running devenv from Powershell scripts. This works very well with no problems.
The problem is that it is slow. I want to speed up this build process.
MSBuild (using VS-2005) is one option but there's a bug specifying icons to the vb compiler/linker on the command line such that it won't successfully link.
What other options are there to "make" VB.NET programs?
(Faster workstation is not an option.)

Do you absolutely have to compile the whole solution every time? With that many assemblies it seems unlikely that they all need to be built unless they actually change. If your solution is made up of multiple projects, you might consider creating multiple solutions in your build environment. One master solution could contain all the projects, another that includes the ones that change most often. You can then configure your build process to focus on the projects that have changed. Depending on the source control system you use, you may be able to query the system to determine which projects have changed since the last build, and only build those projects.

There's NAnt, and Cruisecontrol.NET for continuous build.
You mentioned that getting a faster PC is not an option, but how much memory do you have? 2GB should be the minimum for a developer machine. Also, using a fast 10K RPM hard disk makes a big difference.
Have you tried disabling any virus scanner during your build?

If you can, upgrade to the 3.5 version of MSBuild. It can build solution files, and enables support for multiprocessor support (or here if you need to host it yourself) enabling it to build projects in parallel.
The caveat is that you need to be using project references so it knows what to build.
Also, how long is it taking now? Have you looked at the CPU/Memory Usage (using something like PerfMon) to see if it is a bottleneck?

There's not much you can do to make the build process any faster short of adding more cores, CPU power, and memory to your machine, but that isn't an option in your case.
Most large projects are not self-contained in a single EXE. More often, logical units are moved into seperate assemblies, which can either be a DLL or EXE. The end result is a whole bunch of little assemblies, instead of one enormous one.
To cite one example, one project that I worked on was enormous, consisting of 700+ forms and 10s of 1000s of classes. Functionally related forms, such as those related to printing, report generation, user interrogation, etc were self-contained in their own EXEs. If I was working on the reports, I'd exclude all projects not related to reports from the build process, and this helps bring the compilation time down from a half hour to a few seconds.
This programming style can be tricky, but when it done right, it simply works and works flawlessly.

If you have a big number of projects then you should try to reduce them. You can always split them up in dll's later. The fewer projects the faster it can build. Especially if it has to build them in a certain order.
Breaking them in smaller solutions is also an option.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas