Visual Studio & compile performance issues for large number of files

Visual Studio & compile performance issues for large number of files - vb.net

Our current solutions/projects have several classes combined into one file, I'm told this was done due to the slow compile times in VS.
Is this a confirmed problem and solution?
Can we break these apart now that we are using VS2008 Team system? Has anyone else separated the classes into different files and still had good performance?

I work on the VB.Net IDE team and I can tell you that putting everything in 1 file will make VS run slower, not faster. VB.Net works just fine with classes in different files.
The only time this would ever make a difference is if you had an unbelievably slow hard drive, and files that were on very different parts of the physical disks (resulting in more and longer seek instructions). In general this shouldn't be a problem and for the VB.Net IDE this would only be a problem during initial startup. We have several layers of caching that would help eliminate even these types of problems.
You may be able to discover some minimal benefits to this approach if you only consider the raw time it takes the command line compiler to operate. IMHO, the more important numbers are the respnosiveness of Visual Studio and the relative build time for Visual Studio. VS responsiveness will decrease if have extremely long files (which is what will eventually occur if you put ever class into a single file).

What is your hardware like?
The big bottleneck with VS is that it needs to read and write loads of little files.
A fast hard drive or two can improve performance loads!

I'm using C# rather than VB.NET, but I never encountered compiler performance problems. It seems that this is some kind of premature optimization. I don't care how long my build server needs to build the application. Don't sacrifice clarity for build time performance.

Sounds like you might want to try to persuade your team to look into the solution design; it might not be optimal to have one giant solution with all projects in there. If build times are indeed a problem, I would rather solve it by breaking the solution down to logical parts, if possible.

I do not believe that grouping many classes into a single file will significantly improve build performance for normal projects. It should be safe to break them out into more files (one-per-class is standard).
The one exception I've run into is "Web Site" projects. I don't have any scientific data, but these do seem to get slower as the number of files increases. To fix that you can convert it to a "Web Application" project.
Having too many projects in a solution is also known to make VS slow, but that doesn't sound like your problem.

This is a confirmed problem and a confirmed solution. At Tech Ed this year they said that it will no longer be a problem in VS2010, which you could download a beta of and try out.

Related

Automatisation&Piping of diverse tasks

I am looking for recommendations for a very generic automation/task execution tool. The scope is somewhat between a script, a build system like make and orchestration tools like Ansible or Puppet. The best I can do is describe my rather vague 'requirements' and hope for clues how others have solved these problems. Sorry for the long description, I guess I don't really know what exactly I want he solution to do. I profit from programming answers on SO all the time but I am not entirely sure if my open ended question is acceptable here.
--
We work as data analysts/system validators in a corporate setting. We perform a range of diverse tasks and interact with lots of ever changing systems. Each little step we do is arguably mundane/easy, but the bigger picture only forms if lots of iterations with slightly different inputs or combinations are repeated. It is a bit like looking for a needle in a hay stack, but the concrete problem is slightly different every time. This makes it hard to use a normal script or automation tool, which require more structure to work. But doing things semi-manual without a big team does not allow us to cover all the analysis/cases we want/need.
To give an applied example: a typical tasks could involve setting up a big calculation in a vendor system, extracting their ASCII output from a web server and parsing it. Then we would suck raw input data from a set of configuration files and data bases. This is piped into some of our home grown replication tools/models living in C++. Then both the system's results and our replication is scanned for interesting outliers (e.g. regression tested) and only this subset is uploaded for human analysts to investigate, nicely presented in an Excel sheet.
We can do all these things easily by hand for a once-off or maybe using ad-hoc tools/scripts. We just can't do it repeatedly for ever so slightly different settings. We seem to need a library for 'common tasks' that are just specialized by some few inputs (e.g. task it to download a time series and scan for outliers - parameters would be db access/login and maybe parameters defining what an outlier is in that context). And then I need to chain these tasks together to make complex tasks repeatable and simple to build up from atomic steps.
I have not found anything really do something like this. There seems to be specialist scripting or tools for each niche available, but not something combining all the different tasks I need to perform.
I have been so far toying on and off with a minimalist sqlite database which controls a set of python 'scripts'/wrappers. These scripts take input parameters from the data base, and they are chained/piped based on the database. The scripts write their results back to the database, mostly as plain text and floats/ints. This kind of db interface is very error prone and complicated for humans; the idea is to have (template) scripts writing (concrete/parametrised) scripts to the db for execution, like rolling itself out before executing. Not sure if this is a smart idea, but the db is driving the scripts, without much interacting among these building block script; rather than having the conventional bunch of scripts calling each other and dumping some data into db as an after thought. So far we have lots of separate wrappers (scripts) to talk to all the systems and do the work, what is really missing is something tying it all together an controlling it.
I am interested (obviously) more in data/flow transparency, repeatability and chaining mini-programs together to bigger units, rather than speed or scaling to larger data sets. All the heavier lifting is either done in the systems we interact with, or it is delegated to C++ called from these python scripts. This is not a production system with more stability and fixed goals but rather a flexible analysis/investigation helper.
I really hope someone here has previously run into exactly that problem severely limiting our productivity, and we can just piggy back off your solution or ideas.

I would suggest that you consider staf (Software Test Automation Framework). It's open source, distributed, and cross-platform. It will run just about any task on just about any platform. It has a variety of plugin "Services" available for specific purposes, or you can create your own custom Service. You can also extend the functionality through scripting (jython) It's also well documented and reasonably well supported through user forums by IBM.

Does [Aptana] have a minimize function, if so where is it?

I am working in Aptana Studio three and am wondering if there is a quick easy way to minimise your code when finished developing?
Say you have a long SQL insert that you have separated out over several lines for ease of editing during development. At production time, you could go through and remove all your line breaks to compress the code into less lines but this could be a very time consuming manual task.
Just wondering if there is an easier way?

I don't know whether Aptana has this feature but I doubt it will give you any benefit to do this.
Generally putting code onto fewer lines doesn't actually give any benefit apart from when this code must be transferred over a network (e.g. to a web browser) in which case you save a few bytes of network traffic.

How to compare the performance of web frameworks

I want to compare the performance of some web frameworks (Ruby on Rails and ASP MVC3) but I don't know how to get started... Should I measure how fast each framework renders e 10k long loop or how fast its renders 10k lines of html? Are there maybe programs that can help you with this? Also how can the server load be monitored? Any help is appreciated!
Thijs

With respect, this is an unanswerable question. Is a Porche faster than a Prius? Well, no, not when the Porche is in the shop :-).
The answer depends on what you're trying to accomplish, how you do it, and how you code it. For example, Rails goes out of its way to transparently cache as much as it can, and then makes it trivially easy to cache stuff on your command. Of course there's a way to do the same in ASP MVC3, but is it as easy?
Can you find, hire, and train a suitable team in that knows how to use the framework? What's the culture of the organization (Windows or Unix?). I could write a really fast application in MS-Access and the same application poorly in Rails against a high-performance database and the MS-Access app would win. It's far from a given that an application will be written well, optimized, or whatever.
These days, a well-written application is typically performance bound on data I/O, and if this is the case, then it's which database you use that might matter. The loop-test you propose would test almost nothing, unless you're writing an application that calculates pi to the billionth place, or something.
I am sure there are published benchmarks of application frameworks available, but again, they need to make assumptions about what the application actually has to do.
The reality is that any reasonable framework (which includes both of the two you mention) is likely to be as fast as necessary for most scenarios, and again, what you do, and how you architect and implement it are the far more likely culprits for performance problems.
Once you do choose, there's a great (awesome) tool called NewRelic RPM which works with several frameworks -- I use it with Rails, and it gives you internal metrics at a level of detail that is beyond belief.
I don't mean to be glib, or unhelpful. But this is a little bit of a sore spot for me -- in so many cases people say "we should use foo instead of bar because foo's faster", and weeks go by as bar is replaced by foo. And then there are little incompatibilities. And an unexpected bug. And then, well, for some reason the new one is a little slower. And then after it gets optimized, it's finally just as fast.
I'll step down from my soapbox now :-)

Should I choose Hiberlite for integrating SQLite into my Win/iOS application?

I am a composer by profession and my computer science skills are limited though I program quite a bit of the software that I use.
What are the most reasonable ways to approach SQLite integration as a file format and database in an iOS app (it also needs to run on windows, but that is a secondary concern)?
I have been researching Hiberlite, which looks fantastic, but it seems to be little used and apparently it doesn't run well on embedded systems (iOS?) and chokes up when thousands of objects are in play. I haven't been able to get a sense of how severe those bottle necks are when running under those conditions.
The settings of thousands of objects (~50,000 though that number could expand) would be read every 1-10 seconds and written periodically. Read performance is more critical as write operations can stutter with out effecting the core operation of the app.
Given those conditions, how should I approach SQLite? My understanding is that without something like Hiberlite the entire database (many millions of entries) must be read and rewritten for every entry, is that less efficient. If that is the best approach is there a good resource to follow for implementing it?
Any advice would be greatly appreciated. My current software that I rely on is beyond buggy and needs refactoring, but due to my inexperience I am having a difficult time finding information about a reasonable approach.

I'm guessing you've probably found a solution for this by now, but I've been interested myself in embedding SQLite on Android and IOS, and I came across many C++-based ORM solutions.
Hiberlite looked possibly not fully mature (I didn't readily see a method of returning subsets of data, which is fairly standard). A framework which did draw my attention was the POCO:Data ORM library. It's based on the stream-based mechanism used in SOCI ORM. The POCO library is modular and optimised for embedded environments (I believe it also has a minimal external dependencies). Wikipedia has an article here, they outline some of its users, of which OpenFrameworks is one.
The WT ORM also looked pretty interesting.
I'm listing some of the other C++ ORM frameworks I found here, in no particular order:
http://soci.sourceforge.net
webtoolkit WT DBO ORM
http://debea.net
http://www.qxorm.com
http://sourceforge.net/apps/trac/litesql
http://otl.sourceforge.net
http://cppcms.com/sql/cppdb
http://dtemplatelib.sourceforge.net
http://code.google.com/p/qdjango

Are code generators bad?

I use MyGeneration along with nHibernate to create the basic POCO objects and XML mapping files. I have heard some people say they think code generators are not a good idea. What is the current best thinking? Is it just that code generation is bad when it generates thousands of lines of not understandable code?

Code generated by a code-generator should not (as a generalisation) be used in a situation where it is subsequently edited by human intervention. Some systems such the wizards on various incarnations of Visual C++ generated code that the programmer was then expected to edit by hand. This was not popular as it required developers to pick apart the generated code, understand it and make modifications. It also meant that the generation process was one shot.
Generated code should live in separate files from other code in the system and only be generated from the generator. The generated code code should be clearly marked as such to indicate that people shouldn't modify it. I have had occasion to do quite a few code-generation systems of one sort or another and All of the code so generated has something like this in the preamble:
-- =============================================================
-- === Foobar Module ===========================================
-- =============================================================
--
-- === THIS IS GENERATED CODE. DO NOT EDIT. ===
--
-- =============================================================
Code Generation in Action is quite a good book on the subject.

Code generators are great, bad code is bad.
Most of the other responses on this page are along the lines of "No, because often the generated code is not very good."
This is a poor answer because:
1) Generators are tool like anything else - if you misuse them, dont blame the tool.
2) Developers tend to pride themselves on their ability to write great code one time, but you dont use code generators for one off projects.
We use a Code Generation system for persistence in all our Java projects and have thousands of generated classes in production.
As a manager I love them because:
1) Reliability: There are no significant remaining bugs in that code. It has been so exhaustively tested and refined over the years than when debugging I never worry about the persistence layer.
2) Standardisation: Every developers code is identical in this respect so there is much less for a guy to learn when picking up a new project from a coworker.
3) Evolution: If we find a better way to do things we can update the templates and update 1000's of classes quickly and consistently.
4) Revolution: If we switch to a different persistence system in the future then the fact that every single persistent class has an exactly identical API makes my job far easier.
5) Productivity: It is just a few clicks to build a persistent object system from metadata - this saves thousands of boring developer hours.
Code generation is like using a compiler - on an individual case basis you might be able to write better optimised assembly language, but over large numbers of projects you would rather have the compiler do it for you right?
We employ a simple trick to ensure that classes can always be regenerated without losing customisations: every generated class is abstract. Then the developer extends it with a concrete class, adds the custom business logic and overrides any base class methods he wants to differ from the standard. If there is a change in metadata he can regenerate the abstract class at any time, and if the new model breaks his concrete class the compiler will let him know.

The biggest problem I've had with code generators is during maintenance. If you modify the generated code and then make a change to your schema or template and try to regenerate you can have problems.
One problem is if the tool doesn't allow you to protect changes you've made to the modified code then your changes will be overwritten.
Another problem I've seen, particularly with code generators in RSA for web services, if you change the generated code too much the generator will complain that there is a mismatch and refuse to regenerate the code. This can happen for something as simple as changing the type of a variable. Then you are stuck generating the code to a different project and merging the results back into your original code.

Code generators can be a boon for productivity, but there are a few things to look for:
Let you work the way you want to work.
If you have to bend your non-generated code to fit around the generated code, then you should probably choose a different approach.
Run as part of your regular build.
The output should be generated to an intermediates directory, and not be checked in to source control. The input must be checked in to source control, however.
No install
Ideally, you check the tool in to source control, too. Making people install things when preparing a new build machine is bad news. For example, if you branch, you want to be able to version the tools with the code.
If you must, make a single script that will take a clean machine with a copy of the source tree, and configure the machine as required. Fully automated, please.
No editing output
You shouldn't have to edit the output. If the output isn't useful enough as-is, then the tool isn't working for you.
Also, the output should clearly state that it is a generated file & should not be edited.
Readable output
The output should be written & formatted well. You want to be able to open the output & read it without a lot of trouble.
#line
Many languages support something like a #line directive, which lets you map the contents of the output back to the input, for example when producing compiler error messages or when stepping in the debugger. This can be useful, but it can also be annoying unless done really well, so it's not a requirement.

My stance is that code generators are not bad, but MANY uses of them are.
If you are using a code generator for time savings that writes good code, then great, but often times it is not optimized, or adds a lot of overhead, in those cases I think it is bad.

Code generation might cause you some grief if you like to mix behaviour into your classes. An equally productive alternative might be attributes/annotations and runtime reflection.

Compilers are code generators, so they are not inherently bad unless you only like to program in raw machine code.
I believe however that code generators should always completely encapsulate the generated code. I.e. you should never have to modify the generated code by hand, any change should be done by modifying the input to the generator and regenerate the code.

If its a mainframe cobol code generator that Fran Tarkenton is trying to sell you then absolutely yes!

I've written a few code generators before - and to be honest they saved my butt more than once!
Once you have a clearly defined object - collection - user control design, you can use a code generator to build the basics for you, allowing your time as a developer to be used more effectively in building the complex stuff, after all, who really wants to write 300+ public property declarations and variable instatiations? I'd rather get stuck into the business logic than all the mindless repetitive tasks.

The mistake many people make when using code generation is to edit the generated code. If you keep in mind that if you feel like you need to edit the code, you actually need to be editing the code generation tool it's a boon to productivity. If you are constantly fighting the code that gets generated it's going to end up costing productivity.
The best code generators I've found are those that allow you to edit the templates that generate the code. I really like Codesmith for this reason, because it's template-based and the templates are easily editable. When you find there is a deficiency in the code that gets generated, you just edit the template and regenerate your code and you are forever good after that.
The other thing that I've found is that a lot of code generators aren't super easy to use with a source control system. The way we've gotten around this is to check in the templates rather than the code and the only thing we check into source control that is generated is a compiled version of the generated code (DLL files, mostly). This saves you a lot of grief because you only have to check in a few DLLs rather than possibly hundreds of generated files.

Our current project makes heavy use of a code generator. That means I've seen both the "obvious" benefits of generating code for the first time - no coder error, no typos, better adherence to a standard coding style - and, after a few months in maintenance mode, the unexpected downsides. Our code generator did, indeed, improve our codebase quality initially. We made sure that it was fully automated and integrated with our automated builds. However, I would say that:
(1) A code generator can be a crutch. We have several massive, ugly blobs of tough-to-maintain code in our system now, because at one point in the past it was easier to add twenty new classes to our code generation XML file, than it was to do proper analysis and class refactoring.
(2) Exceptions to the rule kill you. We use the code generator to create several hundred Screen and Business Object classes. Initially, we enforced a standard on what methods could appear in a class, but like all standards, we started making exceptions. Now, our code generation XML file is a massive monster, filled with special-case snippets of Java code that are inserted into select classes. It's nearly impossible to parse or understand.
(3) Since so much of our code is generated, using values from a database, it's proven difficult for developers to maintain a consistent code base on their individual workstations (since there can be multiple versions of the database). Debugging and tracing through the software is a lot harder, and newbies to the team take much longer to figure out the "flow" of the code, because of the extra abstraction and implicit relationships between classes. IDE's cannot pick up relationships between two classes that communicate via a code-generated class.
That's probably enough for now. I think Code Generators are great as part of a developer's individual toolkit; a set of scripts that write out your boilerplate code make starting a project a lot easier. But Code Generators do not make maintenance problems go away.

In certain (not many) cases they are useful. Such as if you want to generate classes based on lookup-type data in the database tables.

Code generation is bad when it makes programming more difficult (IE, poorly generated code, or a maintenance nightmare), but they are good when they make programming more efficient.
They probably don't always generate optimal code, but depending on your need, you might decide that developer manhours saved make up for a few minor issues.
All that said, my biggest gripe with ORM code generators is that maintenance the generated code can be a PITA if the schema changes.

Code generators are not bad, but sometimes they are used in situations when another solution exists (ie, instantiating a million objects when an array of objects would have been more suitable and accomplished in a few lines of code).
The other situation is when they are used incorrectly, or coded badly. Too many people swear off code generators because they've had bad experiences due to bugs, or their misunderstanding of how to correctly configure it.
But in and of themselves, code generators are not bad.
-Adam

They are like any other tool. Some give beter results than others, but it is up to the user to know when to use them or not. A hammer is a terrible tool if you are trying to screw in a screw.

This is one of those highly contentious issues. Personally, I think code generators are really bad due to the unoptimized crap code most of them put out.
However, the question is really one that only you can answer. In a lot of organizations, development time is more important than project execution speed or even maintainability.

We use code generators for generating data entity classes, database objects (like triggers, stored procs), service proxies etc. Anywhere you see lot of repititive code following a pattern and lot of manual work involved, code generators can help. But, you should not use it too much to the extend that maintainability is a pain. Some issues also arise if you want to regenerate them.
Tools like Visual Studio, Codesmith have their own templates for most of the common tasks and make this process easier. But, it is easy to roll out on your own.

It can really become an issue with maintainability when you have to come back and cant understand what is going on in the code. Therefore many times you have to weigh how important it is to get the project done fast compared to easy maintainability
maintainability <> easy or fast coding process

I use My Generation with Entity Spaces and I don't have any issues with it. If I have a schema change I just regenerate the classes and it all works out just fine.

They serve as a crutch that can disable your ability to maintain the program long-term.

The first C++ compilers were code generators that spit out C code (CFront).
I'm not sure if this is an argument for or against code generators.

I think that Mitchel has hit it on the head.
Code generation has its place. There are some circumstances where it's more effective to have the computer do the work for you!
It can give you the freedom to change your mind about the implementation of a particular component when the time cost of making the code changes is small. Of course, it is still probably important to understand the output the code generator, but not always.
We had an example on a project we just finished where a number of C++ apps needed to communicate with a C# app over named pipes. It was better for us to use small, simple, files that defined the messages and have all the classes and code generated for each side of the transaction. When a programmer was working on problem X, the last thing they needed was to worry about the implentation details of the messages and the inevitable cache hit that would entail.

This is a workflow question. ASP.NET is a code generator. The XAML parsing engine actually generates C# before it gets converted to MSIL. When a code generator becomes an external product like CodeSmith that is isolated from your development workflow, special care must be taken to keep your project in sync. For example, if the generated code is ORM output, and you make a change to the database schema, you will either have to either completely abandon the code generator or else take advantage of C#'s capacity to work with partial classes (which let you add members and functionality to an existing class without inheriting it).
I personally dislike the isolated / Alt-Tab nature of generator workflows; if the code generator is not part of my IDE then I feel like it's a kludge. Some code generators, such as Entity Spaces 2009 (not yet released), are more integrated than previous generations of generators.
I think the panacea to the purpose of code generators can be enjoyed in precompilation routines. C# and other .NET languages lack this, although ASP.NET enjoys it and that's why, say, SubSonic works so well for ASP.NET but not much else. SubSonic generates C# code at build-time just before the normal ASP.NET compilation kicks in.
Ask your tools vendor (i.e. Microsoft) to support pre-build routines more thoroughly, so that code generators can be integrated into the workflow of your solutions using metadata, rather than manually managed as externally outputted code files that have to be maintained in isolation.
Jon

The best application of a code generator is when the entire project is a model, and all the project's source code is generated from that model. I am not talking UML and related crap. In this case, the project model also contains custom code.
Then the only thing developers have to care about is the model. A simple architectural change may result in instant modification of thousands of source code lines. But everything remains in sync.
This is IMHO the best approach. Sound utopic? At least I know it's not ;) The near future will tell.

In a recent project we built our own code generator. We generated all the data base stuff, and all the base code for our view and view controller classes. Although the generator took several months to build (mostly because this was the first time we had done this, and we had a couple of false starts) it paid for itself the first time we ran it and generated the basic framework for the whole app in about ten minutes.
This was all in Java, but Ruby makes an excellent code-writing language particularly for small, one-off type projects.
The best thing was the consistency of the code and the project organization. In addition you kind of have to think the basic framework out ahead of time, which is always good.

Code generators are great assuming it is a good code generator. Especially working c++/java which is very verbose.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas