Filtering code elements when analyzing source code - code-analysis

Currently I am making a survey about source code analysis and the thing that puzzles me greatly is what is it that project managers and developers would like to filter when analyzing source code (especially when applying OOP metrics - e.g. skpping insignificant methods and classes during analysis or filtering context-based elements according to the type of project). If you have any suggestions based on yout experience with code analysis I will greatly appreciate if you can share some ideas about filtering of elements.
Thanks, Martin

You might want to filter out third-party code and libraries since that's non of your concern.
You might also want to filter out code that is not of interest for a particular form of analysis. This might be useful if you have a huge code base and do not want to re-run the analysis on all of it, some of which has not been modified in decades.

Related

Standard library in ABAP

Is there something similar to a standard library for modern ABAP (maybe even OO-Abap)? For example a curated list of objects that address some of the most common programming tasks like
high-level data structures (not just plain internal tables)
working with file paths and directories
working with files (reading, saving, ...)
working with different file types (text, csv, xml, ...)
regular expressions
working with the environment (client, application server)
...
My current workflow is to stumble upon a problem like getting the extension of a file from a filename (or something fairly similar and easy). Then I have three options:
Dig through a ton of (mostly old and lacking) posts on SDN until i maybe find a pointer to solve the problem
Hack away and create a one-off solution to the problem
Take my time and implement a good and well documented solution
Many times I feel a bit lost. A lot of the available information is old, bad or both. Is there a more structured approach to tackle the problem of finding a suitable abstraction in the ABAP-world?
To answer your first question no, unlike C, C#, Java, there is no need to include a library since all the functions are always available to you, so in that regards it might be simpler. What you are asking though is a great question, I am sure you probably see tons of queries in SDN for "Is there a Function module for?" etc.
There isn't an easy answer but In SAP ABAP I think the easiest way might be to find this is by looking at packages. Similar to a library by looking at a package for the type of function your looking for might get you there. For example if I am looking for handling files I might look for the control framework package and there I can see all the available functions/classes/methods/BAPIs etc. that are related to front end controls/file handling etc. and might be able to find what I am looking for. Note its not perfect as the way packages are used has changed from time to time so its actually better for finding functions related to for example purchasing or sales etc. but its one way that we use.
Like other languages in that we still need to know what library to link in to get the function you need, in ABAP you just have to find the related package. Hope it helps a little, I know its not perfect. Example package for front end controls
For working with the environment:
If you have access to SAP there is a transaction code called BAPI
Here you could find a hierarchical list of all the main objects in SAP (i.e. Material, Purchase Order, etc)
In this list you could find documentation, the function modules used for the object (i.e. créate/get detail/update etc)
And digging into the function modules could take a look to the structures, receiving parameters, etc
The other questions are a little bit complex, I am not aware of any comprehensive list but digging into SCN usually is easy to find a solution for the most common things like handling files, etc
In the particular case of regular expressions, SAP native language, ABAP, has keywords for handling them, but you also have a class in SAP called CL_JAVA_SCRIPT which you could use for doing thigs in "JavaScript way"
For example I used this class in the past to evaluate a simple formula provided in a string (i.e 3 + 2 * 5 )
This is an operation really complex to do in ABAP but easy to do in JS.
Hope it helps
SAP Reuse Library (SE83) is the most close thing to what you are looking for. It provides common development tasks grouped in hierarchy (UI controls, standard dialogs, confirmation prompts) and contains code snippets for each with commonly used classes/modules:
Though, it is incomplete and lacks many popular things.
Consider also DWDM, BIBS, LIBS transactions and other packages in this link.

Are code generators bad?

I use MyGeneration along with nHibernate to create the basic POCO objects and XML mapping files. I have heard some people say they think code generators are not a good idea. What is the current best thinking? Is it just that code generation is bad when it generates thousands of lines of not understandable code?
Code generated by a code-generator should not (as a generalisation) be used in a situation where it is subsequently edited by human intervention. Some systems such the wizards on various incarnations of Visual C++ generated code that the programmer was then expected to edit by hand. This was not popular as it required developers to pick apart the generated code, understand it and make modifications. It also meant that the generation process was one shot.
Generated code should live in separate files from other code in the system and only be generated from the generator. The generated code code should be clearly marked as such to indicate that people shouldn't modify it. I have had occasion to do quite a few code-generation systems of one sort or another and All of the code so generated has something like this in the preamble:
-- =============================================================
-- === Foobar Module ===========================================
-- =============================================================
--
-- === THIS IS GENERATED CODE. DO NOT EDIT. ===
--
-- =============================================================
Code Generation in Action is quite a good book on the subject.
Code generators are great, bad code is bad.
Most of the other responses on this page are along the lines of "No, because often the generated code is not very good."
This is a poor answer because:
1) Generators are tool like anything else - if you misuse them, dont blame the tool.
2) Developers tend to pride themselves on their ability to write great code one time, but you dont use code generators for one off projects.
We use a Code Generation system for persistence in all our Java projects and have thousands of generated classes in production.
As a manager I love them because:
1) Reliability: There are no significant remaining bugs in that code. It has been so exhaustively tested and refined over the years than when debugging I never worry about the persistence layer.
2) Standardisation: Every developers code is identical in this respect so there is much less for a guy to learn when picking up a new project from a coworker.
3) Evolution: If we find a better way to do things we can update the templates and update 1000's of classes quickly and consistently.
4) Revolution: If we switch to a different persistence system in the future then the fact that every single persistent class has an exactly identical API makes my job far easier.
5) Productivity: It is just a few clicks to build a persistent object system from metadata - this saves thousands of boring developer hours.
Code generation is like using a compiler - on an individual case basis you might be able to write better optimised assembly language, but over large numbers of projects you would rather have the compiler do it for you right?
We employ a simple trick to ensure that classes can always be regenerated without losing customisations: every generated class is abstract. Then the developer extends it with a concrete class, adds the custom business logic and overrides any base class methods he wants to differ from the standard. If there is a change in metadata he can regenerate the abstract class at any time, and if the new model breaks his concrete class the compiler will let him know.
The biggest problem I've had with code generators is during maintenance. If you modify the generated code and then make a change to your schema or template and try to regenerate you can have problems.
One problem is if the tool doesn't allow you to protect changes you've made to the modified code then your changes will be overwritten.
Another problem I've seen, particularly with code generators in RSA for web services, if you change the generated code too much the generator will complain that there is a mismatch and refuse to regenerate the code. This can happen for something as simple as changing the type of a variable. Then you are stuck generating the code to a different project and merging the results back into your original code.
Code generators can be a boon for productivity, but there are a few things to look for:
Let you work the way you want to work.
If you have to bend your non-generated code to fit around the generated code, then you should probably choose a different approach.
Run as part of your regular build.
The output should be generated to an intermediates directory, and not be checked in to source control. The input must be checked in to source control, however.
No install
Ideally, you check the tool in to source control, too. Making people install things when preparing a new build machine is bad news. For example, if you branch, you want to be able to version the tools with the code.
If you must, make a single script that will take a clean machine with a copy of the source tree, and configure the machine as required. Fully automated, please.
No editing output
You shouldn't have to edit the output. If the output isn't useful enough as-is, then the tool isn't working for you.
Also, the output should clearly state that it is a generated file & should not be edited.
Readable output
The output should be written & formatted well. You want to be able to open the output & read it without a lot of trouble.
#line
Many languages support something like a #line directive, which lets you map the contents of the output back to the input, for example when producing compiler error messages or when stepping in the debugger. This can be useful, but it can also be annoying unless done really well, so it's not a requirement.
My stance is that code generators are not bad, but MANY uses of them are.
If you are using a code generator for time savings that writes good code, then great, but often times it is not optimized, or adds a lot of overhead, in those cases I think it is bad.
Code generation might cause you some grief if you like to mix behaviour into your classes. An equally productive alternative might be attributes/annotations and runtime reflection.
Compilers are code generators, so they are not inherently bad unless you only like to program in raw machine code.
I believe however that code generators should always completely encapsulate the generated code. I.e. you should never have to modify the generated code by hand, any change should be done by modifying the input to the generator and regenerate the code.
If its a mainframe cobol code generator that Fran Tarkenton is trying to sell you then absolutely yes!
I've written a few code generators before - and to be honest they saved my butt more than once!
Once you have a clearly defined object - collection - user control design, you can use a code generator to build the basics for you, allowing your time as a developer to be used more effectively in building the complex stuff, after all, who really wants to write 300+ public property declarations and variable instatiations? I'd rather get stuck into the business logic than all the mindless repetitive tasks.
The mistake many people make when using code generation is to edit the generated code. If you keep in mind that if you feel like you need to edit the code, you actually need to be editing the code generation tool it's a boon to productivity. If you are constantly fighting the code that gets generated it's going to end up costing productivity.
The best code generators I've found are those that allow you to edit the templates that generate the code. I really like Codesmith for this reason, because it's template-based and the templates are easily editable. When you find there is a deficiency in the code that gets generated, you just edit the template and regenerate your code and you are forever good after that.
The other thing that I've found is that a lot of code generators aren't super easy to use with a source control system. The way we've gotten around this is to check in the templates rather than the code and the only thing we check into source control that is generated is a compiled version of the generated code (DLL files, mostly). This saves you a lot of grief because you only have to check in a few DLLs rather than possibly hundreds of generated files.
Our current project makes heavy use of a code generator. That means I've seen both the "obvious" benefits of generating code for the first time - no coder error, no typos, better adherence to a standard coding style - and, after a few months in maintenance mode, the unexpected downsides. Our code generator did, indeed, improve our codebase quality initially. We made sure that it was fully automated and integrated with our automated builds. However, I would say that:
(1) A code generator can be a crutch. We have several massive, ugly blobs of tough-to-maintain code in our system now, because at one point in the past it was easier to add twenty new classes to our code generation XML file, than it was to do proper analysis and class refactoring.
(2) Exceptions to the rule kill you. We use the code generator to create several hundred Screen and Business Object classes. Initially, we enforced a standard on what methods could appear in a class, but like all standards, we started making exceptions. Now, our code generation XML file is a massive monster, filled with special-case snippets of Java code that are inserted into select classes. It's nearly impossible to parse or understand.
(3) Since so much of our code is generated, using values from a database, it's proven difficult for developers to maintain a consistent code base on their individual workstations (since there can be multiple versions of the database). Debugging and tracing through the software is a lot harder, and newbies to the team take much longer to figure out the "flow" of the code, because of the extra abstraction and implicit relationships between classes. IDE's cannot pick up relationships between two classes that communicate via a code-generated class.
That's probably enough for now. I think Code Generators are great as part of a developer's individual toolkit; a set of scripts that write out your boilerplate code make starting a project a lot easier. But Code Generators do not make maintenance problems go away.
In certain (not many) cases they are useful. Such as if you want to generate classes based on lookup-type data in the database tables.
Code generation is bad when it makes programming more difficult (IE, poorly generated code, or a maintenance nightmare), but they are good when they make programming more efficient.
They probably don't always generate optimal code, but depending on your need, you might decide that developer manhours saved make up for a few minor issues.
All that said, my biggest gripe with ORM code generators is that maintenance the generated code can be a PITA if the schema changes.
Code generators are not bad, but sometimes they are used in situations when another solution exists (ie, instantiating a million objects when an array of objects would have been more suitable and accomplished in a few lines of code).
The other situation is when they are used incorrectly, or coded badly. Too many people swear off code generators because they've had bad experiences due to bugs, or their misunderstanding of how to correctly configure it.
But in and of themselves, code generators are not bad.
-Adam
They are like any other tool. Some give beter results than others, but it is up to the user to know when to use them or not. A hammer is a terrible tool if you are trying to screw in a screw.
This is one of those highly contentious issues. Personally, I think code generators are really bad due to the unoptimized crap code most of them put out.
However, the question is really one that only you can answer. In a lot of organizations, development time is more important than project execution speed or even maintainability.
We use code generators for generating data entity classes, database objects (like triggers, stored procs), service proxies etc. Anywhere you see lot of repititive code following a pattern and lot of manual work involved, code generators can help. But, you should not use it too much to the extend that maintainability is a pain. Some issues also arise if you want to regenerate them.
Tools like Visual Studio, Codesmith have their own templates for most of the common tasks and make this process easier. But, it is easy to roll out on your own.
It can really become an issue with maintainability when you have to come back and cant understand what is going on in the code. Therefore many times you have to weigh how important it is to get the project done fast compared to easy maintainability
maintainability <> easy or fast coding process
I use My Generation with Entity Spaces and I don't have any issues with it. If I have a schema change I just regenerate the classes and it all works out just fine.
They serve as a crutch that can disable your ability to maintain the program long-term.
The first C++ compilers were code generators that spit out C code (CFront).
I'm not sure if this is an argument for or against code generators.
I think that Mitchel has hit it on the head.
Code generation has its place. There are some circumstances where it's more effective to have the computer do the work for you!
It can give you the freedom to change your mind about the implementation of a particular component when the time cost of making the code changes is small. Of course, it is still probably important to understand the output the code generator, but not always.
We had an example on a project we just finished where a number of C++ apps needed to communicate with a C# app over named pipes. It was better for us to use small, simple, files that defined the messages and have all the classes and code generated for each side of the transaction. When a programmer was working on problem X, the last thing they needed was to worry about the implentation details of the messages and the inevitable cache hit that would entail.
This is a workflow question. ASP.NET is a code generator. The XAML parsing engine actually generates C# before it gets converted to MSIL. When a code generator becomes an external product like CodeSmith that is isolated from your development workflow, special care must be taken to keep your project in sync. For example, if the generated code is ORM output, and you make a change to the database schema, you will either have to either completely abandon the code generator or else take advantage of C#'s capacity to work with partial classes (which let you add members and functionality to an existing class without inheriting it).
I personally dislike the isolated / Alt-Tab nature of generator workflows; if the code generator is not part of my IDE then I feel like it's a kludge. Some code generators, such as Entity Spaces 2009 (not yet released), are more integrated than previous generations of generators.
I think the panacea to the purpose of code generators can be enjoyed in precompilation routines. C# and other .NET languages lack this, although ASP.NET enjoys it and that's why, say, SubSonic works so well for ASP.NET but not much else. SubSonic generates C# code at build-time just before the normal ASP.NET compilation kicks in.
Ask your tools vendor (i.e. Microsoft) to support pre-build routines more thoroughly, so that code generators can be integrated into the workflow of your solutions using metadata, rather than manually managed as externally outputted code files that have to be maintained in isolation.
Jon
The best application of a code generator is when the entire project is a model, and all the project's source code is generated from that model. I am not talking UML and related crap. In this case, the project model also contains custom code.
Then the only thing developers have to care about is the model. A simple architectural change may result in instant modification of thousands of source code lines. But everything remains in sync.
This is IMHO the best approach. Sound utopic? At least I know it's not ;) The near future will tell.
In a recent project we built our own code generator. We generated all the data base stuff, and all the base code for our view and view controller classes. Although the generator took several months to build (mostly because this was the first time we had done this, and we had a couple of false starts) it paid for itself the first time we ran it and generated the basic framework for the whole app in about ten minutes.
This was all in Java, but Ruby makes an excellent code-writing language particularly for small, one-off type projects.
The best thing was the consistency of the code and the project organization. In addition you kind of have to think the basic framework out ahead of time, which is always good.
Code generators are great assuming it is a good code generator. Especially working c++/java which is very verbose.

Understanding code metrics

I recently installed the Eclipse Metrics Plugin and have exported the data for one of our projects.
It's all very good having these nice graphs but I'd really like to understand more in depth what they all mean. The definitions of the metrics only go so far to telling you what it really means.
Does anyone know of any good resources, books, websites, etc, that can help me better understand what all the data means and give an understanding of how to improve the code where necessary?
I'm interested in things like Efferent Coupling, and Cyclomatic Complexity, etc, rather than lines of code or lines per method.
I don't think that code metrics (sometimes referred to as software metrics) provide valuable data in terms of where you can improve.
With code metrics it is sort of nice to see how much code you write in an hour etc., but beyond they tell you nada about the quality of the code written, its documentation and code coverage. They are pretty much a week attempt to measure where you cannot really measure.
Code metrics also discriminate the programmers who solve the harder problems because they obviously managed to code less. Yet they solved the hard issues and a junior programmer whipping out lots of crap code looks good.
Another example for using metrics is the very popular Ohloh. They employ metrics to put a price tag on an opensource project (using number of lines, etc.), which in itself is an attempt which is flawed as hell - as you can imagine.
Having said all that the Wikipedia entry provides some overall insight on the topic, sorry to not answer your question in a more supportive way with a really great website or book, but I bet you got the drift that I am not a huge fan. :)
Something to employ to help you improve would be continuous integration and adhering to some sort of standard when it comes to code, documentation and so on. That is how you can improve. Metrics are just eye candy for meetings - "look we coded that much already".
Update
Ok, well my point being efferent coupling or even cyclomatic complexity can indicate something is wrong - it doesn't have to be wrong though. It can be an indicator to refactor a class but there is no rule of thumb that tells you when.
IMHO a rule such as 500+ lines of code, refactor or the DRY principal are more applicable in most cases. Sometimes it's as simple as that.
I give you that much that since cyclomatic complexity is graphed into a flow chart, it can be an eye opener. But again, use carefully.
In my opinion metrics are an excellent way to find pain points in your codebase. They are very useful also to show your manager why you should spend time improving it.
This is a post I wrote about it: http://blog.jorgef.net/2011/12/metrics-in-brownfield-applications.html
I hope it helps

Time estimate for ABAP development

I'm looking for a table or list of standard time estimations for developments in ABAP, something customizable in some variables according to the development team, complexity of project, etc...
Something similar to:
Simple Module Pool -> 10 hours
Complex Module Pool -> 30 hours
Definition of Dictionary -> (0,4 * number_of_tables * average_fields ) hours
ALV Report -> (2 * number_of_parameters) hours
I've searched but haven't found anything yet. I found AboveSoft Adaptive Estimator, what looks like a software tool to do what I need, but I prefer something... manual, an official or standard table.
Do you know anything like that?
Thank you in advance.
Updated, as requested in comments by Rob S., to provide more information for future similar questions:
What I'm looking for is a bunch of formulas, any metric system that can be applicable to (or even created for) time estimations on SAP development.
I'm looking for a technic/tool/method to estimate SAP work, duration, cost, something similar to COCOMO II, FP, ESTIMACS or SLIM for SAP development.
If I am reading this right, you are looking for a something to estimate how long it would take someone to program an application. I would doubt an official table actually exists.
Development time is highly variable. Programmer experience, complexity of requirements, clarity of requirements, and dozens of other factors affect how much time development takes. So even if an official table exists, it may not be accurate.
the formulas you made up for illustration purposes in your question are as good as any others - in other words, you are asking for something that is pointless.
the reason is that no formula can account for the truly important variables:
your team
your customer
your environment
your standards and best practices
all of which will have a much larger drag coefficient than any other terms
if you want accurate estimates, ask your developers, and track their accuracy
if you truly think that this sort of thing can be reduced to formulas, please resign as a project manager immediately
I'm not a project manager, I'm only an internship into a SAP team. Due to my experience in other languages I DO know that there are so many variables that it's impossible to automate a estimation of development time.
But I've received the work of search for a "standard table of estimated times" for SAP/ABAP developments and, being a newbie in SAP, I imagined that will could exist any metric standarization.
I think i've suffer a rough joke from my manager...
Sorry for the inconvenience of my question.
You can use Excel, Numbers, Gant charts, to do it manualy but you won't be able to find ANY automated thing for that, you'll have to do it yourself!
Let me guess... you're a project manager?
There is no "one way" in programming, especially not in the highly specialized world of ABAP.
HI, I understand your need...
I'm project manager and estimation specialist, and what you're looking for is a table for estimate effort, por develop ABAP components...
You need a tabulate table, where based on complexity of the component and complexity of the change, yo can get an estimate effort. (this is based in one estimation method called Object Points http://yunus.hacettepe.edu.tr/~sencer/objectp.html)
This effor is only for Codding & Unit Testing, you as project manager (or your project manager!) must to take this estimation as input, but you need to estimate another project factors to get the complete "project estimation"...
I didn't find this table, or some estandar table for bencharmak, so I'm working in a project withing my software factory to build our own table...
I hope this will be helpful for you..
Regards,
this simply doesn't exist. General metrics are so general that are useless. Project Managers should find other ways to make their life easier, like resign, but not try to quantify developments as if peeling potatoes.
Try AboveSoft's new tool, named AboveSoft Predictor. You can download it here www.abovesoft.com
It connects to an SAP system and you can easily (graphically) generate your own estimation templates which are saved in SAP.
Dave.

How to convince my co-workers not to use datasets for enterprise development (.NET 2.0+)

Everyone I work with is obsessed with the data-centric approach to enterprise development and hates the idea of using custom collections/objects. What is the best way to convince them otherwise?
Do it by example and tread lightly. Anything stronger will just alienate you from the rest of the team.
Remember to consider the possibility that they're onto something you've missed. Being part of a team means taking turns learning & teaching.
No single person has all the answers.
If you are working on legacy code (e.g., apps ported from .NET 1.x to 2.0 or 3.5) then it would be a bad idea to depart from datasets. Why change something that already works?
If you are, however, creating a new apps, there a few things that you can cite:
Appeal to experiencing pain in maintaining apps that stick with DataSets
Cite performance benefits for your new approach
Bait them with a good middle-ground. Move to .NET 3.5, and promote LINQ to SQL, for instance: while still sticking to data-driven architecture, is a huge, huge departure to string-indexed data sets, and enforces... voila! Custom collections -- in a manner that is hidden from them.
What is important is that whatever approach you use you remain consistent, and you are completely honest with the pros and cons of your approaches.
If all else fails (e.g., you have a development team that utterly refuses to budge from old practices and is skeptical of learning new things), this is a very, very clear sign that you've outgrown your team it's time to leave your company!
Remember to consider the possibility that they're onto something you've missed. Being part of a team means taking turns learning & teaching.
Seconded. The whole idea that "enterprise development" is somehow distinct from (and usually the implication is 'more important than') normal development really irks me.
If there really is a benefit for using some technology, then you'll need to come up with a considered list of all the pros and cons that would occur if you switched.
Present this list to your co workers along with explanations and examples for each one.
You have to be realistic when creating this list. You can't just say "Saves us lots of time!!! WIN!!" without addressing the fact that sometimes it is going to take MORE time, will require X months to come up to speed on the new tech, etc. You have to show concrete examples where it will save time, and exactly how.
Likewise you can't just skirt over the cons as if they don't matter, your co-workers will call you on it.
If you don't do these things, or come across as just pushing what you personally like, nobody is going to take you seriously, and you'll just get a reputation for being the guy who's full of enthusiasm and energy but has no idea about anything.
BTW. Look out for this particular con. It will trump everything, unless you have a lot of strong cases for all your other stuff:
Requires 12+ months work porting our existing code. You lose.
Of course, "it depends" on the situation. Sometimes DataSets or DataTables are more suited, like if it really is pretty light business logic, flat hierarchy of entities/records, or featuring some versioning capabilities.
Custom object collections shine when you want to implement a deep hierarchy/graph of objects that cannot be efficiently represented in flat 2D tables. What you can demonstrate is a large graph of objects and getting certain events to propagate down the correct branches without invoking inappropriate objects in other branches. That way it is not necessary to loop or Select through each and every DataTable just to get the child records.
For example, in a project I got involved in two and half years ago, there was a UI module that is supposed to display questions and answer controls in a single WinForms DataGrid (to be more specific, it was Infragistics' UltraGrid). Some more tricky requirements
The answer control for a question can be anything - text box, check box options, radio button options, drop-down lists, or even to pop up a custom dialog box that may pull more data from a web service.
Depending on what the user answered, it can trigger more sub-questions to appear directly under the parent question. If a different answer is given later, it should expose another set of sub-questions (if any) related to that answer.
The original implementation was written entirely in DataSets, DataTables, and arrays. The amount of looping through the hundreds of rows for multiple tables was purely mind-bending. It did not help the programmer came from a C++ background attempting to ref everything (hello, objects living in the heap use reference variables, like pointers!). Nobody, not even the originally programmer, could explain why the code is doing what it does. I came into the scene more than six months after this, and it was stil flooded with bugs. No wonder the 2nd-generation developer I took over from decided to quit.
Two months of tying to fix the chaotic mess, I took it upon myself to redesign the entire module into an object-oriented graph to solve this problem. yeap, complete with abstract classes (to render different answer control on a grid cell depending on question type), delegates and eventing. The end result was a 2D dataGrid binded to a deep hierarchy of questions, naturally sorted according to the parent-child arrangement. When a parent question's answer changed, it would raise an event to the children questions and they would automatically show/hide their rows in the grid according to the parent's answer. Only question objects down that path were affected. The UI responsiveness of this solution compared to the old method was by orders of magnitude.
Ironically, I wanted to post a question that was the exact opposite of this. Most of the programmers I've worked with have gone with the custom data objects/collections approach. It breaks my heart to watch someone with their SQL Server table definition open on one monitor, slowly typing up a matching row-wrapper class in Visual Studio in another monitor (complete with private properties and getters-setters for each column). It's especially painful if they're also prone to creating 60-column tables. I know there are ORM systems that can build these classes automagically, but I've seen the manual approach used much more frequently.
Engineering choices always involve trade-offs between the pros and cons of the available options. The DataSet-centric approach has its advantages (db-table-like in-memory representation of actual db data, classes written by people who know what they're doing, familiar to large pool of developers etc.), as do custom data objects (compile-type checking, users don't need to learn SQL etc.). If everyone else at your company is going the DataSet route, it's at least technically possible that DataSets are the best choice for what they're doing.
Datasets/tables aren't so bad are they?
Best advise I can give is to use it as much as you can in your own code, and hopefully through peer reviews and bugfixes, the other developers will see how code becomes more readable. (make sure to push the point when these occurrences happen).
Ultimately if the code works, then the rest is semantics is my view.
I guess you can trying selling the idea of O/R mapping and mapper tools. The benefit of treating rows as objects is pretty powerful.
I think you should focus on the performance. If you can create an application that shows the performance difference when using DataSets vs Custom Entities. Also, try to show them Domain Driven Design principles and how it fits with entity frameworks.
Don't make it a religion or faith discussion. Those are hard to win (and is not what you want anyway)
Don't frame it the way you just did in your question. The issue is not getting anyone to agree that this way or that way is the general way they should work. You should talk about how each one needs to think in order to make the right choice at any given time. give an example for when to use dataSet, and when not to.
I had developers using dataTables to store data they fetched from the database and then have business logic code using that dataTable... And I showed them how I reduced the time to load a page from taking 7 seconds of 100% CPU (on the web server) to not being able to see the CPU line move at all.. by changing the memory object from dataTable to Hash table.
So take an example or case that you thing is better implemented differently, and win that battle. Don't fight the a high level war...
If Interoperability is/will be a concern down the line, DataSet is definitely not the right direction to go in. You CAN expose DataSets/DataTables over a service but whether you SHOULD or is debatable. If you are talking .NET->.NET you're probably Ok, otherwise you are going to have a very unhappy client developer from the other side of the fence consuming your service
You can't convince them otherwise. Pick a smaller challenge or move to a different organization. If your manager respects you see if you can do a project in the domain-driven style as a sort of technology trial.
If you can profile, just Do it and profile. Datasets are heavier then a simple Collection<T>
DataReaders are faster then using Adapters...
Changing behavior in an objects is much easier than massaging a dataset
Anyway: Just Do It, ask for forgiveness not permission.
Most programmers don't like to stray out of their comfort zones (note that the intersection of the 'most programmers' set and the 'Stack Overflow' set is the probably the empty set). "If it worked before (or even just worked) then keep on doing it". The project I'm currently on required a lot of argument to get the older programmers to use XML/schemas/data sets instead of just CSV files (the previous version of the software used CSV's). It's not perfect, the schemas aren't robust enough at validating the data. But it's a step in the right direction. The code I develop uses OO abstractions on the data sets rather than passing data set objects around. Generally, it's best to teach by example, one small step at a time.
There is already some very good advice here but you'll still have a job to convince your colleagues if all you have to back you up is a few supportive comments on stackoverflow.
And, if they are as sceptical as they sound, you are going to need more ammo.
First, get a copy of Martin Fowler's "Patterns of Enterprise Architecture" which contains a detailed analysis of a variety of data access techniques.
Read it.
Then force them all to read it.
Job done.
data-centric means less code-complexity.
custom objects means potentially hundreds of additional objects to organize, maintain, and generally live with. It's also going to be a bit faster.
I think it's really a code-complexity vs performance question, which can be answered by the needs of your app.
Start small. Is there a utility app you can use to illustrate your point?
For instance, at a place where I worked, the main application had a complicated build process, involving changing config files, installing a service, etc.
So I wrote an app to automate the build process. It had a rudimentary WinForms UI. But since we were moving towards WPF, I changed it to a WPF UI, while keeping the WinForms UI as well, thanks to Model-View-Presenter. For those who weren't familiar with Model-View-Presenter, it was an easily-comprehensible example they could refer to.
Similarly, find something small where you can show them what a non-DataSet app would look like without having to make a major development investment.