How to quickly analyse the impact of a program change? - code-analysis

Lately I need to do an impact analysis on changing a DB column definition of a widely used table (like PRODUCT, USER, etc). I find it is a very time consuming, boring and difficult task. I would like to ask if there is any known methodology to do so?
The question also apply to changes on application, file system, search engine, etc. At first, I thought this kind of functional relationship should be pre-documented or some how keep tracked, but then I realize that everything can have changes, it would be impossible to do so.
I don't even know what should be tagged to this question, please help.
Sorry for my poor English.

Sure. One can technically at least know what code touches the DB column (reads or writes it), by determining program slices.
Methodology: Find all SQL code elements in your sources. Determine which ones touch the column in question. (Careful: SELECT ALL may touch your column, so you need to know the schema). Determine which variables read or write that column. Follow those variables wherever they go, and determine the code and variables they affect; follow all those variables too. (This amounts to computing a forward slice). Likewise, find the sources of the variables used to fill the column; follow them back to their code and sources, and follow those variables too. (This amounts to computing a backward slice).
All the elements of the slice are potentially affecting/affected by a change. There may be conditions in the slice-selected code that are clearly outside the conditions expected by your new use case, and you can eliminate that code from consideration. Everything else in the slices you may have inspect/modify to make your change.
Now, your change may affect some other code (e.g., a new place to use the DB column, or combine the value from the DB column with some other value). You'll want to inspect up and downstream slices on the code you change too.
You can apply this process for any change you might make to the code base, not just DB columns.
Manually this is not easy to do in a big code base, and it certainly isn't quick. There is some automation to do for C and C++ code, but not much for other languages.
You can get a bad approximation by running test cases that involve you desired variable or action, and inspecting the test coverage. (Your approximation gets better if you run test cases you are sure does NOT cover your desired variable or action, and eliminating all the code it covers).

Eventually this task cannot be automated or reduced to an algorithm, otherwise there would be a tool to preview refactored changes. The better you wrote code in the beginning, the easier the task.
Let me explain how to reach the answer: isolation is the key. Mapping everything to object properties can help you automate your review.
I can give you an example. If you can manage to map your specific case to the below, it will save your life.
The OR/M change pattern
Like Hibernate or Entity Framework...
A change to a database column may be simply previewed by analysing what code uses a certain object's property. Since all DB columns are mapped to object properties, and assuming no code uses pure SQL, you are good to go for your estimations
This is a very simple pattern for change management.
In order to reduce a file system/network or data file issue to the above pattern you need other software patterns implemented. I mean, if you can reduce a complex scenario to a change in your objects' properties, you can leverage your IDE to detect the changes for you, including code that needs a slight modification to compile or needs to be rewritten at all.
If you want to manage a change in a remote service when you initially write your software, wrap that service in an interface. So you will only have to modify its implementation
If you want to manage a possible change in a data file format (e.g. length of field change in positional format, column reordering), write a service that maps that file to object (like using BeanIO parser)
If you want to manage a possible change in file system paths, design your application to use more runtime variables
If you want to manage a possible change in cryptography algorithms, wrap them in services (e.g. HashService, CryptoService, SignService)
If you do the above, your manual requirements review will be easier. Because the overall task is manual, but can be aided with automated tools. You can try to change the name of a class's property and see its side effects in the compiler
Worst case
Obviously if you need to change the name, type and length of a specific column in a database in a software with plain SQL hardcoded and shattered in multiple places around the code, and worse many tables present similar column namings, plus without project documentation (did I write worst case, right?) of a total of 10000+ classes, you have no other way than manually exploring your project, using find tools but not relying on them.
And if you don't have a test plan, which is the document from which you can hope to originate a software test suite, it will be time to make one.

Just adding my 2 cents. I'm assuming you're working in a production environment so there's got to be some form of unit tests, integration tests and system tests already written.
If yes, then a good way to validate your changes is to run all these tests again and create any new tests which might be necessary.
And to state the obvious, do not integrate your code changes into the main production code base without running these tests.
Yet again changes which worked fine in a test environment may not work in a production environment.
Have some form of source code configuration management system like Subversion, GitHub, CVS etc.
This enables you to roll back your changes

Related

Forking (postgre)SQL database structure

I have been developing a network security application for several years now, as the lead developer at my company. It is a split-architecture design, where one component resides on the customer's network, and the other component in our own cloud. We have developed our own custom versioning system that keeps both sides synchronized at each patch (per customer), but until now it has only allowed incremental changes to be made, and rollbacks are not possible.
We'd like to move to a forkable git-like solution for our code, so that we can develop and test multiple features simultaneously, but the thing that's holding us back from that is our database. We use PostgreSQL (currently 9.3.12), and I've written a custom script to calculate the deltas between the "old" and "new" database structure, each time we "make a patch". It spits out a list of SQL commands necessary to update the "old" database structure to look like the "new", including tables, functions, sequences, triggers, you name it. It's very elegant and pretty much never fails anymore, even with complicated deltas.
However, I realize that in order to have a git-like solution for this (check-out, check-in, merge changes into test and production code, etc.) while also keeping database changes in sync with application code, we'll need to have something a lot more advanced than just "old" vs "new". Note that we don't need to modify database data for the most part, only table structure, which is altered in place on existing customer databases.
So my question is this: Any ideas for a git-like SQL version control system, which allows forking and merging, and can be easily kept in sync with application code changes? Our custom tool is already a bit more advanced than some open-source tools we've looked into (such as sqlt-diff), and tools like Red Gate are a bit out of our price range as a startup (not to mention that I haven't heard anybody mention forking in context with Red Gate). We're open to writing a custom tool, if that's what we need to do, but we're scratching our heads about where to start with something like that. We know how to calculate deltas, but we don't know how to manage all those things across different forks.
Free or open-source tools, frameworks we can adapt, or general guiding principles for building such tools are all appreciated!
One way of solving this problem is with migrations. A couple of lightweight tools, but there are many others:
http://sequel.jeremyevans.net/rdoc/files/doc/migration_rdoc.html
https://flywaydb.org/
Rather than calculating deltas between versions after the fact, migrations can be used to evolve the schema in a controlled way. You can create feature-specific migrations that can be tracked (and forked/merged) along with the rest of your code.
Depending on how fancy you want to get, you may need to extend the default naming/numbering schemes.

Automatic verification of database contents

Background
I have a software component that writes data to a postgres database (into several tables) and I want to write an automatic functional test for this component. I already have a host of unit tests in place that check the subcomponents, but I'd like a test that checks the whole system end-to-end.
For each test run, I use a clean database (actually a completely new, this-test-run-only database). The software component is stable in the sense that given the same input, it will always write the same user data to the database.
The database design is relational, such that most tables contain foreign keys. Obviously, I don't want to check the value of these keys, because I don't want to rely on the fact that these keys are generated in a predictive manner by postgres.
Assume that there are no issues regarding user rights on the database, connection issues etc. Also disregard development/production disparities.
I currently use a number of select statements to produce a textual "dump" of the database and compare it to a reference dump (ignoring whitespace and so on), but this seems rather clumsy. Also, this doesn't take into account the relationships between the tables. Extending the current approach to deal with this doesn't strike me as maintainable at all, should the database layout ever change.
My software as well as the testing framework is written in C++, the testing scripts are simple bash scripts. I'm open to use any language to achieve this.
Question
How can I automatically verify the database contents in "the database way"?
Even better would be an approach that doesn't rely on postgres as the backend.
pgTap is a testing framework for PostgreSQL. You can use it to test both the structure and the content of a PostgreSQL database. I've used it on projects that had to meet certain contractual standards for seeded data (data for "lookup" tables like state codes and abbreviations, delivery carriers, user roles, etc.). It has worked well for that purpose.
But I don't yet see a compelling reason to abandon your current method, which is already written and working. Text dumps of single tables are supported by all current SQL dbms, as far as I know. If you move to a different dbms, you'll have to change the name of the dump program and the arguments to it. I can't imagine why you'd need to change the reference file, but I suppose that could happen.
The "database way" is really just to select the data you expect to be in the database, and see if it's really there. That's pretty much what you're doing now, and what pgTap does with perhaps greater flexibility.
To increase maintainability (to reduce duplication), you could generate the INSERT statements from the reference data, or you could generate the reference data from the INSERT statements. I can imagine development environments where that would be a wise thing to do, but I don't know whether yours is one of them.

How to functionally test an extremely complex system?

I've got a legacy system that processes extremely complex data that's changing every second. The modularity of the system is quite poor so I can't split the business logic into smaller modules to ease functional testing.
The actual test system is: "close your eyes click and pray", which is not acceptable at all. I want to be confident on the changes we commit on the code.
What are the test good practices, the bibles to read, the changes to operate, to increase confidence in such a system.
The question is not about unit testing, the system wasn't designed for that and it takes too much time to decouple, mock and stub all the dependencies and most of all, we sadly don't have the time and budget for that. I don't want to a philosophic debate about functional testing: I want facts that work in real life.
It sounds like you have yourself a black box as regards testing.
http://en.wikipedia.org/wiki/Black-box_testing
To put it simply, it's horrible, but may be all you can do if you can't isolate the system in any way.
You need to insert known data into your system and compare the result with the known output.
You really need known data & output for
normal values - normal data - you'll find out that it can at least seem to do the right thing
erroneous values - spelling errors, invalid values - so you know that it will tell you if the input is rubbish
out of range - -1 on signed integers, values greater than 2.7 billion (assuming 32bit), and so on - so you know it won't crash out on seriously mis-inputted or corrupted data
dangerous - input that would break the SQL, simulate SQL injection
Lastly make sure that all errors are being carefully handled rather than getting logged and the bad/corrupt/null value getting passed on through the system.
Any processes you can isolate and test that way will make debugging easier, as black box testing can't tell you where the error occurred. This means then you need to diagnose the errors based on what happened, more in the style of House MD than a normal debugging session.
Once you have the different data types listed above, you can test all changes in isolation with them, and then in the system as a whole. Over time as you eventually touch most aspect of the system, you'll have test cases for all areas, and be able to say where failure was most likely to have occurred more easily.
Also: make sure you put tracers in your known data so you don't accidentally indicate a stockmarket crash when you're testing the range limits on a module, so you can take it out of the result flow before it ends up on a CEO's desk.
I hope that's some help
http://www.amazon.com/Working-Effectively-Legacy-Michael-Feathers/dp/0131177052 seems to be the book for these situations.

Scripting your database first versus building the database via SQL Server Management Studio and then generating the script

I had a (friendly but heated) argument with my lead developer the other day because our project has TSQL Scripts that I code directly into SQL files which I then run against the database. I find that when I do this, it's easy to work out the schema in advance without fiddly pointing and clicking and then there's no opportunity to forget to generate a script to put into source control as generating the script no longer becomes a chore you have to do after the fact, but is an implicit part of the process (and also leads to cleaner scripts without the extra crap that SQL Server Management Studio inserts into the scripts it generates).
My lead developer insists that having to manually script it out is a pain in the arse and that he absolutely refuses to write his scripts by hand when there are perfectly good tools to do it without coding. I've noticed that the copying of his changes into the actual scripts tends to get delayed a bit as a result though.
What are your thoughts on the pros and/or cons of doing it one way vs the other? Am I being too rigid/old-school in my sticking to hand coding schema scripts or is he being too reliant on third party tools and losing something in the process?
I always script stuff myself because the wizards sometimes don't script things in a way that I like it and will also give funky names to defaults
scripting things yourself is also good in case you get laid off and you have to go for an interview where they ask you to script DDL on the whiteboard
As I usually collaborate with a colleague during the schema design, I tend to design the schema using the GUI tools, as its easier to discuss it with a diagram of the tables in front of you. I then generate the scripts, being careful to select the exact options that I want to avoid having to make manual changes post-export.
I think a decision on the relative merits of the two approaches might take into account factors such as
the frequency of changes to the schema
the frequency with which changes need to be propagated to other schemas (test, user acceptance, production, clients * n, etc)
the degree to which the schema may vary across development branches
how well-known in advance your various changes can be scheduled
whether or not you can generate SQL "diff" scripts between schemas.
On balance, I tend to prefer to work with a script for each change (or "migration"). It lets me resequence change releases as priorities shift.
Just because you can create tables in a graphical tool doesn't necessarily mean you should.
I find its as quick to write a script as it is to use SQLMS. You still have to type names in SQLMS, and the time spent moving from keyboard and mouse could be used writing the proper script anyway.
The two of you are almost working with two sets of code. Consistency seems to be a key factor on these types of decisions. In your case, if you create a script, your boss uses the gui to add a field, how do you stay in sync? You can't use your script to rebuild the table without editing it (Chance for error.).
Maybe he should pull rank and force you to format your scripts the same way the GUI creates them - just kidding.
I think you should flip on it..........

NHibernate - Usefulness

I work in a software and hardware development farm. Today one of my colleagues told me that NHibernate is only useful for small projects, and for complex or large scale projects it must be avoided. Also, it makes code harder to change.
Are those statements true?
Ebay uses Hibernate (the Java version that NHibernate is ported from). I don't consider that a small project.
As far as changing code goes, consider this: Let's assume we need to add a new property to an object.
Here is what has to be done with a hand-rolled data access layer:
Add the column to the db table.
Change every stored procedure that
deals with that object / table.
This is usually several stored
procedures in my experience.
Change the code in the mapping layer
Add the property to the Object
Here is what has to be done with NHibernate:
Add the column to the db table.
Add the property to the HBM file
Add the property to the object.
Have to agree with Daniel Augur on the first point.
On the second, "does it make code harder to change?", I'll provide a general view. Any time you use something ready-rolled you're going to run into restrictions that might not be easier to overcome. Even when the source is available, you may not wish to modify it for fear of deviating to the point of a breaking change.
Part of a software developer's job is determining whether the merits outweigh the drawbacks with 3rd party code.