Is there a way to delete outdated code without manually cross-checking? - vba

I'm working on a VBA file which has become extremely congested with test sub-routines that I no longer need. I have three routines which Call a number of other routines, and everything that isn't called in these routines is of no use.
Is there any way of identifying these routines and deleting them, without manually cross-checking them against my Calls?

Related

Is it possible for a program cannot find the failure by using dynamic testing, but have fault?

Is it possible for a program cannot find the failure by using dynamic testing, but have fault? any simple example?
Please help! thanks.
Yes. Testing can only prove the absence of bugs for what you tested. Dynamic testing cannot cover all possible inputs and outputs in all environments with all dependencies.
First is to simply not test the code in question. This can be verified by checking the coverage of your test. Even if you achieve 100% coverage there can still be flaws.
Next is to not check all possible types and ranges of inputs. For example, if you have a function that scans for a word in a string, you need to check for...
The word at the start of the string.
The word at the end of the string.
The word in the middle of the string.
A string without the word.
The empty string.
These are known as boundary conditions and include things like:
0
Negative numbers
Empty strings
Null
Extremely large values
Decimals
Unicode
Empty files
Extremely large files
If the code in question keeps state, maybe in an object, maybe in global variables, you have to test that state does not become corrupted or interfere with subsequent runs.
If you're doing parallel processing you must test any number of possibilities for deadlocks or corruption resulting from trying to do the same thing at the same time. For example, two processes trying to write to the same file. Or two processes both waiting for a lock on the same resource. Do they lock only what they need? Do they give up their locks ASAP?
Once you test all the ways the code is supposed to work, you have to test all the ways that it can fail, whether it fails gracefully with an exception (instead of garbage), whether an error leaves it in a corrupted state, and so on. How does it handle resource failure, like failing to connect to a database? This becomes particularly important working with databases and files to ensure a failure doesn't leave things partially altered.
For example, if you're transferring money from one account to another you might write:
my $from_balance = get_balance($from);
my $to_balance = get_balance($to);
set_balance($from, $from_balance - $amount);
set_balance($to, $to_balance + $amount);
What happens if the program crashes after the first set_balance? What happens if another process changes either balance between get_balance and set_balance? These sorts of concurrency issues must be thought of and tested.
There's all the different environments the code could run in. Different operating systems. Different compilers. Different dependencies. Different databases. And all with different versions. All these have to be tested.
The test can simply be wrong. It can be a mistake in the test. It can be a mistake in the spec. Generally one tests the same code in different ways to avoid this problem.
The test can be right, the spec can be right, but the feature is wrong. It could be a bad design. It could be a bad idea. You can argue this isn't a "bug", but if the users don't like it, it needs to be fixed.
If your testing makes use of a lot of mocking, your mocks may not reflect how thing thing being mocked actually behaves.
And so on.
For all these flaws, dynamic testing remains the best we've got for testing more than a few dozen lines of code.

How to quickly analyse the impact of a program change?

Lately I need to do an impact analysis on changing a DB column definition of a widely used table (like PRODUCT, USER, etc). I find it is a very time consuming, boring and difficult task. I would like to ask if there is any known methodology to do so?
The question also apply to changes on application, file system, search engine, etc. At first, I thought this kind of functional relationship should be pre-documented or some how keep tracked, but then I realize that everything can have changes, it would be impossible to do so.
I don't even know what should be tagged to this question, please help.
Sorry for my poor English.
Sure. One can technically at least know what code touches the DB column (reads or writes it), by determining program slices.
Methodology: Find all SQL code elements in your sources. Determine which ones touch the column in question. (Careful: SELECT ALL may touch your column, so you need to know the schema). Determine which variables read or write that column. Follow those variables wherever they go, and determine the code and variables they affect; follow all those variables too. (This amounts to computing a forward slice). Likewise, find the sources of the variables used to fill the column; follow them back to their code and sources, and follow those variables too. (This amounts to computing a backward slice).
All the elements of the slice are potentially affecting/affected by a change. There may be conditions in the slice-selected code that are clearly outside the conditions expected by your new use case, and you can eliminate that code from consideration. Everything else in the slices you may have inspect/modify to make your change.
Now, your change may affect some other code (e.g., a new place to use the DB column, or combine the value from the DB column with some other value). You'll want to inspect up and downstream slices on the code you change too.
You can apply this process for any change you might make to the code base, not just DB columns.
Manually this is not easy to do in a big code base, and it certainly isn't quick. There is some automation to do for C and C++ code, but not much for other languages.
You can get a bad approximation by running test cases that involve you desired variable or action, and inspecting the test coverage. (Your approximation gets better if you run test cases you are sure does NOT cover your desired variable or action, and eliminating all the code it covers).
Eventually this task cannot be automated or reduced to an algorithm, otherwise there would be a tool to preview refactored changes. The better you wrote code in the beginning, the easier the task.
Let me explain how to reach the answer: isolation is the key. Mapping everything to object properties can help you automate your review.
I can give you an example. If you can manage to map your specific case to the below, it will save your life.
The OR/M change pattern
Like Hibernate or Entity Framework...
A change to a database column may be simply previewed by analysing what code uses a certain object's property. Since all DB columns are mapped to object properties, and assuming no code uses pure SQL, you are good to go for your estimations
This is a very simple pattern for change management.
In order to reduce a file system/network or data file issue to the above pattern you need other software patterns implemented. I mean, if you can reduce a complex scenario to a change in your objects' properties, you can leverage your IDE to detect the changes for you, including code that needs a slight modification to compile or needs to be rewritten at all.
If you want to manage a change in a remote service when you initially write your software, wrap that service in an interface. So you will only have to modify its implementation
If you want to manage a possible change in a data file format (e.g. length of field change in positional format, column reordering), write a service that maps that file to object (like using BeanIO parser)
If you want to manage a possible change in file system paths, design your application to use more runtime variables
If you want to manage a possible change in cryptography algorithms, wrap them in services (e.g. HashService, CryptoService, SignService)
If you do the above, your manual requirements review will be easier. Because the overall task is manual, but can be aided with automated tools. You can try to change the name of a class's property and see its side effects in the compiler
Worst case
Obviously if you need to change the name, type and length of a specific column in a database in a software with plain SQL hardcoded and shattered in multiple places around the code, and worse many tables present similar column namings, plus without project documentation (did I write worst case, right?) of a total of 10000+ classes, you have no other way than manually exploring your project, using find tools but not relying on them.
And if you don't have a test plan, which is the document from which you can hope to originate a software test suite, it will be time to make one.
Just adding my 2 cents. I'm assuming you're working in a production environment so there's got to be some form of unit tests, integration tests and system tests already written.
If yes, then a good way to validate your changes is to run all these tests again and create any new tests which might be necessary.
And to state the obvious, do not integrate your code changes into the main production code base without running these tests.
Yet again changes which worked fine in a test environment may not work in a production environment.
Have some form of source code configuration management system like Subversion, GitHub, CVS etc.
This enables you to roll back your changes

SQL files management

Most of my day is spent on writing SQL queries to perform small tasks, mainly to get information from the database and manipulate it somehow for data visualization building reports for others.
At the end of the day i try to have a nice folder scheme to help me reusing code and so on, but it's becoming harder to handle so many files and keep
track of everything I've done so far.
Don't want to have huge SQL files because I might want to
the end It's hard to avoid a war zone in my desktop and on this folders. It's also a mess to handle so many folders/codes.
For version control we're using a GIT server, but there is plenty of code that is not in production that we would like to keep track and reuse somehow.
We're using iPython notebook, R studio and SSMS to build our codes, I'm wonder if there is some efficient ways to work.
There must be an efficient way to work out there. What do you use to keep track of your (SQL) codes? and more importantly reuse it.
Thanks in advance,
Rafael
I just use a folder system. And I keep the shell-scripts so to speak as the first file (like the generic code to do X). Whereas the specific codes where I take X and apply dates and other conditions in the bottom half of the folder.
What do you use to keep track of your (SQL) codes? and more importantly reuse it.
For ease of reuse, I have all my running SQL code backed up on an SQL server through routine INFORMATION SCHEMA dumps. For all development code that I need to reuse with others, I have a GIT server that gets automatic updates throughout the day. For reuse on my laptop itself, I have a local backup through time machine.
As for directory or folder structure, all code starts as project based and eventually I migrate the best and most useful code to a personal folder structure that is topic based (date arithmetic, indexing, etc.). No matter how they are stored, all these folders are indexed using local and remote indexing features so I can search and retrieve them with just a few keystrokes when needed. Ultimately what's needed for optimum reuse is ease of retrieval. The quicker I can retrieve, the more reuse I get.
Lastly, it's not just SQL code, but all the supporting documents that led to that code solution. Sometimes this collection may include code from other languages, code from other servers, emails, text documents, images, workflows, etc. Keeping them all together enhances the value of reuse.

Getting a significant amount of data into a SQL Server (Express) database at time of deployment

For most database-backed projects I've worked on, there is a need to get "startup" or test data into the database before deploying the project. Examples of startup data: a table that lists all the countries in the world or a table that lists a bunch of colors that will be used to populate a color palette.
I've been using a system where I store all my startup data in an Excel spreadsheet (with one table per worksheet), then I have a utility script in SQL that (1) creates the database, (2) creates the schemas, (3) creates the tables (including primary and foreign keys), (4) connects to the spreadsheet as a linked server, and (5) inserts all the data into the tables.
I mostly like this system. I find it very easy to lay out columns in Excel, verify foreign key relationships using simple lookup functions, perform concatenation operations, copy in data from web tables or other spreadsheets, etc. One major disadvantage of this system is the need to sync up the columns in my worksheets any time I change a table definition.
I've been going through some tutorials to learn new .NET technologies or design patterns, and I've noticed that these typically involve using Visual Studio to create the database and add tables (rather than scripts), and the data is typically entered using the built-in designer. This has me wondering if maybe the way I'm doing it is not the most efficient or maintainable.
Questions
In general, do you find it preferable to build your whole database via scripts or a GUI designer, such as SSMSE or Visual Studio?
What method do you recommend for populating your database with startup or test data and why?
Clarification
Judging by the answers so far, I think I should clarify something. Assume that I have a significant amount of data (hundreds or thousands of rows) that needs to find its way into the database. This data could be sourced from various places, such as text files, spreadsheets, web tables, etc. I've received several suggestions to script this process using INSERT statements, but is this really viable when you're talking about a lot of data?
Which leads me to...
New questions
How would you write a SQL script to take the country data on this page and insert it into the database?
With Excel, I could just copy/paste the table into a worksheet and run my utility script, and I'd basically be done.
What if you later realized you needed a new column, CapitalCity?
With Excel, I could take that information from this page, paste it into Excel, and with a quick text-to-column manipulation, I'd have the data in the format I need.
I honestly didn't write this question to defend Excel as the best way or even a good way to get data into a database, but the answers so far don't seem to be addressing my main concern--how to get all this data into your database. Writing a script with hundreds of INSERT statements by hand would be extremely time consuming and error prone. Somehow, this script needs to be machine generated, but how?
I think your current process is fine for seeding the database with initial data. It's simple, easy to maintain, and works for you. If you've got a good database design with adequate constraints then it doesn't really matter how you seed the initial data. You could use an intermediate tool to generate scripts but why bother?
SSIS has a steep learning curve, doesn't work well with source control (impossible to tell what changed between versions), and is very finicky about type conversions from Excel. There's also an issue with how many rows it reads ahead to determine the data type -- you're in deep trouble if your first x rows contain numbers stored as text.
1) I prefer to use scripts for several reasons.
• Scripts are easy to modify, and plus when I get ready to deploy my application to a production environment, I already have the scripts written so I'm all set.
• If I need to deploy my database to a different platform (like Oracle or MySQL) then it's easy to make minor modifications to the scripts to work on the target database.
• With scripts, I'm not dependent on a tool like Visual Studio to build and maintain the database.
2) I like good old fashioned insert statements using a script. Again, at deployment time scripts are your best friend. At our shop, when we deploy our applications we have to have scripts ready for the DBA's to run, as that's what they expect.
I just find that scripts are simple, easy to maintain, and the "least common denominator" when it comes to creating a database and loading up data to it. By least common denominator, I mean that the majority of people (i.e. DBA's, other people in your shop that might not have visual studio) will be able to use them without any trouble.
The other thing that's important with scripts is that it forces you to learn SQL and more specfically DDL (data definition language). While the hand-holding GUI tools are nice, there's no substitute for taking the time to learn SQL and DDL inside out. I've found that those skills are invaluable to have in almost any shop.
Frankly, I find the concept of using Excel here a bit scary. It obviously works, but it's creating a dependency on an ad-hoc data source that won't be resolved until much later. Last thing you want is to be in a mad rush to deploy a database and find out that the Excel file is mangled, or worse, missing entirely. I suppose the severity of this would vary from company to company as a function of risk tolerance, but I would be actively seeking to remove Excel from the equation, or at least remove it as a permanent fixture.
I always use scripts to create databases, because scripts are portable and repeatable - you can use (almost) the same script to create a development database, a QA database, a UAT database, and a production database. For this reason it's equally important to use scripts to modify existing databases.
I also always use a script to create bootstrap data (AKA startup data), and there's a very important reason for this: there's usually more scripting to be done afterward. Or at least there should be. Bootstrap data is almost invariably read-only, and as such, you should be placing it on a read-only filegroup to improve performance and prevent accidental changes. So you'll generally need to script the data first, then make the filegroup read-only.
On a more philosophical level, though, if this startup data is required for the database to work properly - and most of the time, it is - then you really ought to consider it part of the data definition itself, the metadata. For that reason, I don't think it's appropriate to have the data defined anywhere but in the same script or set of scripts that you use to create the database itself.
Test data is a little different, but in my experience you're usually trying to auto-generate that data in some fashion, which makes it even more important to use a script. You don't want to have to manually maintain an ad-hoc database of millions of rows for testing purposes.
If your problem is that the test or startup data comes from an external source - a web page, a CSV file, etc. - then I would handle this with an actual "configuration database." This way you don't have to validate references with VLOOKUPS as in Excel, you can actually enforce them.
Use SQL Server Integration Services (formerly DTS) to pull your external data from CSV, Excel, or wherever, into your configuration database - if you need to periodically refresh the data, you can save the SSIS package so it ends up being just a couple of clicks.
If you need to use Excel as an intermediary, i.e. to format or restructure some data from a web page, that's fine, but the important thing IMO is to get it out of Excel as soon as possible, and SSIS with a config database is an excellent repeatable method of doing that.
When you are ready to migrate the data from your configuration database into your application database, you can use SQL Server Management Studio to generate a script for the data (in case you don't already know - when you right click on the database, go to Tasks, Generate Scripts, and turn on "Script Data" in the Script Options). If you're really hardcore, you can actually script the scripting process, but I find that this usually takes less than a minute anyway.
It may sound like a lot of overhead, but in practice the effort is minimal. You set up your configuration database once, create an SSIS package once, and refresh the config data maybe once every few months or maybe never (this is the part you're already doing, and this part will become less work). Once that "setup" is out of the way, it's really just a few minutes to generate the script, which you can then use on all copies of the main database.
Since I use an object-relational mapper (Hibernate, there is also a .NET version), I prefer to generate such data in my programming language. The ORM then takes care of writing things into the database. I don't have to worry about changing column names in the data because I need to fix the mapping anyway. If refactoring is involved, it usually takes care of the startup/test data also.
Excel is an unnecessary component of this process.
Script the current version the database components that you want to reuse, and add the script to your source control system. When you need to make changes in the future, either modify the entities in the database and regenerate the script, or modify the script and regenerate the database.
Avoid mixing Visual Studio's db designer and Excel as they only add complexity. Scripts and SQL Management Studio are your friends.

Programmatically check for Access database corruption?

Is there a way to programmatically check for database object corruption in Access 2003?
My development project has gotten complex enough that it's hard to manually check all the objects after a day of programming to see if some small control, form, report, query, or code object has been corrupted somehow. I already have the data split off into a separate SQL Database stored on another machine, and this project is merely a front-end application to work with the data.
Mostly an academic musing, as I just don't want to get so far - then have corruption put me back several weeks because some seldom used object got corrupted way back when.
Any ideas out there? Thanks in advance for any pointers!
EDITED 12/03/2009 # 11:51
Sadly, I can only accept one answer - though I got a few very good ones, thank you for all the pointers!
You might like to look at: Is it possible to programmatically detect corrupt Access 2007 database tables?
I am inclined to keep a copy of important databases at each compact & repair and to compare the new database against the previous one. You can also check for non-standard characters.
Neither Compact/Repair nor Decompile/Recompile catches all corruption problems, although you should be doing this anyway.
I use a function to export all Container Docs (and QueryDefs) using SaveAsText into a date/time stamped folder, and use it regularly throughout the day. If I suspect any corruption, I create a new mdb, and use LoadFromText to recreate the objects.
Proper compilation practices will prevent corruption of the VBA project (which is what you're talking about here).
That entails:
use OPTION EXPLICIT in all modules.
turn off COMPILE ON DEMAND in the VBE options.
compile your code regularly, while working.
periodically (e.g., once a day after a full day of coding) decompile and recompile the code.
If you do this, you'll never encounter corruption in the first place so you won't need to test for it (which is impossible in the first place).