Which is better practice: complex SQL statements or Recordset manipulation in Access VBA? - sql

I'm doing some VBA development and I found creating SQLs quite efficient way of getting everything done (selecting and updating).
But I got to this stage where my SQL statements contain complex Switches and WHERE conditions where I have another Selects to update appropriate records. Therefore, I create this SQLs and I simply run it via "CurrentDb.Execute strSQL" and it does everything fine.
The question is, why would I declare ADODB.connections etc, set recordsets, loop through it and manipulate the data one by one?

why would I declare ADODB.connections etc, set recordsets, loop through it and manipulate the data one by one?
No reason. If you can do it in plain SQL you can stick with it.
But in MS Access even the SQL is often evaluated on the client, so from the performance perspective there is no big difference.
If you would use SQL Server Database as a backend, then that would make a difference.
Anyway, if your SQL gets too complex to understand (you will need to read it later!, won't you?), then you could really split it in smaller chains and functions.

I don't think I've seen a solution in Access that uses pure SQL to create a Rank or even a row number without using a VBA function.

When the recordsets are small and the requirements are continually evolving, then using the recordsets in VBA makes sense. I think that's particularly true when complex decisions or parsing must take place line-by-line and a form/report or two will be involved.
If the requirements are known, and the SQL works, then there's no real reason to convert.

Related

Does Table-Valued Function (SQL) create table on each call? [performance]

Okay this might sound a noob question, but SQL isn't my really strength, so I am requesting some help here.
I am trying to implement something, but I am concerned about performance issues.
The problem I am trying to fix is something like this:
I have a column with a lot of data separated by commas ","
Something like this: data1,data2,data3,data57
What I need is looping through each piece of data separated by commas for all the records, and then do something with that single piece data, do you get it?
I found a solution that can actually help me, but I am worried about system performance, because I might need to make multiple calls to this function using different parameters!
Does a table is created on each call I made to the Table-Valued Function (UDF) or does the sql server saves it as cache? [maybe I would rather need a temporary table?]
Thank you for your help in advance!
Note: The data is not mine, and I should use it as is, so suggesting to change the database is out of question (however I know that would be the best scenario).
a
Note2: The purpose of this question/problem is to import initial data to the database, performance may not be a serious problem since it won't run many times, but still I wanna regard that issue, and do it the best way I can!
User defined, table-valued functions that are composed of multiple statements, as the one you found is, will create an object in the tempdb system database, populate it and then dispose of it when the object goes out of scope.
If you want to run this multiple times over the same parameters, you might consider creating a table variable and caching the result in that yourself. If you're going to be calling it on different lists on comma-separated values though, there's not a great way of avoiding the overhead. SQL Server isn't really built for lots of string manipulation.
Generally, for one-off jobs, the performance implications of this tempdb usage is not going to be a major concern for you. It's more concerning when it's a common pattern in the day-to-day of the database life.
I'd suggest trying, if you can, on a suitably sized subset of the data to gauge the performance of your solution.
Since you say you're on SQL Server 2016, you can make use of the new STRING_SPLIT function, something like
SELECT t.Column1, t.Column2, s.value
FROM table t
CROSS APPLY STRING_SPLIT(t.CsvColumn, ',') s
May get you close to where you want, without the need to define a new function. Note, your database needs to be running under the 2016 compatibility level (130) for this to be available, simply running on SQL 2016 isn't enough (they often do this with new features to avoid the risk of backwards-compatibility-breaking changes).

Migrate calculations from VBA to SQL?

I manage an application built on Access with some VBA code that takes its data from:
Inputs by the user through Access forms
Tables in Sybase (that are linked through Access)
Local tables in Access
The application is used to make some financial calculations. Our calculations need a lot of conditions and are mostly some complex calculations (fractions, multiplications...)
My question is : is VBA faster than Sybase to do the calculations ?
(Please notice than when we do our calculations it takes our 3 kinds of data sources)
I was thinking about migrate all of the calculations to Sybase as some stored procedures and call it from the VBA code with parameters, and wait from an output from Sybase.
PS: another reason why I am asking that is because we consider as a long term project to migrate our Access application to a thin client(prob web-based), and if all the calculations are already on the server/database side it could maybe be easier? What do you think?
Thanks a lot for your help
IMO, I would pass the form based variables (user entry) as parameters into a stored procedure, then fetch the other variables as needed from tables within the SP. This avoids sending too much data to the client as the form is opening. This abstracts the logic from VBA code (or any specific front-end language), making it easier to eventually move to a thin layer. You can also recompile independent stored procedures as needed, instead of deploying another instance of your code (much harder usually).
If there are a lot of parameters coming from the form or local tables, consider passing them in as a structured data type within Sybase. The procedure cache within Sybase is extremely powerful and after initial compilation as fast as any other procedural language.
It depends on the calculations. Sybase will be better at doing calculations that involve grouping data, but complex calculations like fractions, etc... would be faster to do in code. Also it's just better practice to separate out business logic from data.

Query design practices in SQL

I am building queries for a database in MS Access 2007 and I am wondering if my current design practices are up to par. Basically, the database was configured before I came, but I have been given the responsibility of building efficient queries to extract the data.
My current queries are small and simple, each accomplishing 2-3 tasks (sometimes only 1) at a time. The reason I am taking this approach is because I am completely new to SQL, and I find it easier to work with many, simple queries and use reports to consolidate the data, as opposed to building extremely complex queries which are 1) hard to build (for me, anyways) and 2) hard to maintain.
I was just curious if anyone had any best practices for query design, and if you could give me some specific feed back for the approach listed above, and whether or not I should start making complex queries, or just stick to simple queries and reports to consolidate the relevant data.
Thanks.
The people answering this question are not coming to it from an Access point of view, so I'll offer some observations as somebody who has been creating Access applications professionally full-time since 1996.
First off, there are several places where you'll have SQL in an Access application:
stored queries.
stored properties of forms, reports, combo boxes and list boxes.
in VBA code where you are writing SQL on the fly.
Managing all of these SQL statements in an organized fashion is difficult, if not impossible. But I'm not sure it's worth it!
First off, consider just stored queries. If you follow the advice of saving a query for every individual task so that each SQL statement is used in only one place, you'll soon have a mess in the list of queries, and you'll be forced into some kind of naming convention to keep track of what's what. Because of this, I generally don't save queries EXCEPT where they MUST be saved, or where the optimization that comes with a saved query is going to be helpful (i.e., large dataset or complex joins/filtering).
For example, when I first started programming in Access, I'd save all the rowsources of my combo boxes as saved queries. I developed a naming convention so they wouldn't be mixed in with the other queries in the list of queries, so it wasn't to hard to manage. At first, I thought I'd be re-using the saved queries, but it quickly became clear that I needed to make changes for individual circumstances, and changing a query that was used elsewhere might alter its results in other contexts, so really, there was no "shared code" benefit to the saved queries (as I thought there would be). The only place where it was helpful was where I had the same combo box on multiple forms, and then I could save the rowsource for that as a saved query and if I needed to alter it, I could it in just one place. However, that was really only an advantage for a relatively complex rowsource -- a simple SELECT on a couple of fields doesn't really benefit from that kind of sharing, particularly when it's used in only a couple of different places.
In short, I quickly concluded that it was just easier to save the SQL statements where they were used -- since there was very little re-use in the first place (once I gained enough experience to realize the pitfalls of trying to re-use them), this worked much better, and it kept the SQL close to where it was being used.
For forms and reports, I do some of the same things, but in general, use saved queries for the purpose of avoiding having to write too many complex subselects for use as derived tables. Where I needed those it was always easier to write it and save it and then use it with a JOIN in another SQL statement than it was to try to use the subselect inline as a derived table (which just makes for complicated SQL that's hard to read -- particularly when you can't comment or format your SQL, as is the case with saved Access queries).
In general, I don't save the recordsources of forms or reports except where there is real re-use going on (a report will often use the same recordsource as a form, so in that case, it's useful to save it, so that when you change the SQL of the form, the report that goes with it inherits the alteration).
That all leaves dynamic SQL assembled in VBA code. I use lots of this, from dynamically setting the rowsources of combo/listboxes, to setting the recordsources of subforms for filtering purposes. This is harder to manage, and sometimes I use string constants in the module to make that easier. For instance, in a case where you're writing dynamic SQL where everything remains the same except the WHERE clause, a constant with the SELECT and a second constant with the ORDER BY makes it a lot easier to assemble the complete SQL statement.
I don't know if this really answers your questions, but I have learned over the years that the benefits of re-using SQL statements are vastly outweighed by the uncertainty that comes from the inability to track easily where that SQL statement may be used. I find that storing the SQL statment as close to where it is used as possible is the best practice, as that is a form of "self-documentation" (though not a great one!).
I do make many exceptions and save queries when there is a real and demonstrable benefit in terms of performance or managing what would otherwise become much more comples SQL. However, I would also note that one should also not go too far in the other direction, using tons of nested saved queries, because then you run into other problems (i.e., the "too many databases" problem, which is actually caused by using up the 2048 table handles available at one time -- it's done more easily than you might think).
My humble opinion, it doesn't matter if DB engine is big and monstrous as MSSQL or Oracle, or tiny and simple as SQLite, every query (or stored procedure or any other unit of data processing) should be responsible only for 1 function. I use this principle anywhere (not only in DB development) and I can say it works.
If you are not sure, try to read books about refactoring, Fawler for example. I suppose his principles are applicable to any area of development.
If you are storing your data in MSAccess then your database cannot be too large and any optimization you do is limited by the constraints MSAccess imposes. If better (more optimized) queries is a goal, then perhaps migrating the data out of Access and into SQL Server may allow you to have better flexibility in development going forward. You can leverage cached execution plans, stored procedure, and views.
This may mean that you will need to enhance your T-SQL skills to accomplish this.
So weigh out the options you propose in your question:
1. Keep code simple (comfortable at your current skill level)
2. Meet the responsibility to create efficient queries for data extraction.
SQL Server Express could be a good starting point (it's free).

Is this a valid benefit of using embedded SQL over stored procedures?

Here's an argument for SPs that I haven't heard. Flamers, be gentle with the down tick,
Since there is overhead associated with each trip to the database server, I would suggest that a POSSIBLE reason for placing your SQL in SPs over embedded code is that you are more insulated to change without taking a performance hit.
For example. Let's say you need to perform Query A that returns a scalar integer.
Then, later, the requirements change and you decide that it the results of the scalar is > x that then, and only then, you need to perform another query. If you performed the first query in a SP, you could easily check the result of the first query and conditionally execute the 2nd SQL in the same SP.
How would you do this efficiently in embedded SQL w/o perform a separate query or an unnecessary query?
Here's an example:
--This SP may return 1 or two queries.
SELECT #CustCount = COUNT(*) FROM CUSTOMER
IF #CustCount > 10
SELECT * FROM PRODUCT
Can this/what is the best way to do this in embedded SQL?
A very persuasive article
SQL and stored procedures will be there for the duration of your data.
Client languages come and go, and you'll have to re-implement your embedded SQL every time.
In the example you provide, the time saved is sending a single scalar value and a single follow-up query over the wire. This is insignificant in any reasonable scenario. That's not to say there might not be other valid performance reasons to use SPs; just that this isn't such a reason.
I would generally never put business logic in SP's, I like them to be in my native language of choice outside the database. The only time I agree SPs are better is when there is a lot of data movement that don't need to come out of the db.
So to aswer your question, I'd rather have two queries in my code than embed that in a SP, in my view I am trading a small performance hit for something a lot more clear.
How would you do this efficiently in
embedded SQL w/o perform a separate
query or an unnecessary query?
Depends on the database you are using. In SQL Server, this is a simple CASE statement.
Perhaps include the WHERE clause in that sproc:
WHERE (all your regular conditions)
AND myScalar > myThreshold
Lately I prefer to not use SPs (Except when uber complexity arises where a proc would just be better...or CLR would be better). I have been using the Repository pattern with LINQ to SQL where my query is written in my data layer in a strongly typed LINQ expression. The key here is that the query is strongly typed which means when I refactor I am refactoring properties of a class that is directly generated from the database table (which makes changes from the DB carried all the way forward super easy and accurate). While my SQL is generated for me and sent to the server I still have the option of sticking to DRY principles as the repository pattern allows me to break things down into their smallest component. I do have the issue that I might make a trip to the server and based on the results of query I may find that I need to make another trip to the server. I don't worry about this up front. If I find later that it becomes an issue then I may refactor that code into something more performant. The over all key here is that there is no one magic bullet. I tend to work on greenfield applications which allows this method of development to be most efficient for me.
Benefits of SPs:
Performance (are precompiled)
Easy to change (without compiling the application)
SQL set based features make very easy doing really difficult data tasks
Drawbacks:
Depend heavily on the database engine used
Makes deployment of upgrades a little harder (you have to deploy the App + the scripts)
My 2 cents...
About your example, it can be done like this:
select * from products where (select count(*) from customers>10)

Code reuse and modularity in SQL

Is code reuse and modulatiry a good idea for SQL Stored Procedures programming?
And if so, what's the best way to add these features to a SQL stored procedure code base?
I usually create scalar valued functions for tasks that are common and repeated. I find that it eases both development of new procedures similar to existing ones, but also aids a lot in bugtracking and troubleshooting.
I try to stay away from table valued functions though, due to performance issues.
My rule of thumb is that if it is a calculation, and it's used in several places, then I create a scalar valued function.
You are going to find that using functions within your queries is a disaster for performance. The functions become a black box for the optimizer, so you will end up re-coding the function call back into the query to make it run fast once you get up to a large number of rows in your tables.
A better way to deal with common calculations is to insert them into a new column with a trigger, or in your insert/update queries. That way you can index the calculated value and use it directly instead of figuring it out each time you need it.
Sql doesn't give you a lot of flexibility when it comes to code reuse. I usually create functions when it comes to calculations or other tasks that don't involve modifying tables. But all tasks that involve writing to tables and that sort of things I usually use a stored procedure to get a better control of the transactions.
You can break code into separate stored procedures to help break down complex stored procs into more manageable chunks. You can also do the same to break out common logic that won't work in a function. Think of it similar to an Extract Method refactoring.
Another way to look at it from the applications side is to use binding to reuse your SQL queries. But that's probably not what your looking for
To follow up on this, I did run into some performance problems, and it seems that the optimizer is not able to pick the correct index for code inside the functions.
So I had to specify the correct index using index hints (with keyword), to solve the performance issue.