Does Table-Valued Function (SQL) create table on each call? [performance] - sql

Okay this might sound a noob question, but SQL isn't my really strength, so I am requesting some help here.
I am trying to implement something, but I am concerned about performance issues.
The problem I am trying to fix is something like this:
I have a column with a lot of data separated by commas ","
Something like this: data1,data2,data3,data57
What I need is looping through each piece of data separated by commas for all the records, and then do something with that single piece data, do you get it?
I found a solution that can actually help me, but I am worried about system performance, because I might need to make multiple calls to this function using different parameters!
Does a table is created on each call I made to the Table-Valued Function (UDF) or does the sql server saves it as cache? [maybe I would rather need a temporary table?]
Thank you for your help in advance!
Note: The data is not mine, and I should use it as is, so suggesting to change the database is out of question (however I know that would be the best scenario).
a
Note2: The purpose of this question/problem is to import initial data to the database, performance may not be a serious problem since it won't run many times, but still I wanna regard that issue, and do it the best way I can!

User defined, table-valued functions that are composed of multiple statements, as the one you found is, will create an object in the tempdb system database, populate it and then dispose of it when the object goes out of scope.
If you want to run this multiple times over the same parameters, you might consider creating a table variable and caching the result in that yourself. If you're going to be calling it on different lists on comma-separated values though, there's not a great way of avoiding the overhead. SQL Server isn't really built for lots of string manipulation.
Generally, for one-off jobs, the performance implications of this tempdb usage is not going to be a major concern for you. It's more concerning when it's a common pattern in the day-to-day of the database life.
I'd suggest trying, if you can, on a suitably sized subset of the data to gauge the performance of your solution.
Since you say you're on SQL Server 2016, you can make use of the new STRING_SPLIT function, something like
SELECT t.Column1, t.Column2, s.value
FROM table t
CROSS APPLY STRING_SPLIT(t.CsvColumn, ',') s
May get you close to where you want, without the need to define a new function. Note, your database needs to be running under the 2016 compatibility level (130) for this to be available, simply running on SQL 2016 isn't enough (they often do this with new features to avoid the risk of backwards-compatibility-breaking changes).

Related

Which is better practice: complex SQL statements or Recordset manipulation in Access VBA?

I'm doing some VBA development and I found creating SQLs quite efficient way of getting everything done (selecting and updating).
But I got to this stage where my SQL statements contain complex Switches and WHERE conditions where I have another Selects to update appropriate records. Therefore, I create this SQLs and I simply run it via "CurrentDb.Execute strSQL" and it does everything fine.
The question is, why would I declare ADODB.connections etc, set recordsets, loop through it and manipulate the data one by one?
why would I declare ADODB.connections etc, set recordsets, loop through it and manipulate the data one by one?
No reason. If you can do it in plain SQL you can stick with it.
But in MS Access even the SQL is often evaluated on the client, so from the performance perspective there is no big difference.
If you would use SQL Server Database as a backend, then that would make a difference.
Anyway, if your SQL gets too complex to understand (you will need to read it later!, won't you?), then you could really split it in smaller chains and functions.
I don't think I've seen a solution in Access that uses pure SQL to create a Rank or even a row number without using a VBA function.
When the recordsets are small and the requirements are continually evolving, then using the recordsets in VBA makes sense. I think that's particularly true when complex decisions or parsing must take place line-by-line and a form/report or two will be involved.
If the requirements are known, and the SQL works, then there's no real reason to convert.

metaprogramming with stored procedures?

This is going to be both a direct question and a discussion point. I'll ask the direct question first:
Can a stored procedure create another stored procedure dynamically? (Personally I'm interested in SQL Server 2008, but in the interests of wider discussion will leave it open)
Now to the reason I'm asking. Briefly (you can read more elsewhere), User Defined Scalar Functions in SQL Server are performance bottlenecks, at best. I've seen uses in our code base that slow the total query down by 3-4x, but from what I've read the local impact of the S-UDF can be 10x+
However, UDFs are, potentially, great for raising abstraction levels, reducing lots of tedious boilerplate, centralising logic rules etc. In most cases they boil down to simple expressions that could easily be expanded inline - but they're not (I'm really only thinking of non-querying functions - e.g. string manipluations). I've seen a bug report for this to be addressed in a future release - with some buy-in from MS. But for now we have to live with the (IMHO) broken implementation.
One workaround is to use a table value UDF instead - however these complicate the client code in ways you don't always want to deal with (esp. when the UDF just computes the result of an expression).
So my crazy idea, at first, was to write the procs with C Preprocessor directives, then pass it through a preprocessor before submitting to the RDBMS. This could work, but has its own problems.
That led me to my next crazy idea, which was to define the "macros" in the DB itself, and have a master proc that accepts a string containing an unprocessed SP with macros, expands the macros inline, then submits it on to the RDMS. This is not what SPs are good at, but I think it could work - assuming you can do this in the first place - hence my original question.
However, now I have explained my path to the question, I'd also like to leave it open for other ideas. I'm sure I'm not the only one who has been thinking along these lines. Perhaps there are third-party solutions already out there? My googling has not turned up much yet.
Also I thought it would be a fun discussion topic.
[edit]
This blog post I found in my research describes the same issue I'm seeing. I'd be happy if anyone could point out something that I, or the blog poster, might be doing wrong that leads to the overhead.
I should also add that I am using WITH SCHEMABINDING on my S-UDF, although it doesn't seem to be giving me any advantage
your string processing UDF won't be a perf problem. Scalar UDF's are a problem only when they perform selects and those selects are done for every row. this in turn spikes the IO.
string manipulaation on the other hand is done in memory and is fast.
as for your idea i can't really see any benefit of it. creating and dropping objects like that can be an expensive operation and may lead to schema locking.

Is this a valid benefit of using embedded SQL over stored procedures?

Here's an argument for SPs that I haven't heard. Flamers, be gentle with the down tick,
Since there is overhead associated with each trip to the database server, I would suggest that a POSSIBLE reason for placing your SQL in SPs over embedded code is that you are more insulated to change without taking a performance hit.
For example. Let's say you need to perform Query A that returns a scalar integer.
Then, later, the requirements change and you decide that it the results of the scalar is > x that then, and only then, you need to perform another query. If you performed the first query in a SP, you could easily check the result of the first query and conditionally execute the 2nd SQL in the same SP.
How would you do this efficiently in embedded SQL w/o perform a separate query or an unnecessary query?
Here's an example:
--This SP may return 1 or two queries.
SELECT #CustCount = COUNT(*) FROM CUSTOMER
IF #CustCount > 10
SELECT * FROM PRODUCT
Can this/what is the best way to do this in embedded SQL?
A very persuasive article
SQL and stored procedures will be there for the duration of your data.
Client languages come and go, and you'll have to re-implement your embedded SQL every time.
In the example you provide, the time saved is sending a single scalar value and a single follow-up query over the wire. This is insignificant in any reasonable scenario. That's not to say there might not be other valid performance reasons to use SPs; just that this isn't such a reason.
I would generally never put business logic in SP's, I like them to be in my native language of choice outside the database. The only time I agree SPs are better is when there is a lot of data movement that don't need to come out of the db.
So to aswer your question, I'd rather have two queries in my code than embed that in a SP, in my view I am trading a small performance hit for something a lot more clear.
How would you do this efficiently in
embedded SQL w/o perform a separate
query or an unnecessary query?
Depends on the database you are using. In SQL Server, this is a simple CASE statement.
Perhaps include the WHERE clause in that sproc:
WHERE (all your regular conditions)
AND myScalar > myThreshold
Lately I prefer to not use SPs (Except when uber complexity arises where a proc would just be better...or CLR would be better). I have been using the Repository pattern with LINQ to SQL where my query is written in my data layer in a strongly typed LINQ expression. The key here is that the query is strongly typed which means when I refactor I am refactoring properties of a class that is directly generated from the database table (which makes changes from the DB carried all the way forward super easy and accurate). While my SQL is generated for me and sent to the server I still have the option of sticking to DRY principles as the repository pattern allows me to break things down into their smallest component. I do have the issue that I might make a trip to the server and based on the results of query I may find that I need to make another trip to the server. I don't worry about this up front. If I find later that it becomes an issue then I may refactor that code into something more performant. The over all key here is that there is no one magic bullet. I tend to work on greenfield applications which allows this method of development to be most efficient for me.
Benefits of SPs:
Performance (are precompiled)
Easy to change (without compiling the application)
SQL set based features make very easy doing really difficult data tasks
Drawbacks:
Depend heavily on the database engine used
Makes deployment of upgrades a little harder (you have to deploy the App + the scripts)
My 2 cents...
About your example, it can be done like this:
select * from products where (select count(*) from customers>10)

Code reuse and modularity in SQL

Is code reuse and modulatiry a good idea for SQL Stored Procedures programming?
And if so, what's the best way to add these features to a SQL stored procedure code base?
I usually create scalar valued functions for tasks that are common and repeated. I find that it eases both development of new procedures similar to existing ones, but also aids a lot in bugtracking and troubleshooting.
I try to stay away from table valued functions though, due to performance issues.
My rule of thumb is that if it is a calculation, and it's used in several places, then I create a scalar valued function.
You are going to find that using functions within your queries is a disaster for performance. The functions become a black box for the optimizer, so you will end up re-coding the function call back into the query to make it run fast once you get up to a large number of rows in your tables.
A better way to deal with common calculations is to insert them into a new column with a trigger, or in your insert/update queries. That way you can index the calculated value and use it directly instead of figuring it out each time you need it.
Sql doesn't give you a lot of flexibility when it comes to code reuse. I usually create functions when it comes to calculations or other tasks that don't involve modifying tables. But all tasks that involve writing to tables and that sort of things I usually use a stored procedure to get a better control of the transactions.
You can break code into separate stored procedures to help break down complex stored procs into more manageable chunks. You can also do the same to break out common logic that won't work in a function. Think of it similar to an Extract Method refactoring.
Another way to look at it from the applications side is to use binding to reuse your SQL queries. But that's probably not what your looking for
To follow up on this, I did run into some performance problems, and it seems that the optimizer is not able to pick the correct index for code inside the functions.
So I had to specify the correct index using index hints (with keyword), to solve the performance issue.

How can my application benefit from temporary tables?

I've been reading a little about temporary tables in MySQL but I'm an admitted newbie when it comes to databases in general and MySQL in particular. I've looked at some examples and the MySQL documentation on how to create a temporary table, but I'm trying to determine just how temporary tables might benefit my applications and I guess secondly what sorts of issues I can run into. Granted, each situation is different, but I guess what I'm looking for is some general advice on the topic.
I did a little googling but didn't find exactly what I was looking for on the topic. If you have any experience with this, I'd love to hear about it.
Thanks,
Matt
Temporary tables are often valuable when you have a fairly complicated SELECT you want to perform and then perform a bunch of queries on that...
You can do something like:
CREATE TEMPORARY TABLE myTopCustomers
SELECT customers.*,count(*) num from customers join purchases using(customerID)
join items using(itemID) GROUP BY customers.ID HAVING num > 10;
And then do a bunch of queries against myTopCustomers without having to do the joins to purchases and items on each query. Then when your application no longer needs the database handle, no cleanup needs to be done.
Almost always you'll see temporary tables used for derived tables that were expensive to create.
First a disclaimer - my job is reporting so I wind up with far more complex queries than any normal developer would. If you're writing a simple CRUD (Create Read Update Delete) application (this would be most web applications) then you really don't want to write complex queries, and you are probably doing something wrong if you need to create temporary tables.
That said, I use temporary tables in Postgres for a number of purposes, and most will translate to MySQL. I use them to break up complex queries into a series of individually understandable pieces. I use them for consistency - by generating a complex report through a series of queries, and I can then offload some of those queries into modules I use in multiple places, I can make sure that different reports are consistent with each other. (And make sure that if I need to fix something, I only need to fix it once.) And, rarely, I deliberately use them to force a specific query plan. (Don't try this unless you really understand what you are doing!)
So I think temp tables are great. But that said, it is very important for you to understand that databases generally come in two flavors. The first is optimized for pumping out lots of small transactions, and the other is optimized for pumping out a smaller number of complex reports. The two types need to be tuned differently, and a complex report run on a transactional database runs the risk of blocking transactions (and therefore making web pages not return quickly). Therefore you generally don't want to avoid using one database for both purposes.
My guess is that you're writing a web application that needs a transactional database. In that case, you shouldn't use temp tables. And if you do need complex reports generated from your transactional data, a recommended best practice is to take regular (eg daily) backups, restore them on another machine, then run reports against that machine.
The best place to use temporary tables is when you need to pull a bunch of data from multiple tables, do some work on that data, and then combine everything to one result set.
In MS SQL, Temporary tables should also be used in place of cursors whenever possible because of the speed and resource impact associated with cursors.
If you are new to databases, there are some good books by Joe Kelko that review best practices for ANSI SQL. SQL For Smarties will describe in great detail the use of temp table, impact of indexes, where clauses, etc. It's a great reference book with in depth detail.
I've used them in the past when I needed to create evaluated data. That was before the time of views and sub selects in MySQL though and I generally use those now where I would have needed a temporary table. The only time I might use them is if the evaluated data took a long time to create.
I haven't done them in MySQL, but I've done them on other databases (Oracle, SQL Server, etc).
Among other tasks, temporary tables provide a way for you to create a queryable (and returnable, say from a sproc) dataset that's purpose-built. Let's say you have several tables of figures -- you can use a temporary table to roll those figures up to nice, clean totals (or other math), then join that temp table to others in your schema for final output. (An example of this, in one of my projects, is calculating how many scheduled calls a given sales-related employee must make per week, bi-weekly, monthly, etc.)
I also often use them as a means of "tilting" the data -- turning columns to rows, etc. They're good for advanced data processing -- but only use them when you need to. (My golden rule, as always, applies: If you don't know why you're using x, and you don't know how x works, then you probably shouldn't use it.)
Generally, I wind up using them most in sprocs, where complex data processing is needed. I'd love to give a concrete example, but mine would be in T-SQL (as opposed to MySQL's more standard SQL), and also they're all client/production code which I can't share. I'm sure someone else here on SO will pick up and provide some genuine sample code; this was just to help you get the gist of what problem domain temp tables address.