In Query that haves 3 tables variables joined with 2 tables in some cases where those variable tables are empty (no rows) in the Execution Plan still appears with 16% cost for the scan of each table variable.
From searching found this rule
If a table variable is joined with other tables in SQL Server, it may
result in slow performance due to inefficient query plan selection
because SQL Server does not support statistics or track number of rows
in a table variable while compiling a query plan
from https://sqlperformance.com/2014/06/t-sql-queries/table-variable-perf-fix
So my question is it worth it before creating the table variable and inserting rows into it, validate if there is any rows to insert in the first place and ignore the creation of the table variable and in the query where have 3 variable table ignore the joins for table variable that weren't created?
Related
Is there an efficient way to update rows of a table that has no indexes and no partitions (and ~50millions rows)?
I have a date field LOAD_DTTM and values of this field for rows that require update (around 2000 distinct dates).
WIll update be faster if i specify a date in a WHERE clause along with the UNIQUE_ID of a row?
If you want to update all, or a large number, of the rows then the quickest way is:
create table my_table_copy as
select ... -- all the columns, updating values as required
from my_table;
drop table my_table;
rename my_table_copy to my_table;
If your table had any indexes, constraints or triggers you would now need to re-add them - but it seems you don't have that issue!
You could create a PL/SQL procedure that loops and update and commit the table every n row count -- Say every 20.000 rows. I do not advise to update the full table as it will create a lock for a looong time and expose you to data loss in case of external factors.
The answer is NO.
Even if you specify both conditions in your WHERE clause as you stated, it won't help you to avoid a full scan of your table.
Even if one of your criteria will uniquely identify the row, it still won't help.
There is a real example tested on Oracle 12C ver.2 similar to your case. No indexes, no partitions, nothing. Just plain table with 4 columns
I have a table with 18mn records.
I also have CUSTOMER_ID which is a UNIQUE identifier for a row.
I also have ORDER_DATE column there.
Even if I do the query that you mentioned
update hit set status = 1 where customer_id = 408518625844 and order_date = '09-DEC-19';
it won't help me to avoid a full table scan. See below Execution Plan. Therefore under conditions, you've specified, you will be always getting the slowest execution time possible. Full Table Scan on 50mn rows is actually the worst-case scenario.
And pay attention to that Cost, it is 26539 on 18mn rows.
So if you have 50mn rows we can easily expect much more Cost for your query
I have a table named [cwbOrder] that currently has 1.277.469 rows. I am using SQL Server 2014 and I am doing these tests on a UAT environment, on production this query takes a little bit longer.
If I try selecting all of the rows like using:
SELECT * FROM cwbOrder
It takes 24 seconds to retrieve all of the data from the table. I have read about how it is important to index columns used in the predicates (WHERE), but I still cannot understand how does a simple select take 24 seconds.
Using this table in other more complex queries generates a lot of extra workload for the query, although I have created the JOINs on indexed columns. Additionally I have selected only 2 columns from this table then JOINED it to another table and this operation still takes a significantly long amount of time. As an example please consider the below query:
Below I have attached the index structure of both tables, to illustrate the matter:
PK_cwbOrder is the index on the id_cwbOrder column in the cwbOrder table.
Edit 1: I have added the execution plan for the query in which I join the cwbOrder table with the cwbAction table.
Is there any way, considering the information above, that I can make this query faster?
There are many reasons why such a select could be slow:
The row size or number of rows could be very large, requiring a lot of time to transport or delay.
Other operations on the table could have locks on the table.
The database server or network could be very busy.
The "table" could really be a view that is running a complicated query.
You can test different aspects. For instance:
SELECT TOP 10 <one column here>
FROM cwbOrder o
This returns a very small result set and reads just a small part of the table. This reads the entire table but returns a small result set:
SELECT COUNT(*)
FROM cwbOrder o
I have a temp table created in DB2 using java and then some 2000 rows inserted the same.
Now I am using this table in a select query with few joins. The query involves other 3 tables say A, B and C. Somehow this select query returns result very slow it almost takes a 20 seconds to provide results.
Below are the details of this query,
table A has 200000 records. B and C has only 100-200 records.
All these 3 tables have enough indexes defined on columns involved in join and the where clause. The Explain plan tool etc. did not show any new indexes needed.
When I run the query removing session table and its use in where clause, the query returns results in milliseconds. And as I mentioned this session table has only around 2000 records.
I have also declared indexes on each column of this session table.
I am not really sure about terminology here but when I say session table it is a temporary table created using db connection and the table gets dropped when the DB connection is closed. Also when the program runs with 15 threads, no thread is capable of looking at table created by other thread.
Where could the issue be? Please let me know some suggestions here.
Some suggestions (I assume LUW since you don't mention platform)
a) You say that you have indexes on each column of your session table, I assume this means a set of 1 column indexes. This is in most cases not optimal, you can probably replace those with a composite index. Check what advis suggests by creating a real table like:
create table temp.t ( ... )
insert into temp.t (...) values (...)
runstats on temp.t with distribution
then run: db2advis -d -m I -s "your query but with temp.t instead of session table"
b) After data is loaded and - eventually - new indexes are created, do a runstats on the session table
I have been searching the internet for hours trying to figure out how to improve the performance of my query using table-valued parameters (TVP).
After hours of searching, I finally determined what I believe is the root of the problem. Upon examining the Estimated Execution plan of my query, I discovered that the estimated number of rows for my query is 1 anytime I use a TVP. If I exchange the TVP for a query that selects the data I am interested in, then the estimated number of rows is much more accurate at around 7400. This significantly increases the performance.
However, in the real scenario, I cannot use a query, I must use a TVP. Is there any way to have SQL Server more accurately predict the number of rows when using a TVP so that a more appropriate plan will be used?
TVPs are Table Variables which don't maintain statistics and hence report only have 1 row. There are two ways to improve statistics on TVPs:
If you have no need to modify any of the values in the TVP or add columns to it to track operational data, then you can do a simple, statement-level OPTION (RECOMPILE) on any query that uses a table variable (TVP or locally created) and is doing more with that table variable than a simple SELECT (i.e. doing INSERT INTO RealTable (columns) SELECT (columns) FROM #TVP; does not need the statement-level recompile). Do the following test in SSMS to see this behavior in action:
DECLARE #TableVariable TABLE (Col1 INT NOT NULL);
INSERT INTO #TableVariable (Col1)
SELECT so.[object_id]
FROM [master].[sys].[objects] so;
-- Control-M to turn on "Include Actual Execution Plan"
SELECT * FROM #TableVariable; -- Estimated Number of Rows = 1 (incorrect)
SELECT * FROM #TableVariable
OPTION (RECOMPILE); -- Estimated Number of Rows = 91 (correct)
SELECT * FROM #TableVariable; -- Estimated Number of Rows = 1 (back to incorrect)
Create a local temporary table (single #) and copy the TVP data to that. While this does duplicate the data in tempdb, the benefits are:
better statistics for a temp table as opposed to table variable (i.e. no need for statement-level recompiles)
ability to add columns
ability to modify values
Bounty open:
Ok people, the boss needs an answer and I need a pay rise. It doesn't seem to be a cold caching issue.
UPDATE:
I've followed the advice below to no avail. How ever the client statistics threw up an interesting set of number.
#temp vs #temp
Number of INSERT, DELETE and UPDATE statements
0 vs 1
Rows affected by INSERT, DELETE, or UPDATE statements
0 vs 7647
Number of SELECT statements
0 vs 0
Rows returned by SELECT statements
0 vs 0
Number of transactions
0 vs 1
The most interesting being the number of rows affected and the number of transactions. To remind you, the queries below return identical results set, just into different styles of tables.
The following query are basicaly doing the same thing. They both select a set of results (about 7000) and populate this into either a temp or var table. In my mind the var table #temp should be created and populated quicker than the temp table #temp however the var table in the first example takes 1min 15sec to execute where as the temp table in the second example takes 16 seconds.
Can anyone offer an explanation?
declare #temp table (
id uniqueidentifier,
brand nvarchar(255),
field nvarchar(255),
date datetime,
lang nvarchar(5),
dtype varchar(50)
)
insert into #temp (id, brand, field, date, lang, dtype )
select id, brand, field, date, lang, dtype
from view
where brand = 'myBrand'
-- takes 1:15
vs
select id, brand, field, date, lang, dtype
into #temp
from view
where brand = 'myBrand'
DROP TABLE #temp
-- takes 16 seconds
I believe this almost completely comes down to table variable vs. temp table performance.
Table variables are optimized for having exactly one row. When the query optimizer chooses an execution plan, it does it on the (often false) assumption that that the table variable only has a single row.
I can't find a good source for this, but it is at least mentioned here:
http://technet.microsoft.com/en-us/magazine/2007.11.sqlquery.aspx
Other related sources:
http://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=125052
http://databases.aspfaq.com/database/should-i-use-a-temp-table-or-a-table-variable.html
Run both with SET STATISTICS IO ON and SET STATISTICS TIME ON. Run 6-7 times each, discard the best and worst results for both cases, then compare the two average times.
I suspect the difference is primarily from a cold cache (first execution) vs. a warm cache (second execution). The output from STATISTICS IO would give away such a case, as a big difference in the physical reads between the runs.
And make sure you have 'lab' conditions for the test: no other tasks running (no lock contention), databases (including tempdb) and logs are pre-grown to required size so you don't hit any log growth or database growth event.
This is not uncommon. Table variables can be (and in a lot of cases ARE) slower than temp tables. Here are some of the reasons for this:
SQL Server maintains statistics for queries that use temporary tables but not for queries that use table variables. Without statistics, SQL Server might choose a poor processing plan for a query that contains a table variable
Non-clustered indexes cannot be created on table variables, other than the system indexes that are created for a PRIMARY or UNIQUE constraint. That can influence the query performance when compared to a temporary table with non-clustered indexes.
table variables use internal metadata in a way that prevents the engine from using a table variable within a parallel query (this means that it wont take advantage of multi-processor machines).
A table variable is optimized for one row, by SQL Server (it assumes 1 row will be returned).
I'm not 100% that this is the cause, but the table var will not have any statistics whereas the temp table will.
SELECT INTO is a non-logged operation, which would likely explain most of the performance difference. INSERT creates a log entry for every operation.
Additionally, SELECT INTO is creating the table as part of the operation, so SQL Server knows automatically that there are no constraints on it, which may factor in.
If it takes over a full minute to insert 7000 records into a temp table (persistent or variable), then the perf issue is almost certainly in the SELECT statement that's populating it.
Have you run DBCC FREEPROCCACHE and DBCC DROPCLEANBUFFERS before profiling? I'm thinking that maybe it's using some cached results for the second query.