Shorter Date Interval results in longer execution time - sql

I have this query:
SELECT
COUNT(*) AS 'RedactedCount'
,s.Redacted1
,s.[Redacted2]
,s.[Redacted3] AS 'Redacted3'
FROM RedactedTable1 s
LEFT OUTER JOIN RedactedTable2 g ON s.Redacted5= g.Redacted5
WHERE g.Redacted6= 31013 AND s.DateTime >= '2013-03-02 00:00:00'
GROUP BY s.Redacted1,s.Redacted2, s.Redacted3
Which have a very odd behavior. This query takes a whole 1min30secs to complete. Should I change de date to 2013-04-02 00:00:00 (today as I'm writing this post), it is near instant, which is the expected behavior.
But if I change the date to 2013-02-02 (a 2 months time span instead of 1), the query takes only 20 secs.
Does Anyone has encountered this problem ? I am completely stunned on the result. It will also be an important SQL request of a web application that I am working on.
Microsoft SQL Server Management Studio 11.0.3128.0
Microsoft Data Access Components (MDAC) 6.1.7601.17514
Microsoft MSXML 3.0 6.0
Microsoft Internet Explorer 9.0.8112.16421
Microsoft .NET Framework 4.0.30319.296
Système d'exploitation 6.1.7601
Note: The database is poorly designed, and contains absolutely no indexes. Yes, this is bad. Unfortunately, this is a commercial software and I have no rights to make changes on the database model. However, I do not think that the problem I have is caused by this.
P.S.: Sorry if my query is heavily redacted as I am on a strict NDA. I tried to made it as readable as possible.
Thanks !

First of all, it is pointless to put a predicate condition on a table on the outer side of an outer join. As soon as you do this, all rows in the final result set that do not have a match in that table are eliminated, effectively making the overall query behave as though it was an inner join.
The condition on RedactedTable2.Redacted6 should be part of the join conditions if you want the join to include rows where there is no matching row in table RedactedTable2.
SELECT COUNT(*) AS 'RedactedCount',
s.Redacted1, s.[Redacted2],
s.[Redacted3] AS 'Redacted3'
FROM RedactedTable1 s
LEFT JOIN RedactedTable2 g
ON g.Redacted5= s.Redacted5
And g.Redacted6= 31013
WHERE s.DateTime >= '2013-03-02 00:00:00'
GROUP BY s.Redacted1,s.Redacted2, s.Redacted3
As to why the difference in performance, my suspicion is that your issue is caused by something in the data in the tables that is causing the query processor to use a different execution plan in one case than it was in the other. This can easily happen. If the optimizer "guesses" that it would need to examine more than a certain percentage of data rows using one query execution plan, based on database statistics about the distribution of data values in the tables), then it will switch to a different plan.
Run both queries with the ShowPlan option turned on, and look and see what the differences are.

In addition to what Charles suggested you might ask the database administrator (assuming you have one) to run UPDATE STATISTICS on at least RedactedTable1 and RedactedTable2. UPDATE STATISTICS requires you to have ALTER permissions on the table/view so I doubt you have permissions to run it. But you can probably ask for it to be done. Problems like what you are describing are frequently caused by out of date statistics.

Aaron Bertrand got the answer in a comment.
This problem was caused by parameter sniffing done by MSSQL.
Declaring and using dummy variables prevents MSSQL to falsely optimize the query using past optimizations.
The following link helped me learn about parameter sniffing http://blogs.technet.com/b/mdegre/archive/2012/03/19/what-is-parameter-sniffing.aspx

Related

ms access query (ms access freezes)

I have this report and need to add totals for each person (the red circle)
existing report
new report
I cannot change the existing report so I export data from MS SQL to MS Access and create a new report there. I got it working for one employee but have trouble with a query which would for multiple employees.
This query extract data use as input:
SELECT [TIME].[RCD_NUM], [TIME].[EMP_ID], [TIME].[PPERIOD], [TIME].[PRUN], [TIME].[TDATE], [TIME].[PC], [TIME].[RATE], [TIME].[HOURS], [TIME].[AMOUNT], [TIME].[JOB_ID], [TIME].[UPDATED], [TIME].[UPDATED_BY], [TIME].[LOG_DATE], [TIME].[ORIGINAL_REC_NUM]
FROM [TIME]
WHERE ((([TIME].[EMP_ID])=376) And (([TIME].[TDATE])<=#12/31/2006# And ([TIME].[TDATE])>=#1/1/2006#) And (([TIME].[PC])<599));
this query populates the report:
SELECT *
FROM TIME1
WHERE RCD_NUM = (SELECT Max(RCD_NUM) FROM [TIME1] UQ WHERE UQ.PPERIOD = [TIME1].PPERIOD AND UQ.PC = [TIME1].PC);
the problem is if I remove EMP_ID from the first query like this
SELECT [TIME].[RCD_NUM], [TIME].[EMP_ID], [TIME].[PPERIOD], [TIME].[PRUN], [TIME].[TDATE], [TIME].[PC], [TIME].[RATE], [TIME].[HOURS], [TIME].[AMOUNT], [TIME].[JOB_ID], [TIME].[UPDATED], [TIME].[UPDATED_BY], [TIME].[LOG_DATE], [TIME].[ORIGINAL_REC_NUM]
FROM [TIME]
WHERE ((([TIME].[TDATE])<=#12/31/2006# And ([TIME].[TDATE])>=#1/1/2006#) And (([TIME].[PC])<599));
then the second query doesn't work and ms access freezes when running this query.
any help/idea please?
Caveat: I won't pretend to know the precise cause of the problem, but I have had to repeatedly refactor queries in Access to get them working even though the original SQL statements are completely valid in regards to syntax and logic. Sometimes I've had to convolute a sequence of queries just to avoid bugs in Access. Access is often rather dumb and will simply (re)execute queries and subqueries exactly as given without optimization. At other times Access will attempt to combine queries by performing some internal optimizations, but sometimes those introduce frustrating bugs. Something as simple as a name change or column reordering can be the difference between a functioning query and one that crashes or freezes Access.
First consider:
Can you leave the data on SQL Server and link to the results in Access (rather than export/importing it into Access)? Even if you need or prefer to use Access for creating the actual report, you could use all the power of SQL Server for querying the data--it is likely less buggy and more efficient.
Common best practice is to create SQL Server stored procedures that return just what data you need in Access. A pass-through query is created in Access to retrieve the data, but all data operations are performed on the server.
Perhaps this is just a performance issue where limiting the set by [EMP_ID] selects a small subset, but the full table is large enough to "freeze" Access.
How long have you let Access remain frozen before killing the process? Be patient... like many, many minutes (or hours). Start it in the morning and check after lunch. :) It might eventually return a result set. This does not imply it is tolerable or that there is no other solution, but it can be useful to know if it eventually returns data or not.
How many possible records are there?
Are the imported data properly indexed? Add indexes to all key fields and those which are used in WHERE clauses.
Is the database located on a network share or is it local? Try copying the database to a local drive.
Other hints:
Try the BETWEEN operator for dates in the WHERE clause.
Try refactoring the "second" query by performing a join in the FROM clause rather than the WHERE clause. In doing this, you may also want to save the subquery as a named query (just as [TIME1] is saved). Whether or not a query is saved or embedded in another statement CAN change the behavior of Access (see caveat) even though the results should be identical.
Here's a version with the embedded aggregate query. Notice how all column references are qualified with the source. Some of the original query's columns do not have a source alias prefixing the column name. Remember the caveat... such picky details can affect Access behavior.:
SELECT TIME1.*
FROM TIME1 INNER JOIN
(SELECT UQ.PPERIOD, UQ.PC, Max(UQ.RCD_NUM) As Max_RCD_NUM
FROM [TIME1] UQ
GROUP BY UQ.PPERIOD, UQ.PC) As TIMEAGG
ON (TIME1.PPERIOD = TIMEAGG.PPERIOD) And (TIME1.PC = TIMEAGG.PC)
AND (TIME1.RCD_NUM = TIMEAGG.Max_RCD_NUM)

Subtract hours from SQL Server 2012 query result

I am running queries on an alarm system signal automation platform database in SQL Server 2012 Management Studio, and I am running into some hiccups.
My queries run just fine, but I am unable to refine my results to the level that I would like.
I am selecting some columns that are formatted as DATETIME, and I simply want to take the value in the column and subtract 4 hours from it (i.e., from GMT to EST) and then output that value into the query results.
All of the documentation I can find regarding DATESUB() or similar commands are showing examples with a specific DATETIME in the syntax, and I don't have anything specific, just 138,000 rows with columns I want to adjust time zones for.
Am I missing something big or will I just need to continue to adjust manually after I being manipulating my query result? Also, in case it makes a difference, I have a read-only access level, and am not interested in altering table data in any way.
Well, for starters, you need to know that you aren't restricted to use functions only on static values, you can use them on columns.
It seems that what you want is simply:
SELECT DATEADD(HOUR,-4,YourColumnWithDateTimes)
FROM dbo.YourTable
Maybe it will be helpful
SELECT DATEPART(HOUR, GETDATE());
DATEPART docs from MSDN

Why are dot-separated prefixes ignored in the column list for INSERT statements?

I've just come across some SQL syntax that I thought was invalid, but actually works fine (in SQL Server at least).
Given this table:
create table SomeTable (FirstColumn int, SecondColumn int)
The following insert statement executes with no error:
insert SomeTable(Any.Text.Here.FirstColumn, It.Does.Not.Matter.What.SecondColumn)
values (1,2);
The insert statement completes without error, and checking select * from SomeTable shows that it did indeed execute properly. See fiddle: http://sqlfiddle.com/#!6/18de0/2
SQL Server seems to just ignore anything except the last part of the column name given in the insert list.
Actual question:
Can this be relied upon as documented behaviour?
Any explanation about why this is so would also be appreciated.
It's unlikely to be part of the SQL standard, given its dubious utility (though I haven't checked specifically (a)).
What's most likely happening is that it's throwing away the non-final part of the column specification because it's superfluous. You have explicitly stated what table you're inserting into, with the insert into SomeTable part of the command, and that's the table that will be used.
What you appear to have done here is to find a way to execute SQL commands that are less readable but have no real advantage. In that vein, it appears similar to the C code:
int nine = 9;
int eight = 8;
xyzzy = xyzzy + nine - eight;
which could perhaps be better written as xyzzy++; :-)
I wouldn't rely on it at all, possibly because it's not standard but mostly because it makes maintenance harder rather than easier, and because I know DBAs all over the world would track me down and beat me to death with IBM DB2 manuals, their choice of weapon due to the voluminous size and skull-crushing abilities :-)
(a) I have checked non-specifically, at least for ISO 9075-2:2003 which dictates the SQL03 language.
Section 14.8 of that standard covers the insert statement and it appears that the following clause may be relevant:
Each column-name in the insert-column-list shall identify an updatable column of T.
Without spending a huge amount of time (that document is 1,332 pages long and would take several days to digest properly), I suspect you could argue that the column could be identified just using the final part of the column name (by removing all the owner/user/schema specifications from it).
Especially since it appears only one target table is possible (updatable views crossing table boundaries notwithstanding):
<insertion target> ::= <table name>
Fair warning: I haven't checked later iterations of the standard so things may have changed. But I'd consider that unlikely since there seems to be no real use case for having this feature.
This was reported as a bug on Connect and despite initially encouraging comments the current status of the item is closed as "won't fix".
The Order by clause used to behave in a similar fashion but that one was fixed in SQL Server 2005.
I get errors when I try to run the script on SQL Server 2012 as well as SQL Server 2014 and SQL Server 2008 R2. So you can certainly not rely on the behavior you see with sqlfiddle.
Even if this were to work, I would never rely on undocumented behavior in production code. Microsoft will include notice of breaking changes with documented features but not undocumented ones. So if this were an actual T-SQL parsing bug that was later fixed, it would break the mal-formed code.

IBM Informix SQL Syntax assistance

Wondering if anyone may be able to help me with some SQL here. I'm tasked with retrieving some data from a legacy DB system - It's an IBM Informix DB running v7.23C1. It may well be that what I'm trying to do here is pretty simple, but for the life of me I can't figure it out.
I'm used to MS SQL Server, rather than any other DB system and this one seems quite old: http://publib.boulder.ibm.com/epubs/pdf/3731.pdf (?)
Basically, I just want to run a query that includes nesting, but I can't seem to figure out how to do this. So for example, I have a query that looks like this:
SELECT cmprod.cmp_product,
(stock.stk_stkqty - stock.stk_allstk) stk_bal,
stock.stk_ospurch,
stock.stk_backord,
'Current Sales Period',
'Current Period -1',
'Current Period -2',
cmprod.cmp_curcost,
stock.stk_credate,
stock.stk_lastpurch,
stock.stk_binref
FROM informix.stock stock,
informix.cmprod cmprod
WHERE stock.stk_product = cmprod.cmp_product
AND (cmp_category = 'VOLV'
OR cmp_category = 'VOLD'
OR cmp_category = 'VOLA')
AND stk_loc = 'ENG';
Now, basically where I have values like 'Current Period -1' I want to include a nested field which will run a query to get the sales within a given date range. I'm sure I can put those together separately, but can't seem to get the compiler to be happy with my code when executed altogether.
Probably something like (NB, this specific query is for another column, but you get the idea):
SELECT s.stmov_product, s.stmov_trandate, s.stmov_qty
FROM informix.stmove s
WHERE s.stmov_product = '1066823'
AND s.stmov_qty > 0
AND s.stmov_trandate IN (
SELECT MAX(r.stmov_trandate)
FROM informix.stmove r
WHERE r.stmov_product = '1066823'
AND r.stmov_qty > 0)
What makes things a little worse is I don't have access to the server that this DB is running on. At the moment I have a custom C# app that connects via an ODBC driver and executes the raw SQL, parsing the results back into a .CSV.
Any and all help appreciated!
Under all circumstances, Informix 7.23 is so geriatric that it is unkind to be still running it. It is not clear whether this is an OnLine (Informix Dynamic Server, IDS) or SE (Standard Engine) database. However, 7.23 was the version prior to the Y2K-certified 7.24 releases, so it is 15 years old or thereabouts, maybe a little older.
The syntaxes supported by Informix servers back in the days of 7.23 were less comprehensive than they are in current versions. Consequently, you'll need to be careful. You should have the manuals for the server — someone, somewhere in your company should. If not, you'll need to try finding it in the graveyard manuals section of the IBM Informix web pages (start at http://www.informix.com/ for simplicity of URL; however, archaic manuals take some finding, but you should be able to get there from http://pic.dhe.ibm.com/infocenter/ifxhelp/v0/index.jsp choosing 'Servers' in the LHS).
If you are trying to write:
SELECT ...
(SELECT ... ) AS 'Current - 1',
(SELECT ... ) AS 'Current - 2',
...
FROM ...
then you need to study the server SQL Syntax for 7.23 to know whether it is allowed. AFAICR, OnLine (Informix Dynamic Server) would allow it and SE probably would not, but that is far from definitive. I simply don't remember what the limitations were in that ancient a version.
Judging from the 7.2 Informix Guide to SQL: Syntax manual (dated April 1996 — 17 years old), you cannot put a (SELECT ...) in the select-list in this version of Informix.
You may have to create a temporary table holding the results you want (along with appropriate key information), and then select from the temporary table in the main query.
This sort of thing is one of the problems with not updating your server for so long.
Sorry to be blunt, but can you at least have mercy on us by shortening the syntax?.. table.columns can be presented AS aliases. Meanwhile, if you cant upgrade to a newer version of Informix, you will have to rely on one or more SELECT INTO temp table queries in order to achieve your objective, which BTW, would make your coding more portable across different versions. You also have to evaluate whether using TEMP tables implies unacceptable processing times.

Providex Query Performance

I am running a query against a providex database that we use in MAS 90. The query has three tables joined together, and has been slow but not unbearably, taking about 8 minutes per run. The query has a fair number of conditions in the where clause:
I'm going to omit the select part of the query as its long and simple, just a list of fields from the three tables that are to be used in the results.
But the tables and the where clauses in the 8 minute run time version are:
(The first parameter is the lower bound of the user-selected date range, the second is the upper bound.)
FROM "AR_InvoiceHistoryDetail" "AR_InvoiceHistoryDetail",
"AR_InvoiceHistoryHeader" "AR_InvoiceHistoryHeader", "IM1_InventoryMasterfile"
"IM1_InventoryMasterfile"
WHERE "AR_InvoiceHistoryDetail"."InvoiceNo" = "AR_InvoiceHistoryHeader"."InvoiceNo"
AND "AR_InvoiceHistoryDetail"."ItemCode" = "IM1_InventoryMasterfile"."ItemNumber"
AND "AR_InvoiceHistoryHeader"."SalespersonNo" = 'SMC'
AND "AR_InvoiceHistoryHeader"."OrderDate" >= #p_dr
AND "AR_InvoiceHistoryHeader"."OrderDate" <= #p_d2
However, it turns out that another date field in the same table needs to be the one that the Date Range is compared with. So I changed the Order Dates at the end of the WHERE clause to InvoiceDate. I haven't had the query run successfully at all yet. And I've waited over 40 minutes. I have no control over indexing because this is a MAS 90 database which I don't believe I can directly change the database characteristics of.
What could cause such a large (at least 5 fold) difference in performance. Is it that OrderDate might have been indexed while InvoiceDate was not? I have tried BETWEEN clauses but it doesn't seem to work in the providex dialect. I am using the ODBC interface through .NET in my custom report engine. I have been debugging the report and it is running at the database execution point when I asked VS to Break All, at the same spot where the 8 minute report was waiting, so it is almost certainly either something in my query or something in the database that is screwed up.
If its just the case that InvoiceDates aren't indexed, what else can I do in the providex dialect of SQL to optimize the performance of these queries? Should I change the order of my criteria? This report gets results for a specific salesperson which is why the SMC clause exists. The prior clauses are for the inner joins, and the last clause is for the date range.
I used an identical date range in both the OrderDate and InvoiceDate versions and have ran them all mulitiple times and got the same results.
I still don't know exactly why it was so slow, but we had another problem with the results coming from the query (we switched back to using OrderDate). We weren't getting some of the results because of the nature of the IM1 table.
So I added a Left Outer Join once I figured out Providex's syntax for that. And for some reason, even though we still have 3 tables, it runs a lot faster now.
The new query criteria are:
FROM "AR_InvoiceHistoryHeader" "AR_InvoiceHistoryHeader",
{OJ "AR_InvoiceHistoryDetail" "AR_InvoiceHistoryDetail"
LEFT OUTER JOIN "IM1_InventoryMasterfile" "IM1_InventoryMasterfile"
ON "AR_InvoiceHistoryDetail"."ItemCode" =
"IM1_InventoryMasterfile"."ItemNumber" }
WHERE "AR_InvoiceHistoryDetail"."InvoiceNo" =
"AR_InvoiceHistoryHeader"."InvoiceNo" AND
"AR_InvoiceHistoryHeader"."SalespersonNo" = 'SMC'
AND "AR_InvoiceHistoryHeader"."InvoiceDate" >= ?
AND "AR_InvoiceHistoryHeader"."InvoiceDate" <= ?
Strange, but at least I learned more of the world of Providex Sql in the process.
I've never used providex before.
A search turned up this reference article on the syntax for creating an index.
Looking over your query, there's three tables and five criteria. Two of the criteria are "join criteria", and three criteria are filtering criteria:
AND "AR_InvoiceHistoryHeader"."SalespersonNo" = 'SMC'
AND "AR_InvoiceHistoryHeader"."OrderDate" >= #p_dr
AND "AR_InvoiceHistoryHeader"."OrderDate" <= #p_d2
I don't know how good SalespersonNo is for limiting return results, but it might be good to add an index on that.
I haven't used .NET so my question may show ignorance, but in Access you must use a SQL Pass-Through query to wring any results from ProvideX, if more than one table is involved.