How can I optimise this Query? - sql-server-2005

How can I optimize this query if given the following query returns either all entries in the table or entries that match only up to current date ?
btw: The Query is targeted to a Oracle Linked Server on MS Sql 2005 as an Inline function.. Do not want this to be a table value function..
ALTER function [dbo].[ftsls031nnnHades](#withExpiredEntries bit =0)
returns table as return
select *
from openQuery(Hades ,"select '010' comno,
trim(t$cuno) t$cuno,
trim(t$cpgs) t$cpgs,
t$dile,
t$qanp,
to_char(t$stdt,'dd Mon yy') t$stdt,
to_char(t$tdat,'dd Mon yy') t$tdat,
to_char(t$disc,'999.99') t$disc,
t$damt,
t$cdis,
t$gnpr,
t$refcntd,
t$refcntu
from baan.ttdsls031010
where (to_char(t$Tdat,'yyyy-mm-dd') >= To_char(current_date,'yyyy-mm-dd'))
and (to_char(t$stdt,'yyyy-mm-dd') <= To_char(current_date,'yyyy-mm-dd'))
union all
select '020' comno,
trim(t$cuno) t$cuno,
trim(t$cpgs) t$cpgs,
t$dile,t$qanp,
to_char(t$stdt,'dd Mon yy') t$stdt,
to_char(t$tdat,'dd Mon yy') t$tdat,
to_char(t$disc,'999.99') t$disc,
t$damt,
t$cdis,
t$gnpr,
t$refcntd,
t$refcntu
from baan.ttdsls031020
where (to_char(t$tdAt,'yyyy-mm-dd') >= To_char(current_date,'yyyy-mm-dd'))
and (to_char(t$stdt,'yyyy-mm-dd') <= To_char(current_date,'yyyy-mm-dd')) ")
p.s: Column naming conventions may be alien to those who are of non BaaN .. Please excuese me for bringing up 'BaaN' conventions into StackOverflow.

Never perform any functional processing of your date column (t$Tdat and t$stdt are of this type, aren't they?) unless you have the corresponding function-based index. This approach doesn't allow you to use indexes on t$stdt and t$Tdat and drops the perfomance dramatically.
Instead, I would rewrite the where clause in the following way:
where t$Tdat >= current_date and t$stdt <= current_date
if current_date is of date type. If it's not, then you can use, for example, to_date(current_date, 'DD-MM-YYYY') instead of it.

Just in case be here now's tip - which is a good one - doesn't work:
you'll need to collect some data to know where time is being spent. Please read this OTN-thread to see how to do this for Oracle: http://forums.oracle.com/forums/thread.jspa?messageID=1812597. For SQL Server, the same principles apply: use their tools to find out where this query is spending time on.
Some general information you can share is:
How many rows are in those two tables
How many rows are returned by that query
Which indexes are present on those two tables
How long does the query currently take
What response time is acceptable, i.e. when are we done tuning
Regards,
Rob.

Not sure how much this will improve performance, but the first thing I'd do is replace the date to string conversion with just date functions. That is, use trunc() instead of to_char().

In the below way you can optimize the Baan Query
In Where condition use indexes and combine field if possible.
In where condition Use "Between/Inrange" when upper and lower limit specified.
Use "Refers To" if reference is available in data dictionary
Use few overlapping "Or" condition as possible
Use only selected field of table in select statement, Which is actually required.
Use "Order by" to get record in correct sorting format
If possible Don't use NOT INRANGE,BETWEEN,IN operators because that operator can scan full table.
Use commit.transaction() to prevent line being print twice.

Related

SQL partial date comparison

Which method is most efficient when comparing PARTS of date/datetime values? Example for comparing month of datetimes:
where insdate =DATEADD(month, DATEDIFF(month, 0, #insdate), 0)
or
where year(insdate)=year(#insdate) and month(insdate)=month(#insdate)
I'm using sql server
I disagree with Damien_The_Unbeliever's assertion that you should just use whichever reads cleaner, as there are objective reasons why one approach will be better than the other. The most pertinent of these is what is known as SARGability.
In essence this refers to whether SQL Server can use your values in the efficient manners it is designed to do, such as utilising indexes.
The differences in your two examples are nicely outlines here.
In short, if you have functions or calculated values on both sides of your equality conditions, SQL Server is definitely going to have to check every single value returned, whereas if you apply the principles of SARGability from the off, even if you don't see any significant benefits immediately you are at least in a better position to realise those benefits later on if required.
In my opinion, the best way to implement Year or YearMonth check is to cast date in this format YYYYMMDD and then work with that.
This is an example:
Filter by YearMonthDay
SELECT * FROM myTable
WHERE CONVERT(VARCHAR,MyField,112) = 20170607
Filter by YearMonth
SELECT * FROM myTable
WHERE CONVERT(VARCHAR,MyField,112) / 100 = 201706
Filter by Year
SELECT * FROM myTable
WHERE CONVERT(VARCHAR,MyField,112) / 10000 = 2017
For sure this perfomrs better than using Year() ,Month() , DateAdd(), DateDiff() functions.

applying knowledge of SQL for everyday workplace activities

My question is how to properly write a SQL query for the below highlighted/bold question.
There is a table in HMO database which stores doctor's working
hours.Table has following fields
"FirstName","LastName","Date","HoursWorked". write a sql statement
which retrieves average working hours for period January-March for a
doctor with name Joe Doe.
so far i have
SELECT HoursWorked
FROM Table
WHERE DATE = (January - March) AND
SELECT AVG(HoursWorked) FROM Table WHERE FirstName="Joe",LastName="Doe"*
A few pointers as this sounds like a homework question (which we don't answer for you here, but we can try to give you some guidance).
You want to put all the things you want to return from your select first and you want to have all your search conditions at the end.
So the general format would be :
SELECT Column1,
Column2,
Column3,
FROM YourTable
WHERE Column4 = Restriction1
AND Column5 = Restriction2
The next thing you need to think about is how the dates are formatted in your database table. Hopefully they're kept in a column of type datetime or date (options will depend on the database engine you're using, eg, Microsoft SQL Server, Oracle or MySql). In reality some older databases people use can store dates in all sorts of formats which makes this much harder, but since I'm assuming it's a homework type question, lets assume it's a datetime format.
You specify restrictions by comparing columns to a value, so if you wanted all rows where the date was after midnight on the 2nd of March 2012, you would have the WHERE clause :
WHERE MyDateColumn >= '2012-03-02 00:00:00'
Note that to avoid confusion, we usually try to format dates as "Year-Month-Day Hour:Minute:Second". This is because in different countries, dates are often written in different formats and this is considered a Universal format which is understood (by computers at least) everywhere.
So you would want to combine a couple of these comparisons in your WHERE, one for dates AFTER a certain date in time AND one for dates before another point in time.
If you give this a go and see where you get to, update your question with your progress and someone will be able to help get it finished if you have problems.
If you don't have access to an actual database and need to experiment with syntax, try this site : http://sqlfiddle.com/
you already have the answer written
SELECT AVG(HoursWorked) FROM Table WHERE FirstName="Joe",LastName="Doe"*
you only need to fix the query
SELECT AVG(HoursWorked) as AVGWORKED FROM Table WHERE FirstName='Joe' AND LastName='Doe'
That query will give you the average hours worked for Joe Doe, however you only need to get between some time you add the next "AND", if you are using SQL server you can use the built in function DateFromParts(year,month,day) to create a new Date, or if you are using another Database Engine you can convert a string to a DateColumn Convert(Date,'MM/dd/yyyy')
Example
SELECT AVG(HoursWorked) as AVGWORKED FROM Table WHERE FirstName='Joe' AND LastName='Doe' AND DateColumn between DateFromParts(year,month,day) and Convert(Date,'MM/dd/yyyy')
In the example i showed both approaches (datefromparts for the initial date, and convert(date) for the ending date).

How to get all rows from a table inserted in a particular date.

I am trying to write a query that gets all the rows of a table for a particular date.
SELECT * FROM MY_TABLE WHERE COLUMN_CONTAINING_DATE='2013-05-07'
However that does not work, because in the table the COLUMN_CONTAINING_DATE contains data like '2013-05-07 00:00:01' etc. So, this would work
SELECT * FROM MY_TABLE WHERE COLUMN_CONTAINING_DATE>='2013-05-07' AND COLUMN_CONTAINING_DATE<'2013-05-08'
However, I dont want to go for option 2 because that feels like a hacky way. I would rather put a query that says get me all the rows for a give date and somehow not bother about the minutes and hours in the COLUMN_CONTAINING_DATE.
I am trying to have this query run on both H2 and DB2.
Any suggestions?
You can do:
select *
from MY_Table
where trunc(COLUMN_CONTAINING_DATE) = '2013-05-07';
However, the version that you describe as a "hack" is actually better. By wrapping a function around the data, many SQL optimizers will not use indexes. With just direct comparisons, an index would definitely be used.
Use something like this
SELECT * FROM MY_TABLE WHERE COLUMN_CONTAINING_DATE=DATE('2013-05-07')
You can ease this if you use the Temporal data management capability from DB2 10.1.
For more information:
http://www.ibm.com/developerworks/data/library/techarticle/dm-1204db2temporaldata/
If your concerns are related to the different data types (timestamp in the column, and a string containing a date), you can do this:
SELECT * FROM MY_TABLE
WHERE
COLUMN_CONTAINING_DATE >= '2013-05-07 00:00:00'
and COLUMN_CONTAINING_DATE < '2013-05-08 00:00:00'
and I'd pay attention to the formatting of the where clause, because this will improve readability a lot, if you have to look at your queries two months later. Just pick a style you prefer for ranges like "a <= x < b". Unfortunately SQL's between does not support this.
One could argue that the milliseconds are still missing, so perfectionists may append another ".0" in the timestamp ...

Is SQL DATEDIFF(year, ..., ...) an Expensive Computation?

I'm trying to optimize up some horrendously complicated SQL queries because it takes too long to finish.
In my queries, I have dynamically created SQL statements with lots of the same functions, so I created a temporary table where each function is only called once instead of many, many times - this cut my execution time by 3/4.
So my question is, can I expect to see much of a difference if say, 1,000 datediff computations are narrowed to 100?
EDIT:
The query looks like this :
SELECT DISTINCT M.MID, M.RE FROM #TEMP INNER JOIN M ON #TEMP.MID=M.MID
WHERE ( #TEMP.Property1=1 ) AND
DATEDIFF( year, M.DOB, #date2 ) >= 15 AND DATEDIFF( year, M.DOB, #date2 ) <= 17
where these are being generated dynamically as strings (put together in bits and pieces) and then executed so that various parameters can be changed along each iteration - mainly the last lines, containing all sorts of DATEDIFF queries.
There are about 420 queries like this where these datediffs are being calculated like so. I know that I can pull them all into a temp table easily (1,000 datediffs becomes 50) - but is it worth it, will it make any difference in seconds? I'm hoping for an improvement better than in the tenths of seconds.
It depends on exactly what you are doing to be honest as to the extent of the performance hit.
For example, if you are using DATEDIFF (or indeed any other function) within a WHERE clause, then this will be a cause of poorer performance as it will prevent an index being used on that column.
e.g. basic example, finding all records in 2009
WHERE DATEDIFF(yyyy, DateColumn, '2009-01-01') = 0
would not make good use of an index on DateColumn. Whereas a better solution, providing optimal index usage would be:
WHERE DateColumn >= '2009-01-01' AND DateColumn < '2010-01-01'
I recently blogged about the difference this makes (with performance stats/execution plan comparisons), if you're interested.
That would be costlier than say returning DATEDIFF as a column in the resultset.
I would start by identifying the individual queries that are taking the most time. Check the execution plans to see where the problem lies and tune from there.
Edit:
Based on the example query you've given, here's an approach you could try out to remove the use of DATEDIFF within the WHERE clause. Basic example to find everyone who was 10 years old on a given date - I think the maths is right, but you get the idea anyway! Gave it a quick test, and seems fine. Should be easy enough to adapt to your scenario. If you want to find people between (e.g.) 15 and 17 years old on a given date, then that's also possible with this approach.
-- Assuming #Date2 is set to the date at which you want to calculate someone's age
DECLARE #AgeAtDate INTEGER
SET #AgeAtDate = 10
DECLARE #BornFrom DATETIME
DECLARE #BornUntil DATETIME
SELECT #BornFrom = DATEADD(yyyy, -(#AgeAtDate + 1), #Date2)
SELECT #BornUntil = DATEADD(yyyy, -#AgeAtDate , #Date2)
SELECT DOB
FROM YourTable
WHERE DOB > #BornFrom AND DOB <= #BornUntil
An important note to add, is for age caculates from DOB, this approach is more accurate. Your current implementation only takes the year of birth into account, not the actual day (e.g. someone born on 1st Dec 2009 would show as being 1 year old on 1st Jan 2010 when they are not 1 until 1st Dec 2010).
Hope this helps.
DATEDIFF is quite efficient compared to other methods of handling of datetime values, like strings. (see this SO answer).
In this case, it sounds like you going over and over the same data, which is likely more expensive than using a temp table. For example, statistics will be generated.
One thing you might be able do to improve performance might be to put an index on the temp table on MID.
Check your execution plan to see if it helps (may depend on the number of rows in the temp table).

How Does Dateadd Impact the Performance of a SQL Query?

Say for instance I'm joining on a number table to perform some operation between two dates in a subquery, like so:
select n
,(select avg(col1)
from table1
where timestamp between dateadd(minute, 15*n, #ArbitraryDate)
and dateadd(minute, 15*(n+1), #ArbitraryDate))
from numbers
where n < 1200
Would the query perform better if I, say, constructed the date from concatenating varchars than using the dateadd function?
Keeping data in the datetime format using DATEADD is most likely to be quicker
Check this question: Most efficient way in SQL Server to get date from date+time?
The accepted answer (not me!) demonstrates DATEADD over string conversions. I've seen another too many years ago that showed the same
Be careful with between and dates, take a look at How Does Between Work With Dates In SQL Server?
I once optmized a query to run from over 24 hours to 36 seconds. Just don't use date functions or conversions on the column , see here: Only In A Database Can You Get 1000% + Improvement By Changing A Few Lines Of Code
to see what query performs better, execute both queries and look at execution plans, you can also use statistics io and statistics time to get how many reads and the time it took to execute the queries
I would NOT go with concatenating varchars.
DateAdd will def be better performace than string contatenation, and casting to DATETIME.
As always, you best bet would be to profile the 2 options, and determine the best result, as no DB is specified.
most likely there will be no differenfce one way or another.
I would run this:
SET STATISTICS IO ON;
SET STATISTICS TIME ON;
followed by both variants of your query, so that you see and compare real execution costs.
As long as your predicate calculations do not include references to the columns of the table you're querying, your approach shouldn't matter either way (go for clarity).
If you were to include something from Table1 in the calculation, though, I'd watch out for table scans or covering index scans as it may no longer be sargable.
In any case, check (or post!) the execution plan to confirm.
Why would you ever use a correlated subquery to begin with? That's going to slow you up far more than dateadd. They are like cursors, they work row by row.
Will something like this work?
select n.n , avgcol1
from numbers n
left outer join
(
select avg(col1) as avgcol1, n
from table1
where timestamp between dateadd(minute, 15*n, #ArbitraryDate)
and dateadd(minute, 15*(n+1), #ArbitraryDate)
Group by n
) t
on n.n = t.n
where n < 1200