How to get all rows from a table inserted in a particular date. - sql

I am trying to write a query that gets all the rows of a table for a particular date.
SELECT * FROM MY_TABLE WHERE COLUMN_CONTAINING_DATE='2013-05-07'
However that does not work, because in the table the COLUMN_CONTAINING_DATE contains data like '2013-05-07 00:00:01' etc. So, this would work
SELECT * FROM MY_TABLE WHERE COLUMN_CONTAINING_DATE>='2013-05-07' AND COLUMN_CONTAINING_DATE<'2013-05-08'
However, I dont want to go for option 2 because that feels like a hacky way. I would rather put a query that says get me all the rows for a give date and somehow not bother about the minutes and hours in the COLUMN_CONTAINING_DATE.
I am trying to have this query run on both H2 and DB2.
Any suggestions?

You can do:
select *
from MY_Table
where trunc(COLUMN_CONTAINING_DATE) = '2013-05-07';
However, the version that you describe as a "hack" is actually better. By wrapping a function around the data, many SQL optimizers will not use indexes. With just direct comparisons, an index would definitely be used.

Use something like this
SELECT * FROM MY_TABLE WHERE COLUMN_CONTAINING_DATE=DATE('2013-05-07')

You can ease this if you use the Temporal data management capability from DB2 10.1.
For more information:
http://www.ibm.com/developerworks/data/library/techarticle/dm-1204db2temporaldata/

If your concerns are related to the different data types (timestamp in the column, and a string containing a date), you can do this:
SELECT * FROM MY_TABLE
WHERE
COLUMN_CONTAINING_DATE >= '2013-05-07 00:00:00'
and COLUMN_CONTAINING_DATE < '2013-05-08 00:00:00'
and I'd pay attention to the formatting of the where clause, because this will improve readability a lot, if you have to look at your queries two months later. Just pick a style you prefer for ranges like "a <= x < b". Unfortunately SQL's between does not support this.
One could argue that the milliseconds are still missing, so perfectionists may append another ".0" in the timestamp ...

Related

Why does using CONVERT(DATETIME, [date], [format]) in WHERE clause take so long?

I'm running the following code on a dataset of 100M to test some things out before I eventually join the entire range (not just the top 10) on another table to make it even smaller.
SELECT TOP 10 *
FROM Table
WHERE CONVERT(datetime, DATE, 112) BETWEEN '2020-07-04 00:00:00' AND '2020-07-04 23:59:59'
The table isn't mine but a client's, so unfortunately I'm not responsible for the data types of the columns. The DATE column, along with the rest of the data, is in varchar. As for the dates in the BETWEEN clause, I just put in a relatively small range for testing.
I have heard that CONVERT shouldn't be in the WHERE clause, but I need to convert it to dates in order to filter. What is the proper way of going about this?
Going to summarise my comments here, as they are "second class citizens" and thus could be removed.
Firstly, the reason your query is slow is because of theCONVERT on the column DATE in your WHERE. Applying functions to a column in your WHERE will almost always make your query non-SARGable (there are some exceptions, but that doesn't make them a good idea). As a result, the entire table must be scanned to find rows that are applicable for your WHERE; it can't use an index to help it.
The real problem, therefore, is that you are storing a date (and time) value in your table as a non-date (and time) datatype; presumably a (n)varchar. This is, in truth, a major design flaw and needs to be fixed. String type values aren't validated to be valid dates, so someone could easily insert the "date" '20210229' or even 20211332'. Fixing the design not only stops this, but also makes your data smaller (a date is 3 bytes in size, a varchar(8) would be 10 bytes), and you could pass strongly typed date and time values to your query and it would be SARGable.
"Fortunately" it appears your data is in the style code 112, which is yyyyMMdd; this at least means that the ordering of the dates is the same as if it were a strongly typed date (and time) data type. This means that the below query will work and return the results you want:
SELECT TOP 10 * --Ideally don't use * and list your columns properly
FROM dbo.[Table]
WHERE [DATE] >= '20210704' AND [DATE] < '20210705'
ORDER BY {Some Column};
you can use like this to get better performance:
SELECT TOP 10 *
FROM Table
WHERE cast(DATE as date) BETWEEN '2020-07-04' AND '2020-07-04' and cast(DATE as time) BETWEEN '00:00:00' AND '23:59:59'
No need to include time portion if you want to search full day.

applying knowledge of SQL for everyday workplace activities

My question is how to properly write a SQL query for the below highlighted/bold question.
There is a table in HMO database which stores doctor's working
hours.Table has following fields
"FirstName","LastName","Date","HoursWorked". write a sql statement
which retrieves average working hours for period January-March for a
doctor with name Joe Doe.
so far i have
SELECT HoursWorked
FROM Table
WHERE DATE = (January - March) AND
SELECT AVG(HoursWorked) FROM Table WHERE FirstName="Joe",LastName="Doe"*
A few pointers as this sounds like a homework question (which we don't answer for you here, but we can try to give you some guidance).
You want to put all the things you want to return from your select first and you want to have all your search conditions at the end.
So the general format would be :
SELECT Column1,
Column2,
Column3,
FROM YourTable
WHERE Column4 = Restriction1
AND Column5 = Restriction2
The next thing you need to think about is how the dates are formatted in your database table. Hopefully they're kept in a column of type datetime or date (options will depend on the database engine you're using, eg, Microsoft SQL Server, Oracle or MySql). In reality some older databases people use can store dates in all sorts of formats which makes this much harder, but since I'm assuming it's a homework type question, lets assume it's a datetime format.
You specify restrictions by comparing columns to a value, so if you wanted all rows where the date was after midnight on the 2nd of March 2012, you would have the WHERE clause :
WHERE MyDateColumn >= '2012-03-02 00:00:00'
Note that to avoid confusion, we usually try to format dates as "Year-Month-Day Hour:Minute:Second". This is because in different countries, dates are often written in different formats and this is considered a Universal format which is understood (by computers at least) everywhere.
So you would want to combine a couple of these comparisons in your WHERE, one for dates AFTER a certain date in time AND one for dates before another point in time.
If you give this a go and see where you get to, update your question with your progress and someone will be able to help get it finished if you have problems.
If you don't have access to an actual database and need to experiment with syntax, try this site : http://sqlfiddle.com/
you already have the answer written
SELECT AVG(HoursWorked) FROM Table WHERE FirstName="Joe",LastName="Doe"*
you only need to fix the query
SELECT AVG(HoursWorked) as AVGWORKED FROM Table WHERE FirstName='Joe' AND LastName='Doe'
That query will give you the average hours worked for Joe Doe, however you only need to get between some time you add the next "AND", if you are using SQL server you can use the built in function DateFromParts(year,month,day) to create a new Date, or if you are using another Database Engine you can convert a string to a DateColumn Convert(Date,'MM/dd/yyyy')
Example
SELECT AVG(HoursWorked) as AVGWORKED FROM Table WHERE FirstName='Joe' AND LastName='Doe' AND DateColumn between DateFromParts(year,month,day) and Convert(Date,'MM/dd/yyyy')
In the example i showed both approaches (datefromparts for the initial date, and convert(date) for the ending date).

How to combine the LIKE function with a DATE_PART function in PostgreSQL?

Using Postgres, but if someone knows how to do this in standard SQL that would be a great start. I am joining to a table via a character varying column. This column contains values such as:
PC11941.2004
PC14151.2004
PC21213.2003
SPC21434.2003
PC17715.04V1
PC18733.2002
0MRACCT_ALL.GLFUNCT
A lot of the numbers after the periods correspond to years. I want to join the table via the current year. So, for example, I could JOIN on the condition LIKE '%2015'.
But I want to create this view and never return to it so I would need to join it against something like (get_fy_part('YEAR', clock_timestamp()).
Not sure how I go about writing that. I haven't had success, yet.
You can get the current year with date_part('year', CURRENT_DATE)
Something like this should work:
SELECT * FROM mytable WHERE mycolumn LIKE ('%' || date_part('year', CURRENT_DATE))
The || operator concatenates the percent-sign with the year.
I hope that helps!
Use the function RIGHT().
SELECT originalColumn, RIGHT(originalColumn,4)
FROM table;
This will get you the years you are interested in.
If you want everything after the dot, then something like:
SELECT originalColumn, RIGHT(originalColumn,len(originalColumn)-position('.' in originalColumn))
FROM table
Depends on the exact rules - and actually implemented CHECK constraints for the column.
If there is always a single dot in your column col and all your years have 4 digits:
Basic solution
SELECT * FROM tbl
WHERE col LIKE to_char(now(), '"%."YYYY');
Why?
It's most efficient to compare to the same data type. Since the column is a character type (varchar), rather use to_char() (returns text, which is effectively the same as varchar) than EXTRACT or date_part() (return double precision).
More importantly, this expression is sargable. That's generally cheapest and allows (optional) index support. In your case, a trigram index would work:
PostgreSQL LIKE query performance variations
Optimize
If you want to be as fast (read performance) and accurate as possible, and your table has more than a trivial number of rows, go with a specialized partial expression index:
CRATE INDEX tbl_year_idx ON tbl (cast(right(col, 4) AS int) DESC)
WHERE col ~ '\.\d{4}$'; -- ends with a dot and 4 digits
Matching query:
SELECT * FROM tbl
WHERE col ~ '\.\d{4}$' -- repeat index condition
AND right(col, 4)::int = EXTRACT(year FROM col);
Test performance with EXPLAIN ANALYZE.
You could even go one step further and tailor the index for the current year:
CRATE INDEX tbl_year2015_idx ON tbl (tbl_id) -- any (useful?) column
WHERE col LIKE '%.2015';
Works with the first "basic" query.
You would have to (re-)create the index for each year. A simple solution would be to create indexes for a couple of years ahead and append another one each year automatically ...
This is also the point where you consider the alternative: store the year as redundant integer column in your table and simplify the rest.
That's what I would do.

How can I optimise this Query?

How can I optimize this query if given the following query returns either all entries in the table or entries that match only up to current date ?
btw: The Query is targeted to a Oracle Linked Server on MS Sql 2005 as an Inline function.. Do not want this to be a table value function..
ALTER function [dbo].[ftsls031nnnHades](#withExpiredEntries bit =0)
returns table as return
select *
from openQuery(Hades ,"select '010' comno,
trim(t$cuno) t$cuno,
trim(t$cpgs) t$cpgs,
t$dile,
t$qanp,
to_char(t$stdt,'dd Mon yy') t$stdt,
to_char(t$tdat,'dd Mon yy') t$tdat,
to_char(t$disc,'999.99') t$disc,
t$damt,
t$cdis,
t$gnpr,
t$refcntd,
t$refcntu
from baan.ttdsls031010
where (to_char(t$Tdat,'yyyy-mm-dd') >= To_char(current_date,'yyyy-mm-dd'))
and (to_char(t$stdt,'yyyy-mm-dd') <= To_char(current_date,'yyyy-mm-dd'))
union all
select '020' comno,
trim(t$cuno) t$cuno,
trim(t$cpgs) t$cpgs,
t$dile,t$qanp,
to_char(t$stdt,'dd Mon yy') t$stdt,
to_char(t$tdat,'dd Mon yy') t$tdat,
to_char(t$disc,'999.99') t$disc,
t$damt,
t$cdis,
t$gnpr,
t$refcntd,
t$refcntu
from baan.ttdsls031020
where (to_char(t$tdAt,'yyyy-mm-dd') >= To_char(current_date,'yyyy-mm-dd'))
and (to_char(t$stdt,'yyyy-mm-dd') <= To_char(current_date,'yyyy-mm-dd')) ")
p.s: Column naming conventions may be alien to those who are of non BaaN .. Please excuese me for bringing up 'BaaN' conventions into StackOverflow.
Never perform any functional processing of your date column (t$Tdat and t$stdt are of this type, aren't they?) unless you have the corresponding function-based index. This approach doesn't allow you to use indexes on t$stdt and t$Tdat and drops the perfomance dramatically.
Instead, I would rewrite the where clause in the following way:
where t$Tdat >= current_date and t$stdt <= current_date
if current_date is of date type. If it's not, then you can use, for example, to_date(current_date, 'DD-MM-YYYY') instead of it.
Just in case be here now's tip - which is a good one - doesn't work:
you'll need to collect some data to know where time is being spent. Please read this OTN-thread to see how to do this for Oracle: http://forums.oracle.com/forums/thread.jspa?messageID=1812597. For SQL Server, the same principles apply: use their tools to find out where this query is spending time on.
Some general information you can share is:
How many rows are in those two tables
How many rows are returned by that query
Which indexes are present on those two tables
How long does the query currently take
What response time is acceptable, i.e. when are we done tuning
Regards,
Rob.
Not sure how much this will improve performance, but the first thing I'd do is replace the date to string conversion with just date functions. That is, use trunc() instead of to_char().
In the below way you can optimize the Baan Query
In Where condition use indexes and combine field if possible.
In where condition Use "Between/Inrange" when upper and lower limit specified.
Use "Refers To" if reference is available in data dictionary
Use few overlapping "Or" condition as possible
Use only selected field of table in select statement, Which is actually required.
Use "Order by" to get record in correct sorting format
If possible Don't use NOT INRANGE,BETWEEN,IN operators because that operator can scan full table.
Use commit.transaction() to prevent line being print twice.

How Does Dateadd Impact the Performance of a SQL Query?

Say for instance I'm joining on a number table to perform some operation between two dates in a subquery, like so:
select n
,(select avg(col1)
from table1
where timestamp between dateadd(minute, 15*n, #ArbitraryDate)
and dateadd(minute, 15*(n+1), #ArbitraryDate))
from numbers
where n < 1200
Would the query perform better if I, say, constructed the date from concatenating varchars than using the dateadd function?
Keeping data in the datetime format using DATEADD is most likely to be quicker
Check this question: Most efficient way in SQL Server to get date from date+time?
The accepted answer (not me!) demonstrates DATEADD over string conversions. I've seen another too many years ago that showed the same
Be careful with between and dates, take a look at How Does Between Work With Dates In SQL Server?
I once optmized a query to run from over 24 hours to 36 seconds. Just don't use date functions or conversions on the column , see here: Only In A Database Can You Get 1000% + Improvement By Changing A Few Lines Of Code
to see what query performs better, execute both queries and look at execution plans, you can also use statistics io and statistics time to get how many reads and the time it took to execute the queries
I would NOT go with concatenating varchars.
DateAdd will def be better performace than string contatenation, and casting to DATETIME.
As always, you best bet would be to profile the 2 options, and determine the best result, as no DB is specified.
most likely there will be no differenfce one way or another.
I would run this:
SET STATISTICS IO ON;
SET STATISTICS TIME ON;
followed by both variants of your query, so that you see and compare real execution costs.
As long as your predicate calculations do not include references to the columns of the table you're querying, your approach shouldn't matter either way (go for clarity).
If you were to include something from Table1 in the calculation, though, I'd watch out for table scans or covering index scans as it may no longer be sargable.
In any case, check (or post!) the execution plan to confirm.
Why would you ever use a correlated subquery to begin with? That's going to slow you up far more than dateadd. They are like cursors, they work row by row.
Will something like this work?
select n.n , avgcol1
from numbers n
left outer join
(
select avg(col1) as avgcol1, n
from table1
where timestamp between dateadd(minute, 15*n, #ArbitraryDate)
and dateadd(minute, 15*(n+1), #ArbitraryDate)
Group by n
) t
on n.n = t.n
where n < 1200