SQL DateDifference in a where clause - sql

I m doing a query as follows:
SELECT
*
FROM a
WHERE DATEDIFF(D, a.DateValue, DateTimeNow) < 3;
and not working
I m trying to get the data that s not older than 3 days.
SQL server.
How to do this?
DATEDIFF works too slow..

DateDiff is extremely fast... Your problem is you are running it on the database table column value, so the query processor must run the function on every row in the table, even if there was an index on this column. This means it has to load the entire table from disk.
Instead, use the dateAdd function on todays date, and compare the database table column to the result of that single calculation. Now it only runs DateAdd() once, and it can use an index (if one exists), to only load the rows that match the predicate criterion.
Where a.DateValue > DateAdd(day,-3,getdate())
doing this in this way makes your query predicate SARG-able

Microsoft's documentation at http://msdn.microsoft.com/en-us/library/aa258269%28v=sql.80%29.aspx suggests that instead of DateTimeNow you should have getdate(). Does it work any better that way?

Your query doesn't seem to bad. Another way to tackle it would be:
SELECT * FROM a WHERE a.DateValue > DATEADD(dd,-3,GETDATE())

Related

Query a time that is less than a specific one

I need to query a time that is less than 10 seconds.
I have this table :
CREATE TABLE Results(
id_result SERIAL PRIMARY KEY,
score REAL,
temps TIME );
Thank you
You would just use a where clause:
select r.*
from results r
where r.temps < '00:00:10'::time
If you are using SQL server 2008 or higher version, take a look at DATEPART() function. You can use DATEPART() function to get the SECOND part of the DateTime in SQL Server. This should also give you an idea of what you need to do in case you want to work with milliseconds, minutes etc. in the future. So, in your case, in the WHERE clause of your query, you could use below conditions in WHERE clause:
WHERE DATEPART(second,temps) > 10
OR
WHERE DATEPART(ss,temps) > 10
OR
WHERE DATEPART(s,temps) > 10

What is the fastest way to perform a date query in Oracle SQL?

We have a 6B row table that is giving us challenges when retrieving data.
Our query returns values instantly when doing a...
SELECT * WHERE Event_Code = 102225120
That type of instant result is exactly what we need. We now want to filter to receive values for just a particular year - but the moment we add...
AND EXTRACT(YEAR FROM PERFORMED_DATE_TIME) = 2017
...the query takes over 10 minutes to begin returning any values.
Another SO post mentions that indexes don't necessarily help date queries when pulling many rows as opposed to an individual row. There are other approaches like using TRUNC, or BETWEEN, or specifying the datetime in YYYY-MM-DD format for doing comparisons.
Of note, we do not have the option to add indexes to the database as it is a vendor's database.
What is the way to add a date filtering query and enable Oracle to begin streaming the results back in the fastest way possible?
Another SO post mentions that indexes don't necessarily help date queries when pulling many rows as opposed to an individual row
That question is quite different from yours. Firstly, your statement above applies to any data type, not only dates. Also the word many is relative to the number of records in the table. If the optimizer decides that the query will return many of all records in your table, then it may decide that a full scan of the table is faster than using the index. In your situation, this translates to how many records are in 2017 out of all records in the table? This calculation gives you the cardinality of your query which then gives you an idea if an index will be faster or not.
Now, if you decide that an index will be faster, based on the above, the next step is to know how to build your index. In order for the optimizer to use the index, it must match the condition that you're using. You are not comparing dates in your query, you are only comparing the year part. So an index on the date column will not be used by this query. You need to create an index on the year part, so use the same condition to create the index.
we do not have the option to add indexes to the database as it is a vendor's database.
If you cannot modify the database, there is no way to optimize your query. You need to talk to the vendor and get access to modify the database or ask them to add the index for you.
A function can also cause slowness for the number of records involved. Not sure if Function Based Index can help you for this, but you can try.
Had you tried to add a year column in the table? If not, try to add a year column and update it using code below.
UPDATE table
SET year = EXTRACT(YEAR FROM PERFORMED_DATE_TIME);
This will take time though.
But after this, you can run the query below.
SELECT *
FROM table
WHERE Event_Code = 102225120 AND year = 2017;
Also, try considering Table Partitioned for this big data. For starters, see link below,
link: https://oracle-base.com/articles/8i/partitioned-tables-and-indexes
Your question is a bit ambiguous IMHO:
but the moment we add...
AND EXTRACT(YEAR FROM PERFORMED_DATE_TIME) = 2017
...the query takes over 10 minutes to begin returning any values.
Do you mean that
SELECT * WHERE Event_Code = 102225120
is fast, but
SELECT * WHERE Event_Code = 102225120 AND EXTRACT(YEAR FROM PERFORMED_DATE_TIME) = 2017
is slow???
For starters I'll agree with Mitch Wheat that you should try to use PERFORMED_DATE_TIME between Jan 1, 2017 and Dec 31, 2017 instead of Year(field) = 2017. Even if you'd have an index on the field, the latter would hardly be able to make use of it while the first method would benefit enormously.
I'm also hoping you want to be more specific than just 'give me all of 2017' because returning over 1B rows is NEVER going to be fast.
Next, if you can't make changes to the database, would you be able to maintain a 'shadow' in another database? This would require that you create a table with all date-values AND the PK of the original table in another database and query those to find the relevant PK values and then JOIN those back to your original table to find whatever you need. The biggest problem with this would be that you need to keep the shadow in sync with the original table. If you know the original table only changes overnight, you could merge the changes in the morning and query all day. If the application is 'real-time(ish)' then this probably won't work without some clever thinking... And yes, your initial load of 6B values will be rather heavy =)
May this could be usefull (because you avoid functions (a cause for context switching) and if you have an index on your date field, it could be used) :
with
dt as
(
select
to_date('01/01/2017', 'DD/MM/YYYY') as d1,
to_date('31/01/2017', 'DD/MM/YYYY') as d2
from dual
),
dates as
(
select
dt.d1 + rownum -1 as d
from dt
connect by dt.d1 + rownum -1 <= dt.d2
)
select *
from your_table, dates
where dates.d = PERFORMED_DATE_TIME
Move the date literal to RHS:
AND PERFORMED_DATE_TIME >= date '2017-01-01'
AND PERFORMED_DATE_TIME < date '2018-01-01'
But without an (undisclosed) appropriate index on PERFORMED_DATE_TIME, the query is unlikely to be any faster.
One option to create indexes in third party databases is to script in the index and then before any vendor upgrade run a script to remove any indexes you've added. If the index is important, ask the vendor to add it to their database design.

How to improve a SQL timestamp range query?

I have a table records having three fields:
id - the row id
value - the row value
source - the source of the value
timestamp - the time when the row was inserted (should this be a unix timestamp or a datetime?)
And I want to perform a query like this:
SELECT timestamp, value FROM records WHERE timestamp >= a AND timestamp <= b
However in a table with millions of records this query is super inefficient!
I am using Azure SQL Server as the DBMS. Can this be optimised?
If so can you provide a step-by-step guide to do it (please don't skip "small" steps)? Be it creating indexes, redesigning the query statement, redesigning the table (partitioning?)...
Thanks!
After creating an index on the field you want to search, you can use a between operator so it is a single operation, which is most efficient for sql.
SELECT XXX FROM ABC WHERE DateField BETWEEN '1/1/2015' AND '12/31/2015'
Also, in SQL Server 2016 you can create range indexes for use on things like time-stamps using memory optimized tables. That's really the way to do it.
I would recommend using the datetime, or even better the datetime2 data type to store the date data (datetime2 being better as it has a higher level of precision, and with lower precision levels will use less storage).
As for your query, based upon the statement you posted you would want the timestamp to be the key column, and then include the value. This is because you are using the timestamp as your predicate, and returning the value along with it.
CREATE NONCLUSTERED INDEX IX_Records_Timestamp on Records (Timestamp) INCLUDE (Value)
This being said, be careful of your column names. I would highly recommend not using reserved keywords for columns names as they can be a lot more difficult to work with.

SQL Select, different than the last 10 records

I have a table called "dutyroster". I want to make a random selection from this table's "names" column, but, I want the selection be different than the last 10 records so that the same guy is not given a second duty in 10 days. Is that possible ?
Create a temporary table with only one column called oldnames which will have no records initially. For each select, execute a query like
select names from dutyroster where dutyroster.names not in (select oldnamesfrom temporarytable) limit 10
and when execution is done add the resultset to the temporary table
The other answer already here is addressing the portion of the question on how to avoid duplicating selections.
To accomplish the random part of the selection, leverage newid() directly within your select statement. I've made this sqlfiddle as an example.
SELECT TOP 10
newid() AS [RandomSortColumn],
*
FROM
dutyroster
ORDER BY
[RandomSortColumn] ASC
Keep executing the query, and you'll keep getting different results. Use the technique in the other answer for avoiding doubling a guy up.
The basic idea is to use a subquery to get all but users from the last ten days, then sort the rest randomly:
select dr.*
from dutyroster dr
where dr.name not in (select dr2.name
from dutyroster dr2
where dr2.datetimecol >= date_sub(curdate(), interval 10 day)
)
order by rand()
limit 1;
Different databases may have different syntax for limit, rand(), and for the date/time functions. The above gives the structure of the query, but the functions may differ.
If you have a large amount of data and performance is a concern, there are other (more complicated) ways to take a random sample.
you could use TOP function for SQL Server
and for MYSQL you could use LIMIT function
Maybe this would help...
SELECT TOP number|percent column_name(s)
FROM table_name;
Source: http://www.w3schools.com/sql/sql_top.asp

How do I optimize DB2 query - joining million rows with one row

I have a db2 query which joins a fact table (300M rows) with a date table (1 row). The dates from the date table is used on the where condition to fetch only that date's data. But the query for 3 hours.
Select * from fact, date
where fact.procdate between date.lastdate and date.currdate
Is there a way to optimize this query without using plsql?
If you feed that query to db2expln you will see that all 300M lines get evaluated, probably several times. You are asking DB2 to build cartesian product, and after that evaluate the where clause.
In any case, that query might not even give you the results you are expecting. You should study the output more carefully to determine that. You more likely want to do something like
Select * from fact
where fact.procdate between DATE("firstdate") and DATE("seconddate")
The firstdate and seconddate you should supply from your application logic (probably separate queries to the table date). Alternatively you could set up subqueries to retrieve the beginning and end dates.