Performing of my basic query taking long time - sql

I use MsSQL. I have a "jobs" table which has 140 columns and includes more than 4 million records in it. This table's columns mostly varchar and bit.
The table's 40 columns connected to some other tables. Like "issuerid" from "issuers" table, "fileid" from "files"...
The indexes of table is only on the "fileid" which is non-unique and non-clustered.
My basic query is like in the following:
select issuerid,count(id) as total , sum(case when X_Status=1 then 1 else 0 end) P_Count
from jobs where 1=1 and issuerid='1001' and creationdate between '01/01/2019 12:00:01 AM' and '06/30/2019 11:59:59 PM' group by issuerid
The duration of the query is: 1min 20seconds (The PC has SSD and 4GB Ram)
So i tried to index on issuerid but it didn't affect as much.
I have a lot of queries on this table for my asp page. For example the sum case changes mostly;
sum(case when Y_Status=1 then 1 else 0 end) P_Count
Like this.
So even tried to let 2 columns in the table and executed this query
select count(id) as, sum(case when X_Status=1 then 1 else 0 end) P_Count from newjobs where 1=1
and this took around 30seconds.
I read many topics and article to improve query performance but didn't work. Is there anyone who has any idea to share?
Thank you.

The following should work for your exact query:
CREATE NONCLUSTERED INDEX IX_Jobs__IssuerID_CreationDate ON dbo.Jobs (IssuerID, CreationDate)
INCLUDE (X_Status);
Since your query filters on IssuerID and CreationDate these are the key columns, then I hav eadded X_Status as a non key column so that the whole query can be run from this index and there is no chance of a bookmark lookup or an index scan.
As an aside, your current where clause will always exclude things that happen in the first second of the first day and the last second of the last day (i.e between 00:00:00 and 00:00:01on 1st January, and 06/30/2019 23:59:59 and 07/01/2019 00:00:00). This may be deliberate, but I suspect it isn't. It is usually much better, and also more clear as to your intentions to use an open ended date range.
WHERE CreationDate > '20190101'
AND CreationDate < '20190701'
Or More likely:
WHERE CreationDate >= '20190101'
AND CreationDate < '20190701'
I have also swtiched to a culture invariant date time format, so that the date literal is interpretted as the same date on every machine. For more reading see:
What do BETWEEN and the devil have in common?
Bad habits to kick : mis-handling date / range queries

Related

SQL Server: compare only month and day - SARGable

I have a table storing a datetime column, which is indexed. I'm trying to find a way to compare ONLY the month and day (ignores the year totally).
Just for the record, I would like to say that I'm already using MONTH() and DAY(). But I'm encountering the issue that my current implementation uses Index Scan instead of Index Seek, due to the column being used directly in both functions to get the month and day.
There could be 2 types of references for comparison: a fixed given date and today (GETDATE()). The date will be converted based on time zone, and then have its month and day extracted, e.g.
DECLARE #monthValue DATETIME = MONTH(#ConvertDateTimeFromServer_TimeZone);
DECLARE #dayValue DATETIME = DAY(#ConvertDateTimeFromServer_TimeZone);
Another point is that the column stores datetime with different years, e.g.
1989-06-21 00:00:00.000
1965-10-04 00:00:00.000
1958-09-15 00:00:00.000
1965-10-08 00:00:00.000
1942-01-30 00:00:00.000
Now here comes the problem. How do I create a SARGable query to get the rows in the table that match the given month and day regardless of the year but also not involving the column in any functions? Existing examples on the web utilise years and/or date ranges, which for my case is not helping at all.
A sample query:
Select t0.pk_id
From dob t0 WITH(NOLOCK)
WHERE ((MONTH(t0.date_of_birth) = #monthValue AND DAY(t0.date_of_birth) = #dayValue))
I've also tried DATEDIFF() and DATEADD(), but they all end up with an Index Scan.
Adding to the comment I made, on a Calendar Table.
This will, probably, be the easiest way to get a SARGable query. As you've discovered, MONTH([YourColumn]) and DATEPART(MONTH,[YourColumn]) both cause your query to become non-SARGable.
Considering that all your columns, at least in your sample data, have a time of 00:00:00 this "works" to our advantage, as they are effectively just dates. This means we can easily JOIN onto a Calendar Table using something like:
SELECT dob.[YourColumn]
FROM dob
JOIN CalendarTable CT ON dob.DateOfBirth = CT.CalendarDate;
Now, if we're using the table from the above article, you will have created some extra columns (MonthNo and CDay, however, you can call them whatever you want really). You can then add those columns to your query:
SELECT dob.[YourColumn]
FROM dob
JOIN CalendarTable CT ON dob.DateOfBirth = CT.CalendarDate
WHERE CT.MonthNo = #MonthValue
AND CT.CDay = #DayValue;
This, as you can see, is a more SARGable query.
If you want to deal with Leap Years, you could add a little more logic using a CASE expression:
SELECT dob.[YourColumn]
FROM dob
JOIN CalendarTable CT ON dob.DateOfBirth = CT.CalendarDate
WHERE CT.MonthNo = #MonthValue
AND CASE WHEN DATEPART(YEAR, GETDATE()) % 4 != 0 AND CT.CDat = 29 AND CT.MonthNo = 2 THEN 28 ELSE CT.Cdat END = #DayValue;
This treats someone's birthday on 29 February as 28 February on years that aren't leap years (when DATEPART(YEAR, GETDATE()) % 4 != 0).
It's also, probably, worth noting that it'll likely be worth while changing your DateOfBirth Column to a date. Date of Births aren't at a given time, only on a given date; this means that there's no implicit conversion from datetime to date on your Calendar Table.
Edit: Also, just noticed, why are you using NOLOCK? You do know what that does, right..? Unless you're happy with dirty reads and ghost data?

Giving the wrong records when used datetime parameter in MS Access Query

I am working MS-Access 2007 DB .
I am trying to write the query for the Datetime, I want to get records between 14 December and 16 December so I write the bellow query.
SELECT * FROM Expense WHERE CreatedDate > #14-Dec-15# and CreatedDate < #16-Dec-15#
( I have to use the two dates for the query.)
But It returning the records having CreatedDate is 14 December...
Whats wrong with the query ?
As #vkp mentions in the comments, there is a time part to a date as well. If it is not defined it defaults to midnight (00:00:00). As 14-dec-2015 6:46:56 is after 14-dec-2015 00:00:00 it is included in the result set. You can use >= 15-dec-15 to get around this, as it will also include records from 15-dec-2015. Same goes for the end date.
It seems you want only records from Dec 15th regardless of the time of day stored in CreatedDate. If so, this query should give you what you want with excellent performance assuming an index on CreatedDate ...
SELECT *
FROM Expense
WHERE CreatedDate >= #2015-12-15# and CreatedDate < #2015-12-16#;
Beware of applying functions to your target field in the WHERE criterion ... such as CDATE(INT(CreatedDate)). Although logically correct, it would force a full table scan. That might not be a problem if your Expense table contains only a few rows. But for a huge table, you really should try to avoid a full table scan.
You must inlcude the time in your thinking:
EDIT: I wrote this with the misunderstanding, that you wanted to
include data rows from 14th to 16th of Dec (three full days).
If you'd write <#17-Dec-15# it would be the full 16th. Or you'd have to write <=#16-Dec-15 23:59:59#.
A DateTime on the 16th of December with a TimePart of let's say 12:30 is bigger than #16-Dec-15#...
Just some backgorund: In Ms-Access a DateTime is stored as a day's number and a fraction part for the time. 0.5 is midday, 0.25 is 6 in the morning...
Comparing DateTime values means to compare Double-values in reality.
Just add one day to your end date and exclude this:
SELECT * FROM Expense WHERE CreatedDate >= #2015/12/14# AND CreatedDate < #2015/12/17#
Thanks A Lot guys for your help...
I finally ended with the solution given by Darren Bartrup-Cook and Gustav ....
My previous query was....
SELECT * FROM Expense WHERE CreatedDate > #14-Dec-15# and CreatedDate < #16-Dec-15#
And the New working query is...
SELECT * FROM Expense WHERE CDATE(INT(CreatedDate)) > #14-Dec-15# and CDATE(INT(CreatedDate)) < #16-Dec-15#

Postgresql query between date ranges

I am trying to query my postgresql db to return results where a date is in certain month and year. In other words I would like all the values for a month-year.
The only way i've been able to do it so far is like this:
SELECT user_id
FROM user_logs
WHERE login_date BETWEEN '2014-02-01' AND '2014-02-28'
Problem with this is that I have to calculate the first date and last date before querying the table. Is there a simpler way to do this?
Thanks
With dates (and times) many things become simpler if you use >= start AND < end.
For example:
SELECT
user_id
FROM
user_logs
WHERE
login_date >= '2014-02-01'
AND login_date < '2014-03-01'
In this case you still need to calculate the start date of the month you need, but that should be straight forward in any number of ways.
The end date is also simplified; just add exactly one month. No messing about with 28th, 30th, 31st, etc.
This structure also has the advantage of being able to maintain use of indexes.
Many people may suggest a form such as the following, but they do not use indexes:
WHERE
DATEPART('year', login_date) = 2014
AND DATEPART('month', login_date) = 2
This involves calculating the conditions for every single row in the table (a scan) and not using index to find the range of rows that will match (a range-seek).
From PostreSQL 9.2 Range Types are supported. So you can write this like:
SELECT user_id
FROM user_logs
WHERE '[2014-02-01, 2014-03-01]'::daterange #> login_date
this should be more efficient than the string comparison
Just in case somebody land here... since 8.1 you can simply use:
SELECT user_id
FROM user_logs
WHERE login_date BETWEEN SYMMETRIC '2014-02-01' AND '2014-02-28'
From the docs:
BETWEEN SYMMETRIC is the same as BETWEEN except there is no
requirement that the argument to the left of AND be less than or equal
to the argument on the right. If it is not, those two arguments are
automatically swapped, so that a nonempty range is always implied.
SELECT user_id
FROM user_logs
WHERE login_date BETWEEN '2014-02-01' AND '2014-03-01'
Between keyword works exceptionally for a date. it assumes the time is at 00:00:00 (i.e. midnight) for dates.
Read the documentation.
http://www.postgresql.org/docs/9.1/static/functions-datetime.html
I used a query like that:
WHERE
(
date_trunc('day',table1.date_eval) = '2015-02-09'
)
or
WHERE(date_trunc('day',table1.date_eval) >='2015-02-09'AND date_trunc('day',table1.date_eval) <'2015-02-09')

Find rows in a database with no time in a datetime column

During testing I have failed to notice an incorrect date/time entry into the database on certain orders. Instead of entering the date and time I have only been entering the date. I was using the correct time stamp createodbcdatetime(now()) however I was using cfsqltype="cf_sql_date" to enter it into the database.
I am lucky enough to have the order date/time correctly recorded, meaning I can use the time from the order date/time field.
My question being can I filter for all rows in the table with only dates entered. My data below;
Table Name: tbl_orders
uid_orders dte_order_stamp
2000 02/07/2012 03:02:52
2001 03/07/2012 01:24:21
2002 03/07/2012 08:34:00
Table Name: tbl_payments
uid_payment dte_pay_paydate uid_pay_orderid
1234 02/07/2012 03:02:52 2000
1235 03/07/2012 2001
1236 03/07/2012 2002
I need to be able to select all payments with no time entered from tbl_payments, i can then loop around the results grabbing the time from my order table add it to the date from my payment table and update the field with the new date/time.
I can pretty much handle the re-inserting the date/time. It's just selecting the no time rows I'm not sure about?
Any help would be appreciated.
The following is the select statements for both orders and payments and if they need to be joined.(just fyi)
SQL Server 2008, Cold Fusion 9
SELECT
dbo.tbl_orders.uid_orders,
dbo.tbl_orders.dte_order_stamp,
dbo.tbl_payment.dte_pay_paydate,
dbo.tbl_payment.uid_pay_orderid
FROM
dbo.tbl_orders
INNER JOIN dbo.tbl_payment ON (dbo.tbl_orders.uid_orders = dbo.tbl_payment.uid_pay_orderid)
SELECT
dbo.tbl_orders.uid_orders,
dbo.tbl_orders.dte_order_stamp
FROM dbo.tbl_orders
SELECT
uid_paymentid,
uid_pay_orderid,
dte_pay_paydate,
FROM
dbo.tbl_payment
Select the records where the hours, minutes, seconds and millisecond value is zero.
select *
from table
where datePart(hour, datecolumn) = 0
and datePart(minute, datecolumn) = 0
and datePart(second, datecolumn) = 0
and datePart(millisecond, datecolumn) = 0
You can probably get those values by casting to time and checking for 0:
SELECT * FROM table WHERE CAST(datetimecolumn AS TIME) = '00:00'
That may not be particularly efficient though, depending on how smart SQL Server's indexes are.
Something like this should work:
....
WHERE CAST(CONVERT(VARCHAR, dbo.tbl_payment.dte_pay_paydate, 101) AS DATETIME) =
dbo.tbl_payment.dte_pay_paydate
This will return all rows where the time is missing.

How to filter table to date when it has a timestamp with time zone format?

I have a very large dataset - records in the hundreds of millions/billions.
I would like to filter the data in this column - i am only showing 2 records of millions:
arrival_time
2019-04-22 07:36:09.870+00
2019-06-07 09:46:09.870+00
How can i filter the data in this column to only the date part? as in I would like to filter where the arrival_time is 2019-04-22 as this would give me the first record and any other records which have the matching date of 2019-04-22?
I have tried to cast the column to timestamp::date = "2019-04-22" but this has been costly and does not work well given i have such vast amounts of records.
sample code is:
select
*
from
mytable
where
arrival_time::timestamp::date = '2019-09-30'
again very costly if i cast to date format as this will be done before the filtering!
any ideas? I am using postgresql and pgadmin4
This query:
where (arrival_time::timestamp)::date = '2019-09-30'
Is converting arrival_time to another type. That generally precludes the use of index and makes it harder for the optimizer to choose the best execution path.
Instead, compare to same data type:
where arrival_time >= '2019-09-30'::timestamp and
arrival_time >= ('2019-09-30'::timestamp + interval '1 day')
You can try to filter for the upper and lower bounds of that day.
...
WHERE arrival_time >= '2019-04-22'::timestamp
AND arrival_time < '2019-04-23'::timestamp
...
Like that an index on arrival_time should be usable and help to improve performance.