Operations on a Calculated Date Field - sql

Can I do WHERE operations on a calculated date field?
I have a lookup field, which has been written badly in SQL and unfortunately I can't change it. But basically it stores dates as characters such as "July-2010" or "June-2009" (along with other non date data). I want to extract the dates first (which I did using a LIKE opertor) and then extract data based on a date range.
SELECT
BusinessUnit,
Lookup,
ReleaseDate
FROM
(
SELECT TOP 10
LookupColumn As Lookup,
BU as BusinessUnit,
CONVERT(DATETIME, REPLACE(LookupColumn,'-',' ')) as ReleaseDate
FROM
[dbo].[LookupTable]
WHERE
LookupColumn LIKE N'%-2010'
) MyTable
ORDER BY ReleaseDate
WHERE ReleaseDate = '2010-02-01'
I'm having issues with WHERE operator. I would assume creating a subquery to encapsulate the calculated field would allow me to do operations with it such as WHERE but maybe I'm wrong. Bottom line is it possible to do operations on calculated fields?
UPDATE: indeed I had the order mixed up and furthermore the LIKE operator was also returning non-date values such as TBD-2010 which were messing me up.

You have the ORDER BY and WHERE clauses round the wrong way. Try switching them round:
...
) MyTable
WHERE ReleaseDate = '2010-02-01'
ORDER BY ReleaseDate

You'd probably have an easier time encapsulating this table in a view and then querying the view. This way, your view alone contains the logic to convert this bit of awfulness into a normal DATETIME column.

What "issues" are you having?
In your example, you can't have a WHERE clause AFTER an ORDER BY clause...

Related

Why does using CONVERT(DATETIME, [date], [format]) in WHERE clause take so long?

I'm running the following code on a dataset of 100M to test some things out before I eventually join the entire range (not just the top 10) on another table to make it even smaller.
SELECT TOP 10 *
FROM Table
WHERE CONVERT(datetime, DATE, 112) BETWEEN '2020-07-04 00:00:00' AND '2020-07-04 23:59:59'
The table isn't mine but a client's, so unfortunately I'm not responsible for the data types of the columns. The DATE column, along with the rest of the data, is in varchar. As for the dates in the BETWEEN clause, I just put in a relatively small range for testing.
I have heard that CONVERT shouldn't be in the WHERE clause, but I need to convert it to dates in order to filter. What is the proper way of going about this?
Going to summarise my comments here, as they are "second class citizens" and thus could be removed.
Firstly, the reason your query is slow is because of theCONVERT on the column DATE in your WHERE. Applying functions to a column in your WHERE will almost always make your query non-SARGable (there are some exceptions, but that doesn't make them a good idea). As a result, the entire table must be scanned to find rows that are applicable for your WHERE; it can't use an index to help it.
The real problem, therefore, is that you are storing a date (and time) value in your table as a non-date (and time) datatype; presumably a (n)varchar. This is, in truth, a major design flaw and needs to be fixed. String type values aren't validated to be valid dates, so someone could easily insert the "date" '20210229' or even 20211332'. Fixing the design not only stops this, but also makes your data smaller (a date is 3 bytes in size, a varchar(8) would be 10 bytes), and you could pass strongly typed date and time values to your query and it would be SARGable.
"Fortunately" it appears your data is in the style code 112, which is yyyyMMdd; this at least means that the ordering of the dates is the same as if it were a strongly typed date (and time) data type. This means that the below query will work and return the results you want:
SELECT TOP 10 * --Ideally don't use * and list your columns properly
FROM dbo.[Table]
WHERE [DATE] >= '20210704' AND [DATE] < '20210705'
ORDER BY {Some Column};
you can use like this to get better performance:
SELECT TOP 10 *
FROM Table
WHERE cast(DATE as date) BETWEEN '2020-07-04' AND '2020-07-04' and cast(DATE as time) BETWEEN '00:00:00' AND '23:59:59'
No need to include time portion if you want to search full day.

Unexpected result with ORDER BY

I have the following query:
SELECT
D.[Year] AS [Year]
, D.[Month] AS [Month]
, CASE
WHEN f.Dept IN ('XSD') THEN 'Marketing'
ELSE f.Dept
END AS DeptS
, COUNT(DISTINCT f.OrderNo) AS CountOrders
FROM Sales.LocalOrders AS l WITH
INNER JOIN Sales.FiscalOrders AS f
ON l.ORDER_NUMBER = f.OrderNo
INNER JOIN Dimensions.Date_Dim AS D
ON CAST(D.[Date] AS DATE) = CAST(f.OrderDate AS DATE)
WHERE YEAR(f.OrderDate) = 2019
AND f.Dept IN ('XSD', 'PPM', 'XPP')
GROUP BY
D.[Year]
, D.[Month]
, f.Dept
ORDER BY
D.[Year] ASC
, D.[Month] ASC
I get the following result the ORDER BY isn't giving the right result with Month column as we can see it is not ordered:
Year Month Depts CountOrders
2019 1 XSD 200
2019 10 PPM 290
2019 10 XPP 150
2019 2 XSD 200
2019 3 XPP 300
The expected output:
Year Month Depts CountOrders
2019 1 XSD 200
2019 2 XSD 200
2019 3 XPP 300
2019 10 PPM 290
2019 10 XPP 150
Your query
It is ordered by month, as your D.[Month] is treated like a text string in the ORDER BY clause.
You could do one of two things to fix this:
Use a two-digit month number (e.g. 01... 12)
Use a data type for the ORDER BY clause that will be recognized as representing a month
A quick fix
You can correct this in your code by quickly changing the ORDER BY clause to analyze those columns as though they are numbers, which is done by converting ("casting") them to an integer data type like this:
ORDER BY
CAST(D.[Year] AS INT) ASC
,CAST(D.[Month] AS INT) ASC
This will correct your unexpected query results, but does not address the root cause, which is your underlying data (more on that below).
Your underlying data
The root cause of your issue is how your underlying data is stored and/or surfaced.
Your Month seems to be appearing as a default data type (VarChar), rather than something more specifically suited to a month or date.
If you administer or have access to or control over the database, it is a good idea to consider correcting this.
In considering this, be mindful of potential context and change management issues, including:
Is this underlying data, or just a representation of upstream data that is elsewhere? (e.g. something that is refreshed periodically using a process that you do not control, or a view that is redefined periodically)
What other queries or processes rely on how this data is currently stored or surfaced (including data types), that may break if you mess with it?
Might there be validation issues if correcting it? (such as from the way zero, null, non-numeric or non-date data is stored, even if invalid)
What change management practices should be followed in your environment?
Is the data source under high transactional load?
Is it a production dataset?
Are other reporting processes dependent on it?
None of these issues are a good excuse to leave something set up incorrectly forever, which will likely compound the issue and introduce others. However, that is only part of the story.
The appropriate approach (correct it, or leave it) will depend on your situation. In a perfect textbook world, you'd correct it. In your world, you will have to decide.
A better way?
The above solution is a bit of a quick and nasty way to force your query to work.
The fact that the solution CASTs late in the query syntax, after the results have been selected and filtered, hints that is not the most elegant way to achieve this.
Ideally you can convert data types as early as possible in the process:
If done in underlying data, not the query, this is the ultimate but may not suit the situation (see below)
If done in the query, try to do it earlier.
In your case, your GROUP BY and ORDER BY are both using columns that look to be redundant data from the original query results, that is, you are getting a DATE and a MONTH and a YEAR. Ideally you would just get a DATE and then use the MONTH or YEAR from that date. Your issue is your dates are not actually dates (see "underlying data" above), which:
In the case of DATE, is converted in your INNER JOIN line ON CAST(D.[Date] AS DATE) = CAST(f.OrderDate AS DATE) (likely to minimise issues with the join)
In the case of D.[year] and D.[month], are not converted (which is why we still need to convert them further down, in ORDER BY)
You could consider ignoring D.[month] and use the MONTH DATEPART computed from DATE, which would avoid the need to use CAST in the ORDER BY clause.
In your instance, this approach is a middle ground. The quick fix is included at the top of this answer, and the best fix is to correct the underlying data. This last section considers optimizing the quick fix, but does not correct the underlying issue. It is only mentioned for awareness and to avoid promoting the use of CAST in an ORDER BY clause as the most legitimate way of addressing your issue with good clean query syntax.
There are also potential performance tradeoffs between how many columns you select that you don't need (e.g. all of the ones in D?), whether to compute the month from the date or a seperate month column, whether to cast to date before filtering, etc. These are beyond the scope of this solution.
So:
The immediate solution: use the quick fix
The optimal solution: after it's working, consider the underlying data (in your situation)
The real problem is your object Dimensions.Date_Dim here. As you are simply ordering on the value of D.[Year] and D.[Month] without manipulating the values at all, this means the object is severely flawed; you are storing numerical data as a varchar. varchar, and numerical data types are completely different. For example 2 is less than 10 but '2' is greater than '10'; because '2' is greater than '1', so therefore it must also be greater than '10'.
The real solution, therefore, is fixing your object. Assuming that both Month and Year are incorrectly stored as a varchar, don't have any non-integer values (another and different flaw if so), and not a computed column then you could just do:
ALTER TABLE Dimensions.Date_Dim ALTER COLUMN [Year] int NOT NULL;
ALTER TABLE Dimensions.Date_Dim ALTER COLUMN [Month] int NOT NULL;
You could, however, also make the columns a PERSISTED computed column, which might well be easier, in my opinion, as DATEPART already returns a strongly typed int value.
ALTER TABLE dbo.Date_Dim DROP COLUMN [Month];
ALTER TABLE dbo.Date_Dim ADD [Month] AS DATEPART(MONTH,[Date]) PERSISTED;
Of course, for both solutions, you'll need to (first) DROP and (afterwards) reCREATE any indexes and constraints on the columns.
As long as your "Month" is always 1-12, you can use
SELECT ..., TRY_CAST(D.[Month] AS INT) AS [Month],...
ORDER BY TRY_CAST(D.[Month] AS INT)
The simplest solution is:
ORDER BY MIN(D.DATE)
or:
ORDER BY MIN(f.ORDER_DATE)
Fiddling with the year and month columns is totally unnecessary when you have a date column that is available.
A very common issue when you store numerical data as a varchar/nvarchar.
Try to cast Year and Month to INT.
ORDER BY
CAST(D.[Year] AS INT) ASC
,CAST(D.[Month] AS INT) ASC
If you try using the <, > and BETWEEN operators, you will get some really "weird" results.

Select and manipulate SQL data, DISTINCT and SUM?

Im trying to make a small report for myself to see how my much time I get inputed in my system every day.
The goal is to have my SQL to sum up the name, Total time worked and Total NG product found for one specific day.
In this order:
1.) Sort out my data for a specific 'date'. I.E 2016-06-03
2.) Present a DISTINCT value for 'operators'
3.) SUM() all time registered at this 'date' and by this 'operator' under 'total_working_time_h'
4.) SUM() all no_of_defects registered at this 'date' and by this 'operator' under 'no_of_defects'
date, operator, total_working_time_h, no_of_defects
Currently I get the data I want by using the Query below. But now I need both the DISTINCT value of the operator and the SUM of the information. Can I use sub-queries for this or should it be done by a loop? Any other hints where I can learn more about how to solve this?
If i run the DISTINCT function I don't get the opportunity to sum my data the way I try.
SELECT date, operator, total_working_time_h, no_of_defects FROM {$table_work_hours} WHERE date = '2016-06-03' "
Without knowing the table structure or contents, the following query is only a good guess. The bits to notice and work with are sum() and GROUP BY. Actually syntax will vary a bit depending on what RDBMS you are using.
SELECT
date
,operator
,SUM(total_working_time_h) AS total_working_time_h
,SUM(no_of_defects) AS no_of_defects
FROM {$table_work_hours}
WHERE date = '2016-06-03'
GROUP BY
date
,operator
(Take out the WHERE clause or replace it with a range of dates to get results per operator per date.)
I'm not sure why you are trying to do DISTINCT. You want to know the data, no of hours, etc for a specific date.
do this....
Select Date, Operator, 'SumWorkHrs'=sum(total_working_time_h),
'SumDefects'=sum(no_ofDefects) from {$table_work_hours}
Where date='2016-06-03'
Try this:
SELECT SUM(total_working_time) as total_working_time,
SUM(no_of_defects) as no_of_defects ,
DISTINCT(operator) AS operator FROM {$table_work_hours} WHERE
date = '2016-06-03'

How to combine the LIKE function with a DATE_PART function in PostgreSQL?

Using Postgres, but if someone knows how to do this in standard SQL that would be a great start. I am joining to a table via a character varying column. This column contains values such as:
PC11941.2004
PC14151.2004
PC21213.2003
SPC21434.2003
PC17715.04V1
PC18733.2002
0MRACCT_ALL.GLFUNCT
A lot of the numbers after the periods correspond to years. I want to join the table via the current year. So, for example, I could JOIN on the condition LIKE '%2015'.
But I want to create this view and never return to it so I would need to join it against something like (get_fy_part('YEAR', clock_timestamp()).
Not sure how I go about writing that. I haven't had success, yet.
You can get the current year with date_part('year', CURRENT_DATE)
Something like this should work:
SELECT * FROM mytable WHERE mycolumn LIKE ('%' || date_part('year', CURRENT_DATE))
The || operator concatenates the percent-sign with the year.
I hope that helps!
Use the function RIGHT().
SELECT originalColumn, RIGHT(originalColumn,4)
FROM table;
This will get you the years you are interested in.
If you want everything after the dot, then something like:
SELECT originalColumn, RIGHT(originalColumn,len(originalColumn)-position('.' in originalColumn))
FROM table
Depends on the exact rules - and actually implemented CHECK constraints for the column.
If there is always a single dot in your column col and all your years have 4 digits:
Basic solution
SELECT * FROM tbl
WHERE col LIKE to_char(now(), '"%."YYYY');
Why?
It's most efficient to compare to the same data type. Since the column is a character type (varchar), rather use to_char() (returns text, which is effectively the same as varchar) than EXTRACT or date_part() (return double precision).
More importantly, this expression is sargable. That's generally cheapest and allows (optional) index support. In your case, a trigram index would work:
PostgreSQL LIKE query performance variations
Optimize
If you want to be as fast (read performance) and accurate as possible, and your table has more than a trivial number of rows, go with a specialized partial expression index:
CRATE INDEX tbl_year_idx ON tbl (cast(right(col, 4) AS int) DESC)
WHERE col ~ '\.\d{4}$'; -- ends with a dot and 4 digits
Matching query:
SELECT * FROM tbl
WHERE col ~ '\.\d{4}$' -- repeat index condition
AND right(col, 4)::int = EXTRACT(year FROM col);
Test performance with EXPLAIN ANALYZE.
You could even go one step further and tailor the index for the current year:
CRATE INDEX tbl_year2015_idx ON tbl (tbl_id) -- any (useful?) column
WHERE col LIKE '%.2015';
Works with the first "basic" query.
You would have to (re-)create the index for each year. A simple solution would be to create indexes for a couple of years ahead and append another one each year automatically ...
This is also the point where you consider the alternative: store the year as redundant integer column in your table and simplify the rest.
That's what I would do.

PostgreSQL - GROUP BY timestamp values?

I've got a table with purchase orders stored in it. Each row has a timestamp indicating when the order was placed. I'd like to be able to create a report indicating the number of purchases each day, month, or year. I figured I would do a simple SELECT COUNT(xxx) FROM tbl_orders GROUP BY tbl_orders.purchase_time and get the value, but it turns out I can't GROUP BY a timestamp column.
Is there another way to accomplish this? I'd ideally like a flexible solution so I could use whatever timeframe I needed (hourly, monthly, weekly, etc.) Thanks for any suggestions you can give!
This does the trick without the date_trunc function (easier to read).
// 2014
select created_on::DATE from users group by created_on::DATE
// updated September 2018 (thanks to #wegry)
select created_on::DATE as co from users group by co
What we're doing here is casting the original value into a DATE rendering the time data in this value inconsequential.
Grouping by a timestamp column works fine for me here, keeping in mind that even a 1-microsecond difference will prevent two rows from being grouped together.
To group by larger time periods, group by an expression on the timestamp column that returns an appropriately truncated value. date_trunc can be useful here, as can to_char.