cross join to get all dates and hours and avoid duplicate values - sql

We have 2 tables:
sales
hourt (only 1 field (hourt) of numbers: 0 to 23)
The goal is to list all dates and all 24 hours for each day and group hours that have sales. For hours that do not have sales, zero will be shown.
This query cross joins the sales table with the hourt table and does list all dates and 24 hours. However, there are also many duplicate rows. How can we avoid the duplicates?
We're using Amazon Redshift (based on Postgres 8.0).
with h as (
SELECT
a.purchase_date,
CAST(DATE_PART("HOUR", AT_TIME_ZONE(AT_TIME_ZONE(CAST(a.purchase_date AS
DATETIME), "0:00"), "PST")) as INTEGER) AS Hour,
COUNT(a.quantity) AS QtyCount,
SUM(a.quantity) AS QtyTotal,
SUM((a.price) AS Price
FROM sales a
GROUP BY CAST(DATE_PART("HOUR",
AT_TIME_ZONE(AT_TIME_ZONE(CAST(a.purchase_date AS DATETIME), "0:00"),
"PST")) as INTEGER),
DATE_FORMAT(AT_TIME_ZONE(AT_TIME_ZONE(CAST(a.purchase_date AS DATETIME),
"0:00"), "PST"), "yyyy-MM-dd")
ORDER by a.purchase_date
),
hr as (
SELECT
CAST(hourt AS INTEGER) AS hourt
FROM hourt
),
joined as (
SELECT
purchase_date,
hourt,
QtyCount,
QtyTotal,
Price
FROM h
cross JOIN hr
)
SELECT *
FROM joined
Order by purchase_date,hourt
Sample Tables:
Before the cross join, query returned correct sales and grouped hours, as seen in the below table.
Desired results table:

Need to create a series of all the hour values and left join your data back to that. Comments inline explain the logic.
WITH data AS (-- Do the basic aggregation first
SELECT DATE_TRUNC('hour',a.purchase_date) purchase_hour --Truncate timestamp to the hour is simpler
,COUNT(a.quantity) AS QtyCount
,SUM(a.quantity) AS QtyTotal
,SUM((a.price) AS Price
FROM sales a
GROUP BY DATE_TRUNC('hour',a.purchase_date)
ORDER BY DATE_TRUNC('hour',a.purchase_date)
-- SELECT '2017-01-13 12:00:00'::TIMESTAMP purchase_hour, 1 qty_count, 1 qty_total, 119 price
-- UNION ALL SELECT '2017-01-13 15:00:00'::TIMESTAMP purchase_hour, 1 qty_count, 1 qty_total, 119 price
-- UNION ALL SELECT '2017-01-14 21:00:00'::TIMESTAMP purchase_hour, 1 qty_count, 1 qty_total, 119 price
)
,time_range AS (--Calculate the start and end **date** values
SELECT DATE_TRUNC('day',MIN(purchase_hour)) start_date
, DATE_TRUNC('day',MAX(purchase_hour))+1 end_date
FROM data
)
,hr AS (--Generate all hours between start and end
SELECT (SELECT start_date
FROM time_range
LIMIT 1) --Limit 1 so the optimizer knows it's not a correlated subquery
+ ((n-1) --Make the series start at zero so we don't miss the starting value
* INTERVAL '1 hour') AS "hour"
FROM (SELECT ROW_NUMBER() OVER () n
FROM stl_query --Can use any table here as long as it enough rows
LIMIT 100) series
WHERE "hour" < (SELECT end_date FROM time_range LIMIT 1)
)
--Use NVL to replace missing values with zeroes
SELECT hr.hour AS purchase_hour --Timestamp like `2017-01-13 12:00:00`
, NVL(data.qty_count, 0) AS qty_count
, NVL(data.qty_total, 0) AS qty_total
, NVL(data.price, 0) AS price
FROM hr
LEFT JOIN data
ON hr.hour = data.purchase_hour
ORDER BY hr.hour
;

I achieved the desired results by using Left Join (table A with table B) instead of Cross Join of these two tables:
Table A has all the dates and hours
Table B is the first part of the original query

Related

Detect if a month is missing and insert them automatically with a select statement (MSSQL)

I am trying to write a select statement which detects if a month is not existent and automatically inserts that month with a value 0. It should insert all missing months from the first entry to the last entry.
Example:
My table looks like this:
After the statement it should look like this:
You need a recursive CTE to get all the years in the table (and the missing ones if any) and another one to get all the month numbers 1-12.
A CROSS join of these CTEs will be joined with a LEFT join to the table and finally filtered so that rows prior to the first year/month and later of the last year/month are left out:
WITH
limits AS (
SELECT MIN(year) min_year, -- min year in the table
MAX(year) max_year, -- max year in the table
MIN(DATEFROMPARTS(year, monthnum, 1)) min_date, -- min date in the table
MAX(DATEFROMPARTS(year, monthnum, 1)) max_date -- max date in the table
FROM tablename
),
years(year) AS ( -- recursive CTE to get all the years of the table (and the missing ones if any)
SELECT min_year FROM limits
UNION ALL
SELECT year + 1
FROM years
WHERE year < (SELECT max_year FROM limits)
),
months(monthnum) AS ( -- recursive CTE to get all the month numbers 1-12
SELECT 1
UNION ALL
SELECT monthnum + 1
FROM months
WHERE monthnum < 12
)
SELECT y.year, m.monthnum,
DATENAME(MONTH, DATEFROMPARTS(y.year, m.monthnum, 1)) month,
COALESCE(value, 0) value
FROM months m CROSS JOIN years y
LEFT JOIN tablename t
ON t.year = y.year AND t.monthnum = m.monthnum
WHERE DATEFROMPARTS(y.year, m.monthnum, 1)
BETWEEN (SELECT min_date FROM limits) AND (SELECT max_date FROM limits)
ORDER BY y.year, m.monthnum
See the demo.
You should not be storing date components in two separate columns; instead, you should have just one column, with a proper date-like datatype.
One approach is to use a recursive query to generate all starts of month between the earliest and latest date in the table, then brin the table with a left join.
In SQL Server:
with cte as (
select min(datefromparts(year, monthnum, 1)) as dt,
max(datefromparts(year, monthnum, 1)) as dt_max
from mytable
union all
select dateadd(month, 1, dt)
from cte
where dt < dt_max
)
select c.dt, coalesce(t.value, 0) as value
from cte c
left join mytable t on datefromparts(t.year, t.month, 1) = c.dt
If your data spreads over more that 100 months, you need to add option(maxrecursion 0) at the end of the query.
You can extract the date components in the final select if you like:
select
year(c.dt) as yr,
month(c.dt) as monthnum,
datename(month, c.dt) as monthname,
coalesce(t.value, 0) as value
from ...

Calculate average days between orders The last three records tsql

I trying to take an average per customer, but you're not grouping by customer.
I would like to calculate the average days between several order dates from a table called invoice. For each BusinessPartnerID, what is the average days between orders i want average days last three records orders .
I got the average of all order for each user but need days last three records orders
The sample table is as below
;WITH temp (avg,invoiceid,carname,carid,fullname,mobail)
AS
(
SELECT AvgLag = AVG(Lag) , Lagged.idinvoice,
Lagged.carname ,
Lagged.carid ,Lagged.fullname,Lagged.mobail
FROM
(
SELECT
(car2.Name) as carname ,
(car2.id) as carid ,( busin.Name) as fullname, ( busin.Mobile) as mobail , INV.Id as idinvoice , Lag = CONVERT(int, DATEDIFF(DAY, LAG(Date,1)
OVER (PARTITION BY car2.Id ORDER BY Date ), Date))
FROM [dbo].[Invoice] AS INV
JOIN [dbo].[InvoiceItem] AS INITEM on INV.Id=INITEM.Invoiceid
JOIN [dbo].[BusinessPartner] as busin on busin.Id=INV.BuyerId and Type=5
JOIN [dbo].[Product] as pt on pt.Id=INITEM.ProductId and INITEM.ProductId is not null and pt.ProductTypeId=3
JOIN [dbo].[Car] as car2 on car2.id=INv.BusinessPartnerCarId
) AS Lagged
GROUP BY
Lagged.carname,
Lagged.carid,Lagged.fullname,Lagged.mobail, Lagged.idinvoice
-- order by Lagged.fullname
)
SELECT * FROM temp where avg is not null order by avg
I don't really see how your query relate to your question. Starting from a table called invoice that has columns businesspartnerid, and date, here is how you would take the average of the day difference between the last 3 invoices of each business partner:
select businesspartnerid,
avg(1.0 * datediff(
day,
lag(date) over(partition by businesspartnerid order by date),
date
) avg_diff_day
from (
select i.*,
row_number() over(partiton by businesspartnerid order by date desc) rn
from invoice i
) i
where rn <= 3
group by businesspartnerid
Note that 3 rows gives you 2 intervals only, that will be averaged.

Calculating business days in Teradata

I need help in business days calculation.
I've two tables
1) One table ACTUAL_TABLE containing order date and contact date with timestamp datatypes.
2) The second table BUSINESS_DATES has each of the calendar dates listed and has a flag to indicate weekend days.
using these two tables, I need to ensure business days and not calendar days (which is the current logic) is calculated between these two fields.
My thought process was to first get a range of dates by comparing ORDER_DATE with TABLE_DATE field and then do a similar comparison of CONTACT_DATE to TABLE_DATE field. This would get me a range from the BUSINESS_DATES table which I can then use to calculate count of days, sum(Holiday_WKND_Flag) fields making the result look like:
Order# | Count(*) As DAYS | SUM(WEEKEND DATES)
100 | 25 | 8
However this only works when I use a specific order number and cant' bring all order numbers in a sub query.
My Query:
SELECT SUM(Holiday_WKND_Flag), COUNT(*) FROM
(
SELECT
* FROM
BUSINESS_DATES
WHERE BUSINESS.Business BETWEEN (SELECT ORDER_DATE FROM ACTUAL_TABLE
WHERE ORDER# = '100'
)
AND
(SELECT CONTACT_DATE FROM ACTUAL_TABLE
WHERE ORDER# = '100'
)
TEMP
Uploading the table structure for your reference.
SELECT ORDER#, SUM(Holiday_WKND_Flag), COUNT(*)
FROM business_dates bd
INNER JOIN actual_table at ON bd.table_date BETWEEN at.order_date AND at.contact_date
GROUP BY ORDER#
Instead of joining on a BETWEEN (which always results in a bad Product Join) followed by a COUNT you better assign a bussines day number to each date (in best case this is calculated only once and added as a column to your calendar table). Then it's two Equi-Joins and no aggregation needed:
WITH cte AS
(
SELECT
Cast(table_date AS DATE) AS table_date,
-- assign a consecutive number to each busines day, i.e. not increased during weekends, etc.
Sum(CASE WHEN Holiday_WKND_Flag = 1 THEN 0 ELSE 1 end)
Over (ORDER BY table_date
ROWS Unbounded Preceding) AS business_day_nbr
FROM business_dates
)
SELECT ORDER#,
Cast(t.contact_date AS DATE) - Cast(t.order_date AS DATE) AS #_of_days
b2.business_day_nbr - b1.business_day_nbr AS #_of_business_days
FROM actual_table AS t
JOIN cte AS b1
ON Cast(t.order_date AS DATE) = b1.table_date
JOIN cte AS b2
ON Cast(t.contact_date AS DATE) = b2.table_date
Btw, why are table_date and order_date timestamp instead of a date?
Porting from Oracle?
You can use this query. Hope it helps
select order#,
order_date,
contact_date,
(select count(1)
from business_dates_table
where table_date between a.order_date and a.contact_date
and holiday_wknd_flag = 0
) business_days
from actual_table a

How to select all dates in SQL query

SELECT oi.created_at, count(oi.id_order_item)
FROM order_item oi
The result is the follwoing:
2016-05-05 1562
2016-05-06 3865
2016-05-09 1
...etc
The problem is that I need information for all days even if there were no id_order_item for this date.
Expected result:
Date Quantity
2016-05-05 1562
2016-05-06 3865
2016-05-07 0
2016-05-08 0
2016-05-09 1
You can't count something that is not in the database. So you need to generate the missing dates in order to be able to "count" them.
SELECT d.dt, count(oi.id_order_item)
FROM (
select dt::date
from generate_series(
(select min(created_at) from order_item),
(select max(created_at) from order_item), interval '1' day) as x (dt)
) d
left join order_item oi on oi.created_at = d.dt
group by d.dt
order by d.dt;
The query gets the minimum and maximum date form the existing order items.
If you want the count for a specific date range you can remove the sub-selects:
SELECT d.dt, count(oi.id_order_item)
FROM (
select dt::date
from generate_series(date '2016-05-01', date '2016-05-31', interval '1' day) as x (dt)
) d
left join order_item oi on oi.created_at = d.dt
group by d.dt
order by d.dt;
SQLFiddle: http://sqlfiddle.com/#!15/49024/5
Friend, Postgresql Count function ignores Null values. It literally does not consider null values in the column you are searching. For this reason you need to include oi.created_at in a Group By clause
PostgreSql searches row by row sequentially. Because an integral part of your query is Count, and count basically stops the query for that row, your dates with null id_order_item are being ignored. If you group by oi.created_at this column will trump the count and return 0 values for you.
SELECT oi.created_at, count(oi.id_order_item)
FROM order_item oi
Group by io.created_at
From TechontheNet (my most trusted source of information):
Because you have listed one column in your SELECT statement that is not encapsulated in the count function, you must use a GROUP BY clause. The department field must, therefore, be listed in the GROUP BY section.
Some info on Count in PostgreSql
http://www.postgresqltutorial.com/postgresql-count-function/
http://www.techonthenet.com/postgresql/functions/count.php
Solution #1 You need Date Table where you stored all date data. Then do a left join depending on period.
Solution #2
WITH DateTable AS
(
SELECT DATEADD(dd, 1, CONVERT(DATETIME, GETDATE())) AS CreateDateTime, 1 AS Cnter
UNION ALL
SELECT DATEADD(dd, -1, CreateDateTime), DateTable.Cnter + 1
FROM DateTable
WHERE DateTable.Cnter + 1 <= 5
)
Generate Temporary table based on your input and then do a left Join.

Select Consecutive Numbers in SQL

This feels simple, but I can't find an answer anywhere.
I'm trying to run a query by time of day for each hour. So I'm doing a Group By on the hour part, but not all hours have data, so there are some gaps. I'd like to display every hour, regardless of whether or not there's data.
Here's a sample query:
SELECT DATEPART(HOUR, DATEADD(HH,-5, CreationDate)) As Hour,
COUNT(*) AS Count
FROM Comments
WHERE UserId = ##UserId##
GROUP BY DATEPART(HOUR, DATEADD(HH,-5, CreationDate))
My thought was to Join to a table that already had numbers 1 through 24 so that the incoming data would get put in it's place.
Can I do this with a CTE?
WITH Hours AS (
SELECT i As Hour --Not Sure on this
FROM [1,2,3...24]), --Not Sure on this
CommentTimes AS (
SELECT DATEPART(HOUR, DATEADD(HH,-5, CreationDate)) AS Hour,
COUNT(*) AS Count
FROM Comments
WHERE UserId = ##UserId##
GROUP BY DATEPART(HOUR, DATEADD(HH,-5, CreationDate))
)
SELECT h.Hour, c.Count
FROM Hours h
JOIN CommentTimes c ON h.Hour = c.Hour
###Here's a sample Query From Stack Exchange Data Explorer
You can use a recursive query to build up a table of whatever numbers you want. Here we stop at 24. Then left join that to your comments to ensure every hour is represented. You can turn these into times easily if you wanted. I also changed your use of hour as a column name as it is a keyword.
;with dayHours as (
select 1 as HourValue
union all select hourvalue + 1
from dayHours
where hourValue < 24
)
,
CommentTimes As (
SELECT DATEPART(HOUR, DATEADD(HH,-5, CreationDate)) As HourValue,
COUNT(*) AS Count
FROM Comments
WHERE UserId = ##UserId##
GROUP BY DATEPART(HOUR, DATEADD(HH,-5, CreationDate)))
SELECT h.Hour, c.Count
FROM dayHours h
left JOIN CommentTimes c ON h.HourValue = c.HourValue
You can use a table value constructor:
with hours as (
SELECT hr
FROM (VALUES (1), (2), (3), (4), (5), (6), (7), (8), (9), (10), (11), (12)) AS b(hr)
)
etc..
You can also use a permanent auxilliary numbers table.
http://dataeducation.com/you-require-a-numbers-table/
Use a recursive CTE to generate the hours:
with hours as (
select 1 as hour
union all
select hour + 1
from hours
where hour < 24
)
. . .
Then your full query needs a left outer join:
with hours as (
select 1 as hour
union all
select hour + 1
from hours
where hour < 24
)
CommentTimes As (
SELECT DATEPART(HOUR, DATEADD(HH,-5, CreationDate)) As Hour,
COUNT(*) AS Count
FROM Comments
WHERE UserId = ##UserId##
GROUP BY DATEPART(HOUR, DATEADD(HH,-5, CreationDate))
)
SELECT h.Hour, c.Count
FROM Hours h LEFT OUTER JOIN
CommentTimes c
ON h.Hour = c.Hour;
Below is demo without using recursive CTE for sql-server
select h.hour ,c.count
from (
select top 24 number + 1 as hour from master..spt_values
where type = 'P'
) h
left join (
select datepart(hour, creationdate) as hour,count(1) count
from comments
where userid = '9131476'
group by datepart(hour, creationdate)
) c on h.hour = c.hour
order by h.hour;
online demo link : consecutive number query demo - Stack Exchange Data Explorer
The basic idea is correct, but you will want to perform a left join instead of a standard join. The reason for the left join is because you want the answers from the left-hand side.
With respect to how to create the original hours table, you can either directly create it with something like:
SELECT 1 as hour
UNION ALL
SELECT 2 as hour
...
UNION ALL
SELECT 24 as hour
or, you can create a permanent table populated with these values. (I do not recall immediately on SqlServer if there is a better way to do this, or if selecting a value but not from a table is allowed. On Oracle, you could select from the built-in table 'dual' which is a table containing a single row).
As a more general abstraction of this issue, you can create consecutive numbers Brad and Gordon have suggested with a recursive CTE like this:
WITH Numbers AS (
SELECT 1 AS Number
UNION ALL SELECT Number + 1
FROM Numbers
WHERE Number < 1000
)
SELECT * FROM Numbers
OPTION (MaxRecursion 0)
As a note, if you plan to go over 100 numbers, you'll need to add OPTION (MaxRecursion 0) to the end of your query to prevent the error The maximum recursion 100 has been exhausted before statement completion
This technique can commonly be seen when populating or using a Tally Table in TSQL