Interpolate missing values in a query by date - sql

Given the following query
SELECT
DATEADD(DAY, DATEDIFF(DAY, 0, [Created]), 0) [Date],
[Type], COUNT(*) as [Total]
FROM
Submissions
WHERE
[Offer] = 'template1'
GROUP BY
DATEADD(DAY, DATEDIFF(DAY, 0, [Created]), 0),
[Type]
ORDER BY 1;
I get the following output:
Date Type Total
----------------------- -------------------- -----------
2021-04-30 00:00:00.000 Online 1
2021-05-01 00:00:00.000 Mail 1
2021-05-01 00:00:00.000 Online 2
2021-05-10 00:00:00.000 Mail 1
My goal is to ensure that for each date, both types are summarized. In the event that no rows for a given type exist, I'd like to show 0 instead of missing the row entirely. How can I reform the query so that, for example, 2 rows exist for 2021-04-30, one with type Online as shown, and one with type Mail with a total of 0?
I got it working using something like below, but this seems like a pretty brute force way of going about it.
SELECT [Date], [Type], [Total] FROM
(
SELECT
DATEADD(DAY, DATEDIFF(DAY, 0, [Created]), 0) [Date],
[Type]
FROM
Submissions
WHERE [Offer] = 'template1'
) t1
PIVOT (
COUNT([Type])
FOR [Type] in ([Mail],[Online])
) p
UNPIVOT
(
[Total] FOR [Type] in ([Mail],[Online])
) p2
This results in what I am looking for:
Date Type Total
----------------------- ------------------- -----------
2021-04-30 00:00:00.000 Mail 0
2021-04-30 00:00:00.000 Online 1
2021-05-01 00:00:00.000 Mail 1
2021-05-01 00:00:00.000 Online 2
2021-05-10 00:00:00.000 Mail 1
2021-05-10 00:00:00.000 Online 0

Even your brute force approach doesn't work if the submission table has no rows for a particular date.
The standard approach is to use dimension tables to create a template of all the rows you desire, then left join your fact table on to it.
SELECT
calendar.date,
type.label,
COUNT(fact.id)
FROM
calendar
CROSS JOIN
type
LEFT JOIN
submissions AS fact
ON fact.created >= calendar.date
AND fact.created < calendar.date + 1
AND fact.type = type.label
AND fact.offer = 'template1'
WHERE
calendar.date BETWEEN ? AND ?
AND type.label IN ('Mail', 'Online')
GROUP BY
calendar.date,
type.label
Please excuse typos, I'm on my phone

Related

how to aggregate one record multiple times based on condition

I have a bunch of records in the table below.
product_id produced_date expired_date
123 2010-02-01 2012-05-31
234 2013-03-01 2014-08-04
345 2012-05-01 2018-02-25
... ... ...
I want the output to display how many unexpired products currently we have at the monthly level. (Say, if a product expires on August 04, we still count it in August stock)
Month n_products
2010-02-01 10
2010-03-01 12
...
2022-07-01 25
2022-08-01 15
How should I do this in Presto or Hive? Thank you!
You can use below SQL.
Here we are using case when to check if a product is expired or not(produced_date >= expired_date ), if its expired, we are summing it to get count of product that has been expired. And then group that data over expiry month.
select
TRUNC(expired_date, 'MM') expired_month,
SUM( case when produced_date >= expired_date then 1 else 0 end) n_products
from mytable
group by 1
We can use unnest and sequence functions to create a derived table; Joining our table with this derived table, should give us the desired result.
Select m.month,count(product_id) as n_products
(Select
(select x
from unnest(sequence(Min(month(produced_date)), Max(month(expired_date)), Interval '1' month)) t(x)
) as month
from table) m
left join table t on m.month >= t.produced_date and m.month <= t.expired_date
group by 1
order by 1

Using Distinct and MAX(date) in a large data

I have a table that stores the list of users who have accessed a product(with the accessed date).
I have written the below query to get the list of users who have accessed the product B between '2021-02-01' and '2021-02-26'.
SELECT DISTINCT UserName,Country,ADate,Product FROM Report WHERE UserName != '-' and Product='B and (CAST(ADate AS DATE) BETWEEN #startdate AND #enddate '
then it gives the below result:
UserName Country ADate Product
-------- ------ -------- ---------
asson IN 2021-02-10 00:00:00.000 B
rajan US 2021-02-23 00:00:00.000 B
rajan US 2021-02-25 00:00:00.000 B
moody US 2021-02-14 00:00:00.000 B
rajon US 2021-02-01 00:00:00.000 B
lukman US 2021-02-10 00:00:00.000 B
since the user rajan has accessed the product in 2 days it shows 2 entries for rajan even though I have added distinct. So I have modified the query as below:
SELECT DISTINCT UserName,Country,max(ADate),Product FROM Report WHERE UserName != '-' and Product='B' and (CAST(ADate AS DATE) BETWEEN #startdate AND #enddate group by Username,product
This query gives me the required result. But the problem I am facing now is When I select the table with more than a month gap (say data between 2 months), I miss some data in the output. I believe it might be due to the MAX(ADate). Can anyone give a good suggestion to get rid of this issue?
This will give you the latest access date of each user by month
SELECT DISTINCT UserName,Country, month(ADate) as month, max(ADate),Product FROM Report WHERE UserName != '-' and Product='B' group by UserName, Country, month, Product

Finding records that are only before a given date (SQL)

I've tried multiple queries but none of them work.
It's probably really simple.
Here's an example table :
ordernr debnaam debnr orddat
1 Coca-Cola 123 2019-02-07
12 Altec 456 2019-02-07
123 Coca-Cola 123 2016-01-01
1234 Brady 789 2015-03-18
So the point is to find the clients (debnaam) that haven't ordered since the last 2 years.
In my example the only record should be Brady.
I've tried following query :
SELECT a.ordernr, a.debnaam, a.debnr, a.orddat
FROM orkrg as a
WHERE NOT EXISTS(SELECT b.debnr
FROM orkrg as b
WHERE a.ordernr = b.ordernr
AND b.orddat > CONVERT(date, dateadd(year,-2,getdate())))
Or with a Left outer join :
SELECT *
FROM (
SELECT orkrg.ordernr, orkrg.debnaam, orkrg.debnr, orkrg.orddat
FROM orkrg
WHERE orkrg.orddat < CONVERT(date, dateadd(year,-2,getdate()))
) AS a
LEFT OUTER JOIN
(
SELECT orkrg.ordernr, orkrg.debnaam, orkrg.debnr, orkrg.orddat
FROM orkrg
WHERE orkrg.orddat > CONVERT(date, dateadd(year,-2,getdate()))
) as b
ON a.ordernr = b.ordernr
But I always get following result :
ordernr debnaam debnr orddat
123 Coca-Cola 123 2016-01-01
1234 Brady 789 2015-03-18
Could someone please help me?
Thanks!
select a.*
from orkrg as a
where a.orddat < dateadd(year,-2,getdate()) -- this is kinda not needed
and not exists (select 1 -- NOT EXISTS is a safer option than NOT IN, where a null result can cause issues
from orkrg as b
where a.debnaam = b.debnaam and
b.orddat > dateadd(year,-2,getdate()))
I use not exists over not in as default, see here for why
You need to use DATEDIFF() to filter out the older dates:
SELECT a.ordernr, a.debnaam, a.debnr, a.orddat
FROM orkrg as a
WHERE DATEDIFF(year, a.orddat, GETDATE()) > 2
AND A.debnr NOT IN (SELECT b.debnr FROM orkrg as b WHERE
DATEDIFF(year, b.orddat, GETDATE()) <= 2)
select *
from orders as o
where o.debnr not in (select debnr
from orders as u
where orddat > CONVERT(date, dateadd(year,-2,getdate())))

SQL Query - Design struggle

I am fairly new to SQL Server (2012) but I was assigned the project where I have to use it.
The database consists of one table (counted in millions of rows) which looks mainly like this:
Number (float) Date (datetime) Status (nvarchar(255))
999 2016-01-01 14:00:00.000 Error
999 2016-01-02 14:00:00.000 Error
999 2016-01-03 14:00:00.000 Ok
999 2016-01-04 14:00:00.000 Error
888 2016-01-01 14:00:00.000 Error
888 2016-01-02 14:00:00.000 Ok
888 2016-01-03 14:00:00.000 Error
888 2016-01-04 14:00:00.000 Error
777 2016-01-01 14:00:00.000 Error
777 2016-01-02 14:00:00.000 Error
I have to create a query which will show me only the phone numbers (one number per row so probably Group by number?) that meet the conditions:
Number reappears at least 3 times
Last two times (that has to be based on date; originally records are not sorted by date) has to be an Error
For example, in the table above the phone number that meets the criteria is only 888, beacuse for 999 2nd newest status is Ok and number 777 reoccurs only 2 times.
I will appreciate any kind of help!
Thanks in advance!
You can use row_number() and conditional aggregation:
select number
from (select t.*,
row_number() over (partition by number order by date desc) as seqnum
from t
) t
group by number
having count(*) >= 3 and
max(case when seqnum = 1 then status end) = 'Error' and
max(case when seqnum = 2 then status end) = 'Error';
Note: float is a really, really bad type to use for the "number" column. In particular, two numbers can look the same but differ in low-order bits. They will produce different rows in the group by.
You should probably use varchar() for telephone numbers. That gives you the most flexibility. If you need to store the number as a number, then decimal/numeric is a much, much better choice than float.
select *, ROW_NUMBER() OVER(partition by Number, order by date desc) as times
FROM
(
select Number, Date
From table
where Number in
(
select Number
from table
group by Number
having count (*) >3
) as ABC
WHERE ABC.times in (1,2) and ABC.Status = 'Error'
with CTE as
(
select t1.*, row_number() over(partition by t1.Number order by t1.date desc) as r_ord
from MyTable t1
)
select C1.*
from CTE C1
inner join
(
select Number
from CTE
group by Number
having max(r_ord) >=3
) C2
on C1.Number = C2.Number
where C1.r_ord in (1,2)
and C1.Status = 'Error'

Fill rows for missing data by last day of month

I have a table that looks like
UserID LastDayofMonth Count
1234 2015-09-30 00:00:00 12
1237 2015-09-30 00:00:00 5
3233 2015-09-30 00:00:00 3
8336 2015-09-30 00:00:00 22
1234 2015-10-31 00:00:00 8
1237 2015-10-31 00:00:00 5
3233 2015-10-31 00:00:00 7
8336 2015-11-30 00:00:00 52
1234 2015-11-30 00:00:00 8
1237 2015-11-30 00:00:00 5
3233 2015-11-30 00:00:00 7
(with around ~10,000 rows). As you can see in the example, UserID 8336 has no record for October 31st (dates are monthly but always the last day of the month, which I want to keep). How do I return a table with a records that fills in records for a period of four months so that users like 8336 get records like
8336 2015-10-31 00:00:00 0
I do have a calendar table with all days that I can use.
If I understand correctly, you want a record for each user and for each end of month. And, if the record does not currently exist, then you want the value of 0.
This is two step process. Generate all the rows first, using cross join. Then use left join to get the values.
So:
select u.userId, l.LastDayofMonth, coalesce(t.cnt, 0) as cnt
from (select distinct userId from t) u cross join
(select distinct LastDayofMonth from t) l left join
t
on t.userId = u.userId and t.LastDayofMonth = l.LastDayofMonth;
This solution uses a couple of CTEs, not knowing your calendar table layout. The only advantage this solution has over Gordon Linoff's is it doesn't assume at least one user per possible month. I've provided test data per your example with an extra record for the month of July, skipping August entirely.
/************** TEST DATA ******************/
IF OBJECT_ID('MonthlyUserCount','U') IS NULL
BEGIN
CREATE TABLE MonthlyUserCount
(
UserID INT
, LastDayofMonth DATETIME
, [Count] INT
)
INSERT MonthlyUserCount
VALUES (1234,'2015-07-31 00:00:00',12),--extra record
(1234,'2015-09-30 00:00:00',12),
(1237,'2015-09-30 00:00:00',5),
(3233,'2015-09-30 00:00:00',3),
(8336,'2015-09-30 00:00:00',22),
(1234,'2015-10-31 00:00:00',8),
(1237,'2015-10-31 00:00:00',5),
(3233,'2015-10-31 00:00:00',7),
(8336,'2015-11-30 00:00:00',52),
(1234,'2015-11-30 00:00:00',8),
(1237,'2015-11-30 00:00:00',5),
(3233,'2015-11-30 00:00:00',7)
END
/************ END TEST DATA ***************/
DECLARE #Start DATETIME;
DECLARE #End DATETIME;
--establish a date range
SELECT #Start = MIN(LastDayofMonth) FROM MonthlyUserCount;
SELECT #End = MAX(LastDayofMonth) FROM MonthlyUserCount;
--create a custom calendar of days using the date range above and identify the last day of the month
--if your calendar table does this already, modify the next cte to mimic this functionality
WITH cteAllDays AS
(
SELECT #Start AS [Date], CASE WHEN DATEPART(mm, #Start) <> DATEPART(mm, #Start+1) THEN 1 ELSE 0 END [Last]
UNION ALL
SELECT [Date]+1, CASE WHEN DATEPART(mm,[Date]+1) <> DatePart(mm, [Date]+2) THEN 1 ELSE 0 END
FROM cteAllDays
WHERE [Date]< #End
),
--cte using calendar of days to associate every user with every end of month
cteUserAllDays AS
(
SELECT DISTINCT m.UserID, c.[Date] LastDayofMonth
FROM MonthlyUserCount m, cteAllDays c
WHERE [Last]=1
)
--left join the cte to evaluate the NULL and present a 0 count for that month
SELECT c.UserID, c.LastDayofMonth, ISNULL(m.[Count],0) [Count]
FROM cteUserAllDays c
LEFT JOIN MonthlyUserCount m ON m.UserID = c.UserID
AND m.LastDayofMonth =c.LastDayofMonth
ORDER BY c.LastDayofMonth, c.UserID
OPTION ( MAXRECURSION 0 )