SQL: Filling in Missing Records with Conditional - sql

I need to count the number of products that existed in inventory by date. In the database however, a product is only recorded when it was viewed by a consumer.
For example consider this basic table structure:
date | productId | views
July 1 | A | 8
July 2 | A | 6
July 2 | B | 4
July 3 | A | 2
July 4 | A | 8
July 4 | B | 6
July 4 | C | 4
July 5 | C | 2
July 10 | A | 17
Using the following query, I attempt to determine the amount of products in inventory on a given date.
select date, count(distinct productId) as Inventory, sum(views) as views
from (
select date, productId, count(*) as views
from SomeTable
group by date, productID
order by date asc, productID asc
)
group by date
This is the output
date | Inventory | views
July 1 | 1 | 8
July 2 | 2 | 10
July 3 | 1 | 2
July 4 | 3 | 18
July 5 | 1 | 2
July 10 | 1 | 17
My output is not an accurate reflection of how many products were in inventory due to missing rows.
The correct understanding of inventory is as follows:
- Product A was present in inventory from July 1 - July 10.
- Product B was present in inventory from July 2 - July 4.
- Product C was in inventory from July 4 - July 5.
The correct SQL output should be:
date | Inventory | views
July 1 | 1 | 8
July 2 | 2 | 10
July 3 | 2 | 2
July 4 | 3 | 18
July 5 | 2 | 2
July 6 | 1 | 0
July 7 | 1 | 0
July 8 | 1 | 0
July 9 | 1 | 0
July 10 | 1 | 17
If you are following along, let me confirm that I am comfortable defining "in inventory" as the date difference between the first & last view.
I have followed the following faulty process:
First I created a table which was the cartesian product of every productID & every date.
'''
with Dates as (
select date
from SomeTable
group by date
),
Products as (
select productId
from SomeTable
group by productId
)
select Dates.date, Products.productId
from Dates cross join Products
'''
Then I attempted do a right outer join to reduce this to just the missing records:
with Records as (
select date, productId, count(*) as views
from SomeTable
group by date, productId
),
Cartesian as (
{See query above}
)
Select Cartesian.date, Cartesian.productId, 0 as views #for upcoming union
from Cartesian right outer join Records
on Cartesian.date = Records.date
where Records.productId is null
Then with the missing rows in hand, union them back onto the Records.
in doing so, I create a new problem: extra rows.
date | productId | views
July 1 | A | 8
July 1 | B | 0
July 1 | C | 0
July 2 | A | 6
July 2 | B | 4
July 2 | C | 0
July 3 | A | 2
July 3 | B | 0
July 3 | C | 0
July 4 | A | 8
July 4 | B | 6
July 4 | C | 4
July 5 | A | 2
July 5 | B | 0
July 5 | C | 0
July 6 | A | 0
July 6 | B | 0
July 6 | C | 0
July 7 | A | 0
July 7 | B | 0
July 7 | C | 0
July 8 | A | 0
July 8 | B | 0
July 8 | C | 0
July 9 | A | 0
July 9 | B | 0
July 9 | C | 0
July 10 | A | 17
July 10 | B | 0
July 10 | C | 0
And when I run my simple query
select date, count(distinct productId) as Inventory, sum(views) as views
on that table I get the wrong output again:
date | Inventory | views
July 1 | 3 | 8
July 2 | 3 | 10
July 3 | 3 | 2
July 4 | 3 | 18
July 5 | 3 | 2
July 6 | 3 | 0
July 7 | 3 | 0
July 8 | 3 | 0
July 9 | 3 | 0
July 10 | 3 | 17
My next thought would be to iterate through each productId, determine it's first & last date, then Union that with the Cartesian table with the condition that the Cartesian.date falls between the first & last date for each specific product.
There's got to be an easier way to do this. Thanks.

Below is for BigQuery Standard SQL
#standardSQL
WITH dates AS (
SELECT day FROM (
SELECT MIN(day) min_day, MAX(day) max_day
FROM `project.dataset.table`
), UNNEST(GENERATE_DATE_ARRAY(min_day, max_day, INTERVAL 1 DAY)) day
), ranges AS (
SELECT productId, MIN(day) min_day, MAX(day) max_day
FROM `project.dataset.table` t
GROUP BY productId
)
SELECT day, COUNT(DISTINCT productId) Inventory, SUM(IFNULL(views, 0)) views
FROM dates d, ranges r
LEFT JOIN `project.dataset.table` USING(day, productId)
WHERE day BETWEEN min_day AND max_day
GROUP BY day
If to apply to sample data from your question as in below example
#standardSQL
WITH `project.dataset.table` AS (
SELECT DATE '2019-07-01' day, 'A' productId, 8 views UNION ALL
SELECT '2019-07-02', 'A', 6 UNION ALL
SELECT '2019-07-02', 'B', 4 UNION ALL
SELECT '2019-07-03', 'A', 2 UNION ALL
SELECT '2019-07-04', 'A', 8 UNION ALL
SELECT '2019-07-04', 'B', 6 UNION ALL
SELECT '2019-07-04', 'C', 4 UNION ALL
SELECT '2019-07-05', 'C', 2 UNION ALL
SELECT '2019-07-10', 'A', 17
), dates AS (
SELECT day FROM (
SELECT MIN(day) min_day, MAX(day) max_day
FROM `project.dataset.table`
), UNNEST(GENERATE_DATE_ARRAY(min_day, max_day, INTERVAL 1 DAY)) day
), ranges AS (
SELECT productId, MIN(day) min_day, MAX(day) max_day
FROM `project.dataset.table` t
GROUP BY productId
)
SELECT day, COUNT(DISTINCT productId) Inventory, SUM(IFNULL(views, 0)) views
FROM dates d, ranges r
LEFT JOIN `project.dataset.table` USING(day, productId)
WHERE day BETWEEN min_day AND max_day
GROUP BY day
-- ORDER BY day
result is
Row day Inventory views
1 2019-07-01 1 8
2 2019-07-02 2 10
3 2019-07-03 2 2
4 2019-07-04 3 18
5 2019-07-05 2 2
6 2019-07-06 1 0
7 2019-07-07 1 0
8 2019-07-08 1 0
9 2019-07-09 1 0
10 2019-07-10 1 17

Related

sql group by monthly sum and sum year by month

RDBMS - Latest Oracle
I'm out of my element here. I need to organize account transaction information by account and by month, and also use another column to show summed transactions for year to date. Here is a depiction of what I'm trying to get
ACCT_ID | ACCT_MM | FISCAL_YYYY | FISCAL_MM_AMT | YTD_AMT
------------------------------------------------------------
1 | 11 | 2018 | 25 | 100
1 | 12 | 2018 | 50 | 150
1 | 01 | 2019 | 20 | 20
I know you can get FISCAL_MM_AMT with a group by ACCT_MM, FISCAL_YYYY
this is all I have figured out so far.
SELECT ACCT_ID,ACCT_MM,FISCAL_YYYY,SUM(NVL(ACCT_TRNSCTN_AMT,0))
FROM TBL_ACCT_DETAIL
GROUP BY ACCT_ID,ACCT_MM,FISCAL_YYYY
Now how to combine this with the additional column YTD_AMT to that adds up all totals for that year up to the current month is what has me baffled. sql noob ftw.
Try analytical function SUM as following:
SELECT T.*,
SUM(FISCAL_MM_AMT)
OVER (PARTITION BY ACCT_ID, FISCAL_YYYY
ORDER BY ACCT_MM) AS YTD_AMT
FROM
(SELECT ACCT_ID,ACCT_MM,FISCAL_YYYY,SUM(NVL(ACCT_TRNSCTN_AMT,0)) AS FISCAL_MM_AMT
FROM TBL_ACCT_DETAIL
GROUP BY ACCT_ID,ACCT_MM,FISCAL_YYYY);
Cheers!!
You can use the cumulative sum window function:
SELECT ACCT_ID, ACCT_MM, FISCAL_YYYY,
COALESCE(SUM(ACCT_TRNSCTN_AMT), 0),
SUM(SUM(ACCT_TRNSCTN_AMT)) OVER (PARTITION BY ACCT_ID, FISCAL_YYYY ORDER BY ACCT_MM) AS YTD
FROM TBL_ACCT_DETAIL
GROUP BY ACCT_ID, ACCT_MM, FISCAL_YYYY
Do you want to have one additional column bringing the sum of the year with it, and a year-to-date column?
OLAP functions are a prerequisite. But every respectable RDBMS should offer those by now.
Then, I think (adding what I presume should be the input) ... you should go:
WITH
---- this is just the input so I have example data
input( acct_id,acct_mm,fiscal_yyyy,fiscal_mm_amt) AS (
SELECT 1, 1, 2018, 5
UNION ALL SELECT 1, 3, 2018, 5
UNION ALL SELECT 1, 4, 2018, 5
UNION ALL SELECT 1, 5, 2018, 10
UNION ALL SELECT 1, 6, 2018, 10
UNION ALL SELECT 1, 7, 2018, 10
UNION ALL SELECT 1, 8, 2018, 10
UNION ALL SELECT 1, 9, 2018, 10
UNION ALL SELECT 1, 10, 2018, 10
UNION ALL SELECT 1, 11, 2018, 25
UNION ALL SELECT 1, 12, 2018, 50
UNION ALL SELECT 1, 01, 2019, 20
)
---- end of input -----
SELECT
*
, SUM(fiscal_mm_amt) OVER(
PARTITION BY acct_id,fiscal_yyyy
) AS fiscal_yy_amt
, SUM(fiscal_mm_amt) OVER(
PARTITION BY acct_id,fiscal_yyyy
ORDER BY acct_mm
) AS ytd_amt
FROM input;
-- out acct_id | acct_mm | fiscal_yyyy | fiscal_mm_amt | fiscal_yy_amt | ytd_amt
-- out ---------+---------+-------------+---------------+---------------+---------
-- out 1 | 1 | 2018 | 5 | 150 | 5
-- out 1 | 3 | 2018 | 5 | 150 | 10
-- out 1 | 4 | 2018 | 5 | 150 | 15
-- out 1 | 5 | 2018 | 10 | 150 | 25
-- out 1 | 6 | 2018 | 10 | 150 | 35
-- out 1 | 7 | 2018 | 10 | 150 | 45
-- out 1 | 8 | 2018 | 10 | 150 | 55
-- out 1 | 9 | 2018 | 10 | 150 | 65
-- out 1 | 10 | 2018 | 10 | 150 | 75
-- out 1 | 11 | 2018 | 25 | 150 | 100
-- out 1 | 12 | 2018 | 50 | 150 | 150
-- out 1 | 1 | 2019 | 20 | 20 | 20
-- out (12 rows)
-- out
-- out Time: First fetch (12 rows): 76.235 ms. All rows formatted: 76.288 ms

How to get last value for each user_id (postgreSQL)

Current ratio of user is his last inserted ratio in table "Ratio History"
user_id | year | month | ratio
For example if user with ID 1 has two rows
1 | 2019 | 2 | 10
1 | 2019 | 3 | 15
his ratio is 15.
there is some slice from develop table
user_id | year | month | ratio
1 | 2018 | 7 | 10
2 | 2018 | 8 | 20
3 | 2018 | 8 | 30
1 | 2019 | 1 | 40
2 | 2019 | 2 | 50
3 | 2018 | 10 | 60
2 | 2019 | 3 | 70
I need a query which will select grouped rows by user_id and their last ratio.
As a result of the request, the following entries should be selected
user_id | year | month | ratio
1 | 2019 | 1 | 40
2 | 2019 | 3 | 70
3 | 2018 | 10 | 60
I tried use this query
select rh1.user_id, ratio, rh1.year, rh1.month from ratio_history rh1
join (
select user_id, max(year) as maxYear, max(month) as maxMonth
from ratio_history group by user_id
) rh2 on rh1.user_id = rh2.user_id and rh1.year = rh2.maxYear and rh1.month = rh2.maxMonth
but i got only one row
Use distinct on:
select distinct on (user_id) rh.*
from ratio_history rh
order by user_id, year desc, month desc;
distinct on is a very convenient Postgres extension. It returns one row for the key values in parentheses? Which row, it is the first row based on the sort criteria. Note that the sort criteria need to start with the expressions in parentheses.

SQL Query to find products with consecutive increase in their prices

I have a table with history data of product prices, which fluctuates every minute.
This is one day's snapshot:
ProductName | Iteration | Price | Date
----------------------------------------------
A | 1 | 10 | 1st Feb 2019 12:01 AM
B | 1 | 10 | 1st Feb 2019 12:01 AM
C | 1 | 10 | 1st Feb 2019 12:01 AM
A | 2 | 12 | 1st Feb 2019 12:02 AM
B | 2 | 9 | 1st Feb 2019 12:02 AM
C | 2 | 15 | 1st Feb 2019 12:02 AM
A | 3 | 15 | 1st Feb 2019 12:03 AM
B | 3 | 9 | 1st Feb 2019 12:03 AM
C | 3 | 14 | 1st Feb 2019 12:03 AM
A | 4 | 14 | 1st Feb 2019 12:04 AM
B | 4 | 11 | 1st Feb 2019 12:04 AM
C | 4 | 14 | 1st Feb 2019 12:04 AM
And I want to find out the product name (for each day) which shows consecutive increase in it's price in consecutive iterations, along with the number of occurrence.
In the given sample data above, price of product A increased consecutively.
I want the output like below:
ProductName | Occurrence
------------------------
A | 3
I tried self joining like below:
SELECT A.ProductName, A.Iteration as LastIteration, B.Iteration as CurrentIteration, A.Price as LastPrice, B.Price as CurrentPrice FROM
ProductDetails (NOLOCK) A
INNER JOIN ProductDetails (NOLOCK) B ON A.ProductName = B.ProductName AND B.Iteration=A.Iteration+1 AND B.Price>A.Price AND Convert(Date, A.Date)=Convert(Date, B.Date)
But this does not give me all the consecutive occurrences.
Can somebody help?
You can use window functions for this. Find the boundaries where the price does not increase. Then use this to define groups -- and aggregate to find the length of the groups.
The following gets all the durations of periods of increasing prices:
select productname, count(*) as num_prices,
min(price) as first_price, max(price) as last_price
from (select t.*,
sum(case when prev_price < price then 0 else 1 end) over (partition by productname order by iteration) as grp
from (select t.*,
lag(price) over (partition by productname order by iteration) as prev_price
from t
) t
) t
group by productname, grp
having count(*) > 1;
If you want the largest, you can add:
select top (1) with ties . . .
. . .
order by row_number() over (partition by productname order by count(*) desc)

Rolling Average SQL

Hi I have a dataset where I have Year Month and output variables with the values as following:
Year | Month | Output
2015 | 1 | 12
2015 | 2 | 24
2015 | 3 | 2
2015 | 4 | 3
2015 | 5 | 7
2015 | 6 | 3
2015 | 7 | 7
2015 | 8 | 6
2015 | 9 | 7
2015 | 10 | 8
2015 | 11 | 3
2015 | 12 | 6
2016 | 1 | 3
2016 | 2 | 6
2016 | 3 | 8
2016 | 4 | 9
2016 | 5 | 4
......... and so on...
I want to add a new column in the dataset as Rolling_Average
Rolling_Average = Sum of previous 12 month Output/ Output of this month
for example :
Rolling_Average (for 2015-7) = output (2015-01) + output (2015-02) +output (2015-03) + output (2015-04) +output (2015-05) + output (2015-06) / output (2015-07)
I tried couple of queries online to get the output but it didn't work for me. Can someone please help me
Output Required is as follows:
Year | Month | Output | Rolling Average
2015 | 1 | 12 | 12
2015 | 2 | 24 | 0.5
2015 | 3 | 2 | 18
2015 | 4 | 3 | 38/3
2015 | 5 | 7 | 45/7
2015 | 6 | 3 | 48/3
2015 | 7 | 7 | 55/7
2015 | 8 | 6 | 61/6
2015 | 9 | 7 | 68/7
2015 | 10 | 8 | 74/8
2015 | 11 | 3 | 77/3
2015 | 12 | 6 | 83/6
2016 | 1 | 3 | 86/3
2016 | 2 | 6 | 92/6
2016 | 3 | 8 | 100/8
2016 | 4 | 9 | 109/9
2016 | 5 | 4 | 113/4
The Query I tried is :
SELECT DISTINCT
//CALCULATIONS
Year,
Month,
Output,
(sum(CAST(Output) AS DOUBLE)))
over(order by year,month rows between 12 preceding and 1 preceding )
as Rolling_Average
from my_table
group by Year,Month
order by Year,Month
It gives me error :
Syntax error: OVER keyword must follow a function call
Also I have tried other things
Can someone please help me in an easy way . I am using SQL Plx it is similar to SQL
Thank You!
You might have misplaced some parentheses
(sum( CAST(Output) AS DOUBLE ))) over (order by year, month rows between 12 preceding and 1 preceding ) as Rolling_Average
Versus:
SUM( CAST(Output AS DOUBLE) ) OVER (order by year, month rows between 12 preceding and 1 preceding) as Rolling_Average
You can also ROUND that result.
And those records already seem to be unique by Year and Month.
So there's not really a need to group on those.
SELECT
t.Year, t.Month, t.Output,
ROUND(SUM(CAST(t.Output AS INT)) OVER (ORDER BY t.Year, t.Month ROWS BETWEEN 12 PRECEDING AND 1 PRECEDING)*1.0 / CAST(t.Output AS INT), 1) as Rolling_Average
FROM my_table t
ORDER BY t.Year, t.Month;
And if the window functions aren't supported, then this will work:
SELECT
t1.Year, t1.Month, t1.Output,
ROUND(SUM(CAST(t2.Output AS INT))*1.0 / CAST(t1.Output AS INT), 1) as Rolling_Average
FROM my_table t1
LEFT JOIN my_table t2 ON ((t2.Year = t1.Year AND t2.Month < t1.Month) OR
(t2.Year = t1.Year - 1 AND t2.Month >= t1.Month))
GROUP BY t1.Year, t1.Month, t1.Output
ORDER BY t1.Year, t1.Month;
db<>fiddle here
Try this(if you use sql-server)
Select *
from tableName T
outer apply (
select sum(output) Rolling_Average
from tableName T_in on T_in.year = T.year and T_in.Month <= T.Month
)x

How to insert additional values in between a GROUP BY

i am currently making a monthly report using MySQL. I have a table named "monthly" that looks something like this:
id | date | amount
10 | 2009-12-01 22:10:08 | 7
9 | 2009-11-01 22:10:08 | 78
8 | 2009-10-01 23:10:08 | 5
7 | 2009-07-01 21:10:08 | 54
6 | 2009-03-01 04:10:08 | 3
5 | 2009-02-01 09:10:08 | 456
4 | 2009-02-01 14:10:08 | 4
3 | 2009-01-01 20:10:08 | 20
2 | 2009-01-01 13:10:15 | 10
1 | 2008-12-01 10:10:10 | 5
Then, when i make a monthly report (which is based by per month of per year), i get something like this.
yearmonth | total
2008-12 | 5
2009-01 | 30
2009-02 | 460
2009-03 | 3
2009-07 | 54
2009-10 | 5
2009-11 | 78
2009-12 | 7
I used this query to achieved the result:
SELECT substring( date, 1, 7 ) AS yearmonth, sum( amount ) AS total
FROM monthly
GROUP BY substring( date, 1, 7 )
But I need something like this:
yearmonth | total
2008-01 | 0
2008-02 | 0
2008-03 | 0
2008-04 | 0
2008-05 | 0
2008-06 | 0
2008-07 | 0
2008-08 | 0
2008-09 | 0
2008-10 | 0
2008-11 | 0
2008-12 | 5
2009-01 | 30
2009-02 | 460
2009-03 | 3
2009-05 | 0
2009-06 | 0
2009-07 | 54
2009-08 | 0
2009-09 | 0
2009-10 | 5
2009-11 | 78
2009-12 | 7
Something that would display the zeroes for the month that doesnt have any value. Is it even possible to do that in a MySQL query?
You should generate a dummy rowsource and LEFT JOIN with it:
SELECT *
FROM (
SELECT 1 AS month
UNION ALL
SELECT 2
…
UNION ALL
SELECT 12
) months
CROSS JOIN
(
SELECT 2008 AS year
UNION ALL
SELECT 2009 AS year
) years
LEFT JOIN
mydata m
ON m.date >= CONCAT_WS('.', year, month, 1)
AND m.date < CONCAT_WS('.', year, month, 1) + INTERVAL 1 MONTH
GROUP BY
year, month
You can create these as tables on disk rather than generate them each time.
MySQL is the only system of the major four that does have allow an easy way to generate arbitrary resultsets.
Oracle, SQL Server and PostgreSQL do have those (CONNECT BY, recursive CTE's and generate_series, respectively)
Quassnoi is right, and I'll add a comment about how to recognize when you need something like this:
You want '2008-01' in your result, yet nothing in the source table has a date in January, 2008. Result sets have to come from the tables you query, so the obvious conclusion is that you need an additional table - one that contains each month you want as part of your result.