SQL: How to deal with NULL and PARTITION BY?

SQL: How to deal with NULL and PARTITION BY? - sql

I've got a question, if you don't mind terribly.
So suppose I have this kind of a table here – Products (amount sold by quarter in 2000, only there are multiple entries for the same product and quarter (with different dates)):
product
quarter
amount sold
Jeans
1
20
Jeans
2
40
Jeans
3
60
Jeans
4
5
Skirt
1
10
Skirt
2
5
Skirt
3
30
Blouse
1
15
Blouse
2
40
Blouse
3
60
Blouse
4
15
I want to reintroduce it as follows:
product
quarter1
quarter2
quarter3
quarter4
Jeans
20
40
60
5
Skirt
10
5
30
Null
Blouse
15
40
60
15
I decided to do it with partition (cause it's not exactly that simple, there are different rows with the same quarter for the same product, but different amount sold, that's why it's sum(amount_sold), but you get the idea, I hope):
WITH quater_sales as(
SELECT DISTINCT pro.product, pro.quarter, to_char (sum(pro.amount_sold) OVER (PARTITION BY pro.product, pro.quarter)) AS quater
FROM products pro
ORDER BY pro.pro.product)
SELECT quater_sales.prod_product, quater_sales.quater AS "Q1", qu2.quater AS "Q2", qu3.quater AS "Q3", qu4.quater AS "Q4"
FROM quater_sales
JOIN quater_sales qu2 ON quater_sales.prod_subcategory=qu2.prod_subcategory
JOIN quater_sales qu3 ON quater_sales.prod_subcategory=qu3.prod_subcategory
JOIN quater_sales qu4 ON quater_sales.prod_subcategory=qu4.prod_subcategory
WHERE quater_sales.calendar_quarter_number=1 and qu2.calendar_quarter_number=2 and qu3.calendar_quarter_number=3 and qu4.calendar_quarter_number=4
The problem is with partition (or maybe it's the condition of select) that the product that was not sold in all the 4 quarters is just discarded. What I basically get in the end is this:
product
quarter1
quarter2
quarter3
quarter4
Jeans
20
40
60
5
Blouse
15
40
60
15
So how do I make "skirts" appear there too? I am a bit stuck with this.

Have you considered using a PIVOT statement?
WITH
quarter_sales (product, quarter, amount_sold)
AS
(SELECT 'Jeans', 1, 20 FROM DUAL
UNION ALL
SELECT 'Jeans', 2, 40 FROM DUAL
UNION ALL
SELECT 'Jeans', 3, 60 FROM DUAL
UNION ALL
SELECT 'Jeans', 4, 5 FROM DUAL
UNION ALL
SELECT 'Skirt', 1, 10 FROM DUAL
UNION ALL
SELECT 'Skirt', 2, 5 FROM DUAL
UNION ALL
SELECT 'Skirt', 3, 30 FROM DUAL
UNION ALL
SELECT 'Blouse', 1, 15 FROM DUAL
UNION ALL
SELECT 'Blouse', 2, 40 FROM DUAL
UNION ALL
SELECT 'Blouse', 3, 60 FROM DUAL
UNION ALL
SELECT 'Blouse', 4, 15 FROM DUAL)
SELECT *
FROM (SELECT *
FROM quarter_sales qs)
PIVOT (SUM (amount_sold)
FOR quarter
IN (1 AS quarter1, 2 AS quarter2, 3 AS quarter3, 4 AS quarter4));
PRODUCT QUARTER1 QUARTER2 QUARTER3 QUARTER4
__________ ___________ ___________ ___________ ___________
Blouse 15 40 60 15
Jeans 20 40 60 5
Skirt 10 5 30

try pivot. this is how you would pivot in tsql
declare #tmp as table(product varchar(20),quarter int,[amount sold] int);
insert into #tmp values
('Jeans', 1, 20)
,('Jeans', 2, 40)
,('Jeans', 3, 60)
,('Jeans', 4, 5)
,('Skirt', 1, 10)
,('Skirt', 2, 5)
,('Skirt', 3, 30)
,('Blouse', 1, 15)
,('Blouse', 2, 40)
,('Blouse', 3, 60)
,('Blouse', 4, 15)
select product, [1] as quarter1,[2] as quarter2,[3] as quarter3,[4] as quarter4
from
(
select product,quarter,[amount sold] from #tmp)p
pivot
(
sum([amount sold])
for quarter in([1],[2],[3],[4])
) as pvt
output:
product quarter1 quarter2 quarter3 quarter4
Blouse 15 40 60 15
Jeans 20 40 60 5
Skirt 10 5 30 NULL

Related

SQL Implementing Forward Fill logic

I have a dataset within a date range which has three columns, Product_type, date and metric. For a given product_type, data is not available for all days. For the missing rows, we would like to do a forward date fill for next n days using the last value of the metric.
Product_type
date
metric
A
2019-10-01
10
A
2019-10-02
12
A
2019-10-03
15
A
2019-10-04
5
A
2019-10-05
5
A
2019-10-06
5
A
2019-10-16
12
A
2019-10-17
23
A
2019-10-18
34
Here, the data from 2019-10-04 to 2019-10-06, has been forward filled. There might be bigger gaps in the dates, but we only want to fill the first n days.
Here, n=2, so rows 5 and 6 has been forward filled.
I am not sure how to implement this logic in SQL.

Here's one option. Read comments within code.
Sample data:
SQL> WITH
2 test (product_type, datum, metric)
3 AS
4 (SELECT 'A', DATE '2019-10-01', 10 FROM DUAL
5 UNION ALL
6 SELECT 'A', DATE '2019-10-02', 12 FROM DUAL
7 UNION ALL
8 SELECT 'A', DATE '2019-10-03', 15 FROM DUAL
9 UNION ALL
10 SELECT 'A', DATE '2019-10-04', 5 FROM DUAL
11 UNION ALL
12 SELECT 'A', DATE '2019-10-16', 12 FROM DUAL
13 UNION ALL
14 SELECT 'A', DATE '2019-10-18', 23 FROM DUAL),
Query begins here:
15 temp
16 AS
17 -- CB_FWD_FILL = 1 if difference between two consecutive dates is larger than 1 day
18 -- (i.e. that's the gap to be forward filled)
19 (SELECT product_type,
20 datum,
21 metric,
22 LEAD (datum) OVER (PARTITION BY product_type ORDER BY datum)
23 next_datum,
24 CASE
25 WHEN LEAD (datum)
26 OVER (PARTITION BY product_type ORDER BY datum)
27 - datum >
28 1
29 THEN
30 1
31 ELSE
32 0
33 END
34 cb_fwd_fill
35 FROM test)
36 -- original data from the table
37 SELECT product_type, datum, metric FROM test
38 UNION ALL
39 -- DATUM is the last date which is OK; add LEVEL pseudocolumn to it to fill the gap
40 -- with PAR_N number of rows
41 SELECT product_type, datum + LEVEL, metric
42 FROM (SELECT product_type, datum, metric
43 FROM (-- RN = 1 means that that's the first gap in data set - that's the one
44 -- that has to be forward filled
45 SELECT product_type,
46 datum,
47 metric,
48 ROW_NUMBER ()
49 OVER (PARTITION BY product_type ORDER BY datum) rn
50 FROM temp
51 WHERE cb_fwd_fill = 1)
52 WHERE rn = 1)
53 CONNECT BY LEVEL <= &par_n
54 ORDER BY datum;
Result:
Enter value for par_n: 2
PRODUCT_TYPE DATUM METRIC
--------------- ---------- ----------
A 2019-10-01 10
A 2019-10-02 12
A 2019-10-03 15
A 2019-10-04 5
A 2019-10-05 5 --> newly added
A 2019-10-06 5 --> rows
A 2019-10-16 12
A 2019-10-18 23
8 rows selected.
SQL>

Another solution:
WITH test (product_type, datum, metric) AS
(
SELECT 'A', DATE '2019-10-01', 10 FROM DUAL
UNION ALL
SELECT 'A', DATE '2019-10-02', 12 FROM DUAL
UNION ALL
SELECT 'A', DATE '2019-10-03', 15 FROM DUAL
UNION ALL
SELECT 'A', DATE '2019-10-04', 5 FROM DUAL
UNION ALL
SELECT 'A', DATE '2019-10-16', 12 FROM DUAL
UNION ALL
SELECT 'A', DATE '2019-10-18', 23 FROM DUAL
),
minmax(mindatum, maxdatum) AS (
SELECT MIN(datum), max(datum) from test
),
alldates (datum, product_type) AS
(
SELECT mindatum + level - 1, t.product_type FROM minmax,
(select distinct product_type from test) t
connect by mindatum + level <= (select maxdatum from minmax)
),
grouped as (
select a.datum, a.product_type, t.metric,
count(t.product_type) over(partition by a.product_type order by a.datum) as grp
from alldates a
left join test t on t.datum = a.datum
),
final_table as (
select g.datum, g.product_type, g.grp, g.rn,
last_value(g.metric ignore nulls) over(partition by g.product_type order by g.datum) as metric
from (
select g.*, row_number() over(partition by product_type, grp order by datum) - 1 as rn
from grouped g
) g
)
select datum, product_type, metric
from final_table
where rn <= &par_n
order by datum
;

How to find the row with the highest value cell based on another column from within a group of values?

I have this table:
Site_ID
Volume
RPT_Date
RPT_Hour
1
10
01/01/2021
1
1
7
01/01/2021
2
1
13
01/01/2021
3
1
11
01/16/2021
1
1
3
01/16/2021
2
1
5
01/16/2021
3
2
9
01/01/2021
1
2
24
01/01/2021
2
2
16
01/01/2021
3
2
18
01/16/2021
1
2
7
01/16/2021
2
2
1
01/16/2021
3
I need to select the RPT_Hour with the highest Volume for each set of dates
Needed Output:
Site_ID
Volume
RPT_Date
RPT_Hour
1
13
01/01/2021
1
1
11
01/16/2021
1
2
24
01/01/2021
2
2
18
01/16/2021
1
SELECT site_id, volume, rpt_date, rpt_hour
FROM (SELECT t.*,
ROW_NUMBER()
OVER (PARTITION BY site_id, rpt_date ORDER BY volume DESC) AS rn
FROM MyTable) t
WHERE rn = 1;
I cannot figure out how to group the table into like date groups. If I could do that, I think the rn = 1 will return the highest volume row for each date.

The way I see it, your query is OK (but rpt_hour in desired output is not).
SQL> with test (site_id, volume, rpt_date, rpt_hour) as
2 (select 1, 10, date '2021-01-01', 1 from dual union all
3 select 1, 7, date '2021-01-01', 2 from dual union all
4 select 1, 13, date '2021-01-01', 3 from dual union all
5 select 1, 11, date '2021-01-16', 1 from dual union all
6 select 1, 3, date '2021-01-16', 2 from dual union all
7 select 1, 5, date '2021-01-16', 3 from dual union all
8 --
9 select 2, 9, date '2021-01-01', 1 from dual union all
10 select 2, 24, date '2021-01-01', 3 from dual union all
11 select 2, 16, date '2021-01-01', 3 from dual union all
12 select 2, 18, date '2021-01-16', 1 from dual union all
13 select 2, 7, date '2021-01-16', 2 from dual union all
14 select 2, 1, date '2021-01-16', 3 from dual
15 ),
16 temp as
17 (select t.*,
18 row_number() over (partition by site_id, rpt_date order by volume desc) rn
19 from test t
20 )
21 select site_id, volume, rpt_date, rpt_hour
22 from temp
23 where rn = 1
24 /
SITE_ID VOLUME RPT_DATE RPT_HOUR
---------- ---------- ---------- ----------
1 13 01/01/2021 3
1 11 01/16/2021 1
2 24 01/01/2021 3
2 18 01/16/2021 1
SQL>

One option would be using MAX(..) KEEP (DENSE_RANK ..) OVER (PARTITION BY ..) analytic function without need of any subquery such as :
SELECT DISTINCT
site_id,
MAX(volume) KEEP (DENSE_RANK FIRST ORDER BY volume DESC) OVER
(PARTITION BY site_id, rpt_date) AS volume,
rpt_date,
MAX(rpt_hour) KEEP (DENSE_RANK FIRST ORDER BY volume DESC) OVER
(PARTITION BY site_id, rpt_date) AS rpt_hour
FROM t
GROUP BY site_id, rpt_date, volume, rpt_hour
ORDER BY site_id, rpt_date
Demo

How to group sales by month, quarter and year in the same row using case?

I'm trying to return the total number of sales for every month, every quarter, for the year 2016. I want to display annual sales on the first month row, and not on the other rows. Plus, I want to display the quarter sales on the first month of each quarter, and not on the others.
To further explain this, here's what I want to achieve:
MONTH MONTH_SALES QUARTER_SALES YEAR_SALES
1 2183 5917 12505
2 1712 - -
3 1972 - -
4 2230 6588 -
5 2250 - -
6 2108 - -
Here's my SQL query so far:
SELECT
Time.month,
SUM(Sales.sales) AS MONTH_SALES, -- display monthly sales.
CASE
WHEN MOD(Time.month, 3) = 1 THEN ( -- first month of quarter
SELECT
SUM(Sales.sales)
FROM
Sales,
Time
WHERE
Sales.Time_id = Time.Time_id
AND Time.year = 2016
GROUP BY
Time.quarter
FETCH FIRST 1 ROW ONLY
)
END AS QUARTER_SALES,
CASE
WHEN Time.month = 1 THEN ( -- display annual sales.
SELECT
SUM(Sales.sales)
FROM
Sales,
Time
WHERE
Sales.Time_id = Time.Time_id
AND Time.year = 2016
GROUP BY
Time.year
)
END AS YEAR_SALES
FROM
Sales,
Time
WHERE
Sales.Time_id = Time.Time_id
AND Time.year = 2016
GROUP BY
Time.month
ORDER BY
Time.month
I'm almost getting the desired output, but I'm getting the same duplicated 6588 value in quarter sales for the first and fourth month (because I'm fetching the first row that comes from first quarter).
MONTH MONTH_SALES QUARTER_SALES YEAR_SALES
1 2183 6588 12505
2 1712 - -
3 1972 - -
4 2230 6588 -
5 2250 - -
6 2108 - -
I even tried to put WHERE Time.quarter = ((Time.month * 4) / 12) but the month value from the outer query doesn't get passed in the subquery.
Unfortunately I don't have enough experience with CASE WHEN expressions to know how to pass the month row. Any tips would be awesome.

How about this?
Sample data:
SQL> with
2 time (time_id, month, quarter, year) as
3 (select 1, 1, 1, 2016 from dual union all
4 select 2, 2, 1, 2016 from dual union all
5 select 3, 3, 1, 2016 from dual union all
6 select 4, 5, 2, 2016 from dual union all
7 select 5, 7, 3, 2016 from dual union all
8 select 6, 8, 3, 2016 from dual union all
9 select 7, 9, 3, 2016 from dual union all
10 select 8, 10, 4, 2016 from dual union all
11 select 9, 11, 4, 2016 from dual
12 ),
13 sales (time_id, sales) as
14 (select 1, 100 from dual union all
15 select 1, 100 from dual union all
16 select 2, 200 from dual union all
17 select 3, 300 from dual union all
18 select 4, 400 from dual union all
19 select 5, 500 from dual union all
20 select 6, 600 from dual union all
21 select 7, 700 from dual union all
22 select 8, 800 from dual union all
23 select 9, 900 from dual
24 ),
Query begins here; it uses sum aggregate in its analytic form; partition by clause says what to compute. row_number, similarly, sorts rows in each quarter/year - it is later used in CASE expression to decide whether to show quarterly/yearly total or not.
25 temp as
26 (select t.month, t.quarter, t.year, sum(s.sales) month_sales
27 from time t join sales s on s.time_id = t.time_id
28 where t.year = 2016
29 group by t.month, t.quarter, t.year
30 ),
31 temp2 as
32 (select month, quarter, month_sales,
33 sum(month_sales) over (partition by quarter) quarter_sales,
34 sum(month_sales) over (partition by year ) year_sales,
35 row_number() over (partition by quarter order by quarter) rnq,
36 row_number() over (partition by year order by null) rny
37 from temp
38 )
39 select month,
40 month_sales
41 case when rnq = 1 then quarter_sales end month_sales,
42 case when rny = 1 then year_sales end year_sales
43 from temp2
44 order by month;
MONTH MONTH_SALES QUARTER_SALES YEAR_SALES
---------- ---------- ----------- ----------
1 200 700 4600
2 200
3 300
4 400 1500
5 500
6 600
7 700 2400
8 800
9 900
9 rows selected.
SQL>

SQL - How to remove from the total what has already been counted

I'm asking ur help
here this is my set
ID date_answered
---------- --------------
1 16/09/19
2 16/09/19
3 16/09/19
4 16/09/19
5 16/09/19
6 16/09/19
7 16/09/19
8 16/09/19
9 16/09/19
10 17/09/19
11 17/09/19
12 17/09/19
13 18/09/19
14 18/09/19
15 18/09/19
16 18/09/19
17 19/09/19
18 19/09/19
19 19/09/19
20 19/09/19
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
as you can see :
16/09/2019 there are 9 people who answered
17/09/2019 there are 7 people who answered
18/09/2019 there are 4 people who answered
19/09/2019 there are 4 people who answered
there are still 20 people who didnt answer
to calculate how many people answered per day, i have done :
nb_answered = count(id) over (partition by date_answered order by date_answered)
now my problem is there, i'm trying to get that :
date_answered nb_answered nb_left
--------------- -------------- --------
16/09/2019 9 40
17/09/2019 7 31(40-9)
18/09/2019 4 24(31-7)
19/09/2019 4 20(24-4)
i have tried :
count(id) over (order by date_complete rows between UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) which give me 40 (total person).
it's cool for the first date, but when i move to the second date i dont know how to have 31.
How can I do that: every day I remove from the total, the number that has already answered
Do you have any suggestion ?

Another option might be correlated subquery in SELECT statement.
Example is a little bit simplified (didn't feel like typing that much).
SQL> with test (id, da) as
2 (select 1, 16092019 from dual union all
3 select 2, 16092019 from dual union all
4 select 3, 16092019 from dual union all
5 select 4, 16092019 from dual union all
6 select 5, 16092019 from dual union all
7 --
8 select 6, 17092019 from dual union all
9 select 7, 17092019 from dual union all
10 select 8, 17092019 from dual union all
11 --
12 select 9, 19092019 from dual union all
13 --
14 select 10, null from dual union all
15 select 11, null from dual union all
16 select 12, null from dual union all
17 select 13, null from dual
18 )
19 select a.da date_answered,
20 count(a.id) nb_answered,
21 (select count(*) from test b
22 where b.da >= a.da
23 or b.da is null
24 ) nb_left
25 from test a
26 group by a.da
27 order by a.da;
DATE_ANSWERED NB_ANSWERED NB_LEFT
------------- ----------- ----------
16092019 5 13
17092019 3 8
19092019 1 5
4 4
SQL>

You want to subtract the overall count from the cumulative count:
select date_answered, count(*) as answered_on_date,
( count(*) over () -
sum(count(*)) over (order by date_answered nulls last)
) as remaining
from t
group by date_answered
order by date_answered;
If you don't want to include the current date, then subtract that as well:
select date_answered, count(*) as answered_on_date,
( count(*) over () -
sum(count(*)) over (order by date_answered nulls last) -
count(*)
) as remaining
from t
group by date_answered
order by date_answered;

How can update a column based on the value of another column in SQL?

Basically I have Product table like this:
date price
--------- -----
02-SEP-14 50
03-SEP-14 60
04-SEP-14 60
05-SEP-14 60
07-SEP-14 71
08-SEP-14 45
09-SEP-14 45
10-SEP-14 24
11-SEP-14 60
I need to update the table in this form
date price id
--------- ----- --
02-SEP-14 50 1
03-SEP-14 60 2
04-SEP-14 60 2
05-SEP-14 60 2
07-SEP-14 71 3
08-SEP-14 45 4
09-SEP-14 45 4
10-SEP-14 24 5
11-SEP-14 60 6
What I have tried:
CREATE SEQUENCE user_id_seq
START WITH 1
INCREMENT BY 1
CACHE 20;
ALTER TABLE Product
ADD (ID number);
UPDATE Product SET ID = user_id_seq.nextval;
This is updating the ID in the usual way like 1,2,3,4,5..
I have no idea how to do it using basic SQL commands. Please suggest how can I make it. Thank you in advance.

Here is one way to create a view from your base data. I assume you have more than one product (identified by product id), and that the price dates aren't necessarily consecutive. The sequence is separate for each product id. (Also, product should be the name of a different table - where the product id is primary key, and you have other information such as product name, category, etc. The table in your post would be more properly called something like price_history.)
alter session set nls_date_format='dd-MON-rr';
create table product ( prod_id number, dt date, price number );
insert into product ( prod_id, dt, price )
select 101, '02-SEP-14', 50 from dual union all
select 101, '03-SEP-14', 60 from dual union all
select 101, '04-SEP-14', 60 from dual union all
select 101, '05-SEP-14', 60 from dual union all
select 101, '07-SEP-14', 71 from dual union all
select 101, '08-SEP-14', 45 from dual union all
select 101, '09-SEP-14', 45 from dual union all
select 101, '10-SEP-14', 24 from dual union all
select 101, '11-SEP-14', 60 from dual union all
select 102, '02-SEP-14', 45 from dual union all
select 102, '04-SEP-14', 45 from dual union all
select 102, '05-SEP-14', 60 from dual union all
select 102, '06-SEP-14', 50 from dual union all
select 102, '09-SEP-14', 60 from dual
;
commit;
create view product_vw ( prod_id, dt, price, seq ) as
select prod_id, dt, price,
count(flag) over (partition by prod_id order by dt)
from ( select prod_id, dt, price,
case when price = lag(price) over (partition by prod_id order by dt)
then null else 1 end as flag
from product
)
;
Now check what the view looks like:
select * from product_vw;
PROD_ID DT PRICE SEQ
------- ------------------- ---------- ----------
101 02/09/0014 00:00:00 50 1
101 03/09/0014 00:00:00 60 2
101 04/09/0014 00:00:00 60 2
101 05/09/0014 00:00:00 60 2
101 07/09/0014 00:00:00 71 3
101 08/09/0014 00:00:00 45 4
101 09/09/0014 00:00:00 45 4
101 10/09/0014 00:00:00 24 5
101 11/09/0014 00:00:00 60 6
102 02/09/0014 00:00:00 45 1
102 04/09/0014 00:00:00 45 1
102 05/09/0014 00:00:00 60 2
102 06/09/0014 00:00:00 50 3
102 09/09/0014 00:00:00 60 4

NOTE: This answers the question that was originally asked. The OP changed the data.
If your data is not too large, you can use a correlated subquery:
update product p
set id = (select count(distinct p2.price)
from product p2
where p2.date <= p.date
);
If your data is larger, then merge is more appropriate.

WITH cts AS
(
SELECT row_number() over (partition by price order by price ) as id
,date
,price
FROM Product
)
UPDATE p
set p.id = cts.id
from product p join cts on cts.id = p.id

This is the best way by which you try to do.
There is no another simple way to do this using simple statements

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL: How to deal with NULL and PARTITION BY? - sql

Related

SQL Implementing Forward Fill logic

How to find the row with the highest value cell based on another column from within a group of values?

How to group sales by month, quarter and year in the same row using case?

SQL - How to remove from the total what has already been counted

How can update a column based on the value of another column in SQL?

Categories

Resources