Group list of dates based on values from joined table - sql

DB-Fiddle
CREATE TABLE sales (
id SERIAL PRIMARY KEY,
event_date DATE,
country VARCHAR,
channel VARCHAR,
sales DECIMAL
);
INSERT INTO sales
(event_date, country, channel, sales)
VALUES
('2020-01-04', 'DE', 'channel_01', '500'),
('2020-01-04', 'FR', 'channel_01', '900'),
('2020-01-04', 'NL', 'channel_01', '100'),
('2020-02-20', 'DE', 'channel_01', '0'),
('2020-02-20', 'FR', 'channel_01', '0'),
('2020-02-20', 'NL', 'channel_01', '0'),
('2020-03-15', 'DE', 'channel_01', '700'),
('2020-03-15', 'FR', 'channel_01', '500'),
('2020-03-15', 'NL', 'channel_03', '300');
/* Table Dates */
CREATE TABLE dates (
id SERIAL PRIMARY KEY,
date DATE
);
INSERT INTO dates
(date)
SELECT generate_series ('2020-01-01'::date, '2020-12-31'::date, interval '1 day');
Expected Result:
date_list | country
--------------|--------------------------
2020-01-01 | DE
2020-01-01 | FR
2020-01-01 | NL
--------------|---------------------------
2020-01-02 | DE
2020-01-02 | FR
2020-01-02 | NL
--------------|---------------------------
: | :
: | :
: | :
--------------|--------------------------
2020-12-29 | DE
2020-12-30 | NL
2020-12-31 | FR
I want to list all dates from table dates and group them by all countries that are available in table sales no matter if the date exist in both tables. So far I have developed this query:
SELECT
d.date AS date_list,
t2.country
FROM dates d
LEFT JOIN
(SELECT
s.event_date,
s.country,
s.sales
FROM sales s
GROUP BY 1,2,3
ORDER BY 1,2) t2 ON t2.event_date = d.date
GROUP BY 1,2
ORDER BY 1,2;
However, it only groups the results by country if the s.event_date matches the d.date.
How do I have to modify the query to get the expected result?

I am not sure, if I understand your requirements right, but seems it is about CROSS JOIN
SELECT D.DATE,X.COUNTRY
FROM DATES AS D
CROSS JOIN
(
SELECT DISTINCT COUNTRY FROM SALES
)X

Related

Compare values between two tables with over partition criteria

DB-Fiddle
/* Table Campaigns */
CREATE TABLE campaigns (
id SERIAL PRIMARY KEY,
insert_time DATE,
campaign VARCHAR,
tranches VARCHAR,
quantity DECIMAL);
INSERT INTO campaigns
(insert_time, campaign, tranches, quantity)
VALUES
('2021-01-01', 'C001', 't', '500'),
('2021-01-01', 'C002', 't', '600'),
('2021-01-02', 'C001', 't', '500'),
('2021-01-02', 'C002', 't', '600');
/* Table Tranches */
CREATE TABLE tranches (
id SERIAL PRIMARY KEY,
insert_time DATE,
campaign VARCHAR,
tranches VARCHAR,
quantity DECIMAL);
INSERT INTO tranches
(insert_time, campaign, tranches, quantity)
VALUES
('2021-01-01', 'C001', 't1', '200'),
('2021-01-01', 'C001', 't2', '120'),
('2021-01-01', 'C001', 't3', '180'),
('2021-01-01','C002', 't1', '350'),
('2021-01-01','C002', 't2', '250'),
('2021-01-02', 'C001', 't1', '400'),
('2021-01-02', 'C001', 't2', '120'),
('2021-01-02', 'C001', 't3', '180'),
('2021-01-02','C002', 't1', '350'),
('2021-01-02','C002', 't2', '250');
Expected Result:
insert_time | campaign | tranches | quantity_campaigns | quantity_tranches | check
--------------|------------|------------|---------------------|---------------------|-----------
2021-01-01 | C001 | t | 500 | 500 | ok
2021-01-01 | C002 | t | 600 | 600 | ok
--------------|------------|------------|---------------------|---------------------|------------
2021-01-02 | C001 | t | 500 | 700 | error
2021-01-02 | C002 | t | 600 | 500 | ok
I want to compare the total quantity per campaign in table campaigns with the total quantity per campaign in table tranches.
So far I have been able to develop this query:
SELECT
c.insert_time AS insert_time,
c.campaign AS campaign,
c.tranches AS tranches,
c.quantity AS quantity_campaigns,
t.quantity AS quantity_tranches,
(CASE WHEN
MAX(c.quantity) OVER(PARTITION BY c.insert_time, c.campaign) = SUM(t.quantity) OVER(PARTITION BY t.insert_time, t.campaign)
THEN 'ok' ELSE 'error' END) AS check
FROM campaigns c
LEFT JOIN tranches t ON c.campaign = t.campaign
ORDER BY 1,2,3,4,5;
However, it does not give me the expected result?
What do I need to change to make it work?
I think the result you're looking for should be something like this. The problem is that you're trying to aggregate over two groupings after a join which will either yield too many results or incorrect calculations. By aggregating in CTE, and then joining the CTEs after aggregation has occurred you can achieve the results you are looking for. See my example below:
WITH campaign_agg AS(
SELECT c.insert_time, c.campaign, c.tranches, MAX(c.quantity) c_quantity
FROM campaigns c
GROUP BY c.insert_time, c.campaign, c.tranches
), tranch_agg AS(
SELECT t.insert_time, t.campaign, SUM(t.quantity) as t_sum
FROM tranches t
GROUP BY t.insert_time, t.campaign
)
SELECT c.insert_time, c.campaign, c.tranches, c.c_quantity, t.t_sum,
CASE WHEN c.c_quantity = t.t_sum THEN 'ok' ELSE 'error' END as check
FROM campaign_agg c
JOIN
tranch_agg t ON
t.insert_time = c.insert_time
AND t.campaign = c.campaign
ORDER BY c.insert_time, c.campaign
I have a db-fiddle for this as well: https://www.db-fiddle.com/f/33x4upVEcgTMNehiHCKzfN/1
DB-Fiddle
SELECT
c.insert_time AS insert_time,
c.campaign AS campaign,
c.tranches AS tranches,
SUM(c.quantity) AS quantity_campaigns,
SUM(t1.quantity) AS quantity_tranches,
(CASE WHEN SUM(c.quantity) <> SUM(t1.quantity) THEN 'error' ELSE 'ok' END) AS check
FROM campaigns c
LEFT JOIN
(SELECT
t.insert_time AS insert_time,
t.campaign AS campaign,
SUM(t.quantity) AS quantity
FROM tranches t
GROUP BY 1,2
ORDER BY 1,2) t1 on t1.insert_time = c.insert_time AND t1.campaign = c.campaign
GROUP BY 1,2,3
ORDER BY 1,2,3;

What SQL query can be used to limit continious periods by parameter value, and then to calculate datediff inside them?

I have a table of phone calls consisting of user_id, call_date, city,
where city can be either A or B.
It looks like this:
user_id
call_date
city
1
2021-01-01
A
1
2021-01-02
B
1
2021-01-03
B
1
2021-01-05
B
1
2021-01-10
A
1
2021-01-12
B
1
2021-01-16
A
2
2021-01-17
A
2
2021-01-20
B
2
2021-01-22
B
2
2021-01-23
A
2
2021-01-24
B
2
2021-01-26
B
2
2021-01-30
A
For this table, we need to select for each user all the periods when he was in city B.
These periods are counted in days and start when the first call is made from city B, and end as soon as the next call is made from city A.
So for user_id = 1 fist period starts on 2021-01-02 and ands on 2021-01-10. There can be several such periods for each user.
The result should be the following table:
user_id
period_1
period_2
1
8
4
2
3
6
Can you please tell me how I can limit the periods according to the condition of the problem, and then calculate the datediff within each period?
Thank you
This is a typical gaps and islands problem. You need to group consecutive rows first, then find the first call_date of the next group. Sample code for Postgres is below, the same may be adapted to another DBMS by applying appropriate function to calculate the difference in days.
with a (user_id, call_date, city)
as (
select *
from ( values
('1', date '2021-01-01', 'A'),
('1', date '2021-01-02', 'B'),
('1', date '2021-01-03', 'B'),
('1', date '2021-01-05', 'B'),
('1', date '2021-01-10', 'A'),
('1', date '2021-01-12', 'B'),
('1', date '2021-01-16', 'A'),
('2', date '2021-01-17', 'A'),
('2', date '2021-01-20', 'B'),
('2', date '2021-01-22', 'B'),
('2', date '2021-01-23', 'A'),
('2', date '2021-01-24', 'B'),
('2', date '2021-01-26', 'B'),
('2', date '2021-01-30', 'A')
) as t
)
, grp as (
/*Identify groups*/
select a.*,
/*This is a grouping of consecutive rows:
they will have the same difference between
two row_numbers while the more detailed
row_number changes, which means the attribute had changed.
*/
dense_rank() over(
partition by user_id
order by call_date asc
) -
dense_rank() over(
partition by user_id, city
order by call_date asc
) as grp,
/*Get next call date*/
lead(call_date, 1, call_date)
over(
partition by user_id
order by call_date asc
) as next_dt
from a
)
select
user_id,
city,
min(call_date) as dt_from,
max(next_dt) as dt_to,
max(next_dt) - min(call_date) as diff
from grp
where city = 'B'
group by user_id, grp, city
order by 1, 3
user_id | city | dt_from | dt_to | diff
:------ | :--- | :--------- | :--------- | ---:
1 | B | 2021-01-02 | 2021-01-10 | 8
1 | B | 2021-01-12 | 2021-01-16 | 4
2 | B | 2021-01-20 | 2021-01-23 | 3
2 | B | 2021-01-24 | 2021-01-30 | 6
db<>fiddle here

Split fix value to countries based on pre-defined share per country

DB-Fiddle
CREATE TABLE costs (
id SERIAL PRIMARY KEY,
event_date DATE,
country VARCHAR,
channel VARCHAR,
costs DECIMAL
);
INSERT INTO costs
(event_date, country, channel, costs)
VALUES
('2020-02-08', 'DE', 'channel_01', '400'),
('2020-02-08', 'DE', 'channel_02', '400'),
('2020-02-08', 'DE', 'channel_03', '400'),
('2020-02-08', 'FR', 'channel_01', '400'),
('2020-02-08', 'FR', 'channel_02', '400'),
('2020-02-08', 'NL', 'channel_01', '400'),
('2020-04-15', 'DE', 'channel_01', '300'),
('2020-04-15', 'FR', 'channel_01', '300'),
('2020-04-15', 'NL', 'channel_01', '300'),
('2020-04-15', 'NL', 'channel_02', '300'),
('2020-04-15', 'NL', 'channel_03', '300');
Expected Result:
event_date | country | costs |
--------------|--------------|-----------------------------|---------
2020-02-08 | DE | 240 (=400 x 0.6) |
2020-02-08 | FR | 120 (=400 x 0.3) |
2020-02-08 | NL | 40 (=400 x 0.1) |
--------------|--------------|-----------------------------|---------
2020-04-15 | DE | 180 (=300 x 0.6) |
2020-04-15 | FR | 90 (=300 x 0.3) |
2020-04-15 | NL | 30 (=300 x 0.1) |
I want to split the costs based on pre-defined shares to each country per day.
The shares are: DE = 0.6, FR = 0.3, NL = 0.1
SELECT
c.event_date,
c.country,
c.costs
FROM costs c
GROUP BY 1,2,3
ORDER BY 1,2;
Do you have an idea what query I need to achieve this?
Just use a CASE expression:
SELECT c.event_date,
c.country,
(CASE WHEN c.country = 'DE' THEN 0.6 * c.costs
WHEN c.country = 'FR' THEN 0.3 * c.costs
WHEN c.country = 'NL' THEN 0.1 * c.costs
END) as allocated_costs
FROM costs c
GROUP BY c.event_date, c.country, c.costs
ORDER BY 1, 2;
You can more conveniently store the values in a derived table, if you prefer:
SELECT c.event_date, c.country, (v.alloc * c.costs) as allocated_costs
FROM costs c JOIN
(VALUES ('DE', 0.6), ('FR', 0.3), ('NL', 0.1)
) v(country, alloc)
USING (country)
GROUP BY c.event_date, c.country, c.costs, v.alloc
ORDER BY 1, 2;
Here is a db<>fiddle.
the best way would be to add those rates into a table and use that table in your query.
to illustrate how you can do it with hard-coded values in your query:
SELECT
c.event_date,
c.country,
c.costs * case country when 'DE' then 0.6
when 'FR' then 0.3
when 'NL' then 0.1
end
FROM costs c
GROUP BY 1,2,3
ORDER BY 1,2;

Calculate share of value per day per country

DB-Fiddle
CREATE TABLE sales (
id SERIAL PRIMARY KEY,
event_date DATE,
country VARCHAR,
channel VARCHAR,
sales DECIMAL
);
INSERT INTO sales
(event_date, country, channel, sales)
VALUES
('2020-02-08', 'DE', 'channel_01', '500'),
('2020-02-08', 'DE', 'channel_02', '400'),
('2020-02-08', 'DE', 'channel_03', '200'),
('2020-02-08', 'FR', 'channel_01', '900'),
('2020-02-08', 'FR', 'channel_02', '800'),
('2020-02-08', 'NL', 'channel_01', '100'),
('2020-04-15', 'DE', 'channel_01', '700'),
('2020-04-15', 'FR', 'channel_01', '500'),
('2020-04-15', 'NL', 'channel_01', '850'),
('2020-04-15', 'NL', 'channel_02', '250'),
('2020-04-15', 'NL', 'channel_03', '300');
Expected Result:
event_date | country | share_per_day_per_country
------------|-------------|----------------------------------------------
2020-02-08 | DE | 0.379 (=1100/2900)
2020-02-08 | FR | 0.586 (=1700/2900)
2020-02-08 | NL | 0.034 (=100/2900)
------------|-------------|----------------------------------------------
2020-04-15 | DE | 0.269 (=700/2600)
2020-04-15 | FR | 0.192 (=500/2600)
2020-04-15 | NL | 0.538 (=1400/2600)
I want to calculate the sales share per country per day as it is done in the question here.
However, since I added the column channel in the database I am not getting the right shares anymore using this query:
SELECT
s.event_date,
s.country,
s.sales,
s.sales/SUM(s.sales) OVER (PARTITION BY s.event_date) AS share_per_day_per_country
FROM sales s
GROUP BY 1,2,3
ORDER BY 1,2;
How do I need to modify this query to get the expected results?
here is one way:
select distinct
s.event_date,
s.country,
SUM(s.sales) OVER (PARTITION BY s.event_date, s.country) / SUM(s.sales) OVER (PARTITION BY s.event_date) AS share_per_day
from
sales s
ORDER BY
s.event_date,
s.country;
or
SELECT
s.event_date,
s.country,
sum(sales) / max(share_per_day) share_per_day_per_country
from
(
select *,SUM(s.sales) OVER (PARTITION BY s.event_date) AS share_per_day
from sales s
) s
GROUP BY
s.event_date,s.country
ORDER BY
s.event_date,s.country;

Calculate share of value per date and country and handle zero values separately

DB-Fiddle
CREATE TABLE sales (
id SERIAL PRIMARY KEY,
event_date DATE,
country VARCHAR,
channel VARCHAR,
sales DECIMAL
);
INSERT INTO sales
(event_date, country, channel, sales)
VALUES
('2020-02-08', 'DE', 'channel_01', '500'),
('2020-02-08', 'DE', 'channel_02', '400'),
('2020-02-08', 'DE', 'channel_03', '200'),
('2020-02-08', 'FR', 'channel_01', '900'),
('2020-02-08', 'FR', 'channel_02', '800'),
('2020-02-08', 'NL', 'channel_01', '100'),
('2020-03-20', 'DE', 'channel_01', '0'),
('2020-03-20', 'FR', 'channel_01', '0'),
('2020-03-20', 'FR', 'channel_02', '0'),
('2020-03-20', 'FR', 'channel_03', '0'),
('2020-03-20', 'NL', 'channel_01', '0'),
('2020-04-15', 'DE', 'channel_01', '700'),
('2020-04-15', 'FR', 'channel_01', '500'),
('2020-04-15', 'NL', 'channel_01', '850'),
('2020-04-15', 'NL', 'channel_02', '250'),
('2020-04-15', 'NL', 'channel_03', '300');
Expected Result:
event_date | country | share_per_day_per_country | details of share calculation
------------|-----------|----------------------------|--------------------------------------------
2020-02-08 | DE | 0.379 | = (500+400+200) / (500+400+200+900+800+100)
2020-02-08 | FR | 0.586 | = (900+800) / (500+400+200+900+800+100)
2020-02-08 | NL | 0.034 | = (100) / (500+400+200+900+800+100)
------------|-----------|----------------------------|--------------------------------------------
2020-03-20 | DE | 0.333 | = equal split in case of 0 sales
2020-03-20 | FR | 0.333 | = equal split in case of 0 sales
2020-03-20 | NL | 0.333 | = equal split in case of 0 sales
------------|-----------|----------------------------|--------------------------------------------
2020-04-15 | DE | 0.269 | = (700) / (700+500+850+250+300)
2020-04-15 | FR | 0.192 | = (500) / (700+500+850+250+300)
2020-04-15 | NL | 0.538 | = (850+250+300) / (700+500+850+250+300)
In the expected result I want to
calculate the share of the sales per country per day
in case there is a day with no sales the share should be divided equaly to the number of countries.
In order to achieve this so far I have developed this query:
SELECT
t1.event_date,
t1.country,
t1.sales,
t1.total_sales_per_country,
t1.total_sales_per_day,
(CASE WHEN SUM(t1.sales) OVER (PARTITION BY t1.event_date) = 0 THEN
100/(COUNT(t1.country) OVER (PARTITION BY t1.event_date))/100::decimal
ELSE t1.total_sales_per_country / t1.total_sales_per_day END) AS share_per_day_per_country
FROM
(SELECT
s.event_date,
s.country,
s.sales,
SUM(s.sales) OVER (PARTITION BY s.event_date) AS total_sales_per_day,
SUM(s.sales) OVER (PARTITION BY s.event_date, s.country) AS total_sales_per_country
FROM sales s
GROUP BY 1,2,3
ORDER BY 1,2) t1
GROUP BY 1,2,3,4,5
ORDER BY 1,2
This query almost gives me the correct result.
However, instead of listing each event_date only one time it lists them multiple times.
I have tried a few ways (e.g. DSTINCT pl.event_date) to fix this issue but none of them worked.
How do I have to modify the query to get the entire expected result?
Remove t1.sales and t1.total_sales_per_country from the select list. Now use distinct instead of group by.
CREATE TABLE sales (
id SERIAL PRIMARY KEY,
event_date DATE,
country VARCHAR,
channel VARCHAR,
sales DECIMAL
);
INSERT INTO sales
(event_date, country, channel, sales)
VALUES
('2020-02-08', 'DE', 'channel_01', '500'),
('2020-02-08', 'DE', 'channel_02', '400'),
('2020-02-08', 'DE', 'channel_03', '200'),
('2020-02-08', 'FR', 'channel_01', '900'),
('2020-02-08', 'FR', 'channel_02', '800'),
('2020-02-08', 'NL', 'channel_01', '100'),
('2020-03-20', 'DE', 'channel_01', '0'),
('2020-03-20', 'FR', 'channel_01', '0'),
('2020-03-20', 'FR', 'channel_02', '0'),
('2020-03-20', 'FR', 'channel_03', '0'),
('2020-03-20', 'NL', 'channel_01', '0'),
('2020-04-15', 'DE', 'channel_01', '700'),
('2020-04-15', 'FR', 'channel_01', '500'),
('2020-04-15', 'NL', 'channel_01', '850'),
('2020-04-15', 'NL', 'channel_02', '250'),
('2020-04-15', 'NL', 'channel_03', '300');
Query:
SELECT
distinct t1.event_date,
t1.country,
t1.total_sales_per_day,
(CASE WHEN SUM(t1.sales) OVER (PARTITION BY t1.event_date) = 0 THEN
100/(COUNT(t1.country) OVER (PARTITION BY t1.event_date))/100::decimal
ELSE t1.total_sales_per_country / t1.total_sales_per_day END) AS share_per_day_per_country
FROM
(SELECT
s.event_date,
s.country,
s.sales,
SUM(s.sales) OVER (PARTITION BY s.event_date) AS total_sales_per_day,
SUM(s.sales) OVER (PARTITION BY s.event_date, s.country) AS total_sales_per_country
FROM sales s
GROUP BY 1,2,3
ORDER BY 1,2) t1
ORDER BY 1,2
Output:
event_date
country
total_sales_per_day
share_per_day_per_country
2020-02-08
DE
2900
0.37931034482758620690
2020-02-08
FR
2900
0.58620689655172413793
2020-02-08
NL
2900
0.03448275862068965517
2020-03-20
DE
0
0.33000000000000000000
2020-03-20
FR
0
0.33000000000000000000
2020-03-20
NL
0
0.33000000000000000000
2020-04-15
DE
2600
0.26923076923076923077
2020-04-15
FR
2600
0.19230769230769230769
2020-04-15
NL
2600
0.53846153846153846154
db<>fiddle here