Assign total value of month to each day of month - sql

DB-Fiddle
CREATE TABLE sales (
id SERIAL PRIMARY KEY,
country VARCHAR(255),
sales_date DATE,
sales_volume DECIMAL,
fix_costs DECIMAL
);
INSERT INTO sales
(country, sales_date, sales_volume, fix_costs
)
VALUES
('DE', '2020-01-03', '500', '0'),
('FR', '2020-01-03', '350', '0'),
('None', '2020-01-31', '0', '2000'),
('DE', '2020-02-15', '0', '0'),
('FR', '2020-02-15', '0', '0'),
('None', '2020-02-29', '0', '5000'),
('DE', '2020-03-27', '180', '0'),
('FR', '2020-03-27', '970', '0'),
('None', '2020-03-31', '0', '4000');
Expected Result:
sales_date | country | sales_volume | fix_costs
--------------|-------------|-------------------|-----------------
2020-01-03 | DE | 500 | 2000
2020-01-03 | FR | 350 | 2000
2020-02-15 | DE | 0 | 5000
2020-02-15 | FR | 0 | 5000
2020-03-27 | DE | 180 | 4000
2020-03-27 | FR | 970 | 4000
As you can see in my table I have a total of fix_costs assigned to the last day of each month.
In my results I want to assign this total of fix_costs to each day of the month.
Therefore, I tried to go with this query:
SELECT
s.sales_date,
s.country,
s.sales_volume,
f.fix_costs
FROM sales s
JOIN
(SELECT
((date_trunc('MONTH', sales_date) + INTERVAL '1 MONTH - 1 DAY')::date) AS month_ld,
SUM(fix_costs) AS fix_costs
FROM sales
WHERE country = 'None'
GROUP BY month_ld) f ON f.month_ld = LAST_DAY(s.sales_date)
WHERE country <> 'None'
GROUP BY 1,2,3;
For this query I get an error on the LAST_DAY(s.sales_date) since this expression does not exist in PostgresSQL.
However, I have no clue how I can replace it correctly in order to get the expected result.
Can you help me?
(MariaDB Fiddle as comparison)

demos:db<>fiddle
SELECT
s1.sales_date,
s1.country,
s1.sales_volume,
s2.fix_costs
FROM sales s1
JOIN sales s2 ON s1.country <> 'None' AND s2.country = 'None'
AND date_trunc('month', s1.sales_date) = date_trunc('month', s2.sales_date)
You need a natural self-join. Join conditions are:
First table without None records (s1.country <> 'None')
Second table only None records (s2.country = 'None')
Date: Only consider year and month part, ignore days. This can be achieved by normalizing the dates of both tables to the first of the month by using date_trunc(). So, e.g. '2020-02-15' results in '2020-02-01' and '2020-02-29' results in '2020-02-01' too, which works well as comparision and join condition.
Alternatively:
SELECT
*
FROM (
SELECT
sales_date,
country,
sales_volume,
SUM(fix_costs) OVER (PARTITION BY date_trunc('month', sales_date)) as fix_costs
FROM sales
) s
WHERE country <> 'None'
You can use the SUM() window function over the group of date_trunc() as described above. Then you need filter the None records afterwards

If I understand correctly, use window functions:
select s.*,
sum(fix_costs) over (partition by date_trunc(sales_date)) as month_fixed_costs
from sales;
Note that this assumes that fixed costs are NULL or 0 on other days -- which is true for the data in the question.

Related

How to use the NOT IN operator (<>) in a GROUP_CONCAT with dates?

I have the calendar table with all the dates of the month of December 2021 (I will only exemplify some dates within the table, but it is understood that it actually contains all the days of said month):
ID
date
01
2021-12-01
02
2021-12-02
03
2021-12-03
04
2021-12-04
05
2021-12-05
I have the users table:
ID
name
num_employee
01
Andrew
101
02
Mary
102
I have the table assistances
ID
date
num_employee
01
2021-12-03
101
02
2021-12-04
101
03
2021-12-03
102
04
2021-12-04
102
05
2021-12-05
101
06
2021-12-06
102
I worked on a query to display the employee number, their name, the days they attended and the days they were absent:
SELECT u.num_employee,
u.name,
a.date AS attendances,
c.date as faults FROM users u
JOIN (SELECT num_employee,
GROUP_CONCAT(DISTINCT EXTRACT(DAY FROM date)) AS date FROM attendances
WHERE date BETWEEN '2021-12-01' AND '2021-12-31'
GROUP BY num_employee) a ON a.not_employee = u.num_employee
LEFT JOIN (SELECT GROUP_CONCAT(DISTINCT EXTRACT(DAY FROM date)) AS date FROM calendar
WHERE date BETWEEN '2021-12-01' AND '2021-12-31') c ON c.date <> a.date
With the above query, I get this:
num_employee
name
assists
faults
101
Andrew
3,4,5
1,2,3,4,5,6,7,8,9,10...
102
Mary
3,4,6
1,2,3,4,5,6,7,8,9,10...
In the attendance column I obtain the days of the month of December in which each employee had attendance, and in the faults I should only obtain the days in which there were absences, but all the days of the month of December are displayed.
I am almost sure that the problem is in how I evaluate that the numbers of the days displayed in the attends column are not displayed in the absences column. Specifically in this part I consider that my evaluation is wrong:
ON c.date <> a.date
I'm under the impression that since I'm working with GROUP_CONCAT, I should evaluate dates differently. How could I adapt my query to get the following?
not_employee
name
attendances
faults
101
Andrew
3,4,5
1,2,3,6,7,8,9,10...
102
Mary
3,4,6
1,2,5,7,8,9,10...
The query in question cannot be adapted to use CTE given the version of MariaDB I am using. I am working on phpMyAdmin.
One solution is a subselect.
This works also in mysql 5, with mysql 8 you could make a CTE from attendense.
CREATE TABLE calendar (
`ID` INTEGER,
`date` VARCHAR(10)
);
INSERT INTO calendar
(`ID`, `date`)
VALUES
('01', '2021-12-01'),
('02', '2021-12-02'),
('03', '2021-12-03'),
('04', '2021-12-04'),
('05', '2021-12-05'),
('06', '2021-12-06'),
('07', '2021-12-07'),
('08', '2021-12-08'),
('09', '2021-12-09'),
('10', '2021-12-10'),
('11', '2021-12-11');
CREATE TABLE users (
`ID` INTEGER,
`name` VARCHAR(6),
`num_employee` INTEGER
);
INSERT INTO users
(`ID`, `name`, `num_employee`)
VALUES
('01', 'Andrew', '101'),
('02', 'Mary', '102');
CREATE TABLE attendances (
`ID` INTEGER,
`date` VARCHAR(10),
`num_employee` INTEGER
);
INSERT INTO attendances
(`ID`, `date`, `num_employee`)
VALUES
('01', '2021-12-03', '101'),
('02', '2021-12-04', '101'),
('03', '2021-12-03', '102'),
('04', '2021-12-04', '102'),
('05', '2021-12-05', '101'),
('06', '2021-12-06', '102');
SELECT u.num_employee,
u.name,
a.date AS attendances,
(SELECT GROUP_CONCAT(DISTINCT EXTRACT(DAY FROM date)) AS date FROM calendar
WHERE date BETWEEN '2021-12-01' AND '2021-12-31'
AND NOT FIND_IN_SET(EXTRACT(DAY FROM date),a.date)) as faults FROM users u
JOIN (SELECT num_employee,
GROUP_CONCAT(DISTINCT EXTRACT(DAY FROM date)) AS date FROM attendances
WHERE date BETWEEN '2021-12-01' AND '2021-12-31'
GROUP BY num_employee) a ON a.num_employee = u.num_employee
num_employee | name | attendances | faults
-----------: | :----- | :---------- | :----------------
101 | Andrew | 3,4,5 | 1,2,6,7,8,9,10,11
102 | Mary | 3,4,6 | 1,2,5,7,8,9,10,11
db<>fiddle here
Consider a cross join of the users and calendar tables for all possible pairwise matches of employees and dates. Then left join to assistance and run aggregate GROUP_CONCAT with one conditional expression for faults:
SELECT u.num_employee,
u.name,
GROUP_CONCAT(DISTINCT EXTRACT(DAY FROM a.date)) AS attendances,
GROUP_CONCAT(DISTINCT
IF(a.date IS NULL, EXTRACT(DAY FROM c.date), NULL)
) AS faults
FROM calendar c
INNER JOIN users u
ON c.date BETWEEN '2021-12-01' AND '2021-12-31'
LEFT JOIN attendances a
ON c.date = a.date
AND u.num_employee = a.num_employee
AND a.date BETWEEN '2021-12-01' AND '2021-12-31'
GROUP BY u.num_employee,
u.name;
Online Demo

Iterating through dates using a 12 months period starting from a certain month

DB-Fiddle
CREATE TABLE customers (
id SERIAL PRIMARY KEY,
order_date DATE,
customer VARCHAR(255)
);
INSERT INTO customers
(order_date, customer)
VALUES
('2020-05-10', 'user_01'),
('2020-05-15', 'user_01'),
('2020-05-18', 'user_02'),
('2020-05-26', 'user_03'),
('2020-06-03', 'user_04'),
('2020-06-05', 'user_05'),
('2020-06-24', 'user_06'),
('2021-05-02', 'user_01'),
('2021-05-05', 'user_01'),
('2021-05-12', 'user_03'),
('2021-05-20', 'user_07'),
('2021-06-08', 'user_04'),
('2021-06-20', 'user_06'),
('2021-06-21', 'user_08'),
('2021-06-25', 'user_08'),
('2021-06-25', 'user_09');
Expected Result:
order_date | customer |
-------------|-------------|----
2021-05-02 | user_01 |
2021-05-05 | user_01 |
2021-05-12 | user_03 |
-------------|-------------|----
2021-06-08 | user_04 |
2021-06-20 | user_06 |
I want to list all customers in a certain month which
a) have not been existing in the past 12 months and
b) already existed before this period of 12 months.
For a single month I am able to achieve this by using this query:
SELECT
c1.order_date,
c1.customer
FROM customers c1
WHERE c1.order_date BETWEEN '2021-05-01 00:00:00' AND '2021-05-31 23:59:59'
/* Check if customer exists in the past 12 months */
AND NOT EXISTS
(SELECT
c2.customer
FROM customers c2
WHERE c2.order_date BETWEEN '2020-06-01 00:00:00' AND '2021-04-30 23:59:59'
AND c2.customer = c1.customer)
/* Check if customer exists before the past 12 months */
AND EXISTS
(SELECT
c2.customer
FROM customers c2
WHERE c2.order_date < '2020-06-01 00:00:00'
AND c2.customer = c1.customer)
ORDER BY 2;
However, I have to run this query for each month seperatly.
Therefore, I am wondering if there is an iterating solution that goes through multiple months at once.
In the example above it would run BETWEEN '2021-05-01 00:00:00' AND '2021-06-30 23:59:59' and calculate 12 months back from May and in the next step 12 months back from June to get the expected result.
Do you have any idea if this is possible?
SELECT
t1.order_date,
t1.customer
FROM
(SELECT
c.customer,
c.order_date,
LAG(c.order_date) OVER (PARTITION BY c.customer ORDER BY c.order_date) AS prev_order_date
FROM customers c) t1
WHERE t1.order_date BETWEEN '2021-05-01' AND '2021-06-30'
AND t1.prev_order_date >= date_trunc('month', t1.order_date) - interval '12 month'
AND t1.prev_order_date < date_trunc('month', t1.order_date)
ORDER BY 1,2;
DB-Fiddle

Iterating through users and check if they exist in the past 12 months

DB-Fiddle
CREATE TABLE customers (
id SERIAL PRIMARY KEY,
order_date DATE,
customer VARCHAR(255)
);
INSERT INTO customers
(order_date, customer)
VALUES
('2020-04-10', 'user_01'),
('2020-05-15', 'user_01'),
('2020-09-08', 'user_02'),
('2020-11-23', 'user_03'),
('2020-01-03', 'user_04'),
('2020-06-03', 'user_04'),
('2020-06-03', 'user_04'),
('2020-07-01', 'user_05'),
('2020-09-24', 'user_06'),
('2021-05-02', 'user_01'),
('2021-05-05', 'user_02'),
('2021-05-12', 'user_03'),
('2021-05-19', 'user_03'),
('2021-05-20', 'user_07'),
('2021-06-08', 'user_04'),
('2021-06-20', 'user_05'),
('2021-06-21', 'user_05'),
('2021-06-25', 'user_08');
Expected Result:
order_date | customer |
-------------|-------------|----
2021-05-05 | user_02 |
2021-05-12 | user_03 |
2021-05-19 | user_03 |
-------------|-------------|----
2021-06-20 | user_05 |
2021-06-21 | user_05 |
I want to list all customers in a certain month which
a) exist in the past 12 months and
b) also exist in the current month.
For a single month I am able to achieve this by using this query:
SELECT
c1.order_date,
c1.customer
FROM customers c1
WHERE c1.order_date BETWEEN '2021-05-01 00:00:00' AND '2021-05-31 23:59:59'
AND EXISTS
(SELECT
c2.customer
FROM customers c2
WHERE c2.order_date BETWEEN '2020-06-01 00:00:00' AND '2021-04-30 23:59:59'
AND c2.customer = c1.customer)
ORDER BY 2;
However, I have to run this query for each month seperatly.
Therefore, I am wondering if there is an iterating solution that goes through multiple months at once.
In the example above it would run BETWEEN '2021-05-01 00:00:00' AND '2021-06-30 23:59:59' and calculate 12 months back from May and in the next step 12 months back from June to get the expected result.
Do you have any idea if this is possible?
Use LAG() to check if there is an order in the previous 12 months:
SELECT c.order_date, c.customer
FROM (SELECT c.*,
LAG(order_date) OVER (PARTITION BY customer) as prev_order_date
FROM customers c
) c
WHERE c.order_date >= '2021-05-01' AND c.order_date < '2021-06-30' AND
c.prev_order_date >= c.order_date - INTERVAL '12 month';
Here is a db<>fiddle.
Note that I fixed the date comparisons so you are not fiddling with seconds when defining a month timeframe.

Split monthly fix value to days and countries in Redshift

DB-Fiddle
CREATE TABLE sales (
id SERIAL PRIMARY KEY,
country VARCHAR(255),
sales_date DATE,
sales_volume DECIMAL,
fix_costs DECIMAL
);
INSERT INTO sales
(country, sales_date, sales_volume, fix_costs
)
VALUES
('DE', '2020-01-03', '500', '2000'),
('NL', '2020-01-03', '320', '2000'),
('FR', '2020-01-03', '350', '2000'),
('None', '2020-01-31', '0', '2000'),
('DE', '2020-02-15', '0', '5000'),
('NL', '2020-02-15', '0', '5000'),
('FR', '2020-02-15', '0', '5000'),
('None', '2020-02-29', '0', '5000'),
('DE', '2020-03-27', '180', '4000'),
('NL', '2020-03-27', '670', '4000'),
('FR', '2020-03-27', '970', '4000'),
('None', '2020-03-31', '0', '4000');
Expected Result:
sales_date | country | sales_volume | used_fix_costs
-------------|--------------|------------------|------------------------------------------
2020-01-03 | DE | 500 | 37.95 (= 2000/31 = 64.5 x 0.59)
2020-01-03 | FR | 350 | 26.57 (= 2000/31 = 64.5 x 0.41)
2020-01-03 | NL | 320 | 0.00
-------------|--------------|------------------|------------------------------------------
2020-02-15 | DE | 0 | 86.21 (= 5000/28 = 172.4 x 0.50)
2020-02-15 | FR | 0 | 86.21 (= 5000/28 = 172.4 x 0.50)
2020-02-15 | NL | 0 | 0.00
-------------|--------------|------------------|------------------------------------------
2020-03-27 | DE | 180 | 20.20 (= 4000/31 = 129.0 x 0.16)
2020-03-27 | FR | 970 | 108.84 (= 4000/31 = 129.0 x 0.84)
2020-03-27 | NL | 670 | 0.00
-------------|--------------|------------------|-------------------------------------------
The column used_fix_costs in the expected result is calculated as the following:
Step 1) Exclude country NL from the next steps but it should still appear with value 0 in the results.
Step 2) Get the daily rate of the fix_costs per month.(2000/31 = 64.5; 5000/29 = 172.4; 4000/31 = 129.0)
Step 3) Split the daily value to the countries DE and FR based on their share in the sales_volume. (500/850 = 0.59; 350/850 = 0.41; 180/1150 = 0.16; 970/1150 = 0.84)
Step 4) In case the sales_volume is 0 the daily rate gets split 50/50 to DE and FR as you can see for 2020-02-15.
I am currently using this query to get the expected results:
SELECT
s.sales_date,
s.country,
s.sales_volume,
s.fix_costs,
(CASE WHEN country = 'NL' THEN 0
/* Exclude NL from fixed_costs calculation */
WHEN SUM(CASE WHEN country <> 'NL' THEN sales_volume ELSE 0 END) OVER (PARTITION BY sales_date) > 0
THEN ((s.fix_costs/ extract(day FROM (date_trunc('month', sales_date + INTERVAL '1 month') - INTERVAL '1 day'))) *
sales_volume /
NULLIF(SUM(s.sales_volume) FILTER (WHERE s.country != 'NL') OVER (PARTITION BY s.sales_date), 0)
)
/* Divide fixed_cots equaly among countries in case of no sale*/
ELSE (s.fix_costs / extract(day FROM (date_trunc('month', sales_date + INTERVAL '1 month') - INTERVAL '1 day')))
/ SUM(CASE WHEN country <> 'NL' THEN 1 ELSE 0 END) OVER (PARTITION by sales_date)
END) AS imputed_fix_costs
FROM sales s
WHERE country NOT IN ('None')
GROUP BY 1,2,3,4
ORDER BY 1;
This query works in the DB-Fiddle.
However, when I run it on Amazon Redshift I get this error message for the line
FILTER (WHERE pl.sales_Channel NOT IN ('Marketplace','B2B')).
Do you have any idea how I can replace/adjust this part of the query to also make it work in Amazon Redshift?
If I understand correctly, you want to define apportioned fixed costs per day for all countries other than NL:
select s.*,
(case when country = 'NL' then 0
when sum(sales_volume) over (partition by sales_date) = 0
then (fix_costs / datepart(day, last_day(sales_date))) * 1.0 / sum(case when country <> 'NL' then 1 else 0 end) over (partition by sales_date)
else (fix_costs / datepart(day, last_day(sales_date))) * (sales_volume / sum(case when country <> 'NL' then sales_volume end) over (partition by sales_date))
end) as apportioned_fix_costs
from sales s
where country <> 'None';
Note: You don't seem to want None in your results so that is just filtered out. Then the rest of the data all seems to be on one data in the month. If it can actually be on multiple data, use date_trunc() in the partition by clause.
For reference, Postgres doesn't support last_day(). You can use the expression:
select extract(day from date_trunc('month', sales_date) + interval '1 month' - interval '1 day')
DB-Fiddle

Group By first day of month and join with a separate table

I have 2 tables in SQL
one with monthly sales targets:
Date Target
1/7/17 50000
1/8/17 65000
1/9/17 50000
1/10/17 48000
etc...
the other with sales orders:
TxnDate JobNum Value
3/7/17 100001 20000
3/7/17 100002 11000
8/7/17 100003 10000
10/8/17 100004 15000
15/9/17 100005 20000
etc...
what I want is a table with following:
Date Target Sales
1/7/17 50000 41000
1/8/17 65000 15000
1/9/17 50000 20000
please help me I'm a newbie to coding and this is doing my head in.. :)
Assuming your 1st table is targetSales and your 2nd table is Sales and your database is SQL Server:
select
t.date
, t.target
, isnull(sum(s.value), 0) as Sales
from targetSales t
left join Sales s
on (month(t.date) = month(s.date)
and year(t.date) = year(s.date))
group by t.date
, t.target
You can follow a similar approach if you use a different database, just find the equivalents of month() and year() functions for your RDBMS.
try this
select tb1.date,tb1.target,tb2.value from table1 as tb1
INNER JOIN (select sum(value) as sales, date from table2 group by date) as tb2
on tb1.date = tb2.date,
you can use this script for daily targets
An another way around, looks like in target table the date is always the first day of the month. So in the sales table, just round the TxnDate column value to first day of the month.
Query
select t1.[date],
max(t1.[target]) as [target],
coalesce(sum(t2.[value]), 0) as [value]
from [targets] t1
left join [sales] t2
on t1.[Date] = dateadd(day, - datepart(day, t2.[txnDate]) + 1, t2.[txnDate])
group by t1.[Date];
demo
If you take any datetime value in SQL Server, calculate the number of months from that date to zero datediff(month,0,TxnDate) then add that number of moths to zero dateadd(month, ... , 0) you get the first day of the month for the original datetime value. This works in all versions of SQL Server. With this we can sum the values of the orders by the first day of the month, then join to targets using that date.
CREATE TABLE Orders
([TxnDate] datetime, [JobNum] int, [Value] int)
;
INSERT INTO Orders
([TxnDate], [JobNum], [Value])
VALUES
('2017-07-03 00:00:00', 100001, 20000),
('2017-07-03 00:00:00', 100002, 11000),
('2017-07-08 00:00:00', 100003, 10000),
('2017-08-10 00:00:00', 100004, 15000),
('2017-09-15 00:00:00', 100005, 20000)
;
CREATE TABLE Targets
([Date] datetime, [Target] int)
;
INSERT INTO Targets
([Date], [Target])
VALUES
('2017-07-01 00:00:00', 50000),
('2017-08-01 00:00:00', 65000),
('2017-09-01 00:00:00', 50000),
('2017-10-10 00:00:00', 48000)
;
GO
9 rows affected
select dateadd(month,datediff(month,0,TxnDate), 0) month_start, sum(Value) SumValue
from Orders
group by dateadd(month, datediff(month,0,TxnDate), 0)
GO
month_start | SumValue
:------------------ | -------:
01/07/2017 00:00:00 | 41000
01/08/2017 00:00:00 | 15000
01/09/2017 00:00:00 | 20000
select
t.[Date], t.Target, coalesce(o.SumValue,0)
from targets t
left join (
select dateadd(month,datediff(month,0,TxnDate), 0) month_start, sum(Value) SumValue
from Orders
group by dateadd(month, datediff(month,0,TxnDate), 0)
) o on t.[Date] = o.month_start
GO
Date | Target | (No column name)
:------------------ | -----: | ---------------:
01/07/2017 00:00:00 | 50000 | 41000
01/08/2017 00:00:00 | 65000 | 15000
01/09/2017 00:00:00 | 50000 | 20000
10/10/2017 00:00:00 | 48000 | 0
dbfiddle here
This is not the best solution but this will give you a correct result.
select date,target,(
select sum(value)
from sales_orders s
where datepart(m,s.TxnDate) = datepart(m,targets.Date)
and datepart(year,s.TxnDate) = datepart(year,targets.Date)
) as sales
from targets