Avoid cartesian product using sum - sql

I want to sum up the stake from tickets table, grouping it by customer_id and date_trunc('day') from bonus table.
The problem is that rows are being multiplied and I don't know how to solve it.
https://www.db-fiddle.com/f/yWCvFamMAY9uGtoZupiAQ/4
CREATE TABLE tickets (
ticket_id integer,
customer_id integer,
stake integer,
reg_date date
);
CREATE TABLE bonus (
bonus_id integer,
customer_id integer,
reg_date date
);
insert into tickets
values
(1,100, 12,'2019-01-10 11:00'),
(2,100, 10,'2019-01-10 12:00'),
(3,100, 30,'2019-01-10 13:00'),
(4,100, 10,'2019-01-11 14:00'),
(5,100, 15,'2019-01-11 15:00'),
(6,102, 25,'2019-01-10 10:00'),
(7,102, 25,'2019-01-10 11:10'),
(8,102, 13,'2019-01-11 12:40'),
(9,102, 9,'2019-01-12 15:00'),
(10,102, 7,'2019-01-13 18:00'),
(13,103, 15,'2019-01-12 19:00'),
(14,103, 11,'2019-01-12 22:00'),
(15,103, 11,'2019-01-14 02:00'),
(16,103, 11,'2019-01-14 10:00')
;
insert into bonus
values
(200,100,'2019-01-10 05:00'),
(201,100,'2019-01-10 06:00'),
(202,100,'2019-01-10 15:00'),
(203,100,'2019-01-10 15:50'),
(204,100,'2019-01-10 16:10'),
(205,100,'2019-01-10 16:15'),
(206,100,'2019-01-10 16:22'),
(207,100,'2019-01-11 10:10'),
(208,100,'2019-01-11 16:10'),
(209,102,'2019-01-10 10:00'),
(210,102,'2019-01-10 11:00'),
(211,102,'2019-01-10 12:00'),
(212,102,'2019-01-10 13:00'),
(213,103,'2019-01-11 11:00'),
(214,103,'2019-01-11 18:00'),
(215,103,'2019-01-12 15:00'),
(216,103,'2019-01-12 16:00'),
(217,103,'2019-01-14 02:00')
select
customer_id,
date_trunc('day', b.reg_date),
sum(t.stake)
from tickets t
join bonus b using (customer_id)
where date_trunc('day', b.reg_date) = date_trunc('day', t.reg_date)
group by 1,2
order by 1
Output for customer 102 should be:
102,2019-01-10, 50

OK, I think you want to get the summary data of column stake in tickets table and the records's customer_id, reg_date pairs have appeared in the second table bonus, and all business has nothing to do with the bonus_id, am I right? The customer_id, reg_date pairs in bonus is duplicated, so you need a distinct on it, and then join the sum data from tickets.The complete SQL and result as below:
with stake_sum as (
select
customer_id,
reg_date,
sum(stake)
from
tickets
group by
customer_id,
reg_date
)
,bonus_date_distinct as (
select
distinct customer_id,
reg_date
from
bonus
)
select
a.*
from
stake_sum a
join
bonus_date_distinct b on a.customer_id = b.customer_id and a.reg_date = b.reg_date order by customer_id, reg_date;
customer_id | reg_date | sum
-------------+------------+-----
100 | 2019-01-10 | 52
100 | 2019-01-11 | 25
102 | 2019-01-10 | 50
103 | 2019-01-12 | 26
103 | 2019-01-14 | 22
(5 rows)

Related

HANA SQL Filling missing gaps in a date table with balance history

On Hana Sql environment I have this table with changes of balances from customers accounts by dates:
"BalanceTable"
CustomerID
BalDate
Balance
1
2021-06-01
0
1
2021-06-04
100
1
2021-06-28
500
2
2021-06-01
200
2
2021-06-03
0
2
2021-07-02
300
...
The table has several rows.
I have created now a date table with all the dates of the interval using the earliest day as first row and latest day as last row:
"DateTable"
Day
2021-06-01
2021-06-02
2021-06-03
2021-06-04
2021-06-05
2021-06-06
...
2021-07-02
I need to join the two tables having the daily balance of each customer:
Day
CustomerID
Balance
2021-06-01
1
0
2021-06-02
1
0
2021-06-03
1
0
2021-06-04
1
100
2021-06-05
1
100
2021-06-06
1
100
...
2021-06-27
1
100
2021-06-28
1
500
2021-06-29
1
500
2021-06-30
1
500
2021-07-01
1
500
2021-07-02
1
100
2021-06-01
2
200
2021-06-02
2
200
2021-06-03
2
0
2021-06-04
2
0
2021-06-05
2
0
...
2021-06-30
2
0
2021-07-01
2
0
2021-07-02
2
300
As first aproach I have tried joining the two tables using a left join:
SELECT * FROM "DateTable" T0 LEFT JOIN "BalanceTable" T1 ON T0."Day"=T1."BalDate"
But I know the proper solution is far beyond my limited SQL knowledge. The key is being able to fill in the gaps for the days of the "DateTable" that don't have a balance value in the "BalanceTable" with the balance of the previous last day with data.
I've read similar cases and they combine IFNULL function to fill gaps with PARTITION BY clause to get the last value, but after many attempts I wasn't able to apply that to my case.
Thank you for your ideas and sorry if I miss something, this is my first post asking for help.
So you have this example data:
CREATE TABLE BALANCETAB (CUSTOMERID INTEGER, BALDATE DATE, BALANCE INTEGER);
INSERT INTO BALANCETAB VALUES (1, '2021-06-01', 0);
INSERT INTO BALANCETAB VALUES (1, '2021-06-04', 100);
INSERT INTO BALANCETAB VALUES (1, '2021-06-28', 500);
INSERT INTO BALANCETAB VALUES (2, '2021-06-01', 200);
INSERT INTO BALANCETAB VALUES (2, '2021-06-03', 0);
INSERT INTO BALANCETAB VALUES (1, '2021-07-02', 300);
You already headed in the right direction by creating the dates table:
CREATE TABLE DATETAB AS (
SELECT GENERATED_PERIOD_START DAY
FROM SERIES_GENERATE_DATE('INTERVAL 1 DAY', '2021-06-01' ,'2021-07-02')
);
However, additionally you will need to know all customers since you want to add one row per date and per customer (cross join):
CREATE TABLE CUSTOMERTAB AS (
SELECT DISTINCT CUSTOMERID FROM BALANCETAB
);
From this you can infer the table with NULL values, that you would like to fill:
WITH DATECUSTOMERTAB AS (
SELECT * FROM DATETAB, CUSTOMERTAB
)
SELECT DCT.DAY, DCT.CUSTOMERID, BT.BALANCE
FROM DATECUSTOMERTAB DCT
LEFT JOIN BALANCETAB BT ON DCT.DAY = BT.BALDATE AND DCT.CUSTOMERID = BT.CUSTOMERID
ORDER BY DCT.CUSTOMERID, DCT.DAY;
On this table, you can apply a self-join (BTFILL) and use window function RANK (documentation) to determine the latest previous balance value.
WITH DATECUSTOMERTAB AS (
SELECT * FROM DATETAB, CUSTOMERTAB
)
SELECT DAY, CUSTOMERID, IFNULL(BALANCE, BALANCEFILL) BALANCE_FILLED
FROM
(
SELECT DCT.DAY, DCT.CUSTOMERID, BT.BALANCE, BTFILL.BALANCE AS BALANCEFILL,
RANK() OVER (PARTITION BY DCT.DAY, DCT.CUSTOMERID, BT.BALANCE ORDER BY BTFILL.BALDATE DESC) RNK
FROM DATECUSTOMERTAB DCT
LEFT JOIN BALANCETAB BT ON DCT.DAY = BT.BALDATE AND DCT.CUSTOMERID = BT.CUSTOMERID
LEFT JOIN BALANCETAB BTFILL ON BTFILL.BALDATE <= DCT.DAY AND DCT.CUSTOMERID = BTFILL.CUSTOMERID AND BTFILL.BALANCE IS NOT NULL
)
WHERE RNK = 1
ORDER BY CUSTOMERID, DAY;
Of course, you would omit the explicit creation of tables DATETAB and CUSTOMERTAB. The list of expected customer would probably already exist somewhere in your system and the series generator function could be part of the final statement.

Counting unique combinations of values across multiple columns regardless of order?

I have a table that looks a bit like this:
Customer_ID | Offer_1 | Offer_2 | Offer_3
------------|---------|---------|--------
111 | A01 | 001 | B01
222 | A01 | B01 | 001
333 | A02 | 001 | B01
I want to write a query to figure out how many unique combinations of offers there are in the table, regardless of what order the offers appear in.
So in the example above there are two unique combinations: customers 111 & 222 both have the same three offers so they count as one unique combination, and then customer 333 is the only customer to have the three orders that they have. So the desired output of the query would be 2.
For some additional context:
The customer_ID column is in integer format, and all the offer
columns are in varchar format.
There are 12 offer columns and over 3 million rows in the actual
table, with over 100 different values in the offer columns. I
simplified the example to better illustrate what I'm trying to do, but any solution needs to scale to this amount of
possible combinations.
I can concatenate all of the offer columns together and then run a count distinct statement on the result, but this doesn't account for customers who have the same unique combination of offers but ordered differently (like customers 111 & 222 in the example above).
Does anyone know how to solve this problem please?
Assuming the character / doesn't show up in any of the offer names, you can do:
select count(distinct offer_combo) as distinct_offers
from (
select listagg(offer, '/') within group (order by offer) as offer_combo
from (
select customer_id, offer_1 as offer from t
union all select customer_id, offer_2 from t
union all select customer_id, offer_3 from t
) x
group by customer_id
) y
Result:
DISTINCT_OFFERS
---------------
2
See running example at db<>fiddle.
One way to do it would be to union all the offers into one column, then use select distinct listagg... to get the combinations of offers. Try this:
with u as
(select Customer_ID, Offer_1 as Offer from table_name union all
select Customer_ID, Offer_2 as Offer from table_name union all
select Customer_ID, Offer_3 as Offer from table_name)
select distinct listagg(Offer, ',') within group(order by Offer) from u
group by Customer_ID
Fiddle
The solution without UNION ALLs. It should have better performance.
/*
WITH MYTAB (Customer_ID, Offer_1, Offer_2, Offer_3) AS
(
VALUES
(111, 'A01', '001', 'B01')
, (222, 'A01', 'B01', '001')
, (333, 'A02', '001', 'B01')
)
*/
SELECT COUNT (DISTINCT LIST)
FROM
(
SELECT LISTAGG (V.Offer, '|') WITHIN GROUP (ORDER BY V.Offer) LIST
FROM MYTAB T
CROSS JOIN TABLE (VALUES T.Offer_1, T.Offer_2, T.Offer_3) V (Offer)
GROUP BY T.CUSTOMER_ID
)

Calculate exact month-difference between two dates

DB-Fiddle
CREATE TABLE inventory
(
id SERIAL PRIMARY KEY,
inventory_date DATE,
product_name VARCHAR(255),
product_value VARCHAR(255)
);
INSERT INTO inventory (inventory_date, product_name, product_value)
VALUES ('2020-10-19', 'Product_A', '400'),
('2020-10-22', 'Product_B', '400'),
('2020-11-20', 'Product_C', '900'),
('2020-11-25', 'Product_D', '300');
Expected result:
product_name | months_in_inventory
-------------+--------------------
Product_A | 2
Product_B | 1
Product_C | 1
Product_D | 0
I want to calculate the months_in_inventory by calculating the difference between a fixed_date and the inventory_date.
In the example the fixed_date is '2020-12-20' and I am using it my query.
So far I am able to calculate the difference in days:
SELECT
iv.product_name,
'2020-12-20'::date - MAX(iv.inventory_date::date) AS days_in_inventory
FROM
inventory iv
GROUP BY
1
ORDER BY
1;
However, I could not figure out how to change it to a difference in month. Do you have any idea?
NOTE
I know that one way to approach this would be extracting the month from the fixed_date and inventory_date and subtract both numbers. However, this would not give me the correct result because I need it exactly based on the dates.
For example Product_B is only 1 month in inventory because 2020-10-22 is not two months compared to 2020-12-20.
You can use age(). If the value is always less than 12 months, then one method is:
SELECT iv.product_name,
extract(month form age('2020-12-20'::date, MAX(iv.inventory_date::date))) AS months_in_inventory
FROM inventory iv
GROUP BY 1
ORDER BY 1;
A more accurate calculation takes the year into account:
SELECT iv.product_name,
(extract(year from age('2020-12-20'::date, MAX(iv.inventory_date::date))) * 12 +
extract(month from age('2020-12-20'::date, MAX(iv.inventory_date::date)))
) AS months_in_inventory
FROM inventory iv
GROUP BY 1
ORDER BY 1;
Here is a db<>fiddle.

Subtract from purchase and sale s table for found balance

I have two tables: table one Purchase and another sales table, actually I need balance using subtract two table, subtract sales from table Purchase. My code is given below
create table purchase(
id number(10) primary key,
name varchar2(10),
p_qty number(10)
);
and insert data :
insert into purchase values(01,'productB',235);
insert into purchase values(04,'productC',394);
insert into purchase values(05,'productD',381);
insert into purchase values(08,'productE',357);
insert into purchase values(09,'productF',389);
insert into purchase values(10,'productQ',336);
another table: Sales
create table sales(
id number(10),
s_qty number(10),
constraint pid_pk foreign key (id)REFERENCES purchase(id)
);
insert data to salse table :
insert into sales values(01,34);
insert into sales values(04,54);
insert into sales values(05,44);
insert into sales values(09,50);
insert into sales values(01,3);
insert into sales values(04,4);
insert into sales values(05,5);
insert into sales values(09,53);
insert into sales values(01,2);
insert into sales values(04,2);
insert into sales values(05,2);
insert into sales values(09,2);
insert into sales values(01,4);
insert into sales values(04,9);
insert into sales values(05,11);
insert into sales values(09,7);
and I have using two query
Query 1:
select id,name,sum(p_qty) as p_total from purchase group by id,name;
ID NAME P_TOTAL
5 productD 381
10 productQ 336
4 productC 394
1 productB 235
8 productE 357
9 productF 389
QUERY2:
select id,sum(s_qty) as s_total from sales group by id;`
ID S_TOTAL
1 43
4 69
5 62
9 112
NOW I Want to below the table for balance each item
ID NAME P_TOTAL S_TOTAL BALANCE
5 productD 381 62 319
4 productC 394 69 325
1 productB 235 43 192
9 productF 389 112 277
Hope this helps.
SELECT p.id, p.name, p.p_total, s.s_total,
p.p_total - s.s_total AS balance
FROM (select id, name, sum(p_qty) as p_total FROM purchase
GROUP BY id, name) p
INNER JOIN (select id, sum(s_qty) as s_total FROM sales
GROUP BY id) s
ON s.ID = p.ID;
You're almost there. Take the two queries you already have and join them together:
SELECT p.ID,
p.NAME,
p.P_TOTAL,
s.S_TOTAL,
p.P_TOTAL - s.S_TOTAL AS BALANCE
FROM (select id, name, sum(p_qty) as p_total
from purchase
group by id, name) p
INNER JOIN (select id, sum(s_qty) as s_total
from sales
group by id) s
ON s.ID = p.ID
Best of luck.
I Want to below the table for balance each item
You way you want the balance for each item, but you only show the balance for the items with sales.
If you want each item that has been purchased, you can use left join with subqueries:
select p.id, p.name, p_total, coalesce(s_total, 0),
(p_total - coalesce(s_total, 0)) as balance
from (select id, name, sum(p_qty) as p_total
from purchase
group by id,name
) p left join
(select id, sum(s_qty) as s_total
from sales
group by id
) s
on p.id = s.id;
If you want each item that has sales, then just use inner join.

How to sum the hours using two Date Fields and group them by the user id in SQL

I feel like the task is straight forward but I am having hard time getting it to do what I want.
Here is a table in my database:
ID |Empl_Acc_ID |CheckIn |CheckOut |WeekDay
----------------------------------------------------------------------------
1 | 1 | 2017-09-24 08:03:02.143 | 2017-09-24 12:00:00.180 | Sun
2 | 1 | 2017-09-24 13:02:23.457 | 2017-09-24 17:01:02.640 | Sun
3 | 2 | 2017-09-24 08:05:23.457 | 2017-09-24 13:01:02.640 | Mon
4 | 2 | 2017-09-24 14:05:23.457 | 2017-09-24 17:00:02.640 | Mon
5 | 3 | 2017-09-24 07:05:23.457 | 2017-09-24 11:30:02.640 | Tue
6 | 3 | 2017-09-24 12:31:23.457 | 2017-09-24 16:01:02.640 | Tue
and so on....
I want to group Empl_Acc_ID by the same date and sum up the total hours each employee worked that day. Each employee could have either one or more records per day depending on how many breaks he/she took that day.
For example if Empl_Acc_ID (2) worked 3 different days with one break, the table will contain 6 records for that person but in my query I want to see 3 records with the total hours they worked each day.
Here is how I constructed the query:
select distinct w.Empl_Acc_ID, ws.fullWorkDayHours
from Work_Schedule as w
INNER JOIN (
SELECT Empl_Acc_ID, fullWorkDayHours = Sum(DATEDIFF(hour, w.CheckIn, w.CheckOut))
from Work_Schedule w
GROUP BY Empl_Acc_ID
) ws on w.Empl_Acc_ID = ws.Empl_Acc_ID
This query does not quite get me what I need. It only returns the sum of hours per employee for all the days they worked. Also, this query only has 2 columns but I want to see more columns. when I tried adding more columns, the records no longer are distinct by Empl_Acc_ID.
What is wrong with the query?
Thank you
You do not need self-join this table in that case, just group by casting the datetime field to date.
create table Work_Schedule (
ID TINYINT,
Empl_Acc_ID TINYINT,
CheckIn DATETIME,
CheckOut DATETIME,
WeekDay CHAR(3)
);
INSERT INTO Work_Schedule VALUES (1, 1,'2017-09-24 08:03:02.143','2017-09-24 12:00:00.180','Sun');
INSERT INTO Work_Schedule VALUES (2, 1,'2017-09-24 13:02:23.457','2017-09-24 17:01:02.640','Sun');
INSERT INTO Work_Schedule VALUES (3, 2,'2017-09-24 08:05:23.457','2017-09-24 13:01:02.640','Mon');
INSERT INTO Work_Schedule VALUES (4, 2,'2017-09-24 14:05:23.457','2017-09-24 17:00:02.640','Mon');
INSERT INTO Work_Schedule VALUES (5, 3,'2017-09-24 07:05:23.457','2017-09-24 11:30:02.640','Tue');
INSERT INTO Work_Schedule VALUES (6, 3,'2017-09-24 12:31:23.457','2017-09-24 16:01:02.640','Tue');
SELECT w.Empl_Acc_ID,
CAST(CheckIn AS DATE) [date],
SUM(DATEDIFF(hour, w.CheckIn, w.CheckOut)) fullWorkDayHours
FROM Work_Schedule w
GROUP BY w.Empl_Acc_ID, CAST(CheckIn AS DATE)
DROP TABLE Work_Schedule;
Empl_Acc_ID date fullWorkDayHours
1 2017-09-24 8
2 2017-09-24 8
3 2017-09-24 8
Try this. You just have to group by date and employee account.
select Employee.Empl_Acc_ID, FirstName, LastName, Username,
convert(varchar(10), checkin, 101) as checkin, convert(varchar(10),
checkout, 101) as checkout, sum(datediff(hour, checkin, checkout)) as hours
from Employee
inner join Employee_Account on Employee.Empl_Acc_ID =
Employee_Account.Empl_Acc_ID
inner join Work_Schedule on Employee_Account.Empl_Acc_ID =
Work_Schedule.Empl_Acc_ID
group by convert(varchar(10), checkin, 101), convert(varchar(10), checkout,
101), Employee.Empl_Acc_ID, FirstName, LastName, Username
order by Employee.Empl_Acc_ID
You do not group by date, that's the issue:
SELECT DISTINCT w.Empl_Acc_ID, ws.fullWorkDayHours, ws.CheckInDate
FROM Work_Schedule as w
INNER JOIN (
SELECT Empl_Acc_ID, CAST(w.CheckIn AS DATE) AS [CheckInDate], fullWorkDayHours = Sum(DATEDIFF(hour,
w.CheckIn, w.CheckOut))
from Work_Schedule w
GROUP BY Empl_Acc_ID, CAST(w.CheckIn AS DATE)
) ws on w.Empl_Acc_ID = ws.Empl_Acc_ID
No need of doing self join, it works fine without it:
Select distinct Empl_Acc_ID, Sum(DATEDIFF(hour,CheckIN,CheckOut)) As
FullDayWorkHours from EMP2
where DATEPART(day,CheckIn)=DATEPART(day,CheckOut)
Group By Empl_Acc_ID