How to join two tables with sum of one column and with condition - sql

I have two tables:
table 1
+-------------+--------------+-----------------+
| id_product | id_customer |start_date |
+-------------+--------------+-----------------+
| 1 | 1 | 2021-08-28T10:37|
| 1 | 2 | 2021-08-28T11:17|
| 1 | 3 | 2021-08-28T12:27|
| 2 | 1 | 2021-08-28T17:00|
table 2
+-------------+------------------+----------+-------------------------------+
| id_customer | stop_date | duration | 20 other columns like duration|
+-------------+------------------+----------+-------------------------------+
| 1 | 2021-08-27T17:00| 20 | ...
| 1 | 2021-08-26T17:00| 40 | ...
| 2 | 2021-08-29T17:00| 120 | ...
| 1 | 2021-08-30T17:00| 40 | ...
| ..........................................|
start_date in table 1 is the date the customer started the product.
stop_datein table 2 is the date the customer stopped the product.
I want to join these two tables to have something like : one row with :
productid
customer_id
start_date
sum of all duration for all the stop_date BEFORE start_date.
same as duration for all the 20 reminding columns.
example for product_id = 1, custom_id = 1 :
+-------------+--------------+-----------------+---------------+-----------------------------------+
| id_product | id_customer |start_date | sum(duration) | sum(all other columns from table 2)
+-------------+--------------+-----------------+---------------+-----------------------------------+
| 1 | 1 | 2021-08-28T10:37| 60
I have a really big tables, I am using pyspark with SQL. Do you know an optimised way to this ?
Thank you
EDIT :
There is also an id_product in table2

SELECT
Table_1.id_product,
Table_1.id_customer,
Table_1.start_date,
SUM(duration) AS [sum(duration)]
---,SUM(duration2)
---,SUM(duration3)
FROM Table_1
LEFT JOIN Table_2 ON
Table_2.id_customer = Table_1.id_customer
AND Table_2.id_product = Table_1.id_product
AND Table_2.stop_date < Table_1.start_date
GROUP BY Table_1.id_product,Table_1.id_customer, Table_1.start_date

Related

Replace null values with most recent non-null values SQL

I have a table where each row consists of an ID, date, variable values (eg. var1).
When there is a null value for var1 in a row, I want like to replace the null value with the most recent non-null value before that date for that ID. How can I do this quickly for a very large table?
So presume I start with this table:
+----+------------|-------+
| id |date | var1 |
+----+------------+-------+
| 1 |'01-01-2022'|55 |
| 2 |'01-01-2022'|12 |
| 3 |'01-01-2022'|45 |
| 1 |'01-02-2022'|Null |
| 2 |'01-02-2022'|Null |
| 3 |'01-02-2022'|20 |
| 1 |'01-03-2022'|15 |
| 2 |'01-03-2022'|Null |
| 3 |'01-03-2022'|Null |
| 1 |'01-04-2022'|Null |
| 2 |'01-04-2022'|77 |
+----+------------+-------+
Then I want this
+----+------------|-------+
| id |date | var1 |
+----+------------+-------+
| 1 |'01-01-2022'|55 |
| 2 |'01-01-2022'|12 |
| 3 |'01-01-2022'|45 |
| 1 |'01-02-2022'|55 |
| 2 |'01-02-2022'|12 |
| 3 |'01-02-2022'|20 |
| 1 |'01-03-2022'|15 |
| 2 |'01-03-2022'|12 |
| 3 |'01-03-2022'|20 |
| 1 |'01-04-2022'|15 |
| 2 |'01-04-2022'|77 |
+----+------------+-------+
cte suits perfect here
this snippets returns the rows with values, just an update query and thats all (will update my response).
WITH selectcte AS
(
SELECT * FROM testnulls where var1 is NOT NULL
)
SELECT t1A.id, t1A.date, ISNULL(t1A.var1,t1B.var1) varvalue
FROM selectcte t1A
OUTER APPLY (SELECT TOP 1 *
FROM selectcte
WHERE id = t1A.id AND date < t1A.date
AND var1 IS NOT NULL
ORDER BY id, date DESC) t1B
Here you can dig further about CTEs :
https://learn.microsoft.com/en-us/sql/t-sql/queries/with-common-table-expression-transact-sql?view=sql-server-ver16

Want to JOIN fourth table in query

I have four tables:
mls_category
points_matrix
mls_entry
bonus_points
My first table (mls_category) is like below:
*--------------------------------*
| cat_no | store_id | cat_value |
*--------------------------------*
| 10 | 101 | 1 |
| 11 | 101 | 4 |
*--------------------------------*
My second table (points_matrix) is like below:
*----------------------------------------------------*
| pm_no | store_id | value_per_point | maxpoint |
*----------------------------------------------------*
| 1 | 101 | 1 | 10 |
| 2 | 101 | 2 | 50 |
| 3 | 101 | 3 | 80 |
*----------------------------------------------------*
My third table (mls_entry) is like below:
*-------------------------------------------*
| user_id | category | distance | status |
*-------------------------------------------*
| 1 | 10 | 20 | approved |
| 1 | 10 | 30 | approved |
| 1 | 11 | 40 | approved |
*-------------------------------------------*
My fourth table (bonus_points) is like below:
*--------------------------------------------*
| user_id | store_id | bonus_points | type |
*--------------------------------------------*
| 1 | 101 | 200 | fixed |
| 2 | 102 | 300 | fixed |
| 1 | 103 | 4 | per |
*--------------------------------------------*
Now, I want to add bonus points value into the sum of total distance according to the store_id, user_id and type.
I am using the following code to get total distance:
SELECT MIN(b.value_per_point) * d.total_distance FROM points_matrix b
JOIN
(
SELECT store_id, sum(t1.totald/c.cat_value) as total_distance FROM mls_category c
JOIN
(
SELECT SUM(distance) totald, user_id, category FROM mls_entry
WHERE user_id= 1 AND status = 'approved' GROUP BY user_id, category
) t1 ON c.cat_no = t1.category
) d ON b.store_id = d.store_id AND b.maxpoint >= d.total_distance
The above code is correct to calculate value, now I want to JOIN my fourth table.
This gives me sum (60*3 = 180) as total value. Now, I want (60+200)*3 = 780 for user 1 and store id 101 and value is fixed.
i think your query will be like below
SELECT Max(b.value_per_point)*( max(d.total_distance)+max(bonus_points)) FROM mls_point_matrix b
JOIN
(
SELECT store_id, sum(t1.totald/c.cat_value) as total_distance FROM mls_category c
JOIN
(
SELECT SUM(distance) totald, user_id, category FROM mls_entry
WHERE user_id= 1 AND status = 'approved' GROUP BY user_id, category
) t1 ON c.cat_no = t1.category group by store_id
) d ON b.store_id = d.store_id inner join bonus_points bp on bp.store_id=d.store_id
DEMO fiddle

calculating sum of rows with identical id

Let's imagine a table with two columns ex:
| Value | ID |
+-------+----+
| 2 | 1 |
| 3 | 1 |
| 4 | 1 |
| 1 | 2 |
| 2 | 2 |
| 2 | 2 |
What I am trying to do is to calculate the sum of those with similar id and display them in different table like:
| Sum | ID |
+-----+----+
| 9 | 1 |
| 5 | 2 |
and so on.
I could find a sum of a known id by
SELECT SUM(VALUE) FROM MYTABLE WHERE ID = 1;
However not sure on how to find sum of different id's separately, could you give an idea on how to proceed?
Select SUM(VALUE),ID FROM MYTABLE GROUP BY ID
Use GROUP BY clause:
SELECT SUM(VALUE) Sum, ID FROM MYTABLE GROUP BY ID;
SELECT SUM(VALUE),ID FROM MYTABLE Group By ID

Add Values to Grouping Column

I am having a lot of trouble with a scenario that I think some of you might have come across.
(the whole thing about Business Trips, two tables, one filled with payments done on Business trips, and the other is about the Business Trips, so the first one has more Rows than the other, (there are more Payments that happened than Trips))
I have two tables, Table A and Table B.
Table A looks as follows
| TableA_ID | TableB_ID | PaymentMethod | ValuePayed |
| 52 | 1 | Method1 | 23,2 |
| 21 | 1 | Method2 | 23,2 |
| 33 | 2 | Method3 | 23,2 |
| 42 | 1 | Method2 | 14 |
| 11 | 14 | Method1 | 267 |
| 42 | 1 | Method2 | 14,7 |
| 13 | 32 | Method1 | 100,2 |
Table B looks like this
| TableB_ID | TravelExpenses | OperatingExpense |
| 1 | 23 | 12 |
| 1 | 234 | 24 |
| 2 | 12 | 7 |
| 1 | 432 | 12 |
| 14 | 110 | 12 |
I am trying to create a measure Table (Table C) that looks like this:
| TableC_ID | TypeofCost | Amount |
| 1 | Method1 | 100,2 |
| 2 | Method2 | 52 |
| 3 | TravelExpenses | 7 |
| 4 | OperatingExpense| 12 |
| 5 | Method3 | 12 |
| 6 | OperatingExpense| 7 |
| 7 | Method3 | 12 |
(the Amount results are to be Summed and Columns - Employee, Month, TypeofCost Grouped)
So I pretty much have to group not only by the PaymentMethod which I get from table A,
but also insert new values in the group (TravelExpenses and OperatingExpense)
Can anybody give me any Idea about how this can be done in SQL ?
Here is what I have tried so far
SELECT PaymentMethod as TypeofCost
,Sum(ValuePayed) as Amount
FROM TableA Left Outer Join TableB on TableA.TableB_ID = TableB.TableB_ID
GROUP PaymentMethod
UNION
SELECT 'TravelExpenses' as TypeofCost
,Sum(TableB.TravelExpenses) as Amount
FROM TableA Left Outer Join TableB on TableA.TableB_ID = TableB.TableB_ID
GROUP PaymentMethod
UNION
SELECT 'OperatingExpense' as TypeofCost
,Sum(TableB.OperatingExpense) as Amount
FROM TableA Left Outer Join TableB on TableA.TableB_ID = TableB.TableB_ID
GROUP PaymentMethod
It should be something like this:
Select
row_number() OVER(ORDER BY TableB_ID) as 'TableC_ID',
u.TypeofCost,
u.Amount
from (
Select
a.TableB_ID,
a.PaymentMethod as 'TypeofCost',
SUM(a.ValuePayed) as 'Amount'
from
Table_A as a
group by a.TableB_ID, a.PaymentMethod
union
Select
b1.TableB_ID,
'TravelExpenses' as 'TypeofCost',
SUM(b1.TravelExpenses) as 'Amount'
from
Table_B as b1
group by b1.TableB_ID
union
Select
b2.TableB_ID,
'OperatingExpenses' as 'TypeofCost',
SUM(b2.OperatingExpenses) as 'Amount'
from
Table_B as b2
group by b2.TableB_ID
) as u
EDIT: Generate TableC_ID

Getting Sum of MasterTable's amount which joins to DetailTable

I have two tables:
1. Master
| ID | Name | Amount |
|-----|--------|--------|
| 1 | a | 5000 |
| 2 | b | 10000 |
| 3 | c | 5000 |
| 4 | d | 8000 |
2. Detail
| ID |MasterID| PID | Qty |
|-----|--------|-------|------|
| 1 | 1 | 1 | 10 |
| 2 | 1 | 2 | 20 |
| 3 | 2 | 2 | 60 |
| 4 | 2 | 3 | 10 |
| 5 | 3 | 4 | 100 |
| 6 | 4 | 1 | 20 |
| 7 | 4 | 3 | 40 |
I want to select sum(Amount) from Master which joins to Deatil where Detail.PID in (1,2,3)
So I execute the following query:
SELECT SUM(Amount) FROM Master M INNER JOIN Detail D ON M.ID = D.MasterID WHERE D.PID IN (1,2,3)
Result should be 20000. But I am getting 40000
See this fiddle. Any suggestion?
You are getting exactly double the amount because the detail table has two occurences for each of the PIDs in the WHERE clause.
See demo
Use
SELECT SUM(Amount)
FROM Master M
WHERE M.ID IN (
SELECT DISTINCT MasterID
FROM DETAIL
WHERE PID IN (1,2,3) )
What is the requirement of joining the master table with details when you have all your columns are in Master table.
Also, isnt there any FK relationhsip defined on these tables. Looking at your data it seems to me that there should be FK on detail table for MasterId. If that is the case then you do not need join the table at all.
Also, in case you want to make sure that you have records in details table for the records for which you need sum and there is no FK relationship. Then you could give a try for exists instead of join.