Want to JOIN fourth table in query - sql

I have four tables:
mls_category
points_matrix
mls_entry
bonus_points
My first table (mls_category) is like below:
*--------------------------------*
| cat_no | store_id | cat_value |
*--------------------------------*
| 10 | 101 | 1 |
| 11 | 101 | 4 |
*--------------------------------*
My second table (points_matrix) is like below:
*----------------------------------------------------*
| pm_no | store_id | value_per_point | maxpoint |
*----------------------------------------------------*
| 1 | 101 | 1 | 10 |
| 2 | 101 | 2 | 50 |
| 3 | 101 | 3 | 80 |
*----------------------------------------------------*
My third table (mls_entry) is like below:
*-------------------------------------------*
| user_id | category | distance | status |
*-------------------------------------------*
| 1 | 10 | 20 | approved |
| 1 | 10 | 30 | approved |
| 1 | 11 | 40 | approved |
*-------------------------------------------*
My fourth table (bonus_points) is like below:
*--------------------------------------------*
| user_id | store_id | bonus_points | type |
*--------------------------------------------*
| 1 | 101 | 200 | fixed |
| 2 | 102 | 300 | fixed |
| 1 | 103 | 4 | per |
*--------------------------------------------*
Now, I want to add bonus points value into the sum of total distance according to the store_id, user_id and type.
I am using the following code to get total distance:
SELECT MIN(b.value_per_point) * d.total_distance FROM points_matrix b
JOIN
(
SELECT store_id, sum(t1.totald/c.cat_value) as total_distance FROM mls_category c
JOIN
(
SELECT SUM(distance) totald, user_id, category FROM mls_entry
WHERE user_id= 1 AND status = 'approved' GROUP BY user_id, category
) t1 ON c.cat_no = t1.category
) d ON b.store_id = d.store_id AND b.maxpoint >= d.total_distance
The above code is correct to calculate value, now I want to JOIN my fourth table.
This gives me sum (60*3 = 180) as total value. Now, I want (60+200)*3 = 780 for user 1 and store id 101 and value is fixed.

i think your query will be like below
SELECT Max(b.value_per_point)*( max(d.total_distance)+max(bonus_points)) FROM mls_point_matrix b
JOIN
(
SELECT store_id, sum(t1.totald/c.cat_value) as total_distance FROM mls_category c
JOIN
(
SELECT SUM(distance) totald, user_id, category FROM mls_entry
WHERE user_id= 1 AND status = 'approved' GROUP BY user_id, category
) t1 ON c.cat_no = t1.category group by store_id
) d ON b.store_id = d.store_id inner join bonus_points bp on bp.store_id=d.store_id
DEMO fiddle

Related

How to join two tables with sum of one column and with condition

I have two tables:
table 1
+-------------+--------------+-----------------+
| id_product | id_customer |start_date |
+-------------+--------------+-----------------+
| 1 | 1 | 2021-08-28T10:37|
| 1 | 2 | 2021-08-28T11:17|
| 1 | 3 | 2021-08-28T12:27|
| 2 | 1 | 2021-08-28T17:00|
table 2
+-------------+------------------+----------+-------------------------------+
| id_customer | stop_date | duration | 20 other columns like duration|
+-------------+------------------+----------+-------------------------------+
| 1 | 2021-08-27T17:00| 20 | ...
| 1 | 2021-08-26T17:00| 40 | ...
| 2 | 2021-08-29T17:00| 120 | ...
| 1 | 2021-08-30T17:00| 40 | ...
| ..........................................|
start_date in table 1 is the date the customer started the product.
stop_datein table 2 is the date the customer stopped the product.
I want to join these two tables to have something like : one row with :
productid
customer_id
start_date
sum of all duration for all the stop_date BEFORE start_date.
same as duration for all the 20 reminding columns.
example for product_id = 1, custom_id = 1 :
+-------------+--------------+-----------------+---------------+-----------------------------------+
| id_product | id_customer |start_date | sum(duration) | sum(all other columns from table 2)
+-------------+--------------+-----------------+---------------+-----------------------------------+
| 1 | 1 | 2021-08-28T10:37| 60
I have a really big tables, I am using pyspark with SQL. Do you know an optimised way to this ?
Thank you
EDIT :
There is also an id_product in table2
SELECT
Table_1.id_product,
Table_1.id_customer,
Table_1.start_date,
SUM(duration) AS [sum(duration)]
---,SUM(duration2)
---,SUM(duration3)
FROM Table_1
LEFT JOIN Table_2 ON
Table_2.id_customer = Table_1.id_customer
AND Table_2.id_product = Table_1.id_product
AND Table_2.stop_date < Table_1.start_date
GROUP BY Table_1.id_product,Table_1.id_customer, Table_1.start_date

eSQL multiple join but with conditions

I've 3 tables as under
MERCHANDISE
+-----------+-----------+---------------+
| MERCH_NUM | MERCH_DIV | MERCH_SUB_DIV |
+-----------+-----------+---------------+
| 1 | car | awd |
| 1 | car | awd |
| 2 | bike | 1kcc |
| 3 | cycle | hybrid |
| 3 | cycle | city |
| 4 | moped | fixie |
+-----------+-----------+---------------+
PRIORITY
+----------+-----------+---------+---------+------------+------------+---------------+
| CUST_NUM | SALES_NUM | DOC_NUM | BALANCE | PRIORITY_1 | PRIORITY_2 | PRIORITY_CODE |
+----------+-----------+---------+---------+------------+------------+---------------+
| 90 | 1000 | 10 | 23 | 1 | 6 | NO |
| 91 | 1001 | 20 | 32 | 3 | 7 | PRI |
| 92 | 1002 | 30 | 11 | 2 | 8 | LATE |
| 93 | 1003 | 40 | 22 | 5 | 9 | 1MON |
+----------+-----------+---------+---------+------------+------------+---------------+
ORDER
+----------+-----------+---------+---------+-----------+-----------+
| CUST_NUM | SALES_NUM | DOC_NUM | COUNTRY | MERCH_NUM | MERCH_DIV |
+----------+-----------+---------+---------+-----------+-----------+
| 90 | 1000 | 10 | INDIA | 1 | car |
| 91 | 1001 | 20 | CHINA | 2 | bike |
| 92 | 1002 | 30 | USA | 3 | cycle |
| 93 | 1003 | 40 | UK | 4 | moped |
+----------+-----------+---------+---------+-----------+-----------+
I want to join the left joined table from the last two tables with the first one such that the MERCH_SUB_DIV 'awd' appears only once for each unique combination of merch_num and merch_div
the code I came up with is as under, but I'm not sure how do I eliminate the duplicate row just for the awd
select
ROW#, MERCH.MERCH_NUMBER, ORDPRI.MERCH_NUMBER, ORDPRI.CUST_NUM,
BALANCE, SALES_NUM, ITEM_NUM, RANK, PRIORITY_1
from (
select
ROW_NUMBER() OVER(
PARTITION BY ORD.DOC_NUM, ORD.ITEM_NUM
ORDER BY ORD.DOC_NUM, ORD.ITEM_NUM ASC
) AS Row#,
ORD.CUST_NUM, PRI.CUST_NUM, ORD.MERCH_NUM, ORD.MERCH_DIV, PRI.BALANCE,
pri.DOC_NUM, pri.SALES_NUM, pri.PRIORITY_1, pri.PRIORITY_2
from ORDER as ORD
left join PRIORITY as PRI on ORD.DOC_NUM = PRI.DOC_NUM
and ORD.SALES_NUMBER = PRI.SALES_NUM
where country_name in ('USA', ‘INDIA’)
) as ORDPRI
left join MERCHANDISE as MERCH on ORDPRI.DIV = MERCH.DIV
and ORDPRI.MERCH_NUM = MERCH.MERCH_NUM
You have to use 'DISTINCT' keyword to get unique values, but if your 'Priority table' & 'Order table' contains different values for Same MERCH_NUM then the final result contains the repetation of the 'MERCH_NUM'.
SELECT DISTINCT M.MERCH_NUMBER, O.MERCH_NUMBER, O.CUST_NUM, BALANCE, SALES_NUM,ITEM_NUM,RANK,PRIORITY_1
FROM priority_table P
LEFT JOIN order_table O ON P.CUST_NUM = O.CUST_NUM AND P.SALES_NUM=O.SALES_NUM AND P.DOC_NUM = O.DOC_NUM
LEFT JOIN merchandise_table M ON M.MERCH_NUM = O.MERCH_NUM
A way around can be to add one new Row_Number() in the outermost query having Partition by MERCH_SUB_DIV + all the columns in the final list and then filter final results based on the New Row_Number() . Follows a pseudo code that might help:
select
-- All expected columns in final result except the newRow#
ROW#, MERCH_NUM, CUST_NUM,
BALANCE, SALES_NUM, PRIORITY_1
from (
select
ROW#,
-- the new row number includes all column you want to show in final result
row_number() over ( PARTITION BY MERCH.MERCH_SUB_DIV ,
MERCH.MERCH_NUM, ORDPRI.MERCH_NUM, ORDPRI.CUST_NUM,
BALANCE, SALES_NUM, PRIORITY_1
order by (select 1 )) as newRow# ,
MERCH.MERCH_NUM, ORDPRI.CUST_NUM,
BALANCE, SALES_NUM, PRIORITY_1
from (
-- main query goes here
select
ROW_NUMBER() OVER(
PARTITION BY ORD.DOC_NUM --, ORD.ITEM_NUM
ORDER BY ORD.DOC_NUM ASC --, ORD.ITEM_NUM
) AS Row#,
ORD.CUST_NUM, ORD.MERCH_NUM, ORD.MERCH_DIV as DIV, PRI.BALANCE,
pri.DOC_NUM, pri.SALES_NUM, pri.PRIORITY_1, pri.PRIORITY_2
from #ORDER as ORD
left join #PRIORITY as PRI on ORD.DOC_NUM = PRI.DOC_NUM
and ORD.SALES_NUMBER = PRI.SALES_NUM
where country_name in ('USA', 'INDIA')
) as ORDPRI
left join #MERCHANDISE as MERCH on ORDPRI.DIV = MERCH.DIV
and ORDPRI.MERCH_NUM = MERCH.MERCH_NUM
) as T
-- final filter to get distinct values
where newRow# = 1
Sample code here .. Hope this helps!!

T-SQL - Total difference between two dates

I have two tables, Orders & Order History, the Orders table has the Order Number and the Order Date which is the date that the order was actually placed.
This is demonstrated in the below schema and demonstration data;
CREATE TABLE #Orders
(
OrderNumber INT,
OrderDate DATETIME
)
INSERT INTO #Orders (OrderNumber,OrderDate)
VALUES
(001,'2019-04-16 07:08:08.567'),
(002,'2019-03-22 07:08:08.567'),
(003,'2019-06-30 07:08:08.567'),
(004,'2019-01-05 07:08:08.567'),
(005,'2019-02-19 07:08:08.567')
The Order audit table also contains the Order Number and the Event Date which is the date that the order status changed.
This is demonstrated in the below schema and demonstration data.
CREATE TABLE #Order_Audit
(
OrderNumber INT,
EventDate DATETIME,
Status INT
)
INSERT INTO #Order_Audit (OrderNumber,EventDate,Status)
VALUES
(001,'2019-04-16 07:08:08.567',1),
(001,'2019-04-19 07:08:08.567',2),
(001,'2019-04-22 07:08:08.567',3),
(001,'2019-04-28 07:08:08.567',4),
(001,'2019-04-30 07:08:08.567',5),
(002,'2019-03-22 07:08:08.567',1),
(002,'2019-03-24 07:08:08.567',2),
(002,'2019-03-26 07:08:08.567',3),
(002,'2019-04-01 07:08:08.567',4),
(002,'2019-04-10 07:08:08.567',5),
(003,'2019-06-30 07:08:08.567',1),
(003,'2019-07-15 07:08:08.567',2),
(003,'2019-07-19 07:08:08.567',3),
(003,'2019-07-20 07:08:08.567',4),
(003,'2019-07-21 07:08:08.567',5),
(004,'2019-01-05 07:08:08.567',1),
(004,'2019-01-06 07:08:08.567',2),
(004,'2019-01-07 07:08:08.567',3),
(004,'2019-01-08 07:08:08.567',4),
(004,'2019-01-09 07:08:08.567',5),
(005,'2019-02-19 07:08:08.567',1),
(005,'2019-03-19 07:08:08.567',2),
(005,'2019-03-21 07:08:08.567',3),
(005,'2019-03-22 07:08:08.567',4),
(005,'2019-03-23 07:08:08.567',5)
Below is the query that I have at the moment, it will give me the difference between the Event Date and the Order Date that the order was placed.
The query has been simplified however the key columns are included. This is being executed on SQL Server 2012 SP4
SELECT
O.OrderNumber,
DATEDIFF(DAY,O.OrderDate,OA.EventDate) AS [Day-Diff]
FROM #Orders O
INNER JOIN #Order_Audit OA ON OA.OrderNumber = O.OrderNumber
The above query outputs like this
|---------------------|------------------|
| OrderNumber | DayDiff |
|---------------------|------------------|
| 001 | 0 |
|---------------------|------------------|
| 001 | 3 |
|---------------------|------------------|
| 001 | 6 |
|---------------------|------------------|
| 001 | 12 |
|---------------------|------------------|
| 001 | 14 |
|---------------------|------------------|
| 002 | 0 |
|---------------------|------------------|
| 002 | 2 |
|---------------------|------------------|
| 002 | 4 |
|---------------------|------------------|
| 002 | 10 |
|---------------------|------------------|
| 002 | 19 |
|---------------------|------------------|
What I actually need is a query that will output more like this
|---------------------|------------------|
| OrderNumber | DayDiff |
|---------------------|------------------|
| 001 | |
|---------------------|------------------|
| 001 | |
|---------------------|------------------|
| 001 | |
|---------------------|------------------|
| 001 | |
|---------------------|------------------|
| 001 | |
|---------------------|------------------|
| Total | 14 |
|---------------------|------------------|
| 002 | |
|---------------------|------------------|
| 002 | |
|---------------------|------------------|
| 002 | |
|---------------------|------------------|
| 002 | |
|---------------------|------------------|
| 002 | |
|---------------------|------------------|
| Total | 19 |
|---------------------|------------------|
However I can't figure out how to get the difference between the Order Date and the most recent Event Date for each order and add it below that group of order events (as shown above) - I am not even sure it is possible in T-SQL and should possibly be handled at the application level.
You can try this below. I have created the Total label as OrderNumber + Total for ordering.
SELECT
CAST(O.OrderNumber AS VARCHAR) + ' Total' OrderNumber,
MAX(DATEDIFF(DAY,O.OrderDate,OA.EventDate)) AS [Day-Diff]
FROM #Orders O
INNER JOIN #Order_Audit OA ON OA.OrderNumber = O.OrderNumber
GROUP BY CAST(O.OrderNumber AS VARCHAR) + ' Total'
UNION ALL
SELECT
CAST(O.OrderNumber AS VARCHAR) OrderNumber,
NULL AS [Day-Diff]
FROM #Orders O
INNER JOIN #Order_Audit OA ON OA.OrderNumber = O.OrderNumber
ORDER BY 1
For the totals you can group by ordernumber to get the last eventdate and then find the difference from the corresponding orderdate.
Then use UNION ALL:
select t.OrderNumber, t.DayDiff
from (
select ordernumber nr, cast(ordernumber as varchar(10)) OrderNumber, null DayDiff, 0 col
from order_audit
union all
select a.ordernumber nr, 'Total', datediff(day, o.orderdate, a.eventdate) DayDiff, 1 col
from orders o inner join (
select
ordernumber, max(eventdate) eventdate
from order_audit
group by ordernumber
) a on a.ordernumber = o.ordernumber
) t
order by t.nr, t.col
See the demo.
Results:
> OrderNumber | DayDiff
> :---------- | ------:
> 1 |
> 1 |
> 1 |
> 1 |
> 1 |
> Total | 14
> 2 |
> 2 |
> 2 |
> 2 |
> 2 |
> Total | 19
> 3 |
> 3 |
> 3 |
> 3 |
> 3 |
> Total | 21
> 4 |
> 4 |
> 4 |
> 4 |
> 4 |
> Total | 4
> 5 |
> 5 |
> 5 |
> 5 |
> 5 |
> Total | 32

Only include grouped observations where event order is valid

I have a table of dates for eye exams and eye wear purchases for individuals. I only want to keep instances where individuals bought their eye wear following an eye exam. In the example below, I would want to keep person 1, events 2 and 3 for person 2, person 3, but not person 4. How can I do this in SQL server?
| Person | Event | Order |
| 1 | Exam | 1 |
| 1 | Eyewear| 2 |
| 2 | Eyewear| 1 |
| 2 | Exam | 2 |
| 2 | Eyewear| 3 |
| 3 | Exam | 1 |
| 3 | Eyewear| 2 |
| 4 | Eyewear| 1 |
| 4 | Exam | 2 |
The final result would look like
| Person | Event | Order |
| 1 | Exam | 1 |
| 1 | Eyewear| 2 |
| 2 | Exam | 2 |
| 2 | Eyewear| 3 |
| 3 | Exam | 1 |
| 3 | Eyewear| 2 |
Self join should work...
select
t.Person
,t.Event
,t.[Order]
from
yourTable t
inner join
yourTable t2 on t2.Person = t.Person
and t2.[Order] = (t.[Order] +1)
where
t2.Event = 'Eyewear'
and t.Event = 'Exam'
I haven't tried to optimize it but this seems to work:
create table t(
person varchar(10),
event varchar(10),
[order] varchar(10)
);
insert into t values
('1','Exam','1'),
('1','Eyewear','2'),
('2','Eyewear','1'),
('2','Exam','2'),
('2','Eyewear','3'),
('3','Exam','1'),
('3','Eyewear','2'),
('4','Eyewear','1'),
('4','Exam','2');
with xxx(person,event_a,seq_a,event_b,seq_b) as (
select a.person,a.event,a.[order],b.event,b.[order]
from t a join t b
on a.person = b.person
and a.[order] < b.[order]
and a.event like 'exam'
and b.event like 'eyewear'
)
select person,event_a event,seq_a [order] from xxx
union
select person,event_b event,seq_b [order] from xxx
order by 1,3

Different results with agregate and analytical with distinct functions

I can't understand why there is the difference in results between two queries:
1) select distinct sum(lot.lot_size) over (partition by lot.detail_id) buying_size, lot.f_detail_id
from buying
join lot on lot.lot_id = buying.lot_id
where exists (select 1
from selling s
join buying b on b.buying_id = s.buying_buying_id
where s.deal_id = 123456
and buying.deal_id = b.deal_id
and s.selling_detailid = buying.buying_detailid)
and buying.buying_status <> 'Canceled'
result:
|buying_size|f_detail_id |
|-----------|------------|
| 105 | 1 |
| 200 | 2 |
| 75 | 3 |
| 225 | 4 |
| 300 | 5 |
2) select distinct *
from (select sum(lot.lot_size) over (partition by lot.detail_id) buying_size, lot.f_detail_id
from buying
join lot on lot.lot_id = buying.lot_id
where exists (select 1
from selling s
join buying b on b.buying_id = s.buying_buying_id
where s.deal_id = 123456
and buying.deal_id = b.deal_id
and s.selling_detailid = buying.buying_detailid)
and buying.buying_status <> 'Canceled')
result:
|buying_size|f_detail_id |
|-----------|------------|
| 105 | 1 |
| 200 | 2 |
| 75 | 3 |
| 225 | 4 |
| 150 | 5 |
I know that this query might be written by another way with using GROUP BY but the main thing I'm interested in is why the result of analytical function depends on DISTINCT.