SQL join 3 tables on ID and dates - sql

I have the below test data. There are 3 tables, sales table, sales delivery table and sales delivery months table.
I need to join all the tables together, so that the blue marked rows are connected to the blue marked rows and the red marked rows are connected to the red marked rows.
The join should use the From and To columns that exist in every table, I guess.
Update:
I have tried the following:
SELECT *
FROM Sales co
LEFT JOIN SalesDelivery cd
ON co.SalesID = cd.SalesID
AND cd.From BETWEEN co.From AND co.To
AND cd.To BETWEEN co.From AND co.To
LEFT JOIN SalesDeliveryMonth cdp
ON cd.SalesDeliveryID = cdp.SalesDeliveryID
AND cdp.From BETWEEN cd.From AND cd.To
AND cdp.To BETWEEN cd.From AND cd.To
Sales table:
SalesID Name Revenue From To Current row
100 New CRM 250000.00 1800-01-01 2018-10-03 0
100 New CRM 500000.00 2018-10-03 9999-12-31 1
SalesDelivery table:
SalesID SalesDeliveryID SalesDeliveryName Revenue SalesStart From To Current row
100 AB100 New CRM 250000.00 2018-07-01 1800-01-01 2018-10-03 0
100 AB100 New CRM 500000.00 2018-07-01 2018-10-03 9999-12-31 1
100 ABM100 New CRM - maintenance 0.00 2018-07-01 2018-10-03 9999-12-31 1
SalesDeliveryMonths table:
RevenueMonth Month SalesDeliveryID SalesID From To Current row
833333.3333 2018-07-01 AB100 100 1800-01-01 2018-10-04 0
166666.6667 2018-07-01 AB100 100 2018-10-04 9999-12-31 1
833333.3333 2018-08-01 AB100 100 1800-01-01 2018-10-04 0
166666.6667 2018-08-01 AB100 100 2018-10-04 9999-12-31 1
833333.3333 2018-09-01 AB100 100 1800-01-01 2018-10-04 0
166666.6667 2018-09-01 AB100 100 2018-10-04 9999-12-31 1

Related

Transpose a table with multiple ID rows and different assessment dates

I would like to transpose my table to see trends in a data. The data is formatted as such:
UserId is can occur multiple times because of different assessment periods. Let's say a user with ID 1 inccured some charges in January, February, and March. There are currently three rows that contain data from these periods respectively.
I would like to see everything as one row - independently of the number of periods (up to 12 months), for each user ID.
This would enable me to see and compare changes between assessment periods and attributes.
Current format:
UserId AssessmentDate Attribute1 Attribute2 Attribute3
1 2020-01-01 00:00:00.000 -01:00 20.13 123.11 405.00
1 2021-02-01 00:00:00.000 -01:00 1.03 78.93 11.34
1 2021-03-01 00:00:00.000 -01:00 15.03 310.10 23.15
2 2021-02-01 00:00:00.000 -01:00 14.31 41.30 63.20
2 2021-03-01 00:03:45.000 -01:00 0.05 3.50 1.30
Desired format:
UserId LastAssessmentDate Attribute1_M-2 Attribute2_M-1 ... Attribute3_M0
1 2021-03-01 00:00:00.000 -01:00 20.13 123.11 23.15
2 2021-03-01 00:03:45.000 -01:00 NULL 41.30 1.30
Either SQL or Pandas - both work for me. Thanks for the help!

Need to join two tables on date range in hive for disc rate on transactions for prod catg at acc_no level monthly

I have mapping table mapping_table with column: discount rate value based on prod_catg on tot_bill by account number for 3 different product categories
for group of consumer and business.
I have one transaction table txn having monthly transactions for multiple acc_no on 1 day of every month (eg.txn_date: 2019-01-01).
Table txn with columns txn_date, acc_no, prod_catg, card_grp, and tot_bill(in $) as shown in below table with dummy values.
I need a Hive query to calculate total_disc_amount for each acc_no monthly at card_no level for card_catg wise for all product_hieracy for a year range: prod_serv_name as x for 2020-01-01 to 2020-12-31 and prod_serv_name as y for platinum category and year 2018-01-01 to 2018-12-31.
Join table on date range in hive on prod_catg and get disc_rate value to calculate tot_disc_bill (tot_bill*disc_rate) for at acc_no level monthly.
txn_date
card_no
prod_catg
card_grp
tot_bill
2019-01-01
201
Platinum
Consumer
900
2019-02-01
201
platinum
Consumer
500
2019-03-01
201
Platinum
Consumer
300
2020-02-01
201
Platinum
Consumer
400
2020-03-01
201
Platinum
Consumer
800
2020-03-01
202
Gold
Business
700
2020-01-01
203
Gold
Business
900
2018-10-01
204
Gold
Business
900
2018-09-01
205
Platinum
Business
100
2018-03-01
206
Bronze
Business
200
prod_serv_name
prod_catg
card_grp
disc_rate
start_date
end_date
x
Platinum
Consumer
2.5
2020-01-01
2020-12-31
x
Gold
Consumer
2.5
2020-01-01
2020-12-31
x
Bronze
Consumer
2.5
2020-01-01
2020-12-31
x
Platinum
Consumer
2.5
2019-01-01
2019-12-31
x
Gold
Consumer
3
2019-01-01
2019-12-31
x
Bronze
Consumer
3
2019-01-01
2019-12-31
x
Gold
Business
3
2020-01-01
2020-12-31
y
Gold
Business
3
2018-01-01
2018-12-31
y
Platinum
Business
3
2018-01-01
2018-12-31
y
Bronze
Business
3
2018-01-01
2018-12-31

How to link two tables but only take the MAX value from one table in PostgreSQL?

I have two tables
exchange_rates
TIMESTAMP curr1 curr2 rate
2018-04-01 00:00:00 EUR GBP 0.89
2018-04-01 01:30:00 EUR GBP 0.92
2018-04-01 01:20:00 USD GBP 1.23
and
transactions
TIMESTAMP user curr amount
2018-04-01 18:00:00 1 EUR 23.12
2018-04-01 14:00:00 1 USD 15.00
2018-04-01 01:00:00 2 EUR 55.00
I want to link these two tables on 1. currency and 2. TIMESTAMP in the following way:
curr in transactions must be equal to curr1 in exchange_rates
TIMESTAMP in exchange_rates must be less than or equal to TIMESTAMP in transactions (so we only pick up the exchange rate that was relevant at the time of transaction)
I have this:
SELECT
trans.TIMESTAMP, trans.user,
-- Multiply the amount in transactions by the corresponding rate in exchange_rates
trans.amount * er.rate AS "Converted Amount"
FROM transactions trans, exchange_rates er
WHERE trans.curr = er.curr1
AND er.TIMESTAMP <= trans.TIMESTAMP
ORDER BY trans.user
but this is linking on two many results as the output is more rows than there are in transactions.
DESIRED OUTPUT:
TIMESTAMP user Converted Amount
2018-04-01 18:00:00 1 21.27
2018-04-01 14:00:00 1 18.45
2018-04-01 01:00:00 2 48.95
The logic behind the Converted Amount:
row 1: user spent at 18:00 so take the rate that is less than or equal to the TIMESTAMP in exchange_rates i.e. 0.92 for EUR at 01:30
row 2: user spent at 14:00 so take the rate that is less than or equal to the TIMESTAMP in exchange_rates i.e. 1.23 for USD at 01:20
row 3: user spent at 01:00 so take the rate that is less than or equal to the TIMESTAMP in exchange_rates i.e. 0.89 for EUR at 00:00
How can I do this in postgresql 9.6?
You can use a LATERAL JOIN (CROSS APPLY) and limit the result to the first row that match your conditions.
select t.dt, t.usr, t.amount * e.rate as conv_amount
from transactions t
join lateral (select *
from exchange_rates er
where t.curr = er.curr1
and er.dt <= t.dt
order by dt desc
limit 1) e on true;
dt | usr | conv_amount
:------------------ | --: | ----------:
2018-04-01 18:00:00 | 1 | 21.2704
2018-04-01 14:00:00 | 1 | 18.4500
2018-04-01 01:00:00 | 2 | 48.9500
db<>fiddle here

PostgreSQL - rank over rows listed in blocks of 0 and 1

I have a table that looks like:
id code date1 date2 block
--------------------------------------------------
20 1234 2017-07-01 2017-07-31 1
15 1234 2017-06-01 2017-06-30 1
13 1234 2017-05-01 2017-05-31 0
11 1234 2017-03-01 2017-03-31 0
9 1234 2017-02-01 2017-02-28 1
8 1234 2017-01-01 2017-01-31 0
7 1234 2016-11-01 2016-11-31 0
6 1234 2016-10-01 2016-10-31 1
2 1234 2016-09-01 2016-09-31 1
I need to rank the rows according to the blocks of 0's and 1's, like:
id code date1 date2 block desired_rank
-------------------------------------------------------------------
20 1234 2017-07-01 2017-07-31 1 1
15 1234 2017-06-01 2017-06-30 1 1
13 1234 2017-05-01 2017-05-31 0 2
11 1234 2017-03-01 2017-03-31 0 2
9 1234 2017-02-01 2017-02-28 1 3
8 1234 2017-01-01 2017-01-31 0 4
7 1234 2016-11-01 2016-11-31 0 4
6 1234 2016-10-01 2016-10-31 1 5
2 1234 2016-09-01 2016-09-31 1 5
I've tried to use rank() and dense_rank(), but the result I end up with is:
id code date1 date2 block dense_rank()
-------------------------------------------------------------------
20 1234 2017-07-01 2017-07-31 1 1
15 1234 2017-06-01 2017-06-30 1 2
13 1234 2017-05-01 2017-05-31 0 1
11 1234 2017-03-01 2017-03-31 0 2
9 1234 2017-02-01 2017-02-28 1 3
8 1234 2017-01-01 2017-01-31 0 3
7 1234 2016-11-01 2016-11-31 0 4
6 1234 2016-10-01 2016-10-31 1 4
2 1234 2016-09-01 2016-09-31 1 5
In the last table, the rank doesn't care about the rows, it just takes all the 1's and 0's as a unit and sets an ascending count starting at the first 1 and 0.
My query goes like this:
CREATE TEMP TABLE data (id integer,code text, date1 date, date2 date, block integer);
INSERT INTO data VALUES
(20,'1234', '2017-07-01','2017-07-31',1),
(15,'1234', '2017-06-01','2017-06-30',1),
(13,'1234', '2017-05-01','2017-05-31',0),
(11,'1234', '2017-03-01','2017-03-31',0),
(9, '1234', '2017-02-01','2017-02-28',1),
(8, '1234', '2017-01-01','2017-01-31',0),
(7, '1234', '2016-11-01','2016-11-30',0),
(6, '1234', '2016-10-01','2016-10-31',1),
(2, '1234', '2016-09-01','2016-09-30',1);
SELECT *,dense_rank() OVER (PARTITION BY code,block ORDER BY date2 DESC)
FROM data
ORDER BY date2 DESC;
By the way, the database is in postgreSQL.
I hope there's a workaround... Thanks :)
Edit: Note that the blocks of 0's and 1's aren't equal.
There's no way to get this result using a single Window Function:
SELECT *,
Sum(flag) -- now sum the 0/1 to create the "rank"
Over (PARTITION BY code
ORDER BY date2 DESC)
FROM
(
SELECT *,
CASE
WHEN Lag(block) -- check if this is the 1st row of a new block
Over (PARTITION BY code
ORDER BY date2 DESC) = block
THEN 0
ELSE 1
END AS flag
FROM DATA
) AS dt

Oracle SQL query to get sales by date range

I am looking to write an SQL query that will provide me sales broken into date ranges, but it is a bit above my SQL knowledge.
I have a table of date ranges by customers as follows:
Cust Product startdate enddate
-----------------------------------
A 123 2011-01-01 2011-12-31
A 124 2011-01-01 2011-05-01
A 125 2011-01-01 2011-05-01
B 123 2011-01-01 2011-03-01
B 124 2011-01-01 2011-03-01
C 125 2011-02-02 2011-05-01
and sales stored as follows:
Cust Product date qty
-----------------------------------
A 123 2011-04-08 1
A 124 2011-01-01 12
A 125 2011-05-01 2
B 123 2011-01-04 3
B 124 2011-02-01 5
C 125 2011-03-01 80
The results should look something like:
Cust Product startdate enddate qty
-----------------------------------------
A 124 2011-01-01 2011-02-01 12
B 123 2011-01-01 2011-02-01 3
B 124 2011-02-02 2011-03-01 5
A 123 2011-03-02 2011-05-01 1
C 125 2011-03-02 2011-05-01 80
A 125 2011-05-02 2011-12-31 2
Any advice gratefully received.
I made the example in MySQL because Oracle server was down. But query is the same.
SQL Fiddle Demo
SELECT R.*, S.*
FROM dRanges R
JOIN Sales S
ON S.`date` >= R.`startdate`
AND S.`date` <= R.`enddate`
AND S.`Cust` = R.`Cust`
AND S.`Product` = R.`Product`
But you have to be carefull ranges doesnt overlap, otherwise you can have same Sales value appear on two ranges
EDIT Please explain the logic here