Assistance with join using a where clause and duplicates - sql

I am having some difficulty with writing an accurate view.
I have 2 tables that I am looking to join on different databases.
Table 1 (in database 1) contains 3 columns:
Purchase_date
Item_id
Quantity_purchased
Table 2 (in database 2) contains 3 columns:
Item_id
Price_effective_date
Price
I am trying to determine the price of the item at the purchase date, which is a challenge since the item prices change on price effective dates. Accordingly, table 2 will have multiple instances of the same item_id, but with different prices and price effective dates.
My current code is:
select tb1.*,
tb2.price x tb1.quantity_purchased as total_price
from "Database2"."schema"."Table1" tb1
left join (select item_id,
price
from "Database2"."Schema"."Table2"
) tb2
on tb1.item_id = tb2.item_id
where tb2.price_effective_date <= tb1.purchase_date
I want to limit my results to the price at the most recent price_effective_date that is just before the purchase_date.
Any recommendations?

It's not really Snowflake specific, and luckily it can be addressed with a pretty common pattern in SQL queries.
Let's prepare some data (btw, for the future, it's best to provide the exact setup like this in your questions, it helps investigations tremendously):
create or replace table tb1(purchase_date date, item_id int, quantity int);
insert into tb1 values
('2020-01-01', 101, 1),
('2020-06-30', 101, 1),
('2020-07-01', 101, 1),
('2020-12-31', 101, 1),
('2021-01-01', 101, 1),
('2020-01-01', 102, 1),
('2020-06-30', 102, 1),
('2020-07-01', 102, 1),
('2020-12-31', 102, 1),
('2021-01-01', 102, 1);
create or replace table tb2(item_id int, effective_date date, price decimal);
insert into tb2 values
(101, '2020-01-01', 10),
(101, '2021-01-01', 11),
(102, '2020-01-01', 20),
(102, '2020-07-01', 18),
(102, '2021-01-01', 22);
Now, what you want is to join records from tb1 and tb2 on item_id, but only use the records from tb2 where effective_date is the largest of all the values of effective_date for that item that are before purchase_date. Correct? If you phrase it like this, the SQL writes itself almost:
select tb1.*, tb2.effective_date, tb2.price
from tb1 join tb2 on tb1.item_id = tb2.item_id
where tb2.effective_date = (
select max(effective_date)
from tb2 sub
where sub.effective_date <= tb1.purchase_date
and sub.item_id = tb1.item_id
)
order by tb1.item_id, purchase_date;
The result is hopefully what you want:
PURCHASE_DATE
ITEM_ID
QUANTITY
EFFECTIVE_DATE
PRICE
2020-01-01
101
1
2020-01-01
10
2020-12-31
101
1
2020-01-01
10
2021-01-01
101
1
2021-01-01
11
2020-01-01
102
1
2020-01-01
20
2020-06-30
102
1
2020-01-01
20
2020-07-01
102
1
2020-07-01
18
2020-12-31
102
1
2020-07-01
18
2021-01-01
102
1
2021-01-01
22
Note, this query will not handle wrong data, e.g. purchases with no matching items and effective dates.
EDIT: Handling missing effective_dates
To handle cases where there are no effective dates matching the purchase date, you can identify the "missing" purchases, and then add the smallest existing effective_date for these items, e.g. (we add a new item, value 103 to the existing table to showcase this):
insert into tb1 values
('2020-06-01', 103, 11),
('2020-08-01', 103, 12);
insert into tb2 values
(103, '2020-07-01', 30);
with missing as (
select * from tb1 where not exists (
select * from tb2
where tb2.effective_date <= tb1.purchase_date
and tb2.item_id = tb1.item_id)
)
select m.item_id, m.purchase_date, m.quantity,
(select min(effective_date) from tb2 where tb2.item_id = m.item_id) best_date
from missing m;
You can take this query and UNION ALL it with the original query.

Related

How to make a query showing purchases of a client on the same day, but only if those were made in diffrent stores (oracle)?

I want to show cases of clients with at least 2 purchases on the same day. But I only want to count those purchases that were made in different stores.
So far I have:
Select Purchase.PurClientId, Purchase.PurDate, Purchase.PurId
from Purchase
join
(
Select count(Purchase.PurId),
Purchase.PurClientId,
to_date(Purchase.PurDate)
from Purchases
group by Purchase.PurClientId,
to_date(Purchase.PurDate)
having count (Purchase.PurId) >=2
) k
on k.PurClientId=Purchase.PurClientId
But I have no clue how to make it count purchases only if those were made in different stores. The column which would allow to identify shop is Purchase.PurShopId.
Thanks for help!
You can use:
SELECT PurId,
PurDate,
PurClientId,
PurShopId
FROM (
SELECT p.*,
COUNT(DISTINCT PurShopId) OVER (
PARTITION BY PurClientId, TRUNC(PurDate)
) AS num_stores
FROM Purchase p
)
WHERE num_stores >= 2;
Or
SELECT *
FROM Purchase p
WHERE EXISTS(
SELECT 1
FROM Purchase x
WHERE p.purclientid = x.purclientid
AND p.purshopid != x.purshopid
AND TRUNC(p.purdate) = TRUNC(x.purdate)
);
Which, for the sample data:
CREATE TABLE purchase (
purid PRIMARY KEY,
purdate,
purclientid,
PurShopId
) AS
SELECT 1, DATE '2021-01-01', 1, 1 FROM DUAL UNION ALL
SELECT 2, DATE '2021-01-02', 1, 1 FROM DUAL UNION ALL
SELECT 3, DATE '2021-01-02', 1, 2 FROM DUAL UNION ALL
SELECT 4, DATE '2021-01-03', 1, 1 FROM DUAL UNION ALL
SELECT 5, DATE '2021-01-03', 1, 1 FROM DUAL UNION ALL
SELECT 6, DATE '2021-01-04', 1, 2 FROM DUAL;
Both output:
PURID
PURDATE
PURCLIENTID
PURSHOPID
2
2021-01-02 00:00:00
1
1
3
2021-01-02 00:00:00
1
2
db<>fiddle here

Select hours as columns from Oracle table

I am working with an Oracle database table that is structured like this:
TRANS_DATE TRANS_HOUR_ENDING TRANS_HOUR_SUFFIX READING
1/1/2021 1 1 100
1/1/2021 2 1 105
... ... ... ...
1/1/2021 24 1 115
The TRANS_HOUR_SUFFIX is only used to track hourly readings on days where day light savings time ends (when there could be 2 hours with the same TRANS_HOUR value). This column is the bane of this database's design, however I'm trying to do something to select this data in a certain way. We need a report that columnizes this data based on the hour. Therefore, it would be structured like this (last day shows a day on which DST would end):
TRANS_DATE HOUR_1 HOUR_2_1 HOUR_2_2 ... HOUR_24
1/1/2021 100 105 0 ... 115
1/2/2021 112 108 0 ... 135
... ... ... ... ... ...
11/7/2021 117 108 107 ... 121
I have done something like this before with a PIVOT, however in this case I'm having trouble determining what I should do to account for the suffix. When DST ending happens, we have to account for this hour. I know that we can do this by selecting each hourly value individually with decode or case statements, but that is some messy code. Is there a cleaner way to do this?
You can include multiple source columns in the pivot for() and in() clauses, so you could do:
select *
from (
select trans_date,
trans_hour_ending,
trans_hour_suffix,
reading
from your_table
)
pivot (max(reading) for (trans_hour_ending, trans_hour_suffix)
in ((1, 1) as hour_1, (2, 1) as hour_2_1, (2, 2) as hour_2_2, (3, 1) as hour_3,
-- snip
(23, 1) as hour_23, (24, 1) as hour_24))
order by trans_date;
where every hour has a (24, 1) tuple, and the DST-relevant hour has an extra (2, 2) tuple.
If you don't have rows for every hour - which you don't appear to have form the very brief sample data, at least for suffix 2 for non-DST days - then you will get null results for those, but can replace them with zeros:
select trans_date,
coalesce(hour_1, 0) as hour_1,
coalesce(hour_2_1, 0) as hour_2_1,
coalesce(hour_2_2, 0) as hour_2_2,
coalesce(hour_3, 0) as hour_3,
-- snip
coalesce(hour_23, 0) as hour_23,
coalesce(hour_24, 0) as hour_24
from (
select trans_date,
trans_hour_ending,
trans_hour_suffix,
reading
from your_table
)
pivot (max(reading) for (trans_hour_ending, trans_hour_suffix)
in ((1, 1) as hour_1, (2, 1) as hour_2_1, (2, 2) as hour_2_2, (3, 1) as hour_3,
-- snip
(23, 1) as hour_23, (24, 1) as hour_24))
order by trans_date;
which with slightly expanded sample data gets:
TRANS_DATE HOUR_1 HOUR_2_1 HOUR_2_2 HOUR_3 HOUR_23 HOUR_24
---------- ---------- ---------- ---------- ---------- ---------- ----------
2021-01-01 100 105 0 0 0 115
2021-01-02 112 108 0 0 0 135
2021-11-07 117 108 107 0 0 121
Which is a bit long-winded when you have to include all 25 columns everywhere; but to avoid that you'd have to do a dynamic pivot.
Like I said in my comment, if you can format it with an additional row, I would recommend just having a row for the extra hour. Every other day would look normal. The query to do it would look like this:
CREATE TABLE READINGS
(
TRANS_DATE DATE,
TRANS_HOUR INTEGER,
TRANS_SUFFIX INTEGER,
READING INTEGER
);
INSERT INTO readings
SELECT TO_DATE('01/01/2021', 'MM/DD/YYYY'), 1, 1, 100 FROM DUAL UNION ALL
SELECT TO_DATE('01/01/2021', 'MM/DD/YYYY'), 2, 1, 100 FROM DUAL UNION ALL
SELECT TO_DATE('11/07/2021', 'MM/DD/YYYY'), 1, 1, 200 FROM DUAL UNION ALL
SELECT TO_DATE('11/07/2021', 'MM/DD/YYYY'), 1, 2, 300 FROM DUAL UNION ALL
SELECT TO_DATE('11/07/2021', 'MM/DD/YYYY'), 2, 1, 500 FROM DUAL UNION ALL
SELECT TO_DATE('11/07/2021', 'MM/DD/YYYY'), 2, 2, 350 FROM DUAL;
SELECT TRANS_DATE||DECODE(MAX(TRANS_SUFFIX) OVER (PARTITION BY TRANS_DATE), 1, NULL, 2, ' - '||TRANS_SUFFIX) AS TRANS_DATE,
HOUR_1, HOUR_2, /*...*/ HOUR_24
FROM readings
PIVOT (MAX(READING) FOR TRANS_HOUR IN (1 AS HOUR_1, 2 AS HOUR_2, /*...*/ 24 AS HOUR_24));
This would result in the following results (Sorry, I can't get dbfiddle to work):
TRANS_DATE
HOUR_1
HOUR_2
HOUR_24
01-JAN-21
100
100
-
07-NOV-21 - 1
200
500
-
07-NOV-21 - 2
300
350
-

Oracle SQL - How do I arrange my list based on customers' start and end locations?

I have the below table and I would like to find out which customers made a trip where his/her start_location is the end_location of another customer who made the trip <= 5 minutes before him/her.
For instance, this is what I have:
DT Customer_Name Start_location End_location Trip_fare
2019-11-01 08:17:42 Jane A B $10
2019-11-01 08:18:02 Mary C A $7
2019-11-01 08:18:04 Tom B D $12
2019-11-01 08:20:11 Harry E C $20
2019-11-01 08:21:22 Alex D A $5
2019-11-01 08:24:30 Sally C B $8
This is what I want:
DT Customer_Name Start_location End_location
2019-11-01 08:17:42 Jane A B
2019-11-01 08:18:04 Tom B D (cause Tom's start_location = B = Jane's end_location and the time difference between the 2 trips is within 5 minutes)
2019-11-01 08:21:22 Alex D A
2019-11-01 08:20:11 Harry E C
2019-11-01 08:24:30 Sally C B
Here, Mary has been removed from the list as her start_location = 'C', which is not the end_location of Jane who made a trip <= 5 minutes before her.
My apologies for this 'messy' looking question. Do let me know if you need further clarifications!
Thank you so much for your help!
I have the below table and I would like to find out which customers made a trip where his/her start_location is the end_location of another customer who made the trip <= 5 minutes before him/her.
Your description of the problem suggests not exists:
select t.*
from t
where not exists (select 1
from t t2
where t2.end_loc = t.start_loc and
t2.dt < t.dt and
t2.dt >= t.dt - interval '5' minute
);
However, this removes Tom, Alex, and Sally. From how you describe the question, I think this is correct.
Here is a db<>fiddle.
As your query relates to customers from the same table, you'll need a self join. That is, you join the table with itself.
SELECT ... FROM mytable JOIN mytable ...
To distinguish one "instance" of the table from the other instance, you'll need alias names:
SELECT ... FROM mytable t1 JOIN mytable t2 ...
And you need join conditions, that is how your two customers are related. In your example this is quite straightforward:
SELECT tcust.name AS name,
tother.name AS other_name
FROM mytable tcust
JOIN mytable tother
ON tcust.start_loc = tother.end_loc
AND tcust.dt >= tother.dt - INTERVAL '5' MINUTE;
However, this query gets a slightly different result. Can you find out why?
CREATE TABLE mytable (
dt DATE, name VARCHAR2(30 CHAR), start_loc VARCHAR2(5 CHAR),
end_loc VARCHAR2(5 CHAR), fare NUMBER);
INSERT INTO mytable VALUES (TIMESTAMP '2019-11-01 08:17:42', 'Jane', 'A', 'B', 10);
INSERT INTO mytable VALUES (TIMESTAMP '2019-11-01 08:18:02', 'Mary', 'C', 'A', 7);
INSERT INTO mytable VALUES (TIMESTAMP '2019-11-01 08:18:04', 'Tom', 'B', 'D', 12);
INSERT INTO mytable VALUES (TIMESTAMP '2019-11-01 08:20:11', 'Harry', 'E', 'C', 20);
INSERT INTO mytable VALUES (TIMESTAMP '2019-11-01 08:21:22', 'Alex', 'D', 'A', 5);
INSERT INTO mytable VALUES (TIMESTAMP '2019-11-01 08:24:30', 'Sally', 'C', 'B', 8);
Result:
NAME OTHER_NAME
Tom Jane
Jane Mary
Alex Tom
Mary Harry
Sally Harry
Jane Alex
The substraction of 5 Minutes is explained in this question.

SQL - Setting Value From Hierarchical Children

I am writing an application which gets task data from a project planning MS SQL table (let's call the table tasks). For simplicity the table fields can be thought of as follows:
task_id, parent_id, name, start_date, end_date
All parent tasks have NULL as start and end dates. Only the children (with no children of their own) have a start and end date.
I want to get the tasks data and in the process set the start date of each parent based upon the earliest start date of all the parent's children and recursive grandchildren and set the end date to be the latest end date of all the children and recursive grandchildren. Is this possible please?
I assume from your question that you use Sql Server. I think this is what you want. It is done with recursive common table expression. It begins with leaf children and goes up to top most parents:
DECLARE #t TABLE(id INT, pid INT, sd DATE, ed DATE)
INSERT INTO #t VALUES
(1, NULL, NULL, NULL),
(2, 1, NULL, NULL),
(3, 2, '20150201', '20150215'),
(4, 2, '20150101', '20150201'),
(5, 1, NULL, NULL),
(6, 5, '20150301', '20150401'),
(7, 1, NULL, NULL),
(8, 7, NULL, NULL),
(9, 8, '20140101', '20141230'),
(10, 8, '20140102', '20141231')
;WITH cte AS(
SELECT * FROM #t WHERE sd IS NOT NULL
UNION ALL
SELECT t.id, t.pid, c.sd, c.ed FROM #t t
JOIN cte c ON c.pid = t.id
)
SELECT id, pid, MIN(sd) AS sd, MAX(ed) AS ed
FROM cte
GROUP BY id, pid
ORDER BY id
Output:
id pid sd ed
1 NULL 2014-01-01 2015-04-01
2 1 2015-01-01 2015-02-15
3 2 2015-02-01 2015-02-15
4 2 2015-01-01 2015-02-01
5 1 2015-03-01 2015-04-01
6 5 2015-03-01 2015-04-01
7 1 2014-01-01 2014-12-31
8 7 2014-01-01 2014-12-31
9 8 2014-01-01 2014-12-30
10 8 2014-01-02 2014-12-31

PostgreSQL/plpython: how compare two columns from different table in loop

I have a problem with loop in which I must compare columns between different tables.
I have two tables year2004 and year2005. Both contains month numbers and an amount for that month. I want compare the amount from both tables and produce a third table year with the number of month and greatest amount for that month.
For example I have in 2004 - 100, in 2005 - 200 so I must return values(2005, number_of_month, 200). Have you any ideas for solve this problem?
PS. Sorry for my writing errors, I learned English only few years ago :)
I'm guessing that you're trying to find the greatest amount for each month across the two years.
This would be much, much easier if your data was all in one table monthly_statistics with a date column. Then it'd just be a simple aggregate function or a window.
So lets turn the two tables into one.
Given sample data:
CREATE TABLE year2004 ( month int primary key, amount int);
INSERT INTO year2004 (month, amount)
VALUES (1, 50), (2, 40), (3, 60), (4, 80), (5, 100), (6, 800), (7, 20), (8, 40), (9, 30), (10, 40), (11, 50), (12, 99);
CREATE TABLE year2005 ( month int primary key, amount int);
INSERT INTO year2005 (month, amount)
VALUES (1, 88), (2, 44), (3, 11), (4, 123), (5, 12), (6, 88), (7, 21), (8, 19), (9, 44), (10, 89), (11, 4), (12, 42);
we could either join the tables, or we could convert it to a single table by date then filter it. Here's how we might generate a single table with the contents:
SELECT DATE '2004-01-01' + month * INTERVAL '1' MONTH AS sampledate, amount
FROM year2004
UNION ALL
SELECT DATE '2005-01-01' + month * INTERVAL '1' MONTH, amount
FROM year2005;
That's what you'd use if you were going to create a new table, but if you don't care about the actual dates, only the months, you can simply union all the two tables:
WITH samples AS (
SELECT month, amount
FROM year2004
UNION ALL
SELECT month, amount
FROM year2005
)
SELECT month, max(amount) AS amount
FROM samples
GROUP BY 1
ORDER BY month;
samplemonth | amount
-------------+--------
5 | 123
11 | 89
1 | 99
2 | 88
3 | 44
9 | 40
4 | 60
6 | 100
10 | 44
12 | 50
7 | 800
8 | 21
(12 rows)