ORACLE SQL display duplicates after distinct function call - sql

I am trying to run a query in Oracle Developer that is going to pull specific data from multiple tables and join it into one table with results.
Please find sample code below:
select distinct tbl1.Product_ID, tbl2.Supplier_ID
from tbl3
inner join tbl1
on tbl1.Product_ID = tbl3.Product_ID
inner join tbl2
on tbl3.PO_ID = tbl2.PO_ID
where tbl1.Season_ID LIKE 'AA18'
order by tbl1.Product_ID
The result is as expected:
+------------------+---------------+
| Product ID | Supplier_ID |
+------------------+---------------+
| ID-1 | NHII88 |
| ID-2 | NHII88 |
| ID-3 | NHII88 |
| ID-4 | NHII88 |
| ID-5 | NHII88 |
+------------------+---------------+
Explanation: Distinct is required at this point as without this call multiple Product_ID's (ie ID-1) will be diplayed due to multiple PO's.
Result I am trying to achieve is as per below:
+------------------+---------------+
| Product ID | Supplier_ID |
+------------------+---------------+
| ID-1 | NHII88 |
| ID-1 | LLLLPP |
+------------------+---------------+
+------------------+---------------+
| Product ID | Supplier_ID |
+------------------+---------------+
| ID-4 | NHII88 |
| ID-4 | LLLLP |
| ID-4 | KKKOOP |
+------------------+---------------+
It doesn't exactly need to be grouped in this way, but the idea is to display only duplicate records within this query.
I have been trying to use HAVING, but I must have done something wrong as this has returned same result as per example one.
Many thanks in advance for any suggestions.
Data example:
Tbl1
+------------------+---------------+
| Product ID | Season_ID |
+------------------+---------------+
| ID-4 | AA18 |
| ID-4 | AA17 |
| ID-4 | AA16 |
+------------------+---------------+
Tbl2
+------------------+---------------+
| PO_number | Supplier_ID |
+------------------+---------------+
| PO1234 | NHII88 |
| PO1235 | LLLLPP |
| PO1236 | KKKOOP |
+------------------+---------------+
Tbl3
+------------------+---------------+
| PO_number | Product_ID |
+------------------+---------------+
| PO1234 | ID-1 |
| PO1235 | ID-2 |
| PO1236 | ID-3 |
+------------------+---------------+
My business rules are this.
tbl1 contains all details about products, such as Product_ID and Season_ID (ID-1 and AA18). tbl2 contains PO header details such as PO number and Supplier_ID. tbl3 contains Purchase Order Line details such as PO number and Product_ID.
The idea is to pull all PO numbers from tbl3 where Product_ID in that table = Product_Id in tbl1 and Season_ID = AA18. Other Products should be ignored. If result match PO number detail from tbl3 should be referenced into tbl2 where Supplier_ID can be found.
Expecting results as mentioned above.

The posted sample data doesn't contain any duplicates so it's a bit difficult to see why your required output looks the way it does. However, this implements your posted business rules:
select tbl1.Product_ID
, tbl2.Supplier_ID
from tbl1
inner join tbl3 on tbl1.Product_ID = tbl3.Product_ID
inner join tbl2 on tbl3.PO_number = tbl2.PO_number
where tbl1.Season_ID = 'AA18'

Related

TSQL Select max value associated with an ID from a table joined on a different ID?

I have 3 tables structured like this:
Shipment
+-------------+--------------------+
| Shipment_ID | Shipment_ID_Master |
+-------------+--------------------+
| 4767 | 4767 |
| 88359 | 28431 |
+-------------+--------------------+
Factory
+------------+-------------+
| Factory_ID | Shipment_ID |
+------------+-------------+
| 338161 | 4767 |
| 1178567 | 88359 |
| 1178568 | 88359 |
+------------+-------------+
Coverage
+------------+-----------+----------+
| Factory_ID | Public_ID | Revision |
+------------+-----------+----------+
| 338161 | 2354 | 2 |
| 1178567 | 32436 | 4 |
| 1178568 | 2354 | 3 |
+------------+-----------+----------+
I am trying to build a view that displays a row for only the max Public_ID associated with a Shipment_ID. The view should look like this:
+-------------+--------------------+------------+-----------+----------+
| Shipment_ID | Shipment_ID_Master | Factory_ID | Public_ID | Revision |
+-------------+--------------------+------------+-----------+----------+
| 4767 | 4767 | 338161 | 2354 | 2 |
| 88359 | 28431 | 1178567 | 32436 | 4 |
+-------------+--------------------+------------+-----------+----------+
I have a query that works to build this view, but it is too slow. When my application joins on this view the query is taking several minutes to finish execution. This is the query:
SELECT f.Shipment_ID,
s.Shipment_ID_Master,
f.Factory_ID,
c.Public_ID,
c.Revision
FROM Coverage c
JOIN Factory f ON c.Factory_ID = f.Factory_ID
JOIN Shipment s ON s.Shipment_ID = f.Shipment_ID
WHERE Public_ID = (
SELECT MAX(Public_ID)
FROM Coverage c2
JOIN Factory f2 ON c2.Factory_ID = f2.Factory_ID
WHERE f2.Shipment_ID = f.Shipment_ID
)
I think referencing this view is so slow because of the logic in the where clause. There must be a better and faster way to do this.
How can I select the maximum Public_ID associated with a Shipment_ID when the Shipment_ID is not stored on the same table as the Public_ID? Is it possible to do this without a where clause?
You can use Row_number to get the better performance over the given solution.
Try the following:
;WITH cte AS
(
SELECT s.Shipment_ID, s.Shipment_ID_Master, f.Factory_ID, c.Public_ID, c.Revision, row_number() OVER (PARTITION BY s.Shipment_ID ORDER BY c.Public_ID desc) AS rn
FROM #Shipment s
JOIN #Factory f ON f.Shipment_ID = s.Shipment_ID
JOIN #Coverage c ON c.Factory_ID = f.Factory_ID
)
SELECT c.Shipment_ID, c.Shipment_ID_Master, c.Factory_ID, c.Public_ID, c.Revision
FROM cte c WHERE rn = 1
Please see db<>fiddle here.

SELECTing Related Rows Based on a Single Row Match

I have the following table running on Postgres SQL 9.5:
+---+------------+-------------+
|ID | trans_id | message |
+---+------------+-------------+
| 1 | 1234567 | abc123-ef |
| 2 | 1234567 | def234-gh |
| 3 | 1234567 | ghi567-ij |
| 4 | 8902345 | ced123-ef |
| 5 | 8902345 | def234-bz |
| 6 | 8902345 | ghi567-ij |
| 7 | 6789012 | abc123-ab |
| 8 | 6789012 | def234-cd |
| 9 | 6789012 | ghi567-ef |
|10 | 4567890 | abc123-ab |
|11 | 4567890 | gex890-aj |
|12 | 4567890 | ghi567-ef |
+---+------------+-------------+
I am looking for the rows for each trans_id based on a LIKE query, like this:
SELECT * FROM table
WHERE message LIKE '%def-234%'
This, of course, returns just three rows, the three that match my pattern in the message column. What I am looking for, instead, is all the rows matching that trans_id in groups of messages that match. That is, if a single row matches the pattern, get all the rows with the trans_id of that matching row.
That is, the results would be:
+---+------------+-------------+
|ID | trans_id | message |
+---+------------+-------------+
| 1 | 1234567 | abc123-ef |
| 2 | 1234567 | def234-gh |
| 3 | 1234567 | ghi567-ij |
| 4 | 8902345 | ced123-ef |
| 5 | 8902345 | def234-bz |
| 6 | 8902345 | ghi567-ij |
| 7 | 6789012 | abc123-ab |
| 8 | 6789012 | def234-cd |
| 9 | 6789012 | ghi567-ef |
+---+------------+-------------+
Notice rows 10, 11, and 12 were not SELECTed because there was not one of them that matched the %def-234% pattern.
I have tried (and failed) to write a sub-query to get the all the related rows when a single message matches a pattern:
SELECT sub.*
FROM (
SELECT DISTINCT trans_id FROM table WHERE message LIKE '%def-234%'
) sub
WHERE table.trans_id = sub.trans_id
I could easily do this with two queries, but the first query to get a list of matching trans_ids to include in a WHERE trans_id IN (<huge list of trans_ids>) clause would be very large, and would not be a very inefficient way of doing this, and I believe there exists a way to do it with a single query.
Thank you!
This will do the job I think :
WITH sub AS (
SELECT trans_id
FROM table
WHERE message LIKE '%def-234%'
)
SELECT *
FROM table JOIN sub USING (trans_id);
Hope this help.
Try this:
SELECT ID, trans_id, message
FROM (
SELECT ID, trans_id, message,
COUNT(*) FILTER (WHERE message LIKE '%def234%')
OVER (PARTITION BY trans_id) AS pattern_cnt
FROM mytable) AS t
WHERE pattern_cnt >= 1
Using a FILTER clause in the windowed version of COUNT function we can get the number of records matching the predefined pattern within each trans_id slice. The outer query uses this count to filter out irrelevant slices.
Demo here
You can do this.
WITH trans
AS
(SELECT DISTINCT trans_id
FROM t1
WHERE message LIKE '%def234%')
SELECT t1.*
FROM t1,
trans
WHERE t1.trans_id = trans.trans_id;
I think this will perform better. If you have enough data, you can do an explain on both Sub query and CTE and compare the output.

SQL join based on Date

I have two tables:
Table A
+-------+----------+
| prop | str_date |
+-------+----------+
| AL408 | 3/1/2009 |
| AL408 | 4/1/2009 |
| AL408 | 5/1/2009 |
| AL408 | 6/1/2009 |
+-------+----------+
Table B
+---------+-----------+----------+
| prop_id | agrx_date | brand_id |
+---------+-----------+----------+
| AL408 | 5/5/1986 | CI |
| AL408 | 6/30/1994 | CI |
| AL408 | 5/3/1999 | CI |
| AL408 | 4/21/2006 | CI |
| AL408 | 3/20/2009 | QI |
+---------+-----------+----------+
I'd like pull in brand_id into my result query but the brand_id changes accordingly by comparing str_date to agrx_date. For the month after a brand_id has changed via the agrx_date, the result would reflect that new brand_id. All str_dates are monthly values.
The end result would look like this:
+-------+----------+--------+
| prop | str_date | Result |
+-------+----------+--------+
| AL408 | 3/1/2009 | CI |
| AL408 | 4/1/2009 | QI |
| AL408 | 5/1/2009 | QI |
| AL408 | 6/1/2009 | QI |
+-------+----------+--------+
Here's what I have so far (which is not correct) and I'm not sure how to get my end result.
select
a.prop
,a.str_date
,b.agrx_date
,b.brand_id
from tableA a
left join tableB b
on a.prop = b.prop_id
and a.str_date < b.agrx_date
where a.prop = 'AL408'
I'm passing this through Tableau so I cannot use CTE or other temp tables.
You could create a date range using a lead() analytical function. The date range could then be used as part of a theta join to pull in the correct brand. This is a pretty simple way to pull the date value from the next record, see the definition of next_agrx_date below.
The range would be inclusive for the start (>=), but noninclusive on the end (<). You also need to handle the null case for open-ended ranges. You can find this logic in the join below.
select
a.prop
,a.str_date
,b.agrx_date
,b.brand_id
from tableA a
left join
( select
prop
,agrx_date
,brand_id
,lead(agrx_date) over (partition by prop order by agrx_date) next_agrx_date
from tableB ) b
on (b.prop = a.prop and a.str_date >= b.agrx_date and (a.str_date < b.next_agrx_date or b.next_agrx_date is null))
order by prop, str_date
You can use DATE_FORMAT to change the dates to match formatting.
Example
DATE_FORMAT(str_date,'%m-%d-%Y')
or whatever field and format you want to use.

Four Table Join in BigQuery

Okay, so I'm trying to link together four different tables, and its getting very difficult. I provided snippets of each table in the hopes you all could help out
Table 1: data
+--------+--------+-----------+
| charge | amount | date |
+--------+--------+-----------+
| 123 | 10000 | 2/10/2016 |
| 456 | 10000 | 1/28/2016 |
| 789 | 10000 | 3/30/2016 |
+--------+--------+-----------+
Table 2: data_metadata
+--------+------------+------------+
| charge | key | value |
+--------+------------+------------+
| 123 | identifier | trrkfll212 |
| 456 | code | test |
| 789 | ID | 123xyz |
+--------+------------+------------+
Table 3: buyer
+-----+-----------+----------+----------+
| id | date | discount | plan |
+-----+-----------+----------+----------+
| ABC | 2/13/2016 | yes | option a |
| DEF | 2/1/2016 | yes | option a |
| GHI | 1/22/2016 | no | option a |
+-----+-----------+----------+----------+
Table 4: buyer_metadata
+--------------+-----------+--------+
| id | |key| | value |
+--------------+-----------+--------+
| ABC | migration | TRUE |
| DEF | emid | foo |
| GHI | ID | 123xyz |
+--------------+-----------+--------+
Okay, so the tables data and data_metadata are obviously connected by the charge column.
The tables buyer and buyer_metadata are connected by the id column.
But I want to link all of them together. I'm pretty sure the way to accomplish this is through linking the metadata tables together through the common field in the "value" column (in this example: 123xyz).
Could anyone help?
This might look like something like that if all "link" columns are unique :
SELECT *
FROM data d
JOIN data_metadata dm ON d.charge = dm.charge
JOIN buyer_metada bm ON dm.value = bm.value
JOIN buyer b ON bm.id = b.id
If not, I think you'll have to use something like GROUP BY clause
Let's take it in two steps, first create composite tables for data and buyer. Composite table for data:
SELECT data.charge, data.amount, data.date,
data_metadata.key, data_metadata.value
FROM [data] AS data
JOIN (SELECT charge, key, value FROM [data_metadata]) AS data_metadata
ON data.charge = data_metadata.charge
And composite table for buyer:
SELECT buyer.id, buyer.date, buyer.discount, buyer.plan,
buyer_metadata.key, buyer_metadata.value
FROM [buyer] AS buyer
JOIN (SELECT key, value FROM [buyer_metadata]) AS buyer_metadata
ON buyer.id = buyer_metadata.id
And then let's join the two composite tables
SELECT composite_data.*, composite_buyer.*
FROM (
SELECT data.charge, data.amount, data.date,
data_metadata.key, data_metadata.value
FROM [data] AS data
JOIN (SELECT charge, key, value FROM [data_metadata]) AS data_metadata
ON data.charge = data_metadata.charge) AS composite_data
JOIN (
SELECT buyer.id, buyer.date, buyer.discount, buyer.plan,
buyer_metadata.key, buyer_metadata.value
FROM [buyer] AS buyer
JOIN (SELECT key, value FROM [buyer_metadata]) AS buyer_metadata
ON buyer.id = buyer_metadata.id) AS composite_buyer
ON composite_data.value = composite_buyer.value
I haven't tested it but it's probably close.
For reference, here is the page on BigQuery JOINs. And have you seen this SO?

Join table condition for between 2 rows

Is it possible to join these tables:
Log table:
+--------+---------------+------------+
| name | ip | created |
+--------+---------------+------------+
| 408901 | 178.22.51.168 | 1390887682 |
| 408901 | 178.22.51.168 | 1390927059 |
| 408901 | 178.22.51.168 | 1390957854 |
+--------+---------------+------------+
Orders table:
+---------+------------+
| id | created |
+---------+------------+
| 8563863 | 1390887692 |
| 8563865 | 1390897682 |
| 8563859 | 1390917059 |
| 8563860 | 1390937059 |
| 8563879 | 1390947854 |
+---------+------------+
Result table would be:
+---------+--------------+---------+---------------+------------+
|orders.id|orders.created|logs.name| logs.ip |logs.created|
+---------+--------------+---------+---------------+------------+
| 8563863 | 1390887692 | 408901 | 178.22.51.168 | 1390887682 |
| 8563865 | 1390897682 | 408901 | 178.22.51.168 | 1390887682 |
| 8563859 | 1390917059 | 408901 | 178.22.51.168 | 1390887682 |
| 8563860 | 1390937059 | 408901 | 178.22.51.168 | 1390927059 |
| 8563879 | 1390947854 | 408901 | 178.22.51.168 | 1390927059 |
+---------+--------------+---------+---------------+------------+
Is it possible?
Espessialy, if first table is result of some query.
UPDATE
Sorry for this mistake. I want found in log who make order. So orders table relate to logs table by created field, i.e.
first row with condition (orders.created >= log.created)
This will result in a non-equi join with a horrible performance:
SELECT *
FROM t2 JOIN t1
ON t1.created =
(
SELECT MAX(t1.created)
FROM t1 WHERE t1.created <= t2.created
)
You might better go with a cursor based on a UNION like this (you probably need to add some type casts to get a working UNION):
SELECT *
FROM
(
SELECT NULL AS name, NULL AS ip, NULL AS created2, t2.*
FROM t2
UNION ALL
SELECT t1.*, NULL AS id, NULL AS created
FROM t1
) AS dt
ORDER BY COALESCE(created, created2)
Now you can process the rows in the right order and remember the rows from the last t1 row.
There is nothing to bind these 2 together.
No ID or other column exists in both tables.
If this were the case, you could join these 2 tables in a stored procedure.
At the moment you ask the first query, store the data in a newly created table, use it in the join to get your results and delete it afterwards.
Kind regards
simply you can use union
select id, created from table_2
union all
select name, ip, created from table_1