Eliminate Duplicate Rows on Outer Join - sql

I am running a query against our Oracle database.
The goal is return the following columns -
Document Id
Document Creation Date
Organization Code
Document Status
Total Amount
The problem I am running into is with the Organization Code.
It is possible to have a document id with multiple organization codes.
I only want 1 instance - I don't care about the rest (if they exist)
Here is what I currently have -
SELECT * FROM (SELECT DISTINCT (K_HDR.DOC_HDR_ID),
K_HDR.CRTE_DT,
FS_EXT.VAL AS ORG_CODE,
REQ.REQS_STAT_CD,
FS_DOC.FDOC_TOTAL_AMT
FROM PUR_REQS_T REQ,
KREW_DOC_HDR_T K_HDR,
FS_DOC_HEADER_T FS_DOC,
KREW_DOC_HDR_EXT_T FS_EXT
WHERE REQ.FDOC_NBR = K_HDR.DOC_HDR_ID AND
FS_DOC.FDOC_NBR = REQ.FDOC_NBR AND
REQ.FDOC_NBR = FS_EXT.DOC_HDR_ID(+) AND
FS_EXT.KEY_CD(+)= 'organizationCode' AND
(K_HDR.CRTE_DT BETWEEN TO_DATE('2011-10-01 00:00:00', 'YYYY-MM-DD HH24:MI:SS')
AND
TO_DATE('2012-09-30 23:59:59', 'YYYY-MM-DD HH24:MI:SS')))
FINAL_SEARCH ORDER BY FINAL_SEARCH.CRTE_DT;
The following query returns 14,933 rows.
The correct amount of rows I should be getting is 14,789.
The culprit is the Organization Code.
For instance, as I'm looking at the result sets I see the following -
DOC_ID CRTE_DT ORG_CD STAT TOTAL
.
.
.
496256 5-OCT-11 0 CLOS 2779.89
496258 5-OCT-11 8050 CLOS 1737.5
496258 5-OCT-11 8000 CLOS 1737.5
.
.
.
How do I get rid of the annoying 2nd instance of 496258 which lives in the FS_EXT Table?
(Obviously I need to get rid of the other instances of the same type of duplicate values)

You could wrap the whole thing in one more SELECT which uses a GROUP BY to get only the MIN organization code.

So - I ended up using another column in the FS_EXT Table to further filter down to the first instance of the Org Code.
Here is what the FS_EXT Table looks like if I am looking at columns that are filtered to only show entries for Document Id = 496258.
(Mind you that there could be different number of rows for any given doc id)
DOC_HDR_EXT_ID DOC_HDR_ID KEY_CD VAL
13318096 496258 documentDescription misc items
13318098 496258 organizationDocNumber (null)
13318099 496258 statusDescription Closed
13318101 496258 chartAndOrgCodeForResult KS-1234
13318102 496258 vendorName APPLE COMPUTERS
13318103
.
.
.
.
.
13318115 496258 organizationCode 8000
13318116
.
.
.
1338118 496258 organizationCode 8050
And here is my new query which circumvents using THE JOIN OPERATION.
Notice that I use a SUBQUERY instead. To get the first instance of the OrganizationCode, I use the MIN operator on the DOC_HDR_EXT_ID column and then retrieve the organizationCode VAL using that ID and pass that back to the main QUERY.
SELECT * FROM ( SELECT DISTINCT (K_HDR.DOC_HDR_ID),
K_HDR.CRTE_DT,
(SELECT KS_EXT.VAL AS ORG_CODE
FROM KREW_DOC_HDR_EXT_T KS_EXT
WHERE KS_EXT.DOC_HDR_EXT_ID =(
SELECT MIN(DOC_HDR_EXT_ID)
FROM KREW_DOC_HDR_EXT_T FS_EXT_INNER
WHERE FS_EXT_INNER.DOC_HDR_ID = K_HDR.DOC_HDR_ID
AND FS_EXT_INNER.KEY_CD = 'organizationCode')) AS ORG_CODE,
REQ.REQS_STAT_CD,
FS_DOC.FDOC_TOTAL_AMT
FROM PUR_REQS_T REQ,
KREW_DOC_HDR_T K_HDR,
FS_DOC_HEADER_T FS_DOC,
KREW_DOC_HDR_EXT_T FS_EXT
WHERE REQ.FDOC_NBR = K_HDR.DOC_HDR_ID AND
FS_DOC.FDOC_NBR = REQ.FDOC_NBR AND
REQ.FDOC_NBR = FS_EXT.DOC_HDR_ID(+) AND
FS_EXT.KEY_CD(+)= 'organizationCode' AND
(K_HDR.CRTE_DT BETWEEN TO_DATE('2011-10-01 00:00:00', 'YYYY-MM-DD HH24:MI:SS') AND TO_DATE('2012-09-30 23:59:59', 'YYYY-MM-DD HH24:MI:SS')))
FINAL_SEARCH ORDER BY FINAL_SEARCH.CRTE_DT;
Thanks for your recommendation #Alex Poole and #StilesCrisis.
You got me thinking differently about my approach to this problem and my solutions integrates both of your suggestions. MIN approach from Stiles and filtering another column per Alex Poole.

Related

Why this below query returned no results?

SQL Query can be found on this link
https://cloud.google.com/kubernetes-engine/docs/how-to/cluster-usage-metering#expandable-1-label
Even though there is no problem with Billing dataset & table as well as GKE usage metering dataset & table .
SELECT
resource_usage.cluster_name,
resource_usage.cluster_location,
resource_usage.namespace,
resource_usage.resource_name,
resource_usage.sku_id,
MIN(resource_usage.start_time) AS usage_start_time,
MAX(resource_usage.end_time) AS usage_end_time,
SUM(resource_usage.usage.amount * gcp_billing_export.rate) AS cost
FROM
'cluster-gcp-project.usage-metering-dataset.gke_cluster_resource_usage' AS resource_usage
LEFT JOIN (
SELECT
sku.id AS sku_id,
SUM(cost) / SUM(usage.amount) AS rate,
MIN(usage_start_time) AS min_usage_start_time,
MAX(usage_end_time) AS max_usage_end_time
FROM
'cluster-gcp-project.billing-dataset.billing-table'
WHERE
project.id = "cluster-gcp-project"
GROUP BY
sku_id) AS gcp_billing_export
ON
resource_usage.sku_id = gcp_billing_export.sku_id
WHERE
resource_usage.start_time >= gcp_billing_export.min_usage_start_time
AND resource_usage.end_time <= gcp_billing_export.max_usage_end_time
GROUP BY
resource_usage.cluster_name,
resource_usage.cluster_location,
resource_usage.namespace,
resource_usage.resource_name,
resource_usage.sku_id
I figured out an issue ,it is with the WHERE condition in that query
WHERE
resource_usage.start_time >= gcp_billing_export.min_usage_start_time
AND resource_usage.end_time <= gcp_billing_export.max_usage_end_time
As you could see below condition failed so that doesn't return any results .
FYI , Logic is to check & validate start_time and end_time of two different tables and return the values .
Thanks everyone for responding .

Completely Unique Rows and Columns in SQL

I want to randomly pick 4 rows which are distinct and do not have any entry that matches with any of the 4 chosen columns.
Here is what I coded:
SELECT DISTINCT en,dialect,fr FROM words ORDER BY RANDOM() LIMIT 4
Here is some data:
**en** **dialect** **fr**
number SFA numero
number TRI numero
hotel CAI hotel
hotel SFA hotel
I want:
**en** **dialect** **fr**
number SFA numero
hotel CAI hotel
Some retrieved rows would have something similar with each other, like having the same en or the same fr, I would like to retrieved rows that do not share anything similar with each other, how do I do that?
I think I’d do this in the front end code rather the dB, here’s a pseudo code (don’t know what your node looks like):
var seenEn = “en not in (''“;
var seenFr = “fr not in (''“;
var rows =[];
while(rows.length < 4)
{
var newrow = sqlquery(“SELECT *
FROM table WHERE “ + seenEn + “) and ”
+ seenFr + “) ORDER BY random() LIMIT 1”);
if(!newrow)
break;
rows.push(newrow);
seenEn += “,‘“+ newrow.en + “‘“;
seenFr += “,‘“+ newrow.fr + “‘“;
}
The loop runs as many times as needed to retrieve 4 rows (or maybe make it a for loop that runs 4 times) unless the query returns null. Each time the query returns the values are added to a list of values we don’t want the query to return again. That list had to start out with some values (null) that are never in the data, to prevent a syntax error when concatenation a comma-value string onto the seenXX variable. Those syntax errors can be avoided in other ways like having a Boolean of “if it’s the first value don’t put the comma” but I chose to put dummy ineffective values into the sql to make the JS simpler. Same goes for the
As noted, it looks like JS to ease your understanding but this should be treated as pseudo code outlining a general algorithm - it’s never been compiled/run/tested and may have syntax errors or not at all work as JS if pasted into your file; take the idea and work it into your solution
Please note this was posted from an iphone and it may have done something stupid with all the apostrophes and quotes (turned them into the curly kind preferred by writers rather than the straight kind used by programmers)
You can use Rank or find first row for each group to achieve your result,
Check below , I hope this code will help you
SELECT 'number' AS Col1, 'SFA' AS Col2, 'numero' AS Col3 INTO #tbl
UNION ALL
SELECT 'number','TRI','numero'
UNION ALL
SELECT 'hotel','CAI' ,'hotel'
UNION ALL
SELECT 'hotel','SFA','hotel'
UNION ALL
SELECT 'Location','LocationA' ,'Location data'
UNION ALL
SELECT 'Location','LocationB','Location data'
;
WITH summary AS (
SELECT Col1,Col2,Col3,
ROW_NUMBER() OVER(PARTITION BY p.Col1 ORDER BY p.Col2 DESC) AS rk
FROM #tbl p)
SELECT s.Col1,s.Col2,s.Col3
FROM summary s
WHERE s.rk = 1
DROP TABLE #tbl

MS Access SQL Date Range Query

I am working on a classroom reservation tool. A core component is the ability to compare the requested date range to the existing reservations, to ensure that there is no overlap. I've read through several date range related questions here, and studied Salman's explanation and implementation of Allen's interval algebra ( SQL Query to Find Overlapping (Conflicting) Date Ranges ) until I understood it. Here's a stripped-down version of what I came up with.
tblRooms
roomID room
5 110
30 178
tblReservations
reservedID fkRoom dateIn dateOut
1 5 3/10/2017 3/15/2017
2 5 3/1/2017 3/3/2017
4 5 4/1/2017 4/30/2017
SELECT DISTINCTROW tblRooms.roomID, tblRooms.room
FROM tblRooms LEFT JOIN tblReservations
ON tblRooms.roomID = tblReservations.fkRoom
WHERE NOT Exists (
SELECT DISTINCT tblRooms.roomID
FROM tblRooms
WHERE ((tblReservations.[dateOut] >= #3/3/2017#)
AND (#3/9/2017# >= tblReservations.[dateIn])));
I'm getting inconsistent returns. These dates will exclude room 110, as they should. Other test input (#3/4/2017# and #3/10/2017#, #4/1/2017# and #4/14/2017#) won't. I've tried combinations of "WHERE NOT (...", "WHERE Exists () = False", etc.
I work on a highly restrictive network, where I can't pull in templates at will - my only options when I create a database are "Blank" and "Web", so I've got to roll my own on this. I appreciate any assistance.
Can you try the following:
SELECT DISTINCTROW tblRooms.roomID, tblRooms.room
FROM tblRooms
WHERE NOT Exists (
SELECT 1
FROM tblReservations
WHERE
tblReservations.fkRoom = tblRooms.roomID
AND ((tblReservations.[dateOut] >= #3/3/2017#)
AND (#3/9/2017# >= tblReservations.[dateIn])));
For a reservation check query you would do this:
select ...
from tblRooms room
where not exists
( select *
from tblReservations r
where r.fkRoom = room.roomId and
end > r.[datein] and start < r.[dateout] );
BUT the important part is, pass those end and start as parameters instead of hardcoded values like you did. With hardcoded values you are always open to get wrong results or error. For example what is:
#3/9/2017# really? Its interpretation would depend on regional settings (I am not an access programmer so I might be wrong).

Need to pull only last date in table that stores change dates SQL / ODBC

Hope somebody can help me with this. I'm trying to pull a list of forthcoming titles (I work in publishing) via ODBC/ms query. I want (amongst other things) to show their internal status (approved, prepress etc.). The database stores the change dates for the status'. I seem to be getting one line per status per title. So if the title has changed status 6 times, I will get 6 lines. But I only want to show the latest status...
The date is in BL_PROJECT_TO_STATUS.STATUS_DATE (I've inserted a date criteria beneath, just to make it more visible).
How can this be done? I'm very new to ODBC and would appreciate it a lot.
SELECT DISTINCT
BL_PROJECT.EXP_PUB_DATE, BL_PROJECT.EAN, BL_PROJECT.TITEL,
MEDIATYPE.DESCRIPTION, BL_PROJECT_STATUS.DESCRIPTION
FROM
FIRMA1.BL_PROJECT BL_PROJECT, FIRMA1.BL_PROJECT_STATUS BL_PROJECT_STATUS,
FIRMA1.BL_PROJECT_TO_STATUS BL_PROJECT_TO_STATUS, FIRMA1.MEDIATYPE MEDIATYPE
WHERE
BL_PROJECT.PROJECT_ID = BL_PROJECT_TO_STATUS.PROJECT_ID AND
BL_PROJECT_TO_STATUS.STATUS_ID = BL_PROJECT_STATUS.CODE AND
BL_PROJECT.MEDIATYPE = MEDIATYPE.ID AND
((BL_PROJECT.PROJECT_TYPE = 2) AND
(BL_PROJECT.EXP_PUB_DATE Between SYSDATE AND (SYSDATE+90)) AND
(BL_PROJECT_TO_STATUS.STATUS_DATE = {ts '2013-11-20 00:00:00'}))
ORDER BY
BL_PROJECT.EXP_PUB_DATE, BL_PROJECT.EAN, BL_PROJECT.TITEL
Here is the general idea. You can adapt it with your table and field names.
select somefields
from sometables
join
(select something, max(datetimefield) maxdt
from table1
where whatever
group by something ) temp on table1.datetimefield = maxdt
etc

SQL to query by date dependencies

I have a table of patients which has the following columns: patient_id, obs_id, obs_date. Obs_id is the ID of a clinical observation (such as weight reading, blood pressure reading....etc), and obs_date is when that observation was taken. Each patient could have several readings on different dates...etc. Currently I have a query to get all patients that had obs_id = 1 and insert them into a temporary table (has two columns, patient_id, and flag which I set to 0 here):
insert into temp_table (select patient_id, 0 from patients_table
where obs_id = 1 group by patient_id having count(*) >= 1)
I also execute an update statement to set the flag to 1 for all patients that also had obs_id = 5:
UPDATE temp_table SET flag = 1 WHERE EXISTS (
SELECT patient_id FROM patients_table WHERE obs_id = 5 group by patient_id having count(*) >=1
) v WHERE temp_table.patient_id = v.patient_id
Here's my question: How do I modify both queries (without combining them or removing the group by statement) such that I can answer the following question:
"get all patients who had obs_id = 5 after obs_id = 1". If I add a min(obs_date) or max(obs_date) to the select of each query and then add "AND v.obs_date > temp_table.obs_date" to the second one, is that correct??
The reason why I need not remove the group by statement or combine is because these queries are generated by a code generator (from a web app), and i'd like to do that modification without messing up the code generator or re-writing it.
Many thanks in advance,
The advantage of SQL is that it works with sets. You don't need to create temporary tables or get all procedural.
As you describe the problem (find all patients who have obs_id 5 after obs_id 1), I'd start with something like this
select distinct p1.patient_id
from patients_table p1, patients_table p2
where
p1.obs_id = 1 and
p2.obs_id = 5 and
p2.patient_id = p1.patient_id and
p2.obs_date > p1.obs_date
Of course, that doesn't help you deal with your code generator. Sometimes, tools that make things easier can also get in the way.