How to find duplicate combinations of columns across two tables in SQL? - sql

I am attempting to find duplicates across two tables which are identified as a combination of five different columns. The tables have one similar column in common. My table structure and relevant columns look like the following:
order_item_tbl
EVENT_KEY
ORDER_KEY
ORDER_NBR
14
82
1
14
82
2
14
82
3
14
82
1
invoice_tbl
EVENT_KEY
CUSTOMER_KEY
BOOTH_KEY
14
41
12
14
41
12
I've tried this query so far and everything is duplicated more than expected:
SELECT OI.ORDER_NBR AS ORDER_NBR, COUNT(*) AS COUNT
FROM ORDER_ITEM_TBL OI
JOIN INVOICE_TBL I
ON I.EVENT_KEY = OI.EVENT_KEY
WHERE OI.EVENT_KEY = '14' AND OI.ORDER_KEY = '82'
AND I.BOOTH_KEY = '12' AND I.CUSTOMER_KEY = '41' AND OI.ORDER_NBR in (1,2,3)
GROUP BY OI.ORDER_NBR
HAVING COUNT(*) > 1
Based on this dataset I would expect to receive the following result:
ORDER_NBR
COUNT
1
2
However this is the result I'm seeing:
ORDER_NBR
COUNT
1
6
2
2
3
2
Any ideas on what I'm doing wrong here?

group by ORDER_NBR alone must give you the results i reckon as you have only ORDER_NBR in select .
The same query will not execute in posgres , as psql requires columns in group by to be in select or in aggregate funcations

Related

Sql getting MAX and MIN values based on two columns for the ids from two others

I'm having difficulties figuring a query out, would someone be able to assist me with this?
Problem: 4 columns that represent results for the 2 separate tests. One of them taken in UK and another in US. Both of them are the same test and I need to find the highest and lowest score for the test taken in both countries. I also need to avoid using subqueries and temporary tables. Would appreciate theoretical ideas and actual solutions for the problem.
The table looks like this:
ResultID Test_UK Test_US Test_UK_Score Test_US_Score
1 1 2 48 11
2 4 1 21 24
3 3 1 55 71
4 5 6 18 78
5 7 4 19 49
6 1 3 23 69
7 5 2 98 35
8 6 7 41 47
The desired results I'm looking for:
TestID HighestScore LowestScore
1 71 23
2 35 11
3 69 55
4 49 21
5 98 18
6 78 41
7 47 19
I tried implementing a case of comparison, but I still ended up with subquery to pull out the final results. Also tried union, but it ends up in a sub query again. As far as I can think it shoul be a case when then query, but can't really come up with the logic for it, as it requires to match the ID's of the tests.
Thank you!
What I've tried and got the best results (still wrong)
select v.TestID,
max(case when Test_US_Score > Test_UK_Score then Test_UK_Score else null end) MaxS,
min(case when Test_UK_Score > Test_US_Score then Test_US_Score else null end) MinS
FROM ResultsDB rDB CROSS APPLY
(VALUES (Test_UK, 1), (Test_US, 0)
) V(testID, amount)
GROUP BY v.TestID
Extra
The answer provided by M. Kanarkowski is a perfect solution. I'm no expert on CTE, and a bit confused, how would it be possible to adapt this query to return the result ID of the row that min and max were found.
something like this:
TestID Result_ID_Max Result_ID_Min
1 3 6
2 7 1
3 6 3
Extra 2
The desired results of the query would me something like this.
The two last columns represent the IDs of the rows from the original table where the max and min values were found.
TestID HighestScore LowestScore Result_ID_Of_Max Result_ID_Of_Min
1 71 23 3 6
2 35 11 7 1
3 69 55 6 3
For example you can use union to have results from both countries togehter and then just pick the maximum and the minimum for your data.
with cte as (
select Test_UK as TestID, Test_UK_Score as score from yourTable
union all
select Test_US as TestID, Test_US_Score as score from yourTable
)
select
TestID
,max(score) as HighestScore
,min(score) as LowestScore
from cte
group by TestID
order by TestID
Extra:
I assumed that you want to have the additional column with the previous result. If not just take the above select and replace Test_UK_Score and Test_US_Score with ResultID.
with cte as (
select Test_UK as TestID, Test_UK_Score as score, ResultID from yourTable
union all
select Test_US as TestID, Test_US_Score as score, ResultID from yourTable
)
select
TestID
,max(score) as HighestScore
,min(score) as LowestScore
,max(ResultID) as Result_ID_Max
,min(ResultID) as Result_ID_Min
from cte
group by TestID
order by TestID

How to find a field with some specific value in another field in sql

I have 2 columns MessageID and FlowStatusID.
I want to find MessageID's which have a FlowStatusID is one specific value with a special sequence.
For example I want to find MessageID's where the FlowStatusID contains the sequence of these numbers: 105,81,21
MessageID FlowStatusID
-------------------------
1 11
1 105
2 105
2 81
2 21
3 81
4 105
4 81
4 21
5 21
5 105
The result must be 2, 4
You don't mention the database type but I've found some other tickets which explain how to concatenate values from multiple records.
Postgress: Concatenate multiple result rows of one column into one, group by another column
Oracle : SQL Query to concatenate column values from multiple rows in Oracle
You can use this concatenated field in a condition with an equality match:
where myconcatresult = '21,81,105'
I don't know how this will perform :)
Try like this
select MessageIDfrom t group by MessageID having count(*) =3 ;
Or
select MessageID from (
select t.MessageID from t where FlowStatusID=21
union all
select t.MessageID from t where FlowStatusID=81
union all
select t.MessageID from t where FlowStatusID=105 )
as tt group by MessageID having count(*) =3

How to COUNT DISTINCT on more than one column

I have the following table.
group _id p_id version value
1 1 1 10
1 1 2 11
1 1 2 12
1 2 3 13
2 1 2 14
2 1 3 15
2 1 2 16
I would like to count on how many records for each group_id and how many distinct p_id + version for each group_id. I have following query
SELECT "group_id",count(*) , count(distinct "p_id","version")
FROM tbl
group by "group_id"
Aapparently, it' not going to work, as Oracle will give me error on COUNT
ORA-00909: invalid number of arguments
I know this can be done by subquery. However, is there any simple way to get same result? Considing the performance is important to me, as we have more than 500 million records in the table.
SQL Fiddle
I don't know if it's the best way, but I normally concatenate the two values, using a delimiter to enforce "distinctness", so they become one expression, which Oracle can handle with COUNT DISTINCT:
SELECT "group_id",count(*) , count(distinct "p_id" || '-' || "version")
FROM tbl
group by "group_id"

Multiple AVG in single SQL query

I've searched for adding multiple AVG calculations and have found a few entries, however I'm having to join another table and the examples of that are scarce.
closest answer I can find is this
but it deals with dates and no joins
here are my tables:
indicators:
StandardScore IndicatorID NID DID
0.033333 7 1 1
0.907723 9 1 1
0.574739 26 1 1
0.917391 21 1 1
.....
indexindicators:
IndexID IndicatorID
1 7
1 26
2 21
3 7
4 9
4 21
4 7
5 9
.......
My goal is to get the average for each IndexID (indexindicators) related to NID/DID (indicators) combination
a query to retrieve a single value would be
SELECT AVG(StandardScore) FROM `indicators` INNER JOIN indexindicators ON indicators.IndicatorId=indexindicators.IndicatorId WHERE nid=1 AND did=1 AND indexindicators.IndexId=1
ultimately there will be 6 (indexID) averages which then have to be rounded then * by 100 (should I do that part with PHP?)
This seems like such a simple query, but i just can't seem wrap my mind around it.
Thanks in advance for your help!
SELECT nid, did, indexid, 100.0 * AVG(StandardScore)
FROM 'indicators'
INNER JOIN 'indexindicators'
ON indicators.IndicatorId=indexindicators.IndicatorId
group by nid, did, indexid

SQL query to find rows that aren't present in other tables

Here's what I'm trying to accomplish:
I've got two tables, call them first and second. They each have an ID column. They might have other columns but those aren't important. I have a third table, call it third. It contains two columns, ID and OTHERID. OTHERID references entries that may or may not exist in tables first and second.
I want to query third and look for rows who don't have an OTHERID column value that is found in either tables first or second. The goal is to delete those rows from table third.
Example:
first table:
ID
1
2
3
second table:
ID
6
7
8
third table
ID | OTHERID
21 1
22 2
23 3
24 4
25 5
26 6
27 7
28 8
In this case, I'd want to retrieve the IDs from third who don't have a matching ID in either table first or table second. I'd expect to get back the following IDs:
24
25
What I've tried:
I've done something this to get back the entries in third that aren't in first:
select t.* from third t where not exists (select * from first f where t.otherid = f.id);
and this will get me back the following rows:
ID | OTHERID
24 4
25 5
26 6
27 7
28 8
Similarly, I can get the ones that aren't in second:
select t.* from third t where not exists (select * from second s where t.otherid = s.id);
and I'll get:
ID | OTHERID
21 1
22 2
23 3
24 4
25 5
What I can't get my brain about this morning is how to combine the two queries together to get the intersection between the two results sets, so that just the rows with IDs 24 and 25 are returned. Those would be two rows I could remove since they are orphans.
How would you solve this? I think I'm on the right track but I'm just spinning at this point making no progress.
Maybe this :
SELECT third.*
FROM third
LEFT JOIN first ON third.otherID = first.id
LEFT JOIN second ON third.otherID = second.id
WHERE first.id IS NULL AND second.id IS NULL
Just use
select t.*
from third t
where
not exists (select * from first f where t.otherid = f.id)
and not exists (select * from second s where t.otherid = s.id)
SELECT t.ID
FROM third t
WHERE t.OTHERID NOT IN (
SELECT ID
FROM first
UNION
SELECT ID
FROM second
)