I'm trying to Select all the records in my database that don't exist in a subquery.
For some reason it returns nothing even though the sub query returns 2000 or so rows on it's own and the main query returns over 5000. I need all the records that aren't contained in the subquery
SELECT ID
FROM PART
WHERE NOT ID IN
(
SELECT DOCUMENT_ID AS ID
FROM USER_DEF_FIELDS
WHERE PROGRAM_ID = 'VMPRTMNT' AND ID = 'UDF-0000029'
)
This is better written as a correlated NOT EXISTS subquery.
SELECT ID
FROM PART
WHERE NOT EXISTS
(
SELECT 1
FROM USER_DEF_FIELDS
WHERE PROGRAM_ID = 'VMPRTMNT'
AND ID = 'UDF-0000029'
AND DOCUMENT_ID = PART.ID
)
Related
I have a stored procedure in Bigquery and a resulting table where 2 rows are not exactly duplicates but I want to filter one of the rows based on a condition.
SQL query:
Results:
WITH DupCodes AS (
SELECT AccCode
FROM Table
GROUP BY AccCode
HAVING COUNT(*) > 1
)
SELECT *
FROM table
WHERE (AccCode IN (SELECT AccCode FROM DupCodes) AND AccountName IS NOT NULL)
OR (AccCode NOT IN (SELECT AccCode FROM DupCodes))
One method uses not exists logic:
select t.*
from t
where t.accountname is not null or
not exists (select 1
from t t2
where t2.accCode = t.accCode and t2.accountname is not null
);
That is, show all rows where accountname is not empty. Then show empty rows only when there is no non-empty accountname for the same accCode.
I have this table called item:
| PERSON_id | ITEM_id |
|------------------|----------------|
|------CP2---------|-----A03--------|
|------CP2---------|-----A02--------|
|------HB3---------|-----A02--------|
|------BW4---------|-----A01--------|
I need an SQL statement that would output the person with the most Items. Not really sure where to start either.
I advice you to use inner query for this purpose. the inner query is going to include group by and order by statement. and outer query will select the first statement which has the most items.
SELECT * FROM
(
SELECT PERSON_ID, COUNT(*) FROM TABLE1
GROUP BY PERSON_ID
ORDER BY 2 DESC
)
WHERE ROWNUM = 1
here is the fiddler link : http://sqlfiddle.com/#!4/4c4228/5
Locating the maximum of an aggregated column requires more than a single calculation, so here you can use a "common table expression" (cte) to hold the result and then re-use that result in a where clause:
with cte as (
select
person_id
, count(item_id) count_items
from mytable
group by
person_id
)
select
*
from cte
where count_items = (select max(count_items) from cte)
Note, if more than one person shares the same maximum count; more than one row will be returned bu this query.
The following query returns the results that I need but I have to add the ID of the row to then update it. If I add the ID directly in the select statement it will return me more results then I need because each ID is unique so the DISTINCT statement see the line as unique.
SELECT DISTINCT ucpse.MemberID, ucpse.ProductID, ucpse.UserID
FROM UserCustomerProductSalaryExceptions as ucpse
WHERE EXISTS (SELECT NULL
FROM UserCustomerProductSalaryExceptions as upcse2
WHERE ucpse.userid = upcse2.userid AND ucpse.MemberID = upcse2.MemberID AND ucpse.ProductID = upcse2.ProductID
GROUP BY upcse2.UserID, upcse2.memberid, upcse2.productid
HAVING COUNT(UserID) >= 2
)
So basically I need to add ucpse.ID in the Select statement while keeping DISTINCT values for MemberID,ProductID and UserID.
Any Ideas ?
Thank you
According to you comment:
If the data has been duplicated 67 times for a given employee with a given product and a given client, I need to keep only one of thoses records. It's not important which one, so this is why I use DISTINC to obtain unique combinaison of given employee with a given product and a given client.
You can use MIN() or MAX() and GROUP BY instead of DISTINCT
SELECT MAX(ucpse.ID) AS ID, ucpse.MemberID, ucpse.ProductID, ucpse.UserID
FROM UserCustomerProductSalaryExceptions as ucpse
WHERE EXISTS (SELECT NULL
FROM UserCustomerProductSalaryExceptions as upcse2
WHERE ucpse.userid = upcse2.userid AND ucpse.MemberID = upcse2.MemberID AND ucpse.ProductID = upcse2.ProductID
GROUP BY upcse2.UserID, upcse2.memberid, upcse2.productid
HAVING COUNT(UserID) >= 2
)
GROUP BY ucpse.MemberID, ucpse.ProductID, ucpse.UserID
UPDATE:
From you comments I think the below query is what you need
DELETE FROM UserCustomerProductSalaryExceptions
WHERE ID NOT IN ( SELECT MAX(ucpse.ID) AS ID
FROM #UserCustomerProductSalaryExceptions
GROUP BY ucpse.MemberID, ucpse.ProductID, ucpse.UserID
HAVING COUNT(ucpse.ID) >= 2
)
If all you want is to delete the duplicates, this will do it:
WITH X AS
(SELECT ID,
ROW_NUMBER() OVER (PARTITION BY MemberID, ProductID, UserID ORDER BY ID) AS DupRowNum<br
FROM UserCustomerProductSalaryExceptions
)
DELETE X WHERE DupRowNum > 1
ID's not necessary - try:
UPDATE uu SET
<your settings here>
FROM UserCustomerProductSalaryExceptions uu
JOIN ( <paste your entire query above here>
) uc ON uc.MemberID=uu.MemberId AND uc.ProductID=uu.ProductId AND uc.UserID=uu.UserId
From the sound of your data structure (which I would STRONGLY advise normalizing as soon as possible), it sounds like you should be updating all the records. It sounds as if each duplicate is important because it contains some information about an employee's relation to a customer or product.
I would probably update all the records. Try this:
UPDATE UCPSE
SET
--Do your updates here
FROM UserCustomerProductSalaryExceptions as ucpse
JOIN
(
SELECT UserID, MemberID, ProductID
FROM UserCustomerProductSalaryExceptions
GROUP BY UserID, MemberID, ProductID
HAVING COUNT(UserID) >= 2
) T
ON ucpse.UserID = T.UserID AND ucpse.MemberID = T.MemberID AND ucpse.ProductID = T.ProductID
Basically my select statement returns below:
ID Status
100 1
100 2
101 1
What i'm looking for is to return if a ID having status as 1 and if the same ID has another status ID as 2 then exclude both
In Short results as below:
ID Status
101 1
Thanks in advance !
The following query returns ID values that occur only once.
SELECT ID
FROM t
GROUP BY ID
HAVING COUNT(*) = 1
It should be sufficient for the sample data you provided. If there are other cases then let me know.
SQL Fiddle
You gonna need subquery and NOT IN here.
The following would work if you have column status as INT datatype
SELECT *
FROM table
WHERE status = 1
AND ID NOT IN (
SELECT ID
FROM table
WHERE status = 2
);
Making a generic query, which will remove all duplicated rows, not only for a particular ID :
select ID
from table where ID NOT IN
(select ID from table GROUP BY ID HAVING count(Status) > 1)
/* Subquery will fetch ID's having multiple entries*/
SQL Fiddle
The CTE 'IDs' retrieves all IDs which have single record in DB. This is then joined to original table to return the result as a pair (ID, Status)
;with IDs as
(
select ID
from yourtable
group by ID
having count(*) = 1
)
select i.ID, y.Status
from yourtable y
inner join IDs i on y.ID = i.ID
order by i.ID
My goal is to delete all records from my table that are NOT the MAX(recordDate) of a grouped CaseKey. So if I have 9 records with 3 sets of 3 casekeys, and each casekey has its 3 dates. I'd delete the 2 lower dates of each set and come up with 3 total records, only the MAX(recordDate) of each remaining.
I have the following SQL Query:
DELETE FROM table
WHERE tableID NOT IN (
SELECT tableID
FROM (
Select MAX(recordDate) As myDate, tableID From table
Group By CaseKey
) As foo
)
I receive the error:
Error on Line 3... Column 'table.tableID' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
Obviously I could add tableID to my Group By clause, but then the result of that statement is incorrect and returns all rows instead of just returning the MAX recordDate of the grouped CaseKeys.
Server is down right now, but the apparent answer is: (tiny tweak from WildPlasser's answer)
DELETE zt FROM ztable zt
WHERE EXISTS (
SELECT * FROM ztable ex
WHERE ex.CaseKey = zt.CaseKey
AND ex.recordDate > zt.recordDate
);
In other words, for each record in zt, run a query to see if the same record also has a record with a higher recordDate. If so, the WHERE EXISTS statement passes and the record is deleted, otherwise the WHERE statement fails and the record is its own MAX recordDate.
Thank you, WildPlasser, for that simplistic methodology that I was somehow blowing up.
There is one special property of MAX: there is no record with a higher value than max. So we can delete all the records for which a record with the same CaseKey, but with a higher recordDate exists:
DELETE FROM ztable zt
WHERE EXISTS (
SELECT *
FROM ztable ex
WHERE ex.CaseKey = zt.CaseKey
AND ex.recordDate > zt.recordDate
);
BTW: The above query (as well as the MAX() version) assumes that there is only one record with the maximum date. There could be ties.
In the case of ties, you'll need to add an extra field to the where clause; as a tie-breaker. Assuming that TableId can function as such, the query would become:
DELETE FROM ztable zt
WHERE EXISTS (
SELECT *
FROM ztable ex
WHERE ex.CaseKey = zt.CaseKey
AND ( ex.recordDate > zt.recordDate
OR (ex.recordDate = zt.recordDate AND ex.TableId > zt.TableId)
)
);
Just express
delete all records from my table that are NOT the MAX(recordDate) of a
grouped CaseKey
in sql as
DELETE FROM table t1
WHERE t1.recordDate <>
(SELECT MAX(recordDate)
FROM table t2
WHERE t2.CaseKey = t1.CaseKey)
You can rank all records with the same caseKey where the rank > 1 to only return the lower dates. That way you can use your tableID.
DELETE FROM [table]
WHERE [tableID] IN
(SELECT
[sub].[tableID]
FROM
(
SELECT
[tableID],
Rank() OVER (PARTITION BY [caseKey] ORDER BY [recordDate] DESC, [tableID] DESC) AS [rank]
FROM [table]
) AS [sub]
WHERE [sub].[rank] > 1)