How can I query for the distinct field1 instances that have multiple distinct corresponding field2 values?
field1
field2
a
apple
b
grape
c
banana
b
orange
a
apple
In this example I want to return "b", since there are at least 2 distinct values (grape and orange) for field2 that correspond to it. I don't wan't "a" since there is only 1 unique field2 value that corresponds, "apple".
I have tried
with all_unique_combos as (
select distinct field1, field2
from table
)
select field1
from all_unique_combos
group by field1
having count(field2) > 1
I actually think this is right and would give me what I need. But at the moment it's returning 0 rows so I kinda need a sanity check. Thanks for any input either way.
You can use aggregation:
select field1
from t
group by field1
having min(field2) <> max(field2);
A straight-forward approach uses group by and having:
select field1
from mytable
group by field1
having min(field2) <> max(field2)
Using COUNT(DISTINCT ...):
select field1
from tab
group by field1
having count(disitnct field2) > 1
I have a table form which I need to extract some information. This table has an oracle spatial (MDSYS.SDO_GEOMETRY) column, from which I also need some data.
I started out with a simple query like this:
select id, field1, field2
FROM my_table;
After that, I was able to loop over the result to extract the data that was in the spatial column:
SELECT *
FROM TABLE (SELECT a.POSITIONMAP.sdo_ordinates
FROM my_table
WHERE ID = 18742084);
The POSITIONMAP.sdo_ordinates seems to usually hold 4 values, like these:
100050,887
407294,948
0,577464740471056
-0,816415625470689
I need the last 2 values. I can achieve that by changing the query into this:
SELECT * FROM
(SELECT rownum AS num,
column_value AS orientatie
FROM TABLE (SELECT a.POSITIONMAP.sdo_ordinates
FROM my_table
WHERE ID = 18742084))
WHERE num IN (3,4)
Looping over every row from my first query to extract the data from the POSITIONMAP column is of course not very performance friendly, so my query becomes slow very quickly.
I would like to retrieve all information in one query, but there are a few things that prevent me from doing so.
Not every row in the table has data in POSITIONMAP
Some rows do have data in POSITIONMAP, but they only contain 2 values (so not the 3rd and 4th value that I am looking for.
I need the data in one row for every row in the table (using the previous query would result in duplicate rows
The closest I got is:
select
id,
field1,
field2
t.*
FROM my_table v,
table (v.POSITIONMAP.sdo_ordinates) t
This gives my 4 rows for every row in my_table.
As soon as I try to put the rownum condition into this query, I get an error: "invalid user.table.column, table.column, or column specification"
Is there any way to combine what I want to do into 1 query?
You can use sdo_util.getvertices as follows:
select t.x,t.y
from my_table mt
,table(sdo_util.getvertices(mt.positionmap)) t
where t.id = 2
I'm assuming that your geometries are lines (gtype=2002) and points (gtype= 2001). If you want X,Y values for lines and empty values for point you can filter on the sdo_gtype property of the geometry object.
select t.x,t.y
from my_table mt
,table(sdo_util.getvertices(mt.positionmap)) t
where t.id = 2
and mt.positionmap.sdo_gtype=2002
union all
select null as X,
null as Y
from my_table mt
where mt.positionmap.sdo_gtype=2001
One method is to use the ROW_NUMBER() analytic function:
SELECT *
FROM (
select id,
field1,
field2,
t.*,
ROW_NUMBER() OVER ( PARTITION BY v.id ORDER BY ROWNUM ) AS rn
FROM my_table v,
TABLE( v.POSITIONMAP.sdo_ordinates ) t
)
WHERE rn IN ( 3, 4 )
Can you help me with SQL statements to find duplicates on multiple fields?
For example, in pseudo code:
select count(field1,field2,field3)
from table
where the combination of field1, field2, field3 occurs multiple times
and from the above statement if there are multiple occurrences I would like to select every record except the first one.
To get the list of fields for which there are multiple records, you can use..
select field1,field2,field3, count(*)
from table_name
group by field1,field2,field3
having count(*) > 1
Check this link for more information on how to delete the rows.
http://support.microsoft.com/kb/139444
There should be a criterion for deciding how you define "first rows" before you use the approach in the link above. Based on that you'll need to use an order by clause and a sub query if needed. If you can post some sample data, it would really help.
You mention "the first one", so I assume that you have some kind of ordering on your data. Let's assume that your data is ordered by some field ID.
This SQL should get you the duplicate entries except for the first one. It basically selects all rows for which another row with (a) the same fields and (b) a lower ID exists. Performance won't be great, but it might solve your problem.
SELECT A.ID, A.field1, A.field2, A.field3
FROM myTable A
WHERE EXISTS (SELECT B.ID
FROM myTable B
WHERE B.field1 = A.field1
AND B.field2 = A.field2
AND B.field3 = A.field3
AND B.ID < A.ID)
This is a fun solution with SQL Server 2005 that I like. I'm going to assume that by "for every record except for the first one", you mean that there is another "id" column that we can use to identify which row is "first".
SELECT id
, field1
, field2
, field3
FROM
(
SELECT id
, field1
, field2
, field3
, RANK() OVER (PARTITION BY field1, field2, field3 ORDER BY id ASC) AS [rank]
FROM table_name
) a
WHERE [rank] > 1
To see duplicate values:
with MYCTE as (
select row_number() over ( partition by name order by name) rown, *
from tmptest
)
select * from MYCTE where rown <=1
If you're using SQL Server 2005 or later (and the tags for your question indicate SQL Server 2008), you can use ranking functions to return the duplicate records after the first one if using joins is less desirable or impractical for some reason. The following example shows this in action, where it also works with null values in the columns examined.
create table Table1 (
Field1 int,
Field2 int,
Field3 int,
Field4 int
)
insert Table1
values (1,1,1,1)
, (1,1,1,2)
, (1,1,1,3)
, (2,2,2,1)
, (3,3,3,1)
, (3,3,3,2)
, (null, null, 2, 1)
, (null, null, 2, 3)
select *
from (select Field1
, Field2
, Field3
, Field4
, row_number() over (partition by Field1
, Field2
, Field3
order by Field4) as occurrence
from Table1) x
where occurrence > 1
Notice after running this example that the first record out of every "group" is excluded, and that records with null values are handled properly.
If you don't have a column available to order the records within a group, you can use the partition-by columns as the order-by columns.
CREATE TABLE #tmp
(
sizeId Varchar(MAX)
)
INSERT #tmp
VALUES ('44'),
('44,45,46'),
('44,45,46'),
('44,45,46'),
('44,45,46'),
('44,45,46'),
('44,45,46')
SELECT * FROM #tmp
DECLARE #SqlStr VARCHAR(MAX)
SELECT #SqlStr = STUFF((SELECT ',' + sizeId
FROM #tmp
ORDER BY sizeId
FOR XML PATH('')), 1, 1, '')
SELECT TOP 1 * FROM (
select items, count(*)AS Occurrence
FROM dbo.Split(#SqlStr,',')
group by items
having count(*) > 1
)K
ORDER BY K.Occurrence DESC
Try this query to find duplicate records on multiple fields
SELECT a.column1, a.column2
FROM dbo.a a
JOIN (SELECT column1,
column2, count(*) as countC
FROM dbo.a
GROUP BY column4, column5
HAVING count(*) > 1 ) b
ON a.column1 = b.column1
AND a.column2 = b.column2
You can also try this query to count a distinct() column and order by with your desired column:
select field1, field2, field3, count(distinct (field2))
from table_name
group by field1, field2, field3
having count(field2) > 1
order by field2;
Try this query to have a separate count of each SELECT statement:
select field1, count(field1) as field1Count, field2,count(field2) as field2Counts, field3, count(field3) as field3Counts
from table_name
group by field1, field2, field3
having count(*) > 1
i would like to run a sql statement that will delete ONLY the second value for example
delete from table1 where condition1
i want this statement to delete ONLY the second value
how can i accomplish this?
i would like to clarify. i have a field called field1 which is an autonumber and and it is a primary key and it increments. i would like to delete the record containing the greater number
You could also employ the ROW_NUMBER() function of SQL server to number each row, and use this number to isolate just the second item for deletion, according to your own custom ordering in the inner query ( over (ORDER BY <myKey> asc) ). This provides a great deal of flexibility.
DELETE a FROM table1
FROM table1 a
JOIN (
select ROW_NUMBER() over (ORDER BY <myKey> asc) as AutoNumber, <myKey> from table1
) b on a.<myKey> = b.<myKey>
WHERE condition1
AND b.AutoNumber = 2
Do you want to delete only the last duplicate, or all but the first?
For all but the first: (Edited to use CTE per #Martin's suggestion.)
with target as (select * from table1 where condition1)
delete from target goner
where exists (select * from target keeper
where keeper.field1 < goner.field1)
In other words, if there is another matching record with a lower field1, delete this record.
EDIT:
To delete only the last:
with target as (select * from table1 where condition1)
delete from target goner
where exists (select * from target keeper
where keeper.field1 < goner.field1)
and not exists (select * from target missing
where missing.field1 > goner.field1)
In other words, if there is another matching record with a lower field1, AND there is no matching record with a higher field1, then we have the highest duplicate, so nuke it.
It's been a while (so my syntax my not quite be right), and this may not be the best solution, but the "academic" answer would be something like:
delete from table1 where condition1
and field1 = (select max(field1) from table1 where condition1)
Try this:
DELETE MyTable
FROM MyTable
LEFT OUTER JOIN (
SELECT MIN(id) as id, Col1, Col2, Col3
FROM MyTable
GROUP BY Col1, Col2, Col3
) as KeepRows ON
MyTable.id= KeepRows.id
WHERE
KeepRows.RowId IS NULL
UPDATE
While this might not be as "pretty" as #Jeffrey's it works. From what I can tell, #Jeffrey's does not. See sql below (Delete replaced with SELECT * for demonstration):
WITH TEMP as
(
SELECT 1 as id,'A' as a,'Z' as b
UNION
SELECT 2,'A','Z'
UNION
SELECT 3,'B','Z'
UNION
SELECT 4,'B','Z'
)
SELECT *
FROM TEMP
LEFT OUTER JOIN (
SELECT MIN(id) as id, a, b
FROM TEMP
GROUP BY a, b
) as KeepRows ON
temp.id= KeepRows.id
WHERE
KeepRows.id IS NULL
Consider this data
PK field1 field2
1 a b
2 a (null)
3 x y
4 x z
5 q w
I need to get this data
select all columns from all rows where field1 has count >1
Which means the desired output is
PK field1 field2
1 a b
2 a (null)
3 x y
4 x z
i tried and finally settled for
select * from mytable where field1 in
(select field1 from mytable group by field1 having count(field1)>1 ) order by field1
but there has to be a better way than this
That's the way I would do it.
You could rewrite it with a join to the subquery instead of using in, but I doubt it would be any faster.
Edit: Ok, so for reference, the "join" method would go something like this:
select m.* from mytable m
join (
select field1 from mytable
group by field1
having count(field1)>1
) j on m.field1=j.field1
order by m.field1
And it seems it's worth testing to see if it's faster (thanks #binaryLV).
Another way, if using T-SQL
;WITH T AS
(
SELECT PK, FIELD1, FIELD2, COUNT(FIELD1) OVER(PARTITION BY FIELD1) AS R
)
SELECT PK, FIELD1, FIELD2
FROM T
WHERE R > 1