sql query distinct on multiple columns - sql

i have this data and i am trying to find cases where there are different ids but duplicate data in Field 1,2,3,4
id field1 field2 field3 field4
==== ====== ====== ===== =======
1 A B C D
2 A B C D
3 A A C B
4 A A C B
so, in whatever way possible, in this case i want it to somehow show me:
1 & 2 are duplicates
3 & 4 are duplicates

Instead of SELECT DISTINCT, select the fields and a count of rows. Use HAVING to filter out items with more than one row, e.g:
select field1
,field2
,field3
,field4
,count (*)
from foo
group by field1
,field2
,field3
,field4
having count (*) > 1
You can then join your original table back against the results of the query.

One way to do this is to use having and group by
esben=# select * from test;
id | a | b | c | d
----+---+---+---+---
1 | 1 | 2 | 3 | 4
2 | 1 | 2 | 3 | 4
3 | 1 | 1 | 3 | 2
4 | 1 | 1 | 3 | 2
(4 rows)
esben=# select count(id),a,b,c,d from test group by a,b,c,d having count(id) >1;
count | a | b | c | d
-------+---+---+---+---
2 | 1 | 2 | 3 | 4
2 | 1 | 1 | 3 | 2
(2 rows)
This doesn't list the actual id's though, but without the actual output you want it is hard to tell you how to get about that.

SELECT *
FROM [TableName]
WHERE ID IN(SELECT MIN(ID)
FROM [TableName]
GROUP BY CONCAT(field1, field2, field3, field4))
This will return the full row for id's 1 & 3

Related

Select objects from Table A that are associated to objects in Table B ordered by matching count

This is an addition to this question:
I want to get all objects from table 1 that a have associations to item 2 or item 3, ordered by the count of matching associations.
So the result should be a list like: object 2 (2 matches), object 3 (2 matches), object 1 (1 match).
What must the SQL look like to get the results ordered?
Table 1
id | title
---------------
1 | object 1
2 | object 2
3 | object 3
Table 2
id | title
---------------
1 | item 1
2 | item 2
3 | item 3
Table 3 (n-m association)
id | object_id | item_id
------------------------------
1 | 1 | 1
2 | 1 | 2
3 | 2 | 1
4 | 2 | 2
5 | 2 | 3
6 | 3 | 2
7 | 3 | 3
Join Table1 with Table3 and aggregate as the following:
select T1.id, T1.title, count(*) matches
from table1 T1 join table3 T3
on T1.id = T3.object_id
where T3.item_id in (2, 3)
group by T1.id, T1.title
order by matches, T1.id
See demo

Bigquery: Joining 2 tables one having repeated records and one with count ()

I want to join tables after unnest arrays in Table:1 but the records duplicated after the join because of the unnest.
Table:1
| a | d.b | d.c |
-----------------
| 1 | 5 | 2 |
- -------------
| | 3 | 1 |
-----------------
| 2 | 2 | 1 |
Table:2
| a | c | f |
-----------------
| 1 | 12 | 13 |
-----------------
| 2 | 14 | 15 |
I want to join table 1 and 2 on a but I need also to have the output of:
| a | d.b | d.c | f | h | Sum(count(a))
---------------------------------------------
| 1 | 5 | 2 | 13 | 12 |
- ------------- - - 1
| | 3 | 1 | | |
---------------------------------------------
| 2 | 2 | 1 | 15 | 14 | 1
a can be repeated in table 2 for that I need to count(a) then select the sum after join.
My problem is when I'm joining I need the nested and repeated record to be the same as in the first table but when use aggregation to get the sum I can't group by struct or arrays so I UNNEST the records first then use ARRAY_AGG function but also there was an issue in the sum.
SELECT
t1.a,
t2.f,
t2.h,
ARRAY_AGG(DISTINCT(t1.db)) as db,
ARRAY_AGG(DISTINCT(t1.dc)) as dc,
SUM(t2.total) AS total
FROM (
SELECT
a,
d.b as db,
d.c as dc
FROM
`table1`,
UNNEST(d) AS d,
) AS t1
LEFT JOIN (
SELECT
a,
f,
h,
COUNT(*) AS total,
FROM
`table2`
GROUP BY
a,f,h) AS t2
ON
t1.a = t2.a
GROUP BY
1,
2,
3
Note: the error is in the total number after the sum it is much higher than expected all other data are correct.
I guess your table 2 contains is not unique for column a.
Lets assume that the table 2 looks like this:
a
c
f
1
12
13
2
14
15
1
100
101
There are two rows where a is 1. Since b and f are different, the grouping does not solve this ( GROUP BY a,f,h) AS t2) and counts(*) as total is one for each row.
a
c
f
total
1
12
13
1
2
14
15
1
1
100
101
1
In the next step you join this table to your table 1. The rows of table1 with value 1 in column a are duplicated, because table2 has two entries. This lead to the fact that the sum is too high.
Instead of unnesting the tables, I recommend following approach:
-- Creating of sample data as given:
with tbl_A as (select 1 a, [struct(5 as b,2 as c),struct(3,1)] d union all select 2,[struct(2,1)] union all select null,[struct(50,51)]),
tbl_B as (select 1 as a,12 b, 13 f union all select 2,14,15 union all select 1,100,101 union all select null,500,501)
-- Query:
select *
from tbl_A A
left join
(Select a,array_agg(struct(b,f)) as B, count(1) as counts from tbl_B group by 1) B
on ifnull(A.a,-9)=ifnull(B.a,-9)

H2 SQL Sequence count with duplicate values

I have a table of IDs, with some duplicates and I need to create a sequence based on the IDs. I'm trying to achieve the following.
[ROW] [ID] [SEQID]
1 11 1
2 11 2
3 12 1
4 13 1
5 13 2
I'm using an old version of the H2 DB which doesn't have use of windows functions so I have todo this using straight SQL. I have tried joining the table on itself but I'm not getting the result I want as the duplicate values cause issues, any ideas? I have unique identifier in row number, but not sure how to use this to achieve what I want?
SELECT A.ID, COUNT(*) FROM TABLE A
JOIN TABLE B
ON A.ID = B.ID
WHERE A.ID >= B.ID
GROUP BY A.ID;
Use a subquery that counts the seqid:
select
t.row, t.id,
(select count(*) from tablename where id = t.id and row <= t.row) seqid
from tablename t
It's not as efficient as window functions but it does what you expect.
See the demo (for MySql but it's standard SQL).
Results:
| row | id | seqid |
| --- | --- | ----- |
| 1 | 11 | 1 |
| 2 | 11 | 2 |
| 3 | 12 | 1 |
| 4 | 13 | 1 |
| 5 | 13 | 2 |

select column1 from table A based on unique value of another column2 in table B

I have table A and table B and need to select a column1 from table A based on unique value of another column in table B
table A
id | product |
1 | A |
1 | B |
1 | A |
2 | A |
3 | B |
4 | A |
table B
id | product | date
1 | A | 1/01/2017
1 | B | 1/02/2017
1 | A | 1/01/2017
2 | A | 1/01/2017
3 | B | 1/02/2017
4 | A | 1/01/2017
I want the output to be : 2,3,4
i.e. all the 'id's' which have a unique value in 'date' column of table B
Depending upon the actual restrictions in your tables, there are a couple of options.
Option 1 - assuming that for example ID=1, Product=A, date=1/01/2017 and ID=1, Product=B, date=1/01/2017 means that ID=1 IS NOT included in your final result as it has 2 entries for the date = 1/01/2017 even though they are for different Products
SELECT a.ID
FROM
(
SELECT ID, COUNT(*)
FROM TableB
GROUP BY ID
HAVING COUNT(*) = 1
) a
Option 2 - assuming that for example ID=1, Product=A, date=1/01/2017 and ID=1, Product=B, date=1/01/2017 means that ID=1 IS included in your final result as it only has a single date for each ID/Product combination
SELECT DISTINCT ID
FROM
(
SELECT ID, Product, COUNT(*)
FROM TableB
GROUP BY ID, Product
HAVING COUNT(*) = 1
) a

Query for unique values

I have the following database table in Access:
Field1 | Field2
A | 1
B | 1
C | 2
D | 2
B | 3
O | 3
L | 3
I want to develop a query in Access (preferably without using SQL) to select all values in Field2 corresponding to an occurence of the value "B" in field 1. This query should yield
Field1|Field2
A | 1
B | 1
B | 3
O | 3
L | 3
Use a subquery:
select t.*
from t
where t.field2 in (select t2.field2 from t as t2 where t2.field1 = 'B');