Count the occurrences of duplicate values among columns of a table - sql

I've a table where each column contains integers and some values are duplicates.
Here the example:
| ColumnA | Column B | Column C |
| 2 | 3 | 1 |
| 1 | 1 | 3 |
| 2 | 1 | 3 |
How can I do a SQL query in order to count the occurrences of each integer?
I want to obtain something like that:
the count for 2 is 2, the count for 1 is 4, the count for 3 is 3

try this:
SELECT Col, COUNT(*) AS TOT
FROM (
SELECT ColumnA AS Col FROM table
UNION ALL
SELECT ColumnB FROM table
UNION ALL
SELECT ColumnC FROM table
) AS A
GROUP BY Col

Related

Bigquery: Joining 2 tables one having repeated records and one with count ()

I want to join tables after unnest arrays in Table:1 but the records duplicated after the join because of the unnest.
Table:1
| a | d.b | d.c |
-----------------
| 1 | 5 | 2 |
- -------------
| | 3 | 1 |
-----------------
| 2 | 2 | 1 |
Table:2
| a | c | f |
-----------------
| 1 | 12 | 13 |
-----------------
| 2 | 14 | 15 |
I want to join table 1 and 2 on a but I need also to have the output of:
| a | d.b | d.c | f | h | Sum(count(a))
---------------------------------------------
| 1 | 5 | 2 | 13 | 12 |
- ------------- - - 1
| | 3 | 1 | | |
---------------------------------------------
| 2 | 2 | 1 | 15 | 14 | 1
a can be repeated in table 2 for that I need to count(a) then select the sum after join.
My problem is when I'm joining I need the nested and repeated record to be the same as in the first table but when use aggregation to get the sum I can't group by struct or arrays so I UNNEST the records first then use ARRAY_AGG function but also there was an issue in the sum.
SELECT
t1.a,
t2.f,
t2.h,
ARRAY_AGG(DISTINCT(t1.db)) as db,
ARRAY_AGG(DISTINCT(t1.dc)) as dc,
SUM(t2.total) AS total
FROM (
SELECT
a,
d.b as db,
d.c as dc
FROM
`table1`,
UNNEST(d) AS d,
) AS t1
LEFT JOIN (
SELECT
a,
f,
h,
COUNT(*) AS total,
FROM
`table2`
GROUP BY
a,f,h) AS t2
ON
t1.a = t2.a
GROUP BY
1,
2,
3
Note: the error is in the total number after the sum it is much higher than expected all other data are correct.
I guess your table 2 contains is not unique for column a.
Lets assume that the table 2 looks like this:
a
c
f
1
12
13
2
14
15
1
100
101
There are two rows where a is 1. Since b and f are different, the grouping does not solve this ( GROUP BY a,f,h) AS t2) and counts(*) as total is one for each row.
a
c
f
total
1
12
13
1
2
14
15
1
1
100
101
1
In the next step you join this table to your table 1. The rows of table1 with value 1 in column a are duplicated, because table2 has two entries. This lead to the fact that the sum is too high.
Instead of unnesting the tables, I recommend following approach:
-- Creating of sample data as given:
with tbl_A as (select 1 a, [struct(5 as b,2 as c),struct(3,1)] d union all select 2,[struct(2,1)] union all select null,[struct(50,51)]),
tbl_B as (select 1 as a,12 b, 13 f union all select 2,14,15 union all select 1,100,101 union all select null,500,501)
-- Query:
select *
from tbl_A A
left join
(Select a,array_agg(struct(b,f)) as B, count(1) as counts from tbl_B group by 1) B
on ifnull(A.a,-9)=ifnull(B.a,-9)

select column1 from table A based on unique value of another column2 in table B

I have table A and table B and need to select a column1 from table A based on unique value of another column in table B
table A
id | product |
1 | A |
1 | B |
1 | A |
2 | A |
3 | B |
4 | A |
table B
id | product | date
1 | A | 1/01/2017
1 | B | 1/02/2017
1 | A | 1/01/2017
2 | A | 1/01/2017
3 | B | 1/02/2017
4 | A | 1/01/2017
I want the output to be : 2,3,4
i.e. all the 'id's' which have a unique value in 'date' column of table B
Depending upon the actual restrictions in your tables, there are a couple of options.
Option 1 - assuming that for example ID=1, Product=A, date=1/01/2017 and ID=1, Product=B, date=1/01/2017 means that ID=1 IS NOT included in your final result as it has 2 entries for the date = 1/01/2017 even though they are for different Products
SELECT a.ID
FROM
(
SELECT ID, COUNT(*)
FROM TableB
GROUP BY ID
HAVING COUNT(*) = 1
) a
Option 2 - assuming that for example ID=1, Product=A, date=1/01/2017 and ID=1, Product=B, date=1/01/2017 means that ID=1 IS included in your final result as it only has a single date for each ID/Product combination
SELECT DISTINCT ID
FROM
(
SELECT ID, Product, COUNT(*)
FROM TableB
GROUP BY ID, Product
HAVING COUNT(*) = 1
) a

How can I select each particular data up to a certain quantity?

How can I select each particular data upto a certain quantity. For example in the below table, there are 4 A, 4 B, 2 C and 1 D. Now I want to select all letters but not more than two each of it, Which will yield 2 A, 2 B, 2 C and 1 D.
+====+========+
| ID | Letter |
+====+========+
| 1 | A |
+----+--------+
| 2 | B |
+----+--------+
| 3 | B |
+----+--------+
| 4 | C |
+----+--------+
| 5 | A |
+----+--------+
| 6 | A |
+----+--------+
| 7 | C |
+----+--------+
| 8 | B |
+----+--------+
| 9 | B |
+----+--------+
| 10 | D |
+----+--------+
| 11 | A |
+----+--------+
Can anyone please help me for the above scenario?
I can think of a simple way:
select
case
when count(*) > 1
then 2
else count(*)
end,
second_column
from your_table
group by second_column;
This will give the result you want, but it won't really 'select ONLY two or less records' of each.
Using a ROW_NUMBER() function and a derived table:
CREATE TABLE myTable (id int, Letter varchar(1))
INSERT INTO myTable
VALUES (1,'A')
,(2,'B')
,(3,'B')
,(4,'C')
,(5,'A')
,(6,'A')
,(7,'C')
,(8,'B')
,(9,'B')
,(10,'D')
,(11,'A')
SELECT id, Letter
FROM
(SELECT *
,ROW_NUMBER() OVER(PARTITION BY Letter ORDER BY Letter) as rn
FROM myTable) myTable
WHERE rn = 1 or rn = 2
In essence, "cut" (PARTITION) the rows by Letters, and assign them each a number for its unique group, then pick the first two of each Letter.
Try it here:
http://rextester.com/WTKYCE51114
Use ROW_NUMBER() function to tag each record the row number and PARTITION it BY (grouping by) letter and ORDER it BY (id)
SELECT id,
letter
FROM (SELECT *,
ROW_NUMBER() OVER(PARTITION BY letter ORDER BY id) rnum
FROM myTable
) t
WHERE rnum <=2
Ordering it by id, you will have the first two instances of each letter in ascending order, thus you will have below result (note that id 1 and 5 are selected for A, 2 and 3 for B)
id letter
1 A
5 A
2 B
3 B
4 C
7 C
10 D

Sqlite: Select last row group by 2 column

I'm trying to get the last row of my table but with 2 column.
+----+-----+---------+
| id1| id2 | info |
+----+-----+---------+
| 1 | 2 | info |
| 2 | 1 | NULL |
| 2 | 3 | info |
| 2 | 1 | NULL |
+----+-----+---------+
I tried:
SELECT * FROM table GROUP BY id1
but I got:
1 2
2 3
2 1
What I need:
2 3
2 1
In other words, I need the last row of each couple ids
Any idea?
SELECT DISTINCT id1, id2 FROM table WHERE id1=2
This should do the trick. Unless you want to apply an aggregation function to other columns, SELECT DISTINCT should to the trick. It will drop any duplicate rows.
If you want to get all items with the highest value dynamically, you can use:
SELECT DISTINCT id1, id2 FROM table WHERE id1=(SELECT MAX(id1))

sql query distinct on multiple columns

i have this data and i am trying to find cases where there are different ids but duplicate data in Field 1,2,3,4
id field1 field2 field3 field4
==== ====== ====== ===== =======
1 A B C D
2 A B C D
3 A A C B
4 A A C B
so, in whatever way possible, in this case i want it to somehow show me:
1 & 2 are duplicates
3 & 4 are duplicates
Instead of SELECT DISTINCT, select the fields and a count of rows. Use HAVING to filter out items with more than one row, e.g:
select field1
,field2
,field3
,field4
,count (*)
from foo
group by field1
,field2
,field3
,field4
having count (*) > 1
You can then join your original table back against the results of the query.
One way to do this is to use having and group by
esben=# select * from test;
id | a | b | c | d
----+---+---+---+---
1 | 1 | 2 | 3 | 4
2 | 1 | 2 | 3 | 4
3 | 1 | 1 | 3 | 2
4 | 1 | 1 | 3 | 2
(4 rows)
esben=# select count(id),a,b,c,d from test group by a,b,c,d having count(id) >1;
count | a | b | c | d
-------+---+---+---+---
2 | 1 | 2 | 3 | 4
2 | 1 | 1 | 3 | 2
(2 rows)
This doesn't list the actual id's though, but without the actual output you want it is hard to tell you how to get about that.
SELECT *
FROM [TableName]
WHERE ID IN(SELECT MIN(ID)
FROM [TableName]
GROUP BY CONCAT(field1, field2, field3, field4))
This will return the full row for id's 1 & 3