Count unique rows - sql

Im having this problem where im trying to count the number of entries by a unique value.
SELECT table1.PID, table2.CID
FROM table1
INNER JOIN table2 using (OID)
WHERE table1.PID IN (
SELECT table1.PID
FROM table1
JOIN table2 using (OID)
WHERE table2.CID = 'A'
) AND table2.CID != 'A'
what I would like to do is to count the number of unique table2.CID entries.
NOTE: I need to do grouping cause of duplicate values.
Just to help, here is a picture of the table i'm getting as an output. What I would like to have is the count of each name on unique ID values. So ERNSH should return 7 and not 15.

You want one record per CID, so group by CID. Then use COUNT. And as you want to count distinct values use COUNT(DISTINCT):
SELECT COUNT(DISTINCT table1.PID), table2.CID
FROM table1
INNER JOIN table2 using (OID)
WHERE table1.PID IN (
SELECT table1.PID
FROM table1
JOIN table2 using (OID)
WHERE table2.CID = 'A'
) AND table2.CID != 'A'
GROUP BY table2.CID;

Related

Show max to min count in SQL group by

I have a table with id, Name columns.
When selecting I want to group by on Name column (but show all records NO summary), and show result count max number of in one grouping to min number of one grouping.
SELECT Table1.id, Table1.name
FROM Table1
GROUP BY Table1.id, Table1.name;
This is table:
My idea:
but I get this result:
One approach uses a join to a subquery which finds the counts for each name:
SELECT a.name, a.ID
FROM Table1 AS a
INNER JOIN
(
SELECT name, COUNT(*) AS cnt
FROM Table1
GROUP BY name
) AS b
ON a.name = b.name
ORDER BY
b.cnt DESC,
a.ID;
When you group by ID you create a separate group for each element, because each element has a unique ID. To get the count of each name group, you will want to create a grouping by name and then join it with your original table so the values are preserved. Something like:
WITH Counts (name, cnt) AS
(SELECT name, COUNT(*)
FROM Table1
GROUP BY name)
SELECT Table1.id, Table1.name
FROM Table1, Counts
INNER JOIN Counts
ON Table1.name = Counts.name
ORDER BY Counts.cnt DESC
Oh, I see. You an use a subquery in the order by:
select t.*
from t
order by (select count(*) from t t2 where t2.name = t.name) desc, name;
Note that name is the second order by key. If two names have the same counts, then this keeps all the rows for a given name together.

oracle12c,sql,difference between count(*) and sum()

Tell me the difference between sql1 and sql2:
sql1:
select count(1)
from table_1 a
inner join table_2 b on a.key = b.key where a.id in (
select id from table_1 group by id having count(1) > 1
)
sql2:
select sum(a) from (
select count(1) as a
from table_1 a
inner join table_2 b on a.key = b.key group by a.id having count(1) > 1
)
Why is the output not the same?
The queries are not even similar. They are very different. Let's check the first one:
select count(1)
from table_1 a
inner join table_2 b
on a.key = b.key
where a.id in (
select id from table_1 group by id having count(1) > 1
) ;
You are first making an inner join:
select count(1)
from table_1 a
inner join table_2 b
on a.key = b.key
In this case, you can use count(1), count(id), count(*), it's equivalent. You are counting the common elements in both tables: those ones that have in common the key field.
After that, you are enforcing this:
where a.id in (
select id from table_1 group by id having count(1) > 1
)
In other words, that every "id" of the table_1 must be at least two times in the table_1 table.
And lastly, you are doing this:
select count(1)
In other words, counting those elements. So, translated into english you have done this:
get every record of table_1 and pair with records of table_2 for the id, and get only those that match
for the result above, filter out only the elements whose id of the table_1 appears more than one time
count that result
Let's see what happens with the second query:
select sum(a) from (
select count(1) as a
from table_1 a
inner join table_2 b
on a.key = b.key
group by a.id
having count(1) > 1
);
You are making the same inner join:
select count(1) as a
from table_1 a
inner join table_2 b
on a.key = b.key
but, you are grouping it by the id of the table:
group by a.id
and then filtering out only those elements who appear more than one time:
having count(1) > 1
The result so far are a set of records that have in common the key field in both tables, but grouped by the id: this means that only those fields that are at leas two times in the table_b are outputed of this join. After that, you group by id, collapsing those results into the table_1.id field and counting the result. I presume that very few records will match this strict criteria.
And lastly, you sum all those set.
When you use count(*) you count ALL the rows. The SUM() function is an aggregate function that returns the sum of all or distinct values in a set of values.

left join where the primary key has both integer and string

I need to join two tables where the primary keys have both integer and string value. When I use left join using id column as primary key I am getting the records related to the integer only.I would like to get the output for both integer and string as shown in the output. Can anyone assist please.
select t1.*
,t2.Position
from t1
left join t2
on t1.id=t2.id;
I am getting the output which related to id (integer) only.I would like to get the output for both integer id and string id.
how to use UNION DISTINCT here – Nrad
SELECT id FROM table1
UNION DISTINCT
SELECT id FROM table2
or
SELECT CAST(id AS CHAR) AS id FROM table2
UNION DISTINCT
SELECT id FROM table1
DISTINCT keyword is optional and may be skipped in both cases.
If you need the ordering shown then add
ORDER BY id + 0 = 0, id
to the end of a query.
i have some other different columns in both table that I need to take. – Nrad
SELECT id,
COALESCE (t1.name, t2.name) name,
t1.salary,
t2.position,
t1.department
FROM ( SELECT id FROM table1
UNION DISTINCT
SELECT id FROM table2 ) t0
LEFT JOIN table1 t1 USING (id)
LEFT JOIN table2 t2 USING (id)

How to compare two tables in Hive based on counts

I have below hive tables
Table_1
ID
1
1
2
Table_2
ID
1
2
2
I am comparing two tables based on count of ID in both tables, I need the output like below
ID
1 - 2records in table 1 and 1 record in Table 2
2 - one record in Table 1 and 2 records in table 2
Table_1 is parent table
i am using below query
select count(*),ID from Table_1 group by ID;
select count(*),ID from Table_2 group by ID;
Just do a full outer join on your queries with the on condition as X.id = Y.id, and then select * from the resultant table checking for nulls on either side.
Select id, concat(cnt1, " entries in table 1, ",cnt2, "entries in table 2") from (select * from (select count(*) as cnt1, id from table1 group by id) X full outer join (select count(*) as cnt2, id from table2 group by id)
on X.id=Y.id
)
Try This. You may use a case statement to check if it should be record / records etc.
SELECT m.id,
CONCAT (COALESCE(a.ct, 0), ' record in table 1, ', COALESCE(b.ct, 0),
' record in table 2')
FROM (SELECT id
FROM table_1
UNION
SELECT id
FROM table_2) m
LEFT JOIN (SELECT Count(*) AS ct,
id
FROM table_1
GROUP BY id) a
ON m.id = a.id
LEFT JOIN (SELECT Count(*) AS ct,
id
FROM table_2
GROUP BY id) b
ON m.id = b.id;
You could use this Python program to do a full comparison of 2 Hive tables:
https://github.com/bolcom/hive_compared_bq
If you want a quick comparison just based on counts, then pass the "--just-count" option (you can also specify the group by column with "--group-by-column").
The script also allows you to visually see all the differences on all rows and all columns if you want a complete validation.

How to get the result of select statement grouped by a column to perform join statement on it?

How to get the result of select statement grouped by a column to perform join statement on it ?
You should enclose the select statement that contains the GROUP BY instead of one of the joined table, something like this:
SELECT t1.Id, ....
FROM Table1 t1
INNER JOIN
(
SELECT Id, COUNT(*)
FROM Table2
GROUP BY Id
) t2 ON t1.Id = t2.Table1Id
This might help you:
suppose there are two table
1.student
(stud_id pk)
(branch_id fk)
2. branch
(branch_id pk)
(branch name varchar)
(city varchar)
select * from student s,branch b where s.branch_id=b.branch_id group by b.city