Multiple count based on dynamic criteria - sql

I have two database for which I want to compare the amount of times a case appears.
TAB1:
ID Sequence
A2D 1
A2D 2
A2D 3
A3D 1
TAB2:
ID Sequence
A2D 1
A2D 2
A3D 1
A3D 2
Now, for this example, I am trying to get this result:
ID Table1 Table2
A2D 3 2
A3D 1 2
I have tried these code without any success:
SELECT R1.ID as ID, COUNT(R1.ID) as Table1,
COUNT(R2.ID) as Table2
FROM TAB1 AS R1, TAB2 AS R2
WHERE R1.ID = R2.ID
GROUP BY R1.ID
This one gave me wrong count values...
Also, this one simply crash:
select
(
select count(*) as Table1
from TAB1
where ID = R1.ID
),(
select count(*) as Table2
from TAB2
where ID= R1.ID
)
FROM TAB1 AS R1
As you can see though, I am trying to have my criteria dynamic. Most examples I found were including basic hard-coded criteria. But for my case, I want the query to look at my first table ID, count the amount of time it appears, do it for the 2nd table with the same ID, then move on to the next ID.
If my question lacks information or is confusing just ask me, I'll do my best to be more precise.
Thanks in advance !

Here I am using a UNION ALL as a subquery
SELECT ID, SUM(T1) AS Table1, SUM(T2) AS Table2
FROM
(SELECT ID, COUNT(ID) AS T1, 0 AS T2 FROM TAB1 GROUP BY ID
UNION ALL
SELECT ID, 0 AS T1, COUNT(ID) AS T2 FROM TAB2 GROUP BY ID)
GROUP BY ID
HAVING SUM(T1)>0 AND SUM(T2)>0

I used a different approach, but unfortunately I have to use two queries, i still don't know if they can be combined together. The first one is just for making sums of both tables, and combining the results:
SELECT "Tab1" AS [Table], Tab1.ID, Count(*) AS Total
FROM Tab1
GROUP BY "Tab1", Tab1.ID
UNION SELECT "Tab2" AS [Table], Tab2.ID, Count(*) AS Total
FROM Tab2
GROUP BY "Tab2", Tab2.ID
and, since Access supports Pivot queries, you can use this:
TRANSFORM Sum(qrySums.[Total]) AS Total
SELECT qrySums.[ID]
FROM qrySums
GROUP BY qrySums.[ID]
PIVOT qrySums.[Table];

Not sure if I understand your question, but you could try something like this:
SELECT DISTINCT t.ID,
(SELECT COUNT(ID) FROM R1 WHERE ID = t.ID) AS table1,
(SELECT COUNT(ID) FROM R2 WHERE ID = t.ID) AS table2
FROM table1 t

To get the desired results, I broke it down into two sub-queries (R1SQ and R2SQ) and a main UNION query - R1R2 that uses inner, left and right joins to include all row entries including those rows that do not appear in both tables:
R1SQ
SELECT R1.Builder, Count(R1.Builder) AS Table1
FROM R1
GROUP BY R1.Builder;
R2SQ
SELECT R2.Builder_E, Count(R2.Builder_E) AS Table2
FROM R2
GROUP BY R2.Builder_E;
R1R2
SELECT R1SQ.Builder, R1SQ.Table1, R2SQ.Table2
FROM R1SQ INNER JOIN R2SQ ON R1SQ.Builder = R2SQ.Builder_E
UNION
SELECT R1SQ.Builder, R1SQ.Table1, 0 AS Table2
FROM R1SQ LEFT JOIN R2SQ ON R1SQ.Builder = R2SQ.Builder_E
WHERE (((R2SQ.Builder_E) Is Null))
UNION
SELECT R2SQ.Builder_E, 0 AS Table1, R2SQ.Table2
FROM R1SQ RIGHT JOIN R2SQ ON R1SQ.Builder = R2SQ.Builder_E
WHERE (((R1SQ.Builder) Is Null))
ORDER BY R1SQ.Builder;

Related

How to combine multiple SELECT statements into a single query & get a single result output

I have multiple SELECT queries which is ran against different tables.
The output of all the queries have the same number of rows (every query when ran individually will have the same number of rows). Is there a way I can combine the output of all these queries into a single result? (Keep out from first query and add the output of next query as a column to the output of the next query). I dont want to save these tables into database as I am just doing some validation testing.
Example:
SELECT AAA,BBB,CCC FROM Table1
SELECT Table2.DDD, Table1.AAA
FROM Table2
INNER JOIN Table1
ON Table1.AAA = Table2.AAA
I tried writing combining the query as
SELECT Table1.AAA,Table1.BBB,Table1.CCC,T1.DDD
FROM Table1,
(SELECT Table2.DDD, Table1.AAA
FROM Table2
INNER JOIN Table1
ON Table1.AAA = Table2.AAA)T1
I tried doing the above combined query, but instead of getting 11 rows as output (both queries above had result of 11 rows), I am getting 35 rows as output.
Hope the question made sense!
You'll need to specify a criteria to match each row the first query with which row of the second query.
If, for example, the column AAA is unique in both queries and you want to match rows with the same values you could do:
select a.*, b.*
from (
SELECT AAA,BBB,CCC FROM Table1
) a
full join join (
SELECT Table2.DDD, Table1.AAA
FROM Table2
INNER JOIN Table1
ON Table1.AAA = Table2.AAA
) b on b.aaa = a.aaa
If there aren't any clear matching rules, you can produce an artificial row number on each result set and use it to match rows. For example:
select
a.aaa, a.bbb, a.ccc,
b.ddd, b.aaa
from (
SELECT AAA, BBB, CCC,
row_number() over(order by aaa) as rn
FROM Table1
) a
full join join (
SELECT Table2.DDD, Table1.AAA,
row_number() over(order by table1.aaa, table2.ddd) as rn
FROM Table2
INNER JOIN Table1
ON Table1.AAA = Table2.AAA
) b on b.rn = a.rn
If you have several results and want to have all of them as additional columns you can simply use ",":
create table temp1 as select '1' as c1 from DUAL;
create table temp2 as select '2' as c2 from DUAL;
create table temp3 as select '3' as c3 from DUAL;
select a.c1, b.c2, c.c3 from temp1 a, (select c2 from temp2) b, (select c3 from temp3) c;
An alternative could also be that you want to have all the results as additional rows then you would use UNION ALL between the individual results.

How to compare two tables in Hive based on counts

I have below hive tables
Table_1
ID
1
1
2
Table_2
ID
1
2
2
I am comparing two tables based on count of ID in both tables, I need the output like below
ID
1 - 2records in table 1 and 1 record in Table 2
2 - one record in Table 1 and 2 records in table 2
Table_1 is parent table
i am using below query
select count(*),ID from Table_1 group by ID;
select count(*),ID from Table_2 group by ID;
Just do a full outer join on your queries with the on condition as X.id = Y.id, and then select * from the resultant table checking for nulls on either side.
Select id, concat(cnt1, " entries in table 1, ",cnt2, "entries in table 2") from (select * from (select count(*) as cnt1, id from table1 group by id) X full outer join (select count(*) as cnt2, id from table2 group by id)
on X.id=Y.id
)
Try This. You may use a case statement to check if it should be record / records etc.
SELECT m.id,
CONCAT (COALESCE(a.ct, 0), ' record in table 1, ', COALESCE(b.ct, 0),
' record in table 2')
FROM (SELECT id
FROM table_1
UNION
SELECT id
FROM table_2) m
LEFT JOIN (SELECT Count(*) AS ct,
id
FROM table_1
GROUP BY id) a
ON m.id = a.id
LEFT JOIN (SELECT Count(*) AS ct,
id
FROM table_2
GROUP BY id) b
ON m.id = b.id;
You could use this Python program to do a full comparison of 2 Hive tables:
https://github.com/bolcom/hive_compared_bq
If you want a quick comparison just based on counts, then pass the "--just-count" option (you can also specify the group by column with "--group-by-column").
The script also allows you to visually see all the differences on all rows and all columns if you want a complete validation.

Join table on Count

I have two tables in Access, one containing IDs (not unique) and some Name and one containing IDs (not unique) and Location. I would like to return a third table that contains only the IDs of the elements that appear more than 1 time in either Names or Location.
Table 1
ID Name
1 Max
1 Bob
2 Jack
Table 2
ID Location
1 A
2 B
Basically in this setup it should return only ID 1 because 1 appears twice in Table 1 :
ID
1
I have tried to do a JOIN on the tables and then apply a COUNT but nothing came out.
Thanks in advance!
Here is one method that I think will work in MS Access:
(select id
from table1
group by id
having count(*) > 1
) union -- note: NOT union all
(select id
from table2
group by id
having count(*) > 1
);
MS Access does not allow union/union all in the from clause. Nor does it support full outer join. Note that the union will remove duplicates.
Simple Group By and Having clause should help you
select ID
From Table1
Group by ID
having count(1)>1
union
select ID
From Table2
Group by ID
having count(1)>1
Based on your description, you do not need to join tables to find duplicate records, if your table is what you gave above, simply use:
With A
as
(
select ID,count(*) as Times From table group by ID
)
select * From A where A.Times>1
Not sure I understand what query you already tried, but this should work:
select table1.ID
from table1 inner join table2 on table1.id = table2.id
group by table1.ID
having count(*) > 1
Or if you have ID's in one table but not the other
select table1.ID
from table1 full outer join table2 on table1.id = table2.id
group by table1.ID
having count(*) > 1

Join between two tables, number of row

I hope you're well.
I try to write a sql query with a join between two tables like below :
table1 (id_master, id)
1,1
1,2
1,3
1,4
1,5
And the second table
table2 (id_master, id)
1,1
1,2
1,3
1,4
As you can see, each table contain ID_master & id.
The table2 contains the acknownledgement (ack) of table1. Each row in the table1 must have an "ack" in the table2.
In my example, I have no result because (table1 (1,5) hasn't got an ack in table2 and I want result when table1.row (1,5) has got a ack in table2.
I have tried with join but i have result when we have the first "ack". I want have result when I have all "ack".
I hope to be clear.
thanks for your help.
kind regards
EDIT :
Thanks stripathi & jpw,
Example1:
table1 (id_master, id)
1,A
1,B
1,C
2,D
2,E
the second table
table2 (id_master, id)
1,A
1,B
2,D
2,E
My query's result must be :
2,D
2,E
Because we can find the rows(2,D) & (2,E) in the two tables, but it isn't the case for (1,*) (it miss the (1,C) in the table2).
I think both of these two queries should do what you want, and they seem to work using Oracle 11g R2 (see this SQL Fiddle). Note that the result might be wrong if the second table contains items that are not present in the first table.
select *
from table1
where id_master in (
select a.id_master
from table1 a
group by a.id_master
having count(distinct a.id) = (
select count(distinct b.id)
from table2 b
where a.id_master = b.id_master
group by b.id_master
)
);
select *
from table1 a
where not exists (
select id from Table1 where id_master = a.id_master
minus
select id from Table2
);
If you using oracle you can use ROWNUM to get row number of first ack. You can try this :
SELECT ID,ID_MASTER FROM(
SELECT ID,ID_MASTER,ROWNUM RR
FROM TABLE2
ORDER BY ID_MASTER,ID ASC) T2
WHERE RR >= (
SELECT R FROM(
SELECT ID_MASTER,ID, ROWNUM R
FROM TABLE1
ORDER BY ID_MASTER,ID ASC
) T1
WHERE T1.ID_MASTER||T1.ID NOT IN(SELECT ID_MASTER||ID FROM TABLE2)
)

select first N distinct rows without inner select in oracle

I have something like the following structure: Table1 -> Table2 relationship is 1:m
I need to perform queries similar to the next one:
select Table1.id from Table1 left outer join Table2 on (Table1.id1 = Table2.id2) where Table2.name like '%a%' and rownum < 11
i.e. I want first 10 ids from Table 1 which fulfils conditions in Table2. The problem is that I've to use distinct, but the distinct clause applies after 'rownum < 11', so the result could be e.g. 5 records even if their number is more than 10.
The apparent solution is to use the following:
select id from ( select Table1.id from Table1 left outer join Table2 on (Table1.id1 = Table2.id2) where Table2.name like '%a%' ) where rownum < 11
But I'm afraid of performance of such a query. If Table1 contains about 300k records, and Table2 contains about 700k records, wouldn't such a query be really slow?
Is there another query, but without inner select? Unluckily, I want to avoid using inner selects.
Unluckily, I want to avoid using inner
selects
With having the WHERE clause on TABLE2, you are filtering the select to an INNER JOIN (ie. since Table2.name IS null <> Table2.name like '%a%' you will only get results where the join is INNER to one another. Also, the %a% without a function based index will result in a full table scan on each iteration.
but #lweller is completely correct, to do the query correctly you will need to use a subquery. keep in mind, without an ORDER BY you have no guarantee of the order of your top X records (it may always 'appear' that the values conform to the primary key or whatnot, but there is no guarantee.
WITH TABLE1 AS(SELECT 1 ID FROM DUAL
UNION ALL
SELECT 2 ID FROM DUAL
UNION ALL
SELECT 3 ID FROM DUAL
UNION ALL
SELECT 4 ID FROM DUAL
UNION ALL
SELECT 5 ID FROM DUAL) ,
TABLE2 AS(SELECT 1 ID, 'AAA' NAME FROM DUAL
UNION ALL
SELECT 2 ID, 'ABB' NAME FROM DUAL
UNION ALL
SELECT 3 ID, 'ACC' NAME FROM DUAL
UNION ALL
SELECT 4 ID, 'ADD' NAME FROM DUAL
UNION ALL
SELECT 1 ID, 'BBB' NAME FROM DUAL
) ,
sortable as( --here is the subquery
SELECT
Table1.ID ,
ROW_NUMBER( ) OVER (ORDER BY Table2.NAME NULLS LAST) ROWOverName , --this wil handle the sort
table2.name
from
Table1
LEFT OUTER JOIN --this left join it moot, pull the WHERE table2.name into the join to have it LEFT join as expected
Table2
on
(
Table1.id = Table2.id
)
WHERE
Table2.NAME LIKE '%A%')
SELECT *
FROM sortable
WHERE ROWOverName <= 2;
-- you can drop the ROW_NUMBER( ) analytic function and replace the final query as such (as you initially indicated)
SELECT *
FROM sortable
WHERE
ROWNUM <= 2
ORDER BY sortable.NAME --make sure to put in an order by!
;
You don't need DISTINCT here at all, and there is nothing bad in subqueries as such.
SELECT id
FROM Table1
WHERE id IN
(
SELECT id
FROM Table2
WHERE name LIKE '%a%'
)
AND rownum < 11
Note that the order is not guaranteed. To guarantee order, you have to use a nested query:
SELECT id
FROM (
SELECT id
FROM Table1
WHERE id IN
(
SELECT id
FROM Table2
WHERE name LIKE '%a%'
)
ORDER BY
id -- or whatever else
)
WHERE rownum < 11
There is no way to do it without nested queries (or the CTE).
For me there is no reason to be afraid of performance. I think the sub select ist the best way to solve your problem. And if you want don't trust me, take a look at explain plan of your query and you will see that it behave not so bad as you might think.