Select count(*) from multiple tables in Hive

Select count(*) from multiple tables in Hive - hive

I have tables with the same name in 2 different schemas. What I want to do is get a count comparison in the 2 tables in the format
TableName : Count1 : Count2
How can I achieve this via Hive query?

Use UNION ALL:
select 'db1.table_name' table_name, count(col1) count1, count(col2) count2 from db1.table_name
UNION ALL
select 'db2.table_name' table_name, count(col1) count1, count(col2) count2 from db2.table_name

You can do a cross join of the count queries.
select t1.count1,t2.count2
from (select count(*) as count1 from tbl1) t1
cross join (select count(*) as count2 from tbl2) t2

Try full outer join
select tt1.cn,tt2.cn from
(select count(1) as cn from db1.table) tt1
full outer join
(select count(1) as cn from db2.table ) tt2
on tt1.cn=tt2.cn;

Related

How do I add two counts in HIVE hql?

So I am aware I can get the count of one table by using select count(*) from table1;
I have tried
select(select count() from table1) table1,
(select count() from table2) table2
from dual;
However it does not work.

Two possible solutions. Cross join and Union all + aggregation
Cross join:
select t1.cnt as table1_count,
t2.cnt as table2_count
from
(select count(*) cnt from table1) t1
cross join
(select count(*) cnt from table2) t2
Union all + max aggregation:
select max(t1_cnt) table1_count, max(t2_cnt) table2_count
from
(
select count(*) t1_cnt, 0 t2_cnt from table1
union all
select 0 t1_cnty, count(*) t2_cnt from table2
) s

Fastest Query to find records which have equal values and specific different value

I have a table transactions with the following columns:
transactionId, systemId, subId and type
I need to find all transactionIds that have
subId and type as equals but different systemId
I tried the following query but I am not sure that it is the fastest query to use:
SELECT DISTINCT transactionId, T1.systemId system1,
T2.systemId system2, T1.subId
FROM transactions T1
INNER JOIN transactions T2
WHERE T1.subId = T2.subId
AND T1.type = T2.type
AND T1.systemId != T2.systemId

Simply use a GROUP BY with a HAVING clause:
SELECT DISTINCT transactionId
FROM t
GROUP BY transactionId, subId, type
HAVING COUNT(DISTINCT systemId) > 1

There are many factors that affect execution efficiency, and you can write all query for comparison.
such as
select *
from (
select t1.*,
count(distinct t1.systemId) over(partition by t1.type, t1.subId) cot
from transactions t1
) t1
where t1.cot > 1
;
select *
from transactions t1
where exists(
select *
from (
select v1.type, v1.subId
from transactions v1
group by v1.type, v1.subId
having count(distinct v1.systemId) > 1
) vv1
where vv1.type = t1.type
and vv1.subId = t1.subId
)
;
Depending on the dbms you are using, there will be more different query ways

How do I count three different distinct values and group on an ID in MS-Access?

So I know MS-Access does not allow SELECT COUNT(DISTINCT....) FROM ..., but I am trying to find a more viable alternative to the usual standard of
SELECT COUNT(*) FROM (SELECT DISTINCT Name FROM table1)
My problem is I am trying to do three separate Count functions and group them on ID. If I use the method above, it is giving me the total unique value count for the whole table instead of the total count for only the value of ID. I tried doing
(SELECT COUNT(*) FROM (SELECT DISTINCT Name FROM table1 as T2
WHERE T2.ColumnA = T1.ColumnA)) As MyVal
FROM table1 as T1
but it tells me I need to specify a value for T1.ColumnA.
The SQL query I am trying to accomplish is this:
SELECT ID
COUNT(DISTINCT ColumnA) as CA,
COUNT(DISTINCT ColumnB) as CB,
COUNT(DISTINCT ColumnC) as CC
FROM table1
GROUP BY ID
Any ideas?

You can use subqueries. Assuming you have a table where each id occurs once:
select (select count(*)
from (select columnA
from table1 t1
where t1.id = t.id
group by columnA
) as a
) as num_a,
(select count(*)
from (select columnB
from table1 t1
where t1.id = t.id
group by columnB
) as b
) as num_b,
(select count(*)
from (select columnC
from table1 t1
where t1.id = t.id
group by columnC
) as c
) as num_c
from <table with ids> as t;
I'm not sure if you'll think this is "viable".
EDIT:
This makes it even more complicated . . . it suggests that MS Access doesn't support correlation clauses more than one level deep (might you consider switching to another database?).
In any case, the brute force way:
select a.id, a.numA, b.numB, c.numC
from ((select id, count(*) as numA
from (select id, columnA
from table1 t1
group by id, columnA
) as a
) as a inner join
(select id, count(*) as numB
from (select id, columnB
from table1 t1
group by id, columnB
) as b
) as b
on a.id = b.id
) inner join
(select id, count(*) as numC
from (select id, columnC
from table1 t1
group by id, columnC
) as c
) c
on c.id = a.id;

SQL WHERE Subquery in Field List

I have query like:
SELECT field
FROM table
WHERE
(
SELECT COUNT(*)
FROM table2
WHERE table2.field = table.field
)
!=
(
SELECT COUNT(*)
FROM table3
WHERE table3.field = table.field
)
Now I want to have those WHERE subqueries in my field list like:
SELECT field, count1, count2
FROM table
WHERE
(
SELECT COUNT(*)
FROM table2
WHERE table2.field = table.field
) AS Count1
!=
(
SELECT COUNT(*)
FROM table3
WHERE table3.field = table.field
) AS Count2
Is this possible? Of course I could put those subqueries in the field list, but then I can't compare them.
Any ideas?

You can do this if you use Sql Server:
SELECT field, ca2.c2, ca3.c3
FROM table t
cross apply(SELECT COUNT(*) c2
FROM table2 t2
WHERE t2.field = t.field)ca2
cross apply(SELECT COUNT(*) c3
FROM table3 t3
WHERE t3.field = t.field)ca3
where ca2.c2 <> ca1.c1

Use correlated sub-selects to count. Wrap up in a derived table:
select dt.* from
(
SELECT field,
(SELECT COUNT(*)
FROM table2
WHERE table2.field = table.field) as cnt1,
(SELECT COUNT(*)
FROM table3
WHERE table3.field = table.field) as cnt2
FROM table
) dt
where dt.cnt1 <> dt.cnt2

You just need to use a Derived Table:
select *
from
(
SELECT field,
(
SELECT COUNT(*)
FROM table2
WHERE table2.field = table.field
) AS Count1,
(
SELECT COUNT(*)
FROM table3
WHERE table3.field = table.field
) AS Count2
FROM table
) dt
WHERE Count1 <> Count2

Sum on subqueries on SQL Server

I have a query with some subqueries inside and I want to add a sum query to sum them all.
How can I do that?
example:
Id,
(SELECT COUNT(*) FROM table1 LEFT JOIN table2 on ...) as col1,
(SELECT COUNT(*) FROM table3 LEFT JOIN table4 on ...) as col2,
** Sum of both col1 and col2 here **

Try this:
SELECT ID, col1, col2, [Total] = (col1 + col2)
FROM (
SELECT Id,
(SELECT COUNT(*) FROM table1 LEFT JOIN table2 on ...) as col1,
(SELECT COUNT(*) FROM table3 LEFT JOIN table4 on ...) as col2
FROM [TABLE]) T
Hope that helps.

the easiest way would be to treat all your query as a subquery
select Id, col1 + col2 as total
from
(<yourCode>) s
Because it's not possible to use alias in the same "level of query" in the select clause.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Select count(*) from multiple tables in Hive - hive

I have tables with the same name in 2 different schemas. What I want to do is get a count comparison in the 2 tables in the format TableName : Count1 : Count2 How can I achieve this via Hive query?

Use UNION ALL: select 'db1.table_name' table_name, count(col1) count1, count(col2) count2 from db1.table_name UNION ALL select 'db2.table_name' table_name, count(col1) count1, count(col2) count2 from db2.table_name

You can do a cross join of the count queries. select t1.count1,t2.count2 from (select count() as count1 from tbl1) t1 cross join (select count() as count2 from tbl2) t2

Try full outer join select tt1.cn,tt2.cn from (select count(1) as cn from db1.table) tt1 full outer join (select count(1) as cn from db2.table ) tt2 on tt1.cn=tt2.cn;

Related

How do I add two counts in HIVE hql?

Fastest Query to find records which have equal values and specific different value

How do I count three different distinct values and group on an ID in MS-Access?

SQL WHERE Subquery in Field List

Sum on subqueries on SQL Server

Categories

Resources

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Select count(*) from multiple tables in Hive - hive

I have tables with the same name in 2 different schemas. What I want to do is get a count comparison in the 2 tables in the format TableName : Count1 : Count2 How can I achieve this via Hive query?

Use UNION ALL: select 'db1.table_name' table_name, count(col1) count1, count(col2) count2 from db1.table_name UNION ALL select 'db2.table_name' table_name, count(col1) count1, count(col2) count2 from db2.table_name

You can do a cross join of the count queries. select t1.count1,t2.count2 from (select count(*) as count1 from tbl1) t1 cross join (select count(*) as count2 from tbl2) t2

Try full outer join select tt1.cn,tt2.cn from (select count(1) as cn from db1.table) tt1 full outer join (select count(1) as cn from db2.table ) tt2 on tt1.cn=tt2.cn;

Related

How do I add two counts in HIVE hql?

Fastest Query to find records which have equal values and specific different value

How do I count three different distinct values and group on an ID in MS-Access?

SQL WHERE Subquery in Field List

Sum on subqueries on SQL Server

Categories

Resources

You can do a cross join of the count queries. select t1.count1,t2.count2 from (select count() as count1 from tbl1) t1 cross join (select count() as count2 from tbl2) t2