Select count(*) from multiple tables in Hive - hive

I have tables with the same name in 2 different schemas. What I want to do is get a count comparison in the 2 tables in the format
TableName : Count1 : Count2
How can I achieve this via Hive query?

Use UNION ALL:
select 'db1.table_name' table_name, count(col1) count1, count(col2) count2 from db1.table_name
UNION ALL
select 'db2.table_name' table_name, count(col1) count1, count(col2) count2 from db2.table_name

You can do a cross join of the count queries.
select t1.count1,t2.count2
from (select count(*) as count1 from tbl1) t1
cross join (select count(*) as count2 from tbl2) t2

Try full outer join
select tt1.cn,tt2.cn from
(select count(1) as cn from db1.table) tt1
full outer join
(select count(1) as cn from db2.table ) tt2
on tt1.cn=tt2.cn;

Related

How do I add two counts in HIVE hql?

So I am aware I can get the count of one table by using select count(*) from table1;
I have tried
select(select count() from table1) table1,
(select count() from table2) table2
from dual;
However it does not work.
Two possible solutions. Cross join and Union all + aggregation
Cross join:
select t1.cnt as table1_count,
t2.cnt as table2_count
from
(select count(*) cnt from table1) t1
cross join
(select count(*) cnt from table2) t2
Union all + max aggregation:
select max(t1_cnt) table1_count, max(t2_cnt) table2_count
from
(
select count(*) t1_cnt, 0 t2_cnt from table1
union all
select 0 t1_cnty, count(*) t2_cnt from table2
) s

Fastest Query to find records which have equal values and specific different value

I have a table transactions with the following columns:
transactionId, systemId, subId and type
I need to find all transactionIds that have
subId and type as equals but different systemId
I tried the following query but I am not sure that it is the fastest query to use:
SELECT DISTINCT transactionId, T1.systemId system1,
T2.systemId system2, T1.subId
FROM transactions T1
INNER JOIN transactions T2
WHERE T1.subId = T2.subId
AND T1.type = T2.type
AND T1.systemId != T2.systemId
Simply use a GROUP BY with a HAVING clause:
SELECT DISTINCT transactionId
FROM t
GROUP BY transactionId, subId, type
HAVING COUNT(DISTINCT systemId) > 1
There are many factors that affect execution efficiency, and you can write all query for comparison.
such as
select *
from (
select t1.*,
count(distinct t1.systemId) over(partition by t1.type, t1.subId) cot
from transactions t1
) t1
where t1.cot > 1
;
select *
from transactions t1
where exists(
select *
from (
select v1.type, v1.subId
from transactions v1
group by v1.type, v1.subId
having count(distinct v1.systemId) > 1
) vv1
where vv1.type = t1.type
and vv1.subId = t1.subId
)
;
Depending on the dbms you are using, there will be more different query ways

How do I count three different distinct values and group on an ID in MS-Access?

So I know MS-Access does not allow SELECT COUNT(DISTINCT....) FROM ..., but I am trying to find a more viable alternative to the usual standard of
SELECT COUNT(*) FROM (SELECT DISTINCT Name FROM table1)
My problem is I am trying to do three separate Count functions and group them on ID. If I use the method above, it is giving me the total unique value count for the whole table instead of the total count for only the value of ID. I tried doing
(SELECT COUNT(*) FROM (SELECT DISTINCT Name FROM table1 as T2
WHERE T2.ColumnA = T1.ColumnA)) As MyVal
FROM table1 as T1
but it tells me I need to specify a value for T1.ColumnA.
The SQL query I am trying to accomplish is this:
SELECT ID
COUNT(DISTINCT ColumnA) as CA,
COUNT(DISTINCT ColumnB) as CB,
COUNT(DISTINCT ColumnC) as CC
FROM table1
GROUP BY ID
Any ideas?
You can use subqueries. Assuming you have a table where each id occurs once:
select (select count(*)
from (select columnA
from table1 t1
where t1.id = t.id
group by columnA
) as a
) as num_a,
(select count(*)
from (select columnB
from table1 t1
where t1.id = t.id
group by columnB
) as b
) as num_b,
(select count(*)
from (select columnC
from table1 t1
where t1.id = t.id
group by columnC
) as c
) as num_c
from <table with ids> as t;
I'm not sure if you'll think this is "viable".
EDIT:
This makes it even more complicated . . . it suggests that MS Access doesn't support correlation clauses more than one level deep (might you consider switching to another database?).
In any case, the brute force way:
select a.id, a.numA, b.numB, c.numC
from ((select id, count(*) as numA
from (select id, columnA
from table1 t1
group by id, columnA
) as a
) as a inner join
(select id, count(*) as numB
from (select id, columnB
from table1 t1
group by id, columnB
) as b
) as b
on a.id = b.id
) inner join
(select id, count(*) as numC
from (select id, columnC
from table1 t1
group by id, columnC
) as c
) c
on c.id = a.id;

SQL WHERE Subquery in Field List

I have query like:
SELECT field
FROM table
WHERE
(
SELECT COUNT(*)
FROM table2
WHERE table2.field = table.field
)
!=
(
SELECT COUNT(*)
FROM table3
WHERE table3.field = table.field
)
Now I want to have those WHERE subqueries in my field list like:
SELECT field, count1, count2
FROM table
WHERE
(
SELECT COUNT(*)
FROM table2
WHERE table2.field = table.field
) AS Count1
!=
(
SELECT COUNT(*)
FROM table3
WHERE table3.field = table.field
) AS Count2
Is this possible? Of course I could put those subqueries in the field list, but then I can't compare them.
Any ideas?
You can do this if you use Sql Server:
SELECT field, ca2.c2, ca3.c3
FROM table t
cross apply(SELECT COUNT(*) c2
FROM table2 t2
WHERE t2.field = t.field)ca2
cross apply(SELECT COUNT(*) c3
FROM table3 t3
WHERE t3.field = t.field)ca3
where ca2.c2 <> ca1.c1
Use correlated sub-selects to count. Wrap up in a derived table:
select dt.* from
(
SELECT field,
(SELECT COUNT(*)
FROM table2
WHERE table2.field = table.field) as cnt1,
(SELECT COUNT(*)
FROM table3
WHERE table3.field = table.field) as cnt2
FROM table
) dt
where dt.cnt1 <> dt.cnt2
You just need to use a Derived Table:
select *
from
(
SELECT field,
(
SELECT COUNT(*)
FROM table2
WHERE table2.field = table.field
) AS Count1,
(
SELECT COUNT(*)
FROM table3
WHERE table3.field = table.field
) AS Count2
FROM table
) dt
WHERE Count1 <> Count2

Sum on subqueries on SQL Server

I have a query with some subqueries inside and I want to add a sum query to sum them all.
How can I do that?
example:
Id,
(SELECT COUNT(*) FROM table1 LEFT JOIN table2 on ...) as col1,
(SELECT COUNT(*) FROM table3 LEFT JOIN table4 on ...) as col2,
** Sum of both col1 and col2 here **
Try this:
SELECT ID, col1, col2, [Total] = (col1 + col2)
FROM (
SELECT Id,
(SELECT COUNT(*) FROM table1 LEFT JOIN table2 on ...) as col1,
(SELECT COUNT(*) FROM table3 LEFT JOIN table4 on ...) as col2
FROM [TABLE]) T
Hope that helps.
the easiest way would be to treat all your query as a subquery
select Id, col1 + col2 as total
from
(<yourCode>) s
Because it's not possible to use alias in the same "level of query" in the select clause.