How do I merge data from two tables in a single database call into the same columns? - sql

If I run the two statements in batch will they return one table to two to my sqlcommand object with the data merged. What I am trying to do is optimize a search by searching twice, the first time on one set of data and then a second on another. They have the same fields and I’d like to have all the records from both tables show and be added to each other. I need this so that I can sort the data between both sets of data but short of writing a stored procedure I can’t think of a way of doing this.
Eg. Table 1 has columns A and B, Table 2 has these same columns but different data source. I then wan to merge them so that if a only exists in one column it is added to the result set and if both exist it eh tables the column B will be summed between the two.
Please note that this is not the same as a full outer join operation as that does not merge the data.
[EDIT]
Here's what the code looks like:
Select * From
(Select ID,COUNT(*) AS Count From [Table1]) as T1
full outer join
(Select ID,COUNT(*) AS Count From [Table2]) as T2
on t1.ID = T2.ID

Perhaps you're looking for UNION?
IE:
SELECT A, B FROM Table1
UNION
SELECT A, B FROM Table2

Possibly:
select table1.a, table1.b
from table1
where table1.a not in (select a from table2)
union all
select table1.a, table1.b+table2.b as b
from table1
inner join table2 on table1.a = table2.a
edit: perhaps you would benefit from unioning the tables before counting. e.g.
select id, count() as count from
(select id from table1
union all
select id from table2)

I'm not sure if I understand completely but you seem to be asking about a UNION
SELECT A,B
FROM tableX
UNION ALL
SELECT A,B
FROM tableY

To do it, you would go:
SELECT * INTO TABLE3 FROM TABLE1
UNION
SELECT * FROM TABLE2
Provided both tables have the same columns

I think what you are looking for is this, but I am not sure I am understanding your language correctly.
select id, sum(count) as count
from (
select id, count() as count
from table1
union all
select id, count() as count
from table2
) a
group by id

Related

How can I divide two columns and show results per row?

I would not be able to do what I want :(, I explain:
I have two tables:
Table1
Table2
I am looking for a query where I can get FALSE/(FALSE+TRUE).
So far I managed to do it over the total using this:
select
(select count(*) from (select Drink from Table1
where Drink=False)) as F,
(select count(*) from (select Drink from Table1
where Drink=True)) as T,
(F/(F+T)) as total
With that I can get the total of the column I want, I just have to change, if I want, Drink-->Food.
But now I would like the final result by Country, that is, something similar to this:
Expected result
In order not to break the forum rules, I'll leave what I've tried so far:
select * from (select distinct t2.Country_Name, count(t1.Drink)
from Table 1 t1
left join Table 2 t2 on t2.Country_ID=t1.Country_ID
where Drink=False
group by t2.Country_Name) as F
union
select * from (select distinct t2.Country_Name, count(t1.Drink)
from Table 1 t1
left join Table 2 t2 on t2.Country_ID=t1.Country_ID
where Drink=True
group by t2.Country_Name) as T
But it doesn't even bring me what I expect :(
I don't know if it helps much, but I'm using Snowflake
Any help is welcome. Thank you!
The first query could be simpliefied using COUNT_IF:
SELECT COUNT_IF(Drink=False)/COUNT(*)
FROM Table1;
Grouped by Country:
SELECT t2.Country_Name, COUNT_IF(t1.Drink=False)/COUNT(*)
FROM Table1 AS t1
JOIN Table2 AS t2
ON t2.Country_ID=t1.Country_ID
GROUP BY t2.Country_Name;

Comparing two tables for equality in HIVE

I have two tables, table1 and table2. Each with the same columns:
key, c1, c2, c3
I want to check to see if these tables are equal to eachother (they have the same rows). So far I have these two queries (<> = not equal in HIVE):
select count(*) from table1 t1
left outer join table2 t2
on t1.key=t2.key
where t2.key is null or t1.c1<>t2.c1 or t1.c2<>t2.c2 or t1.c3<>t2.c3
And
select count(*) from table1 t1
left outer join table2 t2
on t1.key=t2.key and t1.c1=t2.c1 and t1.c2=t2.c2 and t1.c3=t2.c3
where t2.key is null
So my idea is that, if a zero count is returned, the tables are the same. However, I'm getting a zero count for the first query, and a non-zero count for the second query. How exactly do they differ? If there is a better way to check this certainly let me know.
The first one excludes rows where t1.c1, t1.c2, t1.c3, t2.c1, t2.c2, or t2.c3 is null. That means that you effectively doing an inner join.
The second one will find rows that exist in t1 but not in t2.
To also find rows that exist in t2 but not in t1 you can do a full outer join. The following SQL assumes that all columns are NOT NULL:
select count(*) from table1 t1
full outer join table2 t2
on t1.key=t2.key and t1.c1=t2.c1 and t1.c2=t2.c2 and t1.c3=t2.c3
where t1.key is null /* this condition matches rows that only exist in t2 */
or t2.key is null /* this condition matches rows that only exist in t1 */
If you want to check for duplicates and the tables have exactly the same structure and the tables do not have duplicates within them, then you can do:
select t.key, t.c1, t.c2, t.c3, count(*) as cnt
from ((select t1.*, 1 as which from table1 t1) union all
(select t2.*, 2 as which from table2 t2)
) t
group by t.key, t.c1, t.c2, t.c3
having cnt <> 2;
There are various ways that you can relax the conditions in the first paragraph, if necessary.
Note that this version also works when the columns have NULL values. These might be causing the problem with your data.
Well, the best way is calculate the hash sum of each table, and compare the sum of hash.
So no matter how many column are they, no matter what data type are they, as long as the two table has the same schema, you can use following query to do the comparison:
select sum(hash(*)) from t1;
select sum(hash(*)) from t2;
And you just need to compare the return values.
I used EXCEPT statement and it worked.
select * from Original_table
EXCEPT
select * from Revised_table
Will show us all the rows of the Original table that are not in the Revised table.
If your table is partitioned you will have to provide a partition predicate.
Fyi, partition values don't need to be provided if you use Presto and querying via SQL lab.
I would recommend you not using any JOINs to try to compare tables:
it is quite an expensive operations when tables are big (which is often the case in Hive)
it can give problems when some rows/"join keys" are repeated
(and it can also be unpractical when data are in different clusters/datacenters/clouds).
Instead, I think using a checksum approach and comparing the checksums of both tables is best.
I have developed a Python script that allows you to do easily such comparison, and see the differences in a webbrowser:
https://github.com/bolcom/hive_compared_bq
I hope that can help you!
First get count for both the tables C1 and C2. C1 and C2 should be equal. C1 and C2 can be obtained from the following query
select count(*) from table1
if C1 and C2 are not equal, then the tables are not identical.
2: Find distinct count for both the tables DC1 and DC2. DC1 and DC2 should be equal. Number of distinct records can be found using the following query:
select count(*) from (select distinct * from table1)
if DC1 and DC2 are not equal, the tables are not identical.
3: Now get the number of records obtained by performing a union on the 2 tables. Let it be U. Use the following query to get the number of records in a union of 2 tables:
SELECT count (*)
FROM
(SELECT *
FROM table1
UNION
SELECT *
FROM table2)
You can say that the data in the 2 tables is identical if distinct count for the 2 tables is equal to the number of records obtained by performing union of the 2 tables. ie DC1 = U and DC2 = U
another variant
select c1-c2 "different row counts"
, c1-c3 "mismatched rows"
from
( select count(*) c1 from table1)
,( select count(*) c2 from table2 )
,(select count(*) c3 from table1 t1, table2 t2
where t1.key= t2.key
and T1.c1=T2.c1 )
Try with WITH Clause:
With cnt as(
select count(*) cn1 from table1
)
select 'X' from dual,cnt where cnt.cn1 = (select count(*) from table2);
One easy solution is to do inner join. Let's suppose we have two hive tables namely table1 and table2. Both the table has same column namely col1, col2 and col3. The number of rows should also be same. Then the command would be as follows
**
select count(*) from table1
inner join table2
on table1.col1 = table2.col1
and table1.col2 = table2.col2
and table1.col3 = table2.col3 ;
**
If the output value is same as number of rows in table1 and table2 , then all the columns has same value, If however the output count is lesser than there are some data which are different.
Use a MINUS operator:
SELECT count(*) FROM
(SELECT t1.c1, t1.c2, t1.c3 from table1 t1
MINUS
SELECT t2.c1, t2.c2, t2.c3 from table2 t2)

Fastest SQL & HQL Query for two tables

Table1: Columns A, B, C
Table2: Columns A, B, C
Table 2 is a copy of Table 1 with different data. Assume all columns to be varchar
Looking for a single efficient query which can fetch:
Columns A, B, C from Table1
Additional Rows from Table2 where values of Table2.A are not present in Table1.A
Any differences between the Oracle SQL & HQL for the same query will be appreciated.
I'm fiddling with Joins, Unions & Minus but not able to get the correct combination.
SQL:
SELECT *
FROM Table1
UNION ALL
SELECT *
FROM Table2 T2
WHERE NOT EXISTS(
SELECT 'X' FROM Table1 T1
WHERE T1.A = T2.A
)
HQL:
You must execute two different query an discard the element by Table2 result in a Java loop because in HQL doesn't exist UNION command.
Alternatatively you can write the first query for Table1 and the second query must have a not in clause to discard Table1 A field.
Solution 1:
Query 1:
SELECT * FROM Table1
Query 2:
SELECT * FROM Table2
and then you apply a discard loop in Java code
Solution 2:
Query 1:
SELECT * FROM Table1
Query 2:
SELECT * FROM Table2 WHERE Table2.A not in (SELECT Table1.A from Table1)
This query returns all rows in table1, plus all rows in table2 which does not exist in table1, given that column a is the common key.
select a,b,c
from table1
union
all
select a,b,c
from table2
where a not in(select a from table1);
There may be different options available depending on the relative sizes of table1 and table2 and the expected overlap.

Comparing two datasets SQL SSRS 2005

I have two datasets on two seperate servers. They both pull one column of information each.
I would like to build a report showing the values of the rows that only appear in one of the datasets.
From what I have read, it seems I would like to do this on the SQL side, not the reporting side; I am not sure how to do that.
If someone could shed some light on how that is possible, I would really appreciate it.
You can use the NOT EXISTS clause to get the differences between the two tables.
SELECT
Column
FROM
DatabaseName.SchemaName.Table1
WHERE
NOT EXISTS
(
SELECT
Column
FROM
LinkedServerName.DatabaseName.SchemaName.Table2
WHERE
Table1.Column = Table2.Column --looks at equalities, and doesn't
--include them because of the
--NOT EXISTS clause
)
This will show the rows in Table1 that don't appear in Table2. You can reverse the table names to find the rows in Table2 that don't appear in Table1.
Edit: Made an edit to show what the case would be in the event of linked servers. Also, if you wanted to see all of the rows that are not shared in both tables at the same time, you can try something as in the below.
SELECT
Column, 'Table1' TableName
FROM
DatabaseName.SchemaName.Table1
WHERE
NOT EXISTS
(
SELECT
Column
FROM
LinkedServerName.DatabaseName.SchemaName.Table2
WHERE
Table1.Column = Table2.Column --looks at equalities, and doesn't
--include them because of the
--NOT EXISTS clause
)
UNION
SELECT
Column, 'Table2' TableName
FROM
LinkedServerName.DatabaseName.SchemaName.Table2
WHERE
NOT EXISTS
(
SELECT
Column
FROM
DatabaseName.SchemaName.Table1
WHERE
Table1.Column = Table2.Column
)
You can also use a left join:
select a.* from tableA a
left join tableB b
on a.PrimaryKey = b.ForeignKey
where b.ForeignKey is null
This query will return all records from tableA that do not have corresponding records in tableB.
If you want rows that appear in exactly one data set and you have a matching key on each table, then you can use a full outer join:
select *
from table1 t1 full outer join
table2 t2
on t1.key = t2.key
where t1.key is null and t2.key is not null or
t1.key is not null and t2.key is null
The where condition chooses the rows where exactly one match.
The problem with this query, though, is that you get lots of columns with nulls. One way to fix this is by going through the columns one by one in the SELECT clause.
select coalesce(t1.key, t2.key) as key, . . .
Another way to solve this problem is to use a union with a window function. This version brings together all the rows and counts the number of times that key appears:
select t.*
from (select t.*, count(*) over (partition by key) as keycnt
from ((select 'Table1' as which, t.*
from table1 t
) union all
(select 'Table2' as which, t.*
from table2 t
)
) t
) t
where keycnt = 1
This has the additional column specifying which table the value comes from. It also has an extra column, keycnt, with the value 1. If you have a composite key, you would just replace with the list of columns specifying a match between the two tables.

Is it possible to restrict the results of an outer join?

I've got a scenario where I need to do a join across three tables.
table #1 is a list of users
table #2 contains users who have trait A
table #3 contains users who have trait B
If I want to find all the users who have trait A or trait B (in one simple sql) I think I'm stuck.
If I do a regular join, the people who don't have trait A won't show up in the result set to see if they have trait B (and vice versa).
But if I do an outer join from table 1 to tables 2 and 3, I get all the rows in table 1 regardless of the rest of my where clause specifying a requirement against tables 2 or 3.
Before you come up with multiple sqls and temp tables and whatnot, this program is far more complex, this is just the simple case. It dynamically creates the sql based on lots of external factors, so I'm trying to make it work in one sql.
I expect there are combinations of in or exists that will work, but I was hoping for some thing simple.
But basically the outer join will always yield all results from table 1, yes?
SELECT *
FROM table1
LEFT OUTER
JOIN table2
ON ...
LEFT OUTER
JOIN table3
ON ...
WHERE NOT (table2.pk IS NULL AND table3.pk IS NULL)
or if you want to be sneaky:
WHERE COALESCE(table2.pk, table3.pk) IS NOT NULL
but for you case, i simply suggest:
SELECT *
FROM table1
WHERE table1.pk IN (SELECT fk FROM table2)
OR table1.pk IN (SELECT fk FROM table3)
or the possibly more efficient:
SELECT *
FROM table1
WHERE table1.pk IN (SELECT fk FROM table2 UNION (SELECT fk FROM table3)
If you really just want the list of users that have one trait or the other, then:
SELECT userid FROM users
WHERE userid IN (SELECT userid FROM trait_a UNION SELECT userid FROM trait_b)
Regarding outerjoin specifically, longneck's answer looks like what I was in the midst of writing.
I think you could do a UNION here.
May I suggest:
SELECT columnList FROM Table1 WHERE UserID IN (SELECT UserID FROM Table2)
UNION
SELECT columnList FROM Table1 WHERE UserID IN (SELECT UserID FROM Table3)
Would something like this work? Keep in mind depending on the size of the tables left outer joins can be very expensive with regards to performance.
Select *
from table1
where userid in (Select t.userid
From table1 t
left outer join table2 t2 on t1.userid=t2.userid and t2.AttributeA is not null
left outer join table3 t3 on t1.userid=t3.userid and t3.AttributeB is not null
group by t.userid)
If all you want is the ids of the users then
SELECT UserId From Table2
UNION
SELECT UserId From Table3
is totally sufficient.
If you want some more infos from Table1 on these users, you can join the upper SQL to Table 1:
SELECT <list of columns from Table1>
FROM Table1 Join (
SELECT UserId From Table2
UNION
SELECT UserId From Table3) User on Table1.UserID = Users.UserID