Fastest SQL & HQL Query for two tables - sql

Table1: Columns A, B, C
Table2: Columns A, B, C
Table 2 is a copy of Table 1 with different data. Assume all columns to be varchar
Looking for a single efficient query which can fetch:
Columns A, B, C from Table1
Additional Rows from Table2 where values of Table2.A are not present in Table1.A
Any differences between the Oracle SQL & HQL for the same query will be appreciated.
I'm fiddling with Joins, Unions & Minus but not able to get the correct combination.

SQL:
SELECT *
FROM Table1
UNION ALL
SELECT *
FROM Table2 T2
WHERE NOT EXISTS(
SELECT 'X' FROM Table1 T1
WHERE T1.A = T2.A
)
HQL:
You must execute two different query an discard the element by Table2 result in a Java loop because in HQL doesn't exist UNION command.
Alternatatively you can write the first query for Table1 and the second query must have a not in clause to discard Table1 A field.
Solution 1:
Query 1:
SELECT * FROM Table1
Query 2:
SELECT * FROM Table2
and then you apply a discard loop in Java code
Solution 2:
Query 1:
SELECT * FROM Table1
Query 2:
SELECT * FROM Table2 WHERE Table2.A not in (SELECT Table1.A from Table1)

This query returns all rows in table1, plus all rows in table2 which does not exist in table1, given that column a is the common key.
select a,b,c
from table1
union
all
select a,b,c
from table2
where a not in(select a from table1);
There may be different options available depending on the relative sizes of table1 and table2 and the expected overlap.

Related

Use wildcards that are stored in table columns in SQL-queries with MS Access

I want to join two tables in Access based on different wildcards for different rows.
The first, table1, contains rows with different wildcards and table2 contains the column that should be matched with the wildcards in table1.
I imagine the SQL code to look like:
SELECT *
FROM table2
LEFT JOIN table1
ON table2.subject LIKE table1.wildcard
The tables look like this: https://imgur.com/a/O9OPAL6
The third pictures shows the result that I want.
How do I execute the join or is there an alternative?
I don't think MySQL support non-equality conditions for JOINs. So, you can do this as:
SELECT * -- first get the matches
FROM table2 as t2, -- ugg, why doesn't it support CROSS JOIN
table1 as t1
WHERE t2.subject LIKE t1.wildcard
UNION ALL
SELECT * -- then get the non-matches
FROM table2 as t2 LEFT JOIN
table1 as t1
ON 1 = 0 -- always false but gets the same columns
WHERE NOT EXISTS (SELECT 1
FROM table1 as t1
WHERE t2.subject LIKE t1.wildcard
);

Basic difference between two tables

I am attempting a very basic difference function in postgresql. Table 1 and Table 2 have identical columns. Only difference is Table 1 has some surplus rows. I would like to select for surplus rows only:
SELECT *
FROM table1
WHERE NOT EXISTS (SELECT * from table2);
The query above returns nothing when I know there are surplus rows.
I think you are looking for except:
select t1.*
from table1 t1
except
select t2.*
from table2 t2;
Note that the two tables must have the same number of columns, and the columns must all be of the same type. You can review the documentation here.
If you wish to use NOT EXISTS you're missing the joining of your table's keys in the inner where clause. Try:
SELECT *
FROM table1 t1
WHERE NOT EXISTS (SELECT * from table2 t2 WHERE t2.id = t1.id);

Comparing two tables for equality in HIVE

I have two tables, table1 and table2. Each with the same columns:
key, c1, c2, c3
I want to check to see if these tables are equal to eachother (they have the same rows). So far I have these two queries (<> = not equal in HIVE):
select count(*) from table1 t1
left outer join table2 t2
on t1.key=t2.key
where t2.key is null or t1.c1<>t2.c1 or t1.c2<>t2.c2 or t1.c3<>t2.c3
And
select count(*) from table1 t1
left outer join table2 t2
on t1.key=t2.key and t1.c1=t2.c1 and t1.c2=t2.c2 and t1.c3=t2.c3
where t2.key is null
So my idea is that, if a zero count is returned, the tables are the same. However, I'm getting a zero count for the first query, and a non-zero count for the second query. How exactly do they differ? If there is a better way to check this certainly let me know.
The first one excludes rows where t1.c1, t1.c2, t1.c3, t2.c1, t2.c2, or t2.c3 is null. That means that you effectively doing an inner join.
The second one will find rows that exist in t1 but not in t2.
To also find rows that exist in t2 but not in t1 you can do a full outer join. The following SQL assumes that all columns are NOT NULL:
select count(*) from table1 t1
full outer join table2 t2
on t1.key=t2.key and t1.c1=t2.c1 and t1.c2=t2.c2 and t1.c3=t2.c3
where t1.key is null /* this condition matches rows that only exist in t2 */
or t2.key is null /* this condition matches rows that only exist in t1 */
If you want to check for duplicates and the tables have exactly the same structure and the tables do not have duplicates within them, then you can do:
select t.key, t.c1, t.c2, t.c3, count(*) as cnt
from ((select t1.*, 1 as which from table1 t1) union all
(select t2.*, 2 as which from table2 t2)
) t
group by t.key, t.c1, t.c2, t.c3
having cnt <> 2;
There are various ways that you can relax the conditions in the first paragraph, if necessary.
Note that this version also works when the columns have NULL values. These might be causing the problem with your data.
Well, the best way is calculate the hash sum of each table, and compare the sum of hash.
So no matter how many column are they, no matter what data type are they, as long as the two table has the same schema, you can use following query to do the comparison:
select sum(hash(*)) from t1;
select sum(hash(*)) from t2;
And you just need to compare the return values.
I used EXCEPT statement and it worked.
select * from Original_table
EXCEPT
select * from Revised_table
Will show us all the rows of the Original table that are not in the Revised table.
If your table is partitioned you will have to provide a partition predicate.
Fyi, partition values don't need to be provided if you use Presto and querying via SQL lab.
I would recommend you not using any JOINs to try to compare tables:
it is quite an expensive operations when tables are big (which is often the case in Hive)
it can give problems when some rows/"join keys" are repeated
(and it can also be unpractical when data are in different clusters/datacenters/clouds).
Instead, I think using a checksum approach and comparing the checksums of both tables is best.
I have developed a Python script that allows you to do easily such comparison, and see the differences in a webbrowser:
https://github.com/bolcom/hive_compared_bq
I hope that can help you!
First get count for both the tables C1 and C2. C1 and C2 should be equal. C1 and C2 can be obtained from the following query
select count(*) from table1
if C1 and C2 are not equal, then the tables are not identical.
2: Find distinct count for both the tables DC1 and DC2. DC1 and DC2 should be equal. Number of distinct records can be found using the following query:
select count(*) from (select distinct * from table1)
if DC1 and DC2 are not equal, the tables are not identical.
3: Now get the number of records obtained by performing a union on the 2 tables. Let it be U. Use the following query to get the number of records in a union of 2 tables:
SELECT count (*)
FROM
(SELECT *
FROM table1
UNION
SELECT *
FROM table2)
You can say that the data in the 2 tables is identical if distinct count for the 2 tables is equal to the number of records obtained by performing union of the 2 tables. ie DC1 = U and DC2 = U
another variant
select c1-c2 "different row counts"
, c1-c3 "mismatched rows"
from
( select count(*) c1 from table1)
,( select count(*) c2 from table2 )
,(select count(*) c3 from table1 t1, table2 t2
where t1.key= t2.key
and T1.c1=T2.c1 )
Try with WITH Clause:
With cnt as(
select count(*) cn1 from table1
)
select 'X' from dual,cnt where cnt.cn1 = (select count(*) from table2);
One easy solution is to do inner join. Let's suppose we have two hive tables namely table1 and table2. Both the table has same column namely col1, col2 and col3. The number of rows should also be same. Then the command would be as follows
**
select count(*) from table1
inner join table2
on table1.col1 = table2.col1
and table1.col2 = table2.col2
and table1.col3 = table2.col3 ;
**
If the output value is same as number of rows in table1 and table2 , then all the columns has same value, If however the output count is lesser than there are some data which are different.
Use a MINUS operator:
SELECT count(*) FROM
(SELECT t1.c1, t1.c2, t1.c3 from table1 t1
MINUS
SELECT t2.c1, t2.c2, t2.c3 from table2 t2)

T-SQL "Where not in" using two columns

I want to select all records from a table T1 where the values in columns A and B has no matching tuple for the columns C and D in table T2.
In mysql “Where not in” using two columns I can read how to accomplish that using the form select A,B from T1 where (A,B) not in (SELECT C,D from T2), but that fails in T-SQL for me resulting in "Incorrect syntax near ','.".
So how do I do this?
Use a correlated sub-query:
...
WHERE
NOT EXISTS (
SELECT * FROM SecondaryTable WHERE c = FirstTable.a AND d = FirstTable.b
)
Make sure there's a composite index on SecondaryTable over (c, d), unless that table does not contain many rows.
You can't do this using a WHERE IN type statement.
Instead you could LEFT JOIN to the target table (T2) and select where T2.ID is NULL.
For example
SELECT
T1.*
FROM
T1 LEFT OUTER JOIN T2
ON T1.A = T2.C AND T1.B = T2.D
WHERE
T2.PrimaryKey IS NULL
will only return rows from T1 that don't have a corresponding row in T2.
I Used it in Mysql because in Mysql there isn't "EXCLUDE" statement.
This code:
Concates fields C and D of table T2 into one new field to make it easier to compare the columns.
Concates the fields A and B of table T1 into one new field to make it easier to compare the columns.
Selects all records where the value of the new field of T1 is not equal to the value of the new field of T2.
SQL-Statement:
SELECT T1.* FROM T1
WHERE CONCAT(T1.A,'Seperator', T1.B) NOT IN
(SELECT CONCAT(T2.C,'Seperator', T2.D) FROM T2)
Here is an example of the answer that worked for me:
SELECT Count(1)
FROM LCSource as s
JOIN FileTransaction as t
ON s.TrackingNumber = t.TrackingNumber
WHERE NOT EXISTS (
SELECT * FROM LCSourceFileTransaction
WHERE [LCSourceID] = s.[LCSourceID] AND [FileTransactionID] = t.[FileTransactionID]
)
You see both columns exist in LCSourceFileTransaction, but one occurs in LCSource and one occurs in FileTransaction and LCSourceFileTransaction is a mapping table. I want to find all records where the combination of the two columns is not in the mapping table. This works great. Hope this helps someone.

How do I merge data from two tables in a single database call into the same columns?

If I run the two statements in batch will they return one table to two to my sqlcommand object with the data merged. What I am trying to do is optimize a search by searching twice, the first time on one set of data and then a second on another. They have the same fields and I’d like to have all the records from both tables show and be added to each other. I need this so that I can sort the data between both sets of data but short of writing a stored procedure I can’t think of a way of doing this.
Eg. Table 1 has columns A and B, Table 2 has these same columns but different data source. I then wan to merge them so that if a only exists in one column it is added to the result set and if both exist it eh tables the column B will be summed between the two.
Please note that this is not the same as a full outer join operation as that does not merge the data.
[EDIT]
Here's what the code looks like:
Select * From
(Select ID,COUNT(*) AS Count From [Table1]) as T1
full outer join
(Select ID,COUNT(*) AS Count From [Table2]) as T2
on t1.ID = T2.ID
Perhaps you're looking for UNION?
IE:
SELECT A, B FROM Table1
UNION
SELECT A, B FROM Table2
Possibly:
select table1.a, table1.b
from table1
where table1.a not in (select a from table2)
union all
select table1.a, table1.b+table2.b as b
from table1
inner join table2 on table1.a = table2.a
edit: perhaps you would benefit from unioning the tables before counting. e.g.
select id, count() as count from
(select id from table1
union all
select id from table2)
I'm not sure if I understand completely but you seem to be asking about a UNION
SELECT A,B
FROM tableX
UNION ALL
SELECT A,B
FROM tableY
To do it, you would go:
SELECT * INTO TABLE3 FROM TABLE1
UNION
SELECT * FROM TABLE2
Provided both tables have the same columns
I think what you are looking for is this, but I am not sure I am understanding your language correctly.
select id, sum(count) as count
from (
select id, count() as count
from table1
union all
select id, count() as count
from table2
) a
group by id