sql - ignore duplicates while joining

sql - ignore duplicates while joining - sql

I have two tables.
Table1 is 1591 rows. Table2 is 270 rows.
I want to fetch specific column data from Table2 based on some condition between them and also exclude duplicates which are in Table2. Which I mean to join the tables but get only one value from Table2 even if the condition has occurred more than time. The result should be exactly 1591 rows.
I tried to make Left,Right, Inner joins but the data comes more than or less 1591.
Example
Table1
type,address,name
40,blabla,Adam
20,blablabla,Joe
Table2
type,currency
40,usd
40,gbp
40,omr
Joining on 'type'
Result
type,address,name,currency
40,blabla,name,usd
20,blblbla,Joe,null

try this it has to work
select *
from
Table1 h
inner join
(select type,currency,ROW_NUMBER()over (partition by type order by
currency) as rn
from
Table2
) sr on
sr.type=h.type
and rn=1

Try this. It's standard SQL, therefore, it should work on your rdbms system.
select * from Table1 AS t
LEFT OUTER JOIN Table2 AS y ON t.[type] = y.[type] and y.currency IN (SELECT MAX(currency) FROM Table2 GROUP BY [type])
If you want to control which currency is joined, consider altering Table2 by adding a new column active/non active and modifying accordingly the JOIN clause.

You can use outer apply if it's supported.
select a.type, a.address, a.name, b.currency
from Table1 a
outer apply (
select top 1 currency
from Table2
where Table2.type = a.type
) b

I typical way to do this uses a correlated subquery. This guarantees that all rows in the first table are kept. And it generates an error if more than one row is returned from the second.
So:
select t1.*,
(select t2.currency
from table2 t2
where t2.type = t1.type
fetch first 1 row only
) as currency
from table1 t1;
You don't specify what database you are using, so this uses standard syntax for returning one row. Some databases use limit or top instead.

Related

Comparing two tables for equality in HIVE

I have two tables, table1 and table2. Each with the same columns:
key, c1, c2, c3
I want to check to see if these tables are equal to eachother (they have the same rows). So far I have these two queries (<> = not equal in HIVE):
select count(*) from table1 t1
left outer join table2 t2
on t1.key=t2.key
where t2.key is null or t1.c1<>t2.c1 or t1.c2<>t2.c2 or t1.c3<>t2.c3
And
select count(*) from table1 t1
left outer join table2 t2
on t1.key=t2.key and t1.c1=t2.c1 and t1.c2=t2.c2 and t1.c3=t2.c3
where t2.key is null
So my idea is that, if a zero count is returned, the tables are the same. However, I'm getting a zero count for the first query, and a non-zero count for the second query. How exactly do they differ? If there is a better way to check this certainly let me know.

The first one excludes rows where t1.c1, t1.c2, t1.c3, t2.c1, t2.c2, or t2.c3 is null. That means that you effectively doing an inner join.
The second one will find rows that exist in t1 but not in t2.
To also find rows that exist in t2 but not in t1 you can do a full outer join. The following SQL assumes that all columns are NOT NULL:
select count(*) from table1 t1
full outer join table2 t2
on t1.key=t2.key and t1.c1=t2.c1 and t1.c2=t2.c2 and t1.c3=t2.c3
where t1.key is null /* this condition matches rows that only exist in t2 */
or t2.key is null /* this condition matches rows that only exist in t1 */

If you want to check for duplicates and the tables have exactly the same structure and the tables do not have duplicates within them, then you can do:
select t.key, t.c1, t.c2, t.c3, count(*) as cnt
from ((select t1.*, 1 as which from table1 t1) union all
(select t2.*, 2 as which from table2 t2)
) t
group by t.key, t.c1, t.c2, t.c3
having cnt <> 2;
There are various ways that you can relax the conditions in the first paragraph, if necessary.
Note that this version also works when the columns have NULL values. These might be causing the problem with your data.

Well, the best way is calculate the hash sum of each table, and compare the sum of hash.
So no matter how many column are they, no matter what data type are they, as long as the two table has the same schema, you can use following query to do the comparison:
select sum(hash(*)) from t1;
select sum(hash(*)) from t2;
And you just need to compare the return values.

I used EXCEPT statement and it worked.
select * from Original_table
EXCEPT
select * from Revised_table
Will show us all the rows of the Original table that are not in the Revised table.
If your table is partitioned you will have to provide a partition predicate.
Fyi, partition values don't need to be provided if you use Presto and querying via SQL lab.

I would recommend you not using any JOINs to try to compare tables:
it is quite an expensive operations when tables are big (which is often the case in Hive)
it can give problems when some rows/"join keys" are repeated
(and it can also be unpractical when data are in different clusters/datacenters/clouds).
Instead, I think using a checksum approach and comparing the checksums of both tables is best.
I have developed a Python script that allows you to do easily such comparison, and see the differences in a webbrowser:
https://github.com/bolcom/hive_compared_bq
I hope that can help you!

First get count for both the tables C1 and C2. C1 and C2 should be equal. C1 and C2 can be obtained from the following query
select count(*) from table1
if C1 and C2 are not equal, then the tables are not identical.
2: Find distinct count for both the tables DC1 and DC2. DC1 and DC2 should be equal. Number of distinct records can be found using the following query:
select count(*) from (select distinct * from table1)
if DC1 and DC2 are not equal, the tables are not identical.
3: Now get the number of records obtained by performing a union on the 2 tables. Let it be U. Use the following query to get the number of records in a union of 2 tables:
SELECT count (*)
FROM
(SELECT *
FROM table1
UNION
SELECT *
FROM table2)
You can say that the data in the 2 tables is identical if distinct count for the 2 tables is equal to the number of records obtained by performing union of the 2 tables. ie DC1 = U and DC2 = U

another variant
select c1-c2 "different row counts"
, c1-c3 "mismatched rows"
from
( select count(*) c1 from table1)
,( select count(*) c2 from table2 )
,(select count(*) c3 from table1 t1, table2 t2
where t1.key= t2.key
and T1.c1=T2.c1 )

Try with WITH Clause:
With cnt as(
select count(*) cn1 from table1
)
select 'X' from dual,cnt where cnt.cn1 = (select count(*) from table2);

One easy solution is to do inner join. Let's suppose we have two hive tables namely table1 and table2. Both the table has same column namely col1, col2 and col3. The number of rows should also be same. Then the command would be as follows
**
select count(*) from table1
inner join table2
on table1.col1 = table2.col1
and table1.col2 = table2.col2
and table1.col3 = table2.col3 ;
**
If the output value is same as number of rows in table1 and table2 , then all the columns has same value, If however the output count is lesser than there are some data which are different.

Use a MINUS operator:
SELECT count(*) FROM
(SELECT t1.c1, t1.c2, t1.c3 from table1 t1
MINUS
SELECT t2.c1, t2.c2, t2.c3 from table2 t2)

Is it possible to use subquery in join condition in Access?

In postgresql I can use subquery in join condition
SELECT *
FROM table1 LEFT JOIN table2
ON table1.id1 = (SELECT id2 FROM table2 LIMIT 1);
But when I try to use it in Access
SELECT *
FROM table1 LEFT JOIN table2
ON table1.id1 = (SELECT TOP 1 id2 FROM table2);
I get syntax error. Is it actually impossible in Access or just my mistake?
I know that I can get the same result with WHERE, but my question is about possibilities of JOIN in Access.

It's not possible, per the MSDN documentation:
Syntax
FROM table1 [ LEFT | RIGHT ] JOIN table2 ON table1.field1 compopr table2.field2
And (emphasis mine):
field1, field2: The names of the fields that are joined. The fields must be of the same data type and contain the same kind of data, but they do not need to have the same name.
It appears you can't even have hard-coded values in your join; you must specify the column name to join against.
In your case, you would want:
SELECT *
FROM Table1
LEFT JOIN (
SELECT DISTINCT TOP 1 ID
FROM Table2
ORDER BY ID
) Table2Derived ON Table1.ID = Table2Derived.ID

Comparing two datasets SQL SSRS 2005

I have two datasets on two seperate servers. They both pull one column of information each.
I would like to build a report showing the values of the rows that only appear in one of the datasets.
From what I have read, it seems I would like to do this on the SQL side, not the reporting side; I am not sure how to do that.
If someone could shed some light on how that is possible, I would really appreciate it.

You can use the NOT EXISTS clause to get the differences between the two tables.
SELECT
Column
FROM
DatabaseName.SchemaName.Table1
WHERE
NOT EXISTS
(
SELECT
Column
FROM
LinkedServerName.DatabaseName.SchemaName.Table2
WHERE
Table1.Column = Table2.Column --looks at equalities, and doesn't
--include them because of the
--NOT EXISTS clause
)
This will show the rows in Table1 that don't appear in Table2. You can reverse the table names to find the rows in Table2 that don't appear in Table1.
Edit: Made an edit to show what the case would be in the event of linked servers. Also, if you wanted to see all of the rows that are not shared in both tables at the same time, you can try something as in the below.
SELECT
Column, 'Table1' TableName
FROM
DatabaseName.SchemaName.Table1
WHERE
NOT EXISTS
(
SELECT
Column
FROM
LinkedServerName.DatabaseName.SchemaName.Table2
WHERE
Table1.Column = Table2.Column --looks at equalities, and doesn't
--include them because of the
--NOT EXISTS clause
)
UNION
SELECT
Column, 'Table2' TableName
FROM
LinkedServerName.DatabaseName.SchemaName.Table2
WHERE
NOT EXISTS
(
SELECT
Column
FROM
DatabaseName.SchemaName.Table1
WHERE
Table1.Column = Table2.Column
)

You can also use a left join:
select a.* from tableA a
left join tableB b
on a.PrimaryKey = b.ForeignKey
where b.ForeignKey is null
This query will return all records from tableA that do not have corresponding records in tableB.

If you want rows that appear in exactly one data set and you have a matching key on each table, then you can use a full outer join:
select *
from table1 t1 full outer join
table2 t2
on t1.key = t2.key
where t1.key is null and t2.key is not null or
t1.key is not null and t2.key is null
The where condition chooses the rows where exactly one match.
The problem with this query, though, is that you get lots of columns with nulls. One way to fix this is by going through the columns one by one in the SELECT clause.
select coalesce(t1.key, t2.key) as key, . . .
Another way to solve this problem is to use a union with a window function. This version brings together all the rows and counts the number of times that key appears:
select t.*
from (select t.*, count(*) over (partition by key) as keycnt
from ((select 'Table1' as which, t.*
from table1 t
) union all
(select 'Table2' as which, t.*
from table2 t
)
) t
) t
where keycnt = 1
This has the additional column specifying which table the value comes from. It also has an extra column, keycnt, with the value 1. If you have a composite key, you would just replace with the list of columns specifying a match between the two tables.

Outer Join with Where returning Nulls

Hi I have 2 tables. I want to list
all records in table1 which are present in
table2
all records in table2 which are not present in table1 with a where condition
Null rows will be returned by table1 in second condition but I am unable to get the query working correctly. It is only returning null rows
SELECT
A.CLMSRNO,A.CLMPLANO,A.GENCURRCODE,A.CLMNETLOSSAMT,
A.CLMLOSSAMT,A.CLMCLAIMPRCLLOSSSHARE
FROM
PAKRE.CLMCLMENTRY A
RIGHT OUTER JOIN (
SELECT
B.CLMSRNO,B.UWADVICETYPE,B.UWADVICENO,B.UWADVPREMCURRCODE,
B.GENSUBBUSICLASS,B.UWADVICENET,B.UWADVICEKIND,B.UWADVYEAR,
B.UWADVQTR,B.ISMANUAL,B.UWCLMNOREFNO
FROM
PAKRE.UWADVICE B
WHERE
B.ISMANUAL=1
) r
ON a.CLMSRNO=r.CLMSRNO
ORDER BY
A.CLMSRNO DESC;

Which OS are you using ?
Table aliases are case sensistive on some platforms, which is why your join condition ON a.CLMSRNO=r.CLMSRNO fails.
Try with A.CLMSRNO=r.CLMSRNO and see if that works

I'm not understanding your first attempt, but here's basically what you need, I think:
SELECT *
FROM TABLE1
INNER JOIN TABLE2
ON joincondition
UNION ALL
SELECT *
FROM TABLE2
LEFT JOIN TABLE1
ON joincondition
AND TABLE1.wherecondition
WHERE TABLE1.somejoincolumn IS NULL

I think you may want to remove the subquery and put its columns into the main query e.g.
SELECT A.CLMSRNO, A.CLMPLANO, A.GENCURRCODE, A.CLMNETLOSSAMT,
A.CLMLOSSAMT, A.CLMCLAIMPRCLLOSSSHARE,
B.CLMSRNO, B.UWADVICETYPE, B.UWADVICENO, B.UWADVPREMCURRCODE,
B.GENSUBBUSICLASS, B.UWADVICENET, B.UWADVICEKIND, B.UWADVYEAR,
B.UWADVQTR, B.ISMANUAL, B.UWCLMNOREFNO
FROM PAKRE.CLMCLMENTRY A
RIGHT OUTER JOIN PAKRE.UWADVICE B
ON A.CLMSRNO = B.CLMSRNO
WHERE B.ISMANUAL = 1
ORDER
BY A.CLMSRNO DESC;

SQL SELECT across two tables

I am a little confused as to how to approach this SQL query.
I have two tables (equal number of records), and I would like to return a column with which is the division between the two.
In other words, here is my not-working-correctly query:
SELECT( (SELECT v FROM Table1) / (SELECT DotProduct FROM Table2) );
How would I do this? All I want it a column where each row equals the same row in Table1 divided by the same row in Table2. The resulting table should have the same number of rows, but I am getting something with a lot more rows than the original two tables.
I am at a complete loss. Any advice?

It sounds like you have some kind of key between the two tables. You need an Inner Join:
select t1.v / t2.DotProduct
from Table1 as t1
inner join Table2 as t2
on t1.ForeignKey = t2.PrimaryKey
Should work. Just make sure you watch out for division by zero errors.

You didn't specify the full table structure so I will assume a common ID column to link rows in the tables.
SELECT table1.v/table2.DotProduct
FROM Table1 INNER JOIN Table2
ON (Table1.ID=Table2.ID)

You need to do a JOIN on the tables and divide the columns you want.
SELECT (Table1.v / Table2.DotProduct) FROM Table1 JOIN Table2 ON something
You need to substitue something to tell SQL how to match up the rows:
Something like: Table1.id = Table2.id

In case your fileds are both integers you need to do this to avoid integer math:
select t1.v / (t2.DotProduct*1.00)
from Table1 as t1
inner join Table2 as t2
on t1.ForeignKey = t2.PrimaryKey
If you have multiple values in table2 relating to values in table1 you need to specify which to use -here I chose the largest one.
select t1.v / (max(t2.DotProduct)*1.00)
from Table1 as t1
inner join Table2 as t2
on t1.ForeignKey = t2.PrimaryKey
Group By t1.v

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

sql - ignore duplicates while joining - sql

try this it has to work select * from Table1 h inner join (select type,currency,ROW_NUMBER()over (partition by type order by currency) as rn from Table2 ) sr on sr.type=h.type and rn=1

You can use outer apply if it's supported. select a.type, a.address, a.name, b.currency from Table1 a outer apply ( select top 1 currency from Table2 where Table2.type = a.type ) b

Related

Comparing two tables for equality in HIVE

Is it possible to use subquery in join condition in Access?

Comparing two datasets SQL SSRS 2005

Outer Join with Where returning Nulls

SQL SELECT across two tables

Categories

Resources