SQL: Reading description for a columns from a different table - sql

I have a table (T1) with two columns (C1, C2) that describes the relationship between a parent group and child group. But this relation ship is defined based on group id. The name of the group is defined in another table (say T2).
Eg:
T1 has below values. C1 is for Parent Group Id and C2 is for Child Group Id. I have three groups in following order 1 - 2 - 3.
C1,C2
1,2
2,3
T2 has below values
C1,C2
1,Parent_Group
2,Child_Group1
3,Child_Group2
Now, I need to combine above table via SQL query, so that i will get below output.
C1,C2
Parent_Group,Child_Group1
Child_Group1,Child_Group2
How can i achieve the same?

Try this:
SELECT
C1.C2 AS C1, C2.C2 AS C2
FROM
T1 INNER JOIN
T2 C1 ON T1.C1 = C1.C1 INNER JOIN
T2 C2 ON T1.C2 = C2.C1

Related

Is there a way of switching/changing the "ON" condition in a SQL query when the columns do not match?

I have two tables (t1 and t2). I want to print/keep all records from t1 (we'll treat t1 as the left table). I want to perform a JOIN with t1 and t2 on two columns, but there's a problem. Table t1 consists of columns c1 and c2. Table t2 consists of columns c3, c4, and c5. I want to perform a JOIN between t1 and t2 on c1 (from t1) and c3 (from t2), but I also want to do a JOIN between t1 and t2 on c2 (from t1) and c4 (from t2) if records from c1 and c3 do not match.
Here are the two tables. Completely fictitious, but applicable
to my real, work-related problem
The table below is what I want
All the records/rows from t1 are printed.
I greatly appreciate anyone who comes forward with query solutions. With that being said, is there a way to solve this problem without UNION? Also, I am using SQL Server.
Looking at the input and output data, I think you mean to compare c2 and c4 if c1 and c3 differ, not c1 and c2. I recreated the tables in sql and this code below gives the result you're looking for.
In that case you can just join and use an OR:
SELECT
*
FROM
t1
LEFT JOIN t2 ON
t1.c1 = t2.c3
OR t1.c2 = t2.c4;

check if a row record exist with column record A combined with DIFFERING records from column B

I have a table called purchase contracts
I want to select the contracts having the same supplier but different purchasers.
This is how the data looks like (C1= Contract, C2= supplier, C3= purhaser)
Dataset
I want to create a select which results in showing rows 4, 5 & 6
I've tried solution from this tread, but didn't work
Select rows with same id but different value in another column
You can use exists
select t.*
from table t
where exists (select 1 from table t1 where t1.c2 = t.c2 and t1.c3 <> t.c3);
You can use exists:
select t.*
from t
where exists (select 1
from t t2
where t2.c2 = t.c2 and t2.c3 <> t.c3
);

Deleting duplicates from composite Primary key in SQL server

I am using SQL server 2012.
I have two tables with identical structure. (Say there are four columns - ID, C2, C3, C4)
I want to copy data from Table1 to Table2.
The only difference between the two tables is that Table1 contains the local ID in the 'ID' field and table 2 should have the global ID.
There is another third table that contains this mapping between the localID and the global ID.
(multiple local ID's can be mapped to one Global ID)
Example:
ID 'abcd' in russia can be global ID 123
ID 'cdef' in china can be global ID 123
therefore abcd in russia is basically cdef in china (this mapping is stored in table3)
I want to take the GlobalID from table3 corresponding to the local ID that we have in Table1 and insert them into table2.
Table2 has a primary key constraint defined on the column ID+C2.
The problem is that there are high chance we'll have duplicate data - So we want to insert only distinct combination of ID+C2. (clustered PK)
Can someone please help me with this? Sorry if this is confusing.
I right now have this query, but it doesnt eliminates duplicates when copying data from T1 to T2 and thus I get an error on PK.
INSERT INTO TABLE2
(ID,
C2,
C3,
C4)
SELECT T3.ID,
T1.C2,
T1.C3,
T1.C4
FROM TABLE1 T1
JOIN TABLE3 T3 ON T1.ID = T3.ID
JOIN TABLE2 T2 ON T2.ID = T1.ID
I'm not sure, but I guess you want something like
INSERT INTO Table2 (globalID)
SELECT DISTINCT m.globalID from Mapping m inner join Table1 t
on t.country = m.country and t.localId = m.localID
You can use any select, so alter it for your needs.

Comparing two tables for equality in HIVE

I have two tables, table1 and table2. Each with the same columns:
key, c1, c2, c3
I want to check to see if these tables are equal to eachother (they have the same rows). So far I have these two queries (<> = not equal in HIVE):
select count(*) from table1 t1
left outer join table2 t2
on t1.key=t2.key
where t2.key is null or t1.c1<>t2.c1 or t1.c2<>t2.c2 or t1.c3<>t2.c3
And
select count(*) from table1 t1
left outer join table2 t2
on t1.key=t2.key and t1.c1=t2.c1 and t1.c2=t2.c2 and t1.c3=t2.c3
where t2.key is null
So my idea is that, if a zero count is returned, the tables are the same. However, I'm getting a zero count for the first query, and a non-zero count for the second query. How exactly do they differ? If there is a better way to check this certainly let me know.
The first one excludes rows where t1.c1, t1.c2, t1.c3, t2.c1, t2.c2, or t2.c3 is null. That means that you effectively doing an inner join.
The second one will find rows that exist in t1 but not in t2.
To also find rows that exist in t2 but not in t1 you can do a full outer join. The following SQL assumes that all columns are NOT NULL:
select count(*) from table1 t1
full outer join table2 t2
on t1.key=t2.key and t1.c1=t2.c1 and t1.c2=t2.c2 and t1.c3=t2.c3
where t1.key is null /* this condition matches rows that only exist in t2 */
or t2.key is null /* this condition matches rows that only exist in t1 */
If you want to check for duplicates and the tables have exactly the same structure and the tables do not have duplicates within them, then you can do:
select t.key, t.c1, t.c2, t.c3, count(*) as cnt
from ((select t1.*, 1 as which from table1 t1) union all
(select t2.*, 2 as which from table2 t2)
) t
group by t.key, t.c1, t.c2, t.c3
having cnt <> 2;
There are various ways that you can relax the conditions in the first paragraph, if necessary.
Note that this version also works when the columns have NULL values. These might be causing the problem with your data.
Well, the best way is calculate the hash sum of each table, and compare the sum of hash.
So no matter how many column are they, no matter what data type are they, as long as the two table has the same schema, you can use following query to do the comparison:
select sum(hash(*)) from t1;
select sum(hash(*)) from t2;
And you just need to compare the return values.
I used EXCEPT statement and it worked.
select * from Original_table
EXCEPT
select * from Revised_table
Will show us all the rows of the Original table that are not in the Revised table.
If your table is partitioned you will have to provide a partition predicate.
Fyi, partition values don't need to be provided if you use Presto and querying via SQL lab.
I would recommend you not using any JOINs to try to compare tables:
it is quite an expensive operations when tables are big (which is often the case in Hive)
it can give problems when some rows/"join keys" are repeated
(and it can also be unpractical when data are in different clusters/datacenters/clouds).
Instead, I think using a checksum approach and comparing the checksums of both tables is best.
I have developed a Python script that allows you to do easily such comparison, and see the differences in a webbrowser:
https://github.com/bolcom/hive_compared_bq
I hope that can help you!
First get count for both the tables C1 and C2. C1 and C2 should be equal. C1 and C2 can be obtained from the following query
select count(*) from table1
if C1 and C2 are not equal, then the tables are not identical.
2: Find distinct count for both the tables DC1 and DC2. DC1 and DC2 should be equal. Number of distinct records can be found using the following query:
select count(*) from (select distinct * from table1)
if DC1 and DC2 are not equal, the tables are not identical.
3: Now get the number of records obtained by performing a union on the 2 tables. Let it be U. Use the following query to get the number of records in a union of 2 tables:
SELECT count (*)
FROM
(SELECT *
FROM table1
UNION
SELECT *
FROM table2)
You can say that the data in the 2 tables is identical if distinct count for the 2 tables is equal to the number of records obtained by performing union of the 2 tables. ie DC1 = U and DC2 = U
another variant
select c1-c2 "different row counts"
, c1-c3 "mismatched rows"
from
( select count(*) c1 from table1)
,( select count(*) c2 from table2 )
,(select count(*) c3 from table1 t1, table2 t2
where t1.key= t2.key
and T1.c1=T2.c1 )
Try with WITH Clause:
With cnt as(
select count(*) cn1 from table1
)
select 'X' from dual,cnt where cnt.cn1 = (select count(*) from table2);
One easy solution is to do inner join. Let's suppose we have two hive tables namely table1 and table2. Both the table has same column namely col1, col2 and col3. The number of rows should also be same. Then the command would be as follows
**
select count(*) from table1
inner join table2
on table1.col1 = table2.col1
and table1.col2 = table2.col2
and table1.col3 = table2.col3 ;
**
If the output value is same as number of rows in table1 and table2 , then all the columns has same value, If however the output count is lesser than there are some data which are different.
Use a MINUS operator:
SELECT count(*) FROM
(SELECT t1.c1, t1.c2, t1.c3 from table1 t1
MINUS
SELECT t2.c1, t2.c2, t2.c3 from table2 t2)

Comma Separated list of rows of a column with group by on other columns

Below is the structure of table I have: -
Table T1
C1 C2 C3
----------
X P A
X P B
Y Q C
Y Q D
Desired output: -
C1 C2 C3
------------
X P A,B
Y Q C,D
Note: - I know i can do the same with For XML('') with group by on C1 and C2, but the main problem in my case is that the table T1 here must be a physical table object (either permanent or temp or table var or CTE) in DB. But in my case it's a derived table and when i am using the below query it's saying invalid object.
In my case it's not good to replace the derived table with temp# tables or fixed tables or even with CTE or table variable because it will take a great effort.
SELECT
b.C1, b.C2, Stuff((',' + a.C3 from t1 a where a.c1 = b.c1 for XML PATH('')),1,1,'') FROM
T1 b group by b.c1,b.c2
I did not have T1 as fixed table. Please consider it as derived table only.
I need the solution with existing derived table.
Please help.
Below is the query with derived table: -
Please consider this only as a demo query. It's not as simple as given below and a lot of calculations have done to get the derived tables and 4 levels of derived tables have been used.
SELECT C1, C2, Stuff((',' + a.C3 from A B where a.c1 = b.c1 for XML PATH('')),1,1,'')
FROM
(
SELECT C1, C2, C3 FROM T1 WHERE C1 IS NOT NULL--and a lot of calculation also
)A
Please mind that T1 is not just below one step, in my case T1 the actual physical table is 4 level downs by derived tables.
If you can post the query the produces derived table, we can help you work it out, but as of the moment try substituting table1 with the derived query.
;WITH Table1
AS
(
SELECT C1, C2, C3 FROM T1 WHERE C1 IS NOT NULL--and a lot of calculation also
)
SELECT
C1,C2,
STUFF(
(SELECT ',' + C3
FROM Table1
WHERE C1 = a.C1 AND C2 = a.C2
FOR XML PATH (''))
, 1, 1, '') AS NamesList
FROM Table1 AS a
GROUP BY C1,C2
SQLFiddle Demo