Heyy I'm new to sql and I'd just like to know if there's a way to retrieve select statements with conditions from other tables.
I want to select all name values that have a number that identifies that they have committed a crime. I only want to select a name once.
"SELECT distinct * FROM Table1 WHERE number LIKE table2.number "
Are you looking for IN?
SELECT t1.*
FROM Table1 t1
WHERE t1.number IN (SELECT t2.number FROM table2 t2 t2.number);
Under most circumstances, the rows in a table should be unique. So, you don't need SELECT DISTINCT. The DISTINCT can add a considerable amount of overhead to such a query.
You can able to use INNER JOIN like below,
select tbl1.Name from tableOne tbl1
inner join tableTwo tbl2 ON tbl1.commonKey = tbl2.commonKey
where tbl1.columnName = 'any value'
I have two tables, table1 and table2. Each with the same columns:
key, c1, c2, c3
I want to check to see if these tables are equal to eachother (they have the same rows). So far I have these two queries (<> = not equal in HIVE):
select count(*) from table1 t1
left outer join table2 t2
on t1.key=t2.key
where t2.key is null or t1.c1<>t2.c1 or t1.c2<>t2.c2 or t1.c3<>t2.c3
And
select count(*) from table1 t1
left outer join table2 t2
on t1.key=t2.key and t1.c1=t2.c1 and t1.c2=t2.c2 and t1.c3=t2.c3
where t2.key is null
So my idea is that, if a zero count is returned, the tables are the same. However, I'm getting a zero count for the first query, and a non-zero count for the second query. How exactly do they differ? If there is a better way to check this certainly let me know.
The first one excludes rows where t1.c1, t1.c2, t1.c3, t2.c1, t2.c2, or t2.c3 is null. That means that you effectively doing an inner join.
The second one will find rows that exist in t1 but not in t2.
To also find rows that exist in t2 but not in t1 you can do a full outer join. The following SQL assumes that all columns are NOT NULL:
select count(*) from table1 t1
full outer join table2 t2
on t1.key=t2.key and t1.c1=t2.c1 and t1.c2=t2.c2 and t1.c3=t2.c3
where t1.key is null /* this condition matches rows that only exist in t2 */
or t2.key is null /* this condition matches rows that only exist in t1 */
If you want to check for duplicates and the tables have exactly the same structure and the tables do not have duplicates within them, then you can do:
select t.key, t.c1, t.c2, t.c3, count(*) as cnt
from ((select t1.*, 1 as which from table1 t1) union all
(select t2.*, 2 as which from table2 t2)
) t
group by t.key, t.c1, t.c2, t.c3
having cnt <> 2;
There are various ways that you can relax the conditions in the first paragraph, if necessary.
Note that this version also works when the columns have NULL values. These might be causing the problem with your data.
Well, the best way is calculate the hash sum of each table, and compare the sum of hash.
So no matter how many column are they, no matter what data type are they, as long as the two table has the same schema, you can use following query to do the comparison:
select sum(hash(*)) from t1;
select sum(hash(*)) from t2;
And you just need to compare the return values.
I used EXCEPT statement and it worked.
select * from Original_table
EXCEPT
select * from Revised_table
Will show us all the rows of the Original table that are not in the Revised table.
If your table is partitioned you will have to provide a partition predicate.
Fyi, partition values don't need to be provided if you use Presto and querying via SQL lab.
I would recommend you not using any JOINs to try to compare tables:
it is quite an expensive operations when tables are big (which is often the case in Hive)
it can give problems when some rows/"join keys" are repeated
(and it can also be unpractical when data are in different clusters/datacenters/clouds).
Instead, I think using a checksum approach and comparing the checksums of both tables is best.
I have developed a Python script that allows you to do easily such comparison, and see the differences in a webbrowser:
https://github.com/bolcom/hive_compared_bq
I hope that can help you!
First get count for both the tables C1 and C2. C1 and C2 should be equal. C1 and C2 can be obtained from the following query
select count(*) from table1
if C1 and C2 are not equal, then the tables are not identical.
2: Find distinct count for both the tables DC1 and DC2. DC1 and DC2 should be equal. Number of distinct records can be found using the following query:
select count(*) from (select distinct * from table1)
if DC1 and DC2 are not equal, the tables are not identical.
3: Now get the number of records obtained by performing a union on the 2 tables. Let it be U. Use the following query to get the number of records in a union of 2 tables:
SELECT count (*)
FROM
(SELECT *
FROM table1
UNION
SELECT *
FROM table2)
You can say that the data in the 2 tables is identical if distinct count for the 2 tables is equal to the number of records obtained by performing union of the 2 tables. ie DC1 = U and DC2 = U
another variant
select c1-c2 "different row counts"
, c1-c3 "mismatched rows"
from
( select count(*) c1 from table1)
,( select count(*) c2 from table2 )
,(select count(*) c3 from table1 t1, table2 t2
where t1.key= t2.key
and T1.c1=T2.c1 )
Try with WITH Clause:
With cnt as(
select count(*) cn1 from table1
)
select 'X' from dual,cnt where cnt.cn1 = (select count(*) from table2);
One easy solution is to do inner join. Let's suppose we have two hive tables namely table1 and table2. Both the table has same column namely col1, col2 and col3. The number of rows should also be same. Then the command would be as follows
**
select count(*) from table1
inner join table2
on table1.col1 = table2.col1
and table1.col2 = table2.col2
and table1.col3 = table2.col3 ;
**
If the output value is same as number of rows in table1 and table2 , then all the columns has same value, If however the output count is lesser than there are some data which are different.
Use a MINUS operator:
SELECT count(*) FROM
(SELECT t1.c1, t1.c2, t1.c3 from table1 t1
MINUS
SELECT t2.c1, t2.c2, t2.c3 from table2 t2)
I want an SQL code which should perform the task of data scrubbing.
I have two tables both contain some names I want to compare them and list out only those name which are in table 2 but not in table 1.
Example:
Table 1 = A ,B,C
Table 2 = C,D,E
The result must have D and E?
SELECT t2.name
FROM 2 t2
LEFT JOIN
1 t1 ON t1.name=t2.name
WHERE t1.name IS NULL
select T2.Name
from Table2 as T2
where not exists (select * from Table1 as T1 where T1.Name = T2.Name)
See this article about performance of different implementations of anti-join (for SQL Server).
select t2.name
from t2,t1
where t2.name<>t1.name -- ( or t2.name!=t1.name)
If the DBMS supports it:
select name from table2
minus
select name from table1
A more portable solution could also be:
select name from table2
where name not in (select name from table1)
I am using SQL 2008 and am having a problem.
I have 3 different tables and here is a sample of my code.
SELECT DISTINCT Name Age
FROM Table1
LEFT JOIN Table2
ON Table1.ID = Table2.ID
This returns something like this:
Name Age
tom 12
ben 23
ian 12
I have another query
SELECT Name
FROM Table3
This returns this:
ian 12
ian 12
ian 12
I want to verify that if a name and age are in name from the first query and is in the second query Table3.name it will return something like this:
ian 12
ian 12
ian 12
I have been trying Join and Union on these two columns but so far have been only been able to get it to return. Any suggestions?
ian 12
This might be able to be simplified (need to see your table structures and sample data), but given your queries, a subquery should work for you:
SELECT T3.Name, T3.Age
FROM Table3 T3
JOIN (
SELECT DISTINCT Name, Age
FROM Table1
LEFT JOIN Table2
ON Table1.ID = Table2.ID
) T ON T3.Name = T.Name AND T3.Age = T.Age
SQL Fiddle Demo
First you pronounce your problem - you want to get all records for which combination of Name and Age exists in recordset of distinct Name and Age from join. Then you use power of declarative language and a little of CTE to get your solution:
;with CTE as (
select distinct Name, Age
from Table1 as T1
inner join Table2 as T2 on T2.ID = T1.ID
)
select *
from Table3 as T3
where exists (select * from CTE as C where C.Name = T3.Name and C.Age = T3.Age)
SQL FIDDLE EXAMPLE to fiddle with query
I have 2 tables with 2 columns (user_id and year).
Query1:
SELECT * FROM table_1 t1
FULL JOIN table_2 t2 ON t1.user_id=t2.user_id AND t1.year=t2.year
Produces following column names:
user_id, year, user_id_1, year_1
Query2:
CREATE TABLE table_copy AS SELECT * FROM
(SELECT * FROM table_1 t1
FULL JOIN table_2 t2 ON t1.user_id=t2.user_id AND t1.year=t2.year);
Produces following vague column names:
QCSJ_C000000000400000, QCSJ_C000000000400002, QCSJ_C000000000400001, QCSJ_C000000000400003
Is there a short way to force Oracle query2 to use the same names as query1 without writing them explicitly (it is important when there are many columns)? Maybe some Oracle settings?
List your columns and use AS to specify the column name.
e.g.
CREATE TABLE table_copy AS
SELECT t1.user_id AS t1_user_id,
t1.year AS t1_year,
t2.user_id AS t2_user_id,
t2.year AS t2_year
FROM table_1 t1
FULL JOIN table_2 t2 ON t1.user_id=t2.user_id
AND t1.year=t2.year;
I didn't fully understand your expected part but what I understood is you want all columns from table_1 or from table_2 only
if it is like this only you can use following query to create table ..
CREATE TABLE table_copy AS SELECT * FROM
(SELECT t1.* FROM table_1 t1 FULL JOIN table_2 t2
ON t1.user_id=t2.user_id
AND t1.year=t2.year);
or if you want both table's column but with different name in sql then you have to follow query suggested by cagcowboy only....
but you can create table with prefix like "t1_" in plsql without specify or write all column name..