Hi I have question about hive.
assume there are two tables t1 and t2. t1 and t2 have columns with the same names.
t1:
emp_id, name, salary, adress
1, a, 100, f
t2:
emp_id, name, org, product
1, trk, as, dss
and when I want to select these tables like
select *from t1,t2 join .....
hive birng me all columns but I am not able to identify which name columns come from which table.
please help, Thank you very much for your interest.
Also how to identify same columns from multiple tables in hive
Use an alias instead of the *.
Select t1.ColumnName1 as [t1_ColumnName1],
t1.ColumnName2 as [t1_ColumnName2],
... ,
t1.ColumnNameN as [t1_ColumnNameN],
t2.ColumnName1 as [t2_ColumnName1],
t2.ColumnName2 as [t2_ColumnName2],
... ,
t2.ColumnNameN as [t2_ColumnNameN]
from t1 join t2 ...
Related
Sql help needed.
Totalcount= Employees + Count
The column names are like that. These are two random tables we are trying to join.
Imp: It is possible that - what exists in Table1 may not exist in Table2. Also what exists in table2 may not exist in Table1. So if exists in both then sum total needed, If not individual value
One method uses a full join:
select coalesce(t1.company, t2.entity) as company,
coalesce(t1.employees, 0) + coalesce(t2.count, 0) as totalcount
from table1 t1 full join
table2 t2
on t1.company = t2.entity
You can use union all and aggregation:
select entity, sum(cnt) total_count
from (
select entity, cnt from table2
union all select company, employees from table1
) t
group by entity
order by entity
For this to work properly , you need the columns in both tables to have the same datatype, ie table2.entity should have the same datatype as table1.company' (as well as table2.cnt and table1.employees). If the datatypes do not match, you must explictly cast the columns to adujst.
I have two tables in Hive, Table1 and Table2. I want to get each distinct customerID in Table1 and map it to each distinct value in a column called category of Table2. However I am a bit lost on how to do this in hive. A better example of what I am trying to do is the following: Let's say Table1 contains 5 distinct customerID's and Table2 contains 3 distinct categories. I want my query result to look something like the following:
However Table1 and Table2 do not have any columns in common so I am a bit lost on how to perform a join on this two tables in hive. Is this task possible in hive? Any insights on this would be greatly appreciated!
You can do that with a cross join of distinct values from both tables.
select t1.customerid,t2.categories
from (select distinct customerid from tbl1) t1
cross join (select distinct categories from tbl2) t2
I have two tables table1 and table2. Both tables have a common column named city.
How do I find all values under city which are in both the tables ?
You can do an inner join on the city column, to find values that exist in both tables.
select
-- Output the city from either table (since it will be the same)
t1.city
from
-- Join table1 and table2 together, on a matching city column
table1 t1 join table2 t2 on (t1.city=t2.city)
group by
-- Only return a single row per city
t1.city
SELECT tbone.desired_column1
tbone.desired_column2,
--other columns from table one
tbtwo.desired_column1,
tbtwo.desired_column2
--other columns from table two
-- Bellow we're stating what this table could be identified as (tbone and tbtwo), so that you don't have to keep typing table name above and bellow. Can be anything, such as A or B or HORSECORRECTINGBATTERY
FROM table1 tbone,
table2 tbtwo
WHERE tbone.city = tbtwo.city
If you don't want to specify which columns to take, just go with
SELECT * FROM ...
So I have two tables:
table_1 and table_2
They both have various columns with the same name.
We only need to work with 2 columns:
ID and REGION
table_1 has ID fields that are distinct to table_1 only.
table_2 has ID fields that are distinct to table_2 only.
however, some ID fields are shared by both table_1 and table_2
I need to write a query where i get the number of different ID fields from both tables where REGION = '1'
A FULL OUTER JOIN should do the trick.
SELECT COUNT(*)
FROM table_1
FULL OUTER JOIN table_2 ON (table_1.id=table_2.id)
It will create a single row for every id that is either in table_1 or table_2. If the id is in both tables, it will still create a single row.
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins
Using SQL, take advantage of a UNION to eliminate duplicate values between the two tables, so you're left with a distinct list of ID values to count.
SELECT COUNT(*)
FROM (SELECT ID
FROM table_1
WHERE REGION = '1'
UNION
SELECT ID
FROM table_2
WHERE REGION = '1') t
I have two databases, for argument sake lets call them db1 and db2. they are both structured exactly the same and both have a table called table1 which both have fields id and value1.
My question is how do I do a query that selects the field value1 for both tables linked by the same id???
You can prefix the table names with the database name to identify the two similarly named tables. You can then use that fully qualified table name to refer to the similarly named fields.
So, without aliases:
select db1.table1.id, db1.table1.value1, db2.table1.value1
from db1.table1 inner join db2.table1 on db1.table1.id = db2.table1.id
and with aliases
select t1.id, t1.value1, t2.value1
from db1.table1 as t1 inner join db2.table1 as t2 on t1.id = t2.id
You may also want to alias the selected columns so your select line becomes:
select t1.id as id, t1.value1 as value_from_db1, t2.value1 as value_from_db2
This is T-Sql, but I can't imagine mysql would be that much different (will delete answer if that's not the case)
SELECT
a.Value1 AS [aValue]
,b.Value1 AS [bValue]
FROM
db1.dbo.Table1 a
INNER JOIN db2.dbo.Table1 b
ON a.Id = b.Id
Try something such as this.
$dbhost="server_name";
$dbuser1="user1";
$dbpass1="password1";
$dbname1="database_I";
$dbname2="database_II";
$db1=mssql_connect($dbhost,$dbuser1,$dbpass1);
mssql_select_db($dbname1,$db1);
$query="SELECT ... FROM database_I.table1, database_II.table2 WHERE ....";
etc. Sorry if this does not help.
There is an easy way in sql. Extend your syntax for FROM clause, so instead of using select ... from tablename, use
select ... from database.namespace.tablename
The default namespace is dbo.
You could use a union select:
Simple example:
select "one" union select "two";
This will return 2 rows, the first row contains one and the 2nd row contains two. It is as if you are concatenating 2 sql quires, the only constant is that they both must return the same number of columns.
Multiple databases:
select * from client_db.users where id=1 union select * from master_db.users where id=1;
In this case both users databases must have the same number of columns. You said they have the same structure, so you shouldn't have a problem.