How to move data between partitioned tables in Hive - hive

I have a two tables emp1 and emp2 having fields-
userid
name
occupation
country
emp1 has partition on country and emp2 has partition on occupation
How can I move data from emp1 to emp2

Overwrite target table with a dataset from emp1 plus (union all) old data that was in emp2 table. Note distribute by at the end of the query - this is for optimizing partitions creation, final reducers will receive only their partition data, this will reduce memory consumption.
insert overwrite table emp2 partition(occupation)
select userid, name, country, occupation from emp1
union all
select userid, name, country, occupation from emp2
distribute by occupation;
Additionally you may add removing duplicates using row_number().

Related

How to do a SQL query with SRFs and display only distinct values?

I'm trying to think about a way to do a query with a single row function and display only distinct values.
Lets suppose that I have a table employees with the columns employee, store and salary and I want to use the SRF MAX(salary) to find out the best paid employee in each store. If there are more than 1 employee earning the MAX(salary) in one store, how can I avoid displaying more than 1 top earner per store? See the code below. The simpler the better. Thank you!
SELECT employee, emp1.store, emp1.salary
FROM employees emp1
INNER JOIN (SELECT store, salary, MAX(salary) FROM employees GROUP BY store) emp2
ON emp1.store = emp2.store
AND emp1.salary = emp2.salary
There is probably a cleaner way to do this:
select * from
(
SELECT employee, emp1.store, emp1.salary,
ROW_NUMBER( )
OVER ( partition by emp1.store order by emp1.salary desc ) rn
FROM employees emp1
INNER JOIN (SELECT store, salary, MAX(salary) FROM employees GROUP BY store, salary) emp2
ON emp1.store = emp2.store
AND emp1.salary = emp2.salary
)
where rn = 1
Row_number() analytic function will assign a value of 1,2,3,... grouped by store. then you always just pick row 1 for each grouping.

Tuning SQL query if where clause uses non index column in Oracle

I have a table emp with columns id, empname, sal, join_date and I have this query:
select *
from emp
where join_date between date1 and date2 ;
Only id has index no other other column has.
How can I tune this query it is on prod and customer don't let me create index or partition on join_date column. It takes 5-7 minutes to return just a few rows (10-15). How can I improve performance?

select column from table with only matched data from another table

I need to select 2 column from table with matched data from another table or cell be null ,
table 1 named "emp" contain emp_name ,emp_id
table 2 named "salary" contain emp_sal, emp_id
I need to create select query
have all emp_name, emp_id and emp_sal (for only the employees will take sale) or be Null
thanks for help now ((((update ))))
first thanks for help
i used
SELECT emp.emp_id,emp.emp_name,salary.emp_sal FROM emp LEFT JOIN salary ON emp.emp_id = salary.emp_id;
it work but with a lot of duplication and i need to make this query with day ...
i create another table named "day" i need query appear day i entered in this table
this table have only one column and i record ((day user entered and saved in "day.user_day"))
i need to link this three tables
together
and lets make it easy we will change salary to attendance ...
i need to query all names and id in date and apear all employee what ever thy have time or not
like when i search only in day 4/8/2014
name id time
john 1 04/08/2014 06:00
man 2 null
scsv 3 04/08/2014 07:00
You want a LEFT JOIN:
SELECT * FROM emp LEFT JOIN salary ON emp.emp_id = salary.emp_id;
This will return all employees along with their emp_sal, which will be NULL if it's not in the salary table.
If what I read is right, you want to get the employee name and salary, returning all employees regardless if they have an entry in salary. If that is correct, this should work:
SELECT
e.emp_name,
e.emp_id,
s.emp_sal
FROM
emp AS e
LEFT JOIN salary AS s ON e.emp_id = s.emp_id
If, however, you only wanted the employees with an entry in the salary table, change LEFT JOIN to be INNER JOIN.

SQL for calculating salary contribution by each department

I am writing a simple query using in oracle database that finds the salary contribution by each department.
Here are my tables:
CREATE TABLE employee (
empid NUMBER,
fname VARCHAR2(20),
deptid NUMBER,
salary NUMBER
);
CREATE TABLE department (
deptid NUMBER,
deptname VARCHAR2(20)
);
Inserting data into this table:
INSERT INTO department VALUES (1, 'Sales');
INSERT INTO department VALUES (2, 'Accounting');
INSERT INTO employee VALUES (1,' John', 1,100);
INSERT INTO employee VALUES (2,' Lisa', 2,200);
INSERT INTO employee VALUES (3,' Jerry', 1,300);
INSERT INTO employee VALUES (4,' Sara', 1,400);
Now to find out the salary contribution in percentage by each department I am using below query:
select dept.deptname, sum(emp.salary)/(select sum(emp.salary) from employee emp)*100 as percentge from employee emp, department dept where dept.deptid=emp.deptid group by dept.deptname;
Is this efficient way of calculating my output or Is there any alternate way?
Please try:
select distinct a.*,
(sum(Salary) over(partition by a.DeptID))/(sum(Salary) over())*100 "Percent"
from department a join employee b on a.deptid=b.deptid
You don't need a subquery for this. You can use analytic functions:
select dept.deptname,
100*sum(emp.salary)/(sum(sum(emp.salary)) over ()) as percentage
from employee emp join
department dept
on dept.deptid = emp.deptid
group by dept.deptname;
I also changed the join syntax to use ANSI standard joins.
EDIT:
There is not a particular "issue" with using subqueries for this. A subquery does work. In general, though, subqueries are harder to optimize than the built-in features in Oracle (and in this case in ANSI SQL). In this simple case, I don't know if there is a performance difference.
As for analytic functions, they are a very powerful component of SQL and you should learn about them.
By with clause you can calculate sum for all departments once and then use it as parameter. On your example sum value for all departments calculated for each row and this will lead to performance loss.
with t as
(select sum(salary) as sum_salary from employee)
select dept.deptname, sum(emp.salary)/ sum_salary * 100 as percentge
from employee emp, department dept, t
where dept.deptid=emp.deptid group by dept.deptname, sum_salary;

i have a oracle table EMP columns are NAME,AGE,DEPT

i have a oracle table EMP columns are NAME,AGE,DEPT.
now i want to retrive data from EMP using "select statement";
Select Name, age, dept from Emp;
select Dept, age, emp from Emp;
Which one will take less time to retrieve data?
Or will the retrieve time not be different?
you can use
Select Name, Age, Dept From EMP
Or
if you have only Three columns in your EMP table than you also use
Select * from EMP.
both take same time..
in your question your second query is time consuming because you use, two time dept column in it.
for Better performance you can create index on EMP Table.