How do I load the data from the first 2 hive tables into the 3rd one below? - sql

The below is a simplified version of the problem I am facing
Let's say I have an employee and a department table in Hive. My goal is to load the data from these 2 tables into a 3rd one below. However, the 3rd table has a few dummy columns set to null and will not be filled by data from either of the employee or department tables. Is it possible to still load the employee and department data and just set the other fields to null?
Employee table(id,first_name,last_name,age,department_id,salary)
1,John,Smith,23,1,40000
2,Bob,Wilson,25,1,45000
3,Fred,Krug,37,2,75000
4,Jeremy,Fisher,41,3,110000
Department table(id,name)
1,Sales
2,IT
3,Marketing
End result(dummy_column0,employeeID,first_name,last_name,age,salary,department_name,dummy_column1)
null,1,John,Smith,23,40000,Sales,null
null,2,Bob,Wilson,25,45000,Sales,null
null,3,Fred,Krug,37,75000,IT,null
null,4,Jeremy,Fisher,41,110000,Marketing,null
Question is given the schema of the end result, how do I load the rest of the non-null data into the 3rd table? Any help would be much appreciated! The end results table already exists at this point so I cannot just recreate it from scratch

Yes. Hive doesn't care of the column names. Its just position of the columns that matter the most. you just have to structure your query in a way so dummy columns have nulls.
insert overwrite table tablename
select null, employeeID, first_name,last_name, age, salary, dept.deptName, null
from employee e join dept d on e.dept_id = d.dept_id;

Related

What is the output of the query if the query try to fetch information form same table multiple time

Consider the following relational data table, employee. Now find the output for the following SQL statement?
SELECT COUNT(*)
FROM employee, employee, employee
Employee table
gid
name
Three
E101
John
HRM
E102
Lucy
Marketing
E103
Rick
Management
This will produce error since you didn't use unique aliases. You need to assign unique name (alias) to all the tables in from clause. But you should run it first. Here I am sharing a fiddle link please go there and run the query.
Schema (MySQL v5.7)
create table employee (gid varchar(20), name varchar(20),Three varchar(20));
insert into employee values('E101','John','HRM');
insert into employee values('E102','Lucy','Marketing');
insert into employee values('E103','Rick','Management');
Query #1
SELECT count(*) From employee a, employee b, employee c;
count(*)
27
View on DB Fiddle

Copying data from one table to another different column names

I'm having an issue copying one table's data to another. I have around 100 or so individual tables that have generally the same field names but not always. I need to be able to copy and map the fields. example: source table is BROWARD and has column names broward_ID, name, dob, address (the list goes on). The temp table I want to copy it to has ID, name, dob, address etc.
I'd like to map the fields like broward_ID = ID, name = name, etc. But many of the other tables are different in column name, so I will have to write a query for each one. Once I figure out the first on, I can do the rest. Also the column in both tables are not in order either..thanks in advance for the TSQL...
With tables:
BROWARD (broward_ID, name, dob, address) /*source*/
TEMP (ID, name, address,dob) /*target*/
If you want to copy information from BROWARD to TEMP then:
INSERT INTO TEMP SELECT broward_ID,NAME,ADDRESS,DOB FROM BROWARD --check that the order of columns in select represents the order in the target table
If you want only copy values of broward_ID and name then:
INSERT INTO TEMP(ID, name) SELECT broward_ID,NAME FROM BROWARD
Your question will resolve using update
Let's consider we have two different table
Table A
Id Name
1 abc
2 cde
Table B
Id Name
1
2
In above case want to insert Table A Name column data into Table B Name column
update B inner join on B.Id = A.Id set B.Name = A.Name where ...

Replace values in column with Oracle

How can I change all the values of a single column for other ones in one single order?
For example, I want to change old values of the last column salary (2250,1,3500,1) for new ones (2352,7512,4253,1142). I have this database:
I know how to do it but changing step by step, and it is not efficient if I have a lot of rows. This way:
UPDATE TABLE tablename
SET salary = REPLACE(tablename.salary, 2250, 2352);
and then perform that operation multiple times.
UPDATE TABLE tablename
SET salary = 2250
WHERE salary = 2352
I'm not sure what you're aiming for with the REPLACE() function but if you want to change the values then you need to do it like the above code. Set the salary to what you want WHERE it has a salary of 2250.
You can write it a few times with the different criteria and then run it.
EDIT: Since you're worried about doing this numerous times you can create a table called salaries:
CREATE TABLE t_salary AS
SELECT salary from tablename;
ALTER t_salary add newsalary integer after salary;
In the 'newsalary' column you can add what the new salary should be then do an inner join. I just created a table for this purpose called stackoverflow (which would be your 'tablename'
update stackoverflow s
inner join t_salary ns on s.salary = ns.salary
set s.salary = ns.newsalary;
Now what this will do is join tablename to t_salary where the current salary = the salary in t_salary. Then you set the tablename.salary equal to the new salary, this worked for me I hope it works for you.
Note, the syntax may be slightly different since I don't have Oracle installed on my home machine, I used MySQL.
Since you already a list old salary values and their corresponding new salary values you can place them in a flat file and create an external table in oracle to point to this file.
Once that is done then you can just fire a simple update statement similar to the one given below:
update test1 set salary = ( select newsalary from test2 where test1.empid = test2.empid);

How to perform a mass SQL insert to one table with rows from two seperate tables

I need some T-SQL help. We have an application which tracks Training Requirements assigned to each employee (such as CPR, First Aid, etc.). There are certain minimum Training Requirements which all employees must be assigned and my HR department wants me to give them the ability to assign those minimum Training Requirements to all personnel with the click of a button. So I have created a table called TrainingRequirementsForAllEmployees which has the TrainingRequirementID's of those identified minimum TrainingRequirements.
I want to insert rows into table Employee_X_TrainingRequirements for every employee in the Employees table joined with every row from TrainingRequirementsForAllEmployees.
I will add abbreviated table schema for clarity.
First table is Employees:
EmployeeNumber PK char(6)
EmployeeName varchar(50)
Second Table is TrainingRequirementsForAllEmployees:
TrainingRequirementID PK int
Third table (the one I need to Insert Into) is Employee_X_TrainingRequirements:
TrainingRequirementID PK int
EmployeeNumber PK char(6)
I don't know what the Stored Procedure should look like to achieve the results I need. Thanks for any help.
cross join operator is suitable when cartesian product of two sets of data is needed. So in the body of your stored procedure you should have something like:
insert into Employee_X_TrainingRequirements (TrainingRequirementID, EmployeeNumber)
select r.TrainingRequirementID, e.EmployeeNumber
from Employees e
cross join TrainingRequirementsForAllEmployees r
where not exists (
select 1 from Employee_X_TrainingRequirements
where TrainingRequirementID = r.TrainingRequirementID
and EmployeeNumber = e.EmployeeNumber
)

Data to be inserted by SSIS

I have a table known as Customer(DATABASE AAA) containing 3 fields
Cust_id Cust_name Cust_salary
1 A 2000
2 B 3000
3 C NULL
I want to put data of these 3 columns in Employee(DATABASE BBB) which has the same structure as of Customer.
I want to transfer records of only those customer in which Cust_salary part is not null.
This work is to be done in SSIS only. MY values for Cust_id is auto generated & before putting values to Employee_id,the Employee table should be deleted.The auto generated identity should be preserved.
You could create a SQL Execute Task in SSIS and run the following:
INSERT INTO Employee
(EmployeeId, EmployeeName, EmployeeSalary)
SELECT Cust_id, Cust_name, Cust_salary
FROM Customer
WHERE Cust_salary IS NOT NULL
Darren Davies answer seems correct, but if for some obscure reason you have an EmployeeID is also an identity column and needs to match Cust_ID, and assuming any entries already in the Employee table correspond with the correct customer you can use an Execute SQL Task in SSIS with a connection open to Database BBB to run the following:
SET IDENTITY_INSERT Employee ON
INSERT INTO Employee (EmployeeID, EmployeeName, EmployeeSalary)
SELECT Cust_ID, Cust_Name, Cust_Salary
FROM AAA..Customer
WHERE Cust_Salary IS NOT NULL
AND NOT EXISTS
( SELECT 1
FROM Employee
WHERE EmployeeID = Cust_ID
)
SET IDENTITY_INSERT Employee OFF
This will maintain the integrity of the Identity fields in each table, and only insert new Customers to the Employee table.
what have you tried?
You will need two connections, one for each DB and one data flow component which will have a OleDBSource and an OleDBDestination component inside.
On the OleDBSource you can select your connection and write your query and then you drag the green arrow to the OleDBDestination. Double click the OleDBDestination select destination connection and table and click on mapping.
Should be it