How to do update/delete operations on non-transactional table - hive

I am a beginner to hive and got to know that update/delete operations are not supported on non-transactional tables. I didn't get a clear picture of why those operations are not supported? Also, wanted to know if there exists a way to update the non-transactional table.

why those operations are not supported?
Transaction tables are special managed tables in hive which should be in orc format, bucketed(hive 2.0), dbtransacmanager enabled and concurrency enabled. These properties ensures we can do ACID operations. And every DML(DEL/UPD/MERGE/INS) creates delta file which tracks the changes. So, hive does a lot of internal operations to maintain the table and data.
wanted to know if there exists a way to update the non-transactional table.
Of course there is.
The idea is to truncate and insert whole data after changing the data.
You can update a field using below code.
Assume you have emp table and you want to update emp salary to 5000 for id =10. For simplicity's sake, emp table has 3 columns only - empid, empname, empsal.
Insert overwrite emp -- truncate and load emp table
Select empid, empname, empsal from emp where id <> 10 -- non changed data
union all
Select empid, empname, 5000 empsal from emp where id = 10 -- union to changed data
You can use similar sql to delete employee id 10.
Insert overwrite emp -- truncate and load emp table
Select empid, empname, empsal from emp where id <> 10 -- insert when emp id not equal to 10 which means it gets deleted from emp
You can do merge too. You can ins/upd/del data from other sources too.

Related

My question is how to create a VPD in Oracle with SQL that will also mask data

I am trying to create a VPD in Oracle using SQL statements. The purpose of this problem is so an employee could ONLY view records for employees in the same department while masking their coworker's salaries as NULL.
The code for the table being used is as follows
create table Employee
(
ID number primary key,
DEPT varchar2(25),
SALARY number(8,2),
NAME varchar2(25)
);
I am unsure what the best way to go about doing this would be.... would it be to create a package and use an application context. I believe getting the table to only display those in same "DEPT" I understand but unsure how to mask the data of those with same DEPT but different ID.
Native RLS will get you close but not totally there. Using "sec_relevant_cols" will give you the option between
only seeing the rows that match your predicate, but all values are present
seeing all the rows, but masking values that do not match your predicate
whereas (if I'm reading correctly) you want to see only predicate matching rows AND mask out some values as well.
You could achieve this with a two-step method
Your context contains two keys (say) DEPT and YOUR_ID
The RLS policy is "where dept = sys_context(ctx,'DEPT')"
You have a view EMP to which that policy is applied, being
select
id,
dept,
name,
case when id = sys_context(ctx,'YOUR_ID') then sal else null end sal
from EMP_TABLE

BigQuery table partition

I have table for e.g emp which is not partitioned and contains 200TB of data.
I want to create partition table from emp table but it should have name emp only.
To do that i have to first create partition table emp_1 from emp table then drop emp then create emp from emp_1
This way have to load 200 TB two times. Is there any alternate solution?
You can copy emp to emp_1. Copy job is a metadata only operation, which is fast and free. Then you can drop emp and re-create as partitioned table, then load the data from emp_1 to emp.

What is the easiest (and fastest) way to update 20K employee data records from a CSV data dump in SQL Server?

I have an "employees" table with 50k+ records. We only have 24k employees but some of the employees that are no longer here are tied to historical projects so I don't want to delete them. And, of course, we've hired more employees that are working on NEW projects so they need to be added to the employees table.
I managed to convince HR to give me a CSV file with the employee data we keep in our table and now I need a way to update the existing records (new phone numbers, departments, etc...) and add new.
There are 3 criteria:
if the record exists in the CSV and the "employees" table, UPDATE the data;
if the record exists in the CSV and NOT the "employees" table, INSERT the data;
if the record exists in the "employees" table and NOT the CSV, set the record to "inactive."
This will be a regular (monthly) process so a Stored Procedure or a Function would be doable.
Suggestions please...
UPDATE: The MERGE idea works but only solves 2/3 of the problem (it does not meet criteria #3 because I do not want to delete the record if the employee is not longer with the company). When adding the 2nd UPDATE statement after the NOT MATCHED BY SOURCE, it returns an error indicating I cannot update the same record twice.
Any suggestions to this final piece of the puzzle?
What about using 'merge'?
MERGE target_table USING source_table
ON merge_condition
WHEN MATCHED
THEN update_statement
WHEN NOT MATCHED
THEN insert_statement
WHEN NOT MATCHED BY SOURCE
THEN DELETE;

How to retrieve a dropped table?

I accidentally deleted a table called DEPARTMENT from my oracle database and I want to restore it back. So I googled and found the solution.
Here is what I did:
SHOW RECYCLEBIN;
CRIMINALS BIN$hqnw1JViXO/gUwPAcgqn3A==$0 TABLE 2019-04-16:13:17:16
DEPARTMENT BIN$hqnw1JVjXO/gUwPAcgqn3A==$0 TABLE 2019-04-16:13:19:04
DEPARTMENT BIN$hqnw1JVkXO/gUwPAcgqn3A==$0 TABLE 2019-04-16:13:21:23
DEPARTMENT BIN$hqnw1JVnXO/gUwPAcgqn3A==$0 TABLE 2019-04-16:13:36:34
FLASHBACK table department TO BEFORE DROP;
Flashback succeeded.
If you can see the SHOW RECYCLEBIN QUERY, You can tell there are more than one department table and all of them have different content. My Question is how can I get the content of all 3 table in one.
After flashback, rename each DEPARTMENT table to a new name, e.g.
rename department to dept_1;
Do it for all of them but the last one (whose name will remain DEPARTMENT). Then insert the rest of data into it:
insert into department
select * from dept_1
union all
select * from dept_2;
Note that uniqueness might be violated; if table's description has changed, select * might not work (so you'll have to name all columns, one-by-one).
But, generally speaking, that's the idea of how to do it.

How do I delete all related records?

I'm trying to delete a user and all the related records tied to him, and I have no clue how to use the SQL INNER JOIN statement, is there any way to do something in the style of:
DELETE * FROM tblUsers, tblEnrollment, tblLinkActivities, tblFullSchedule, tblSchedule, tblLinkMedical
WHERE [IDUser] = ?
(I know that's completely incorrect)
My relationships chart looks like so:
Would it be easier to use 6 delete commands? Or is there another command that does that? Thanks a bunch..
Since you already have defined relationships with referential integrity, simply set the Cascade Delete Related Records option for each relationship.
See https://support.office.com/en-us/article/create-edit-or-delete-a-relationship-dfa453a7-0b6d-4c34-a128-fdebc7e686af#__bmcascade
This way you only need to delete from tblUsers, all related records are deleted automatically.
If you can't or don't want to do this, you need to run separate delete queries on the related tables before deleting the main record.
There's no way to delete records in multiple tables at the same time in single sql query. You need to write multiple delete statements. The better way is to write an inner query with all tables involved and delete in each table.
For ex:
delete from dept where DEPTNO IN (Select a.DEPTNO from emp a , dept b where a.DEPTNO=b.DEPTNO and a.DEPTNO=10)
delete from emp where DEPTNO IN (Select a.DEPTNO from emp a , dept b where a.DEPTNO=b.DEPTNO