I have an Hbase table with 6 columns created using Hive and loaded with data. I need to delete the rows which have the same data in that particular column. I need a HQL command for that.
Here are my table columns:
firstname lastname location id address description
I want to delete the rows with same description.
DELETE is available starting in Hive 0.14.
Deletes can only be performed on tables that support ACID. See Hive Transactions for details.
As for your query try something like
DELETE
FROM TABLE
WHERE TABLE.a IN (SELECT foo FROM B);
OR
DELETE
FROM T1
WHERE EXISTS (SELECT B FROM T2 WHERE T1.X = T2.Y)
Related
I have a base_table and a final_table having same columns with plan and date being the primary keys. The data flow happens from base to final table.
Initially final table will look like below:
After that the base table will have
Now the data needs to flow from base table to final table, based on primary keys columns (plan, date) and distinct rows the Final_table should have:
The first two rows gets updated with new values in percentage from base table to final table.
How do we write a SQL query for this?
I am looking to write this query in Redshift SQL.
Pseudo code tried:
insert into final_table
(plan, date, percentage)
select
b.plan, b.date, b. percentage from base_table
inner join final_table f on b.plan=f.plan andb.date=f.date;
First you need to understand that clustered (distributed) columnar databases like Redshift and Snowflake don't enforce uniqueness constraints (would be a performance killer). So your pseudo code is incorrect as this will create duplicate rows in the final_table.
You could use UPDATE to change the values in the rows with matching PKs. However, this won't work in the case where there are new values to be added to final_table. I expect you need a more general solution that works in the case of updated values AND new values.
The general way to address this is to create an "upsert" transaction that deletes the matching rows and then inserts rows into the target table. A transaction is needed so no other session can see the table where the rows are deleted but not yet inserted. It looks like:
begin;
delete from final_table
using base_table
where final_table.plan = base_table.plan
and final_table.date = base_table.date;
insert into final_table
select * from base_table;
commit;
Things to remember - 1) autocommit mode can break the transaction 2) you should vacuum and analyze the table if the number of rows changed is large.
Based on your description it is not clear that I have captured the full intent of you situation ("distinct rows from two tables"). If I have missed the intent please update.
You don't need an INSERT statement but an UPDATE statement -
UPDATE final_table
SET percentage = b.percentage
FROM base_table b
INNER JOIN final_table f ON b.plan = f.plan AND b.date = f.date;
I'm trying to figure out how to insert data from Table1 into Table2, then use the newly-created ID from Table2 to update the corresponding row in Table1.
I'm using Postgres 12.4 for what it's worth
Example:
I've got two tables, e.g. users and metadata
The users tables has the following columns
| id | info | metadata_id |
The metadata table has the following columns
| id | data |
I want to migrate all of my info values from the users table into the data column of the metadata table, and update my users.metadata_id (currently blank) with the corresponding metadata.id values, essentially backfilling foreign keys.
Is there any way to accomplish this gracefully? I've got a working query which locks both tables and creates a temporary sequence to insert into the metadata.id and users.metadata_id but this seems brittle and I would need to start the sequence after the highest-existing ID in the metadata table, which isn't ideal.
I've also tried to use a data-modifying CTE with a RETURNING clause to update the users table, but couldn't get that to work.
You can't use returning here, since you need to keep track of the association of users and metadata while inserting.
I think it is simpler to first pre-generate the metadata serial of each user in a CTE, using nextval(). You can then use that information to insert into metadata and update the users table:
with
candidates as (
select u.*, nextval(pg_get_serial_sequence('metadata', 'id')) new_metadata_id
from users u
),
inserted as (
insert into metadata (id, data) overriding system value
select new_metadata_id, info from candidates
)
update users u
set metadata_id = c.new_metadata_id
from candidates c
where c.id = u.id
We need the overriding system value clause in the insert statement so Postgres allows us to write to a serial column.
Demo on DB Fiddle
I have 2 tables, I_786 (100k records) and B_786(700k records). The I_786 table all records must be deleted from B_786 table. Here I table is derived from B table by taking the first 100k records of it by using rownum. After that i need to delete the same records which are in I table.
I created the query as below,
DELETE FROM B_786
WHERE EXISTS (SELECT *
FROM I_786);
But it is deleting all the data from B_786 table. Here the where condition is not working.
How to optimize the where clause here?
You should use some column on which basis you want to delete the record from table B_786, like if there are 2 columns in table B_786 (id name) and having 700k records, and there are
100k records I_786 with columns (id, name). So to delete the data from B_786 table which matches the record with I_786 table.
Delete from B_786 where id in (select id from I_786);
By executing above command data will be deleted from B_786 which matches the id from I_786.
DELETE FROM B_786
WHERE EXISTS (SELECT 1 FROM I_786 where I_786.id = B_786.id);
In your case, you need to use a reference key, a field in one table that refers to the PRIMARY KEY in another table.
Assuming pid as referenced key in table I_786 and fid as referenced key in B_786
Your query will look like this:
DELETE FROM B_786 WHERE B_786.fid IN (SELECT I_786.pid FROM I_786) as temp;
I have 2 tables in my SQL database:
And I want to merge them in a way the result will be:
This is just an example for 2 tables which need to be merged into one new table (The tables contain an example data, the statement should work for any amount of data inside the tables).
The ID which got different value in CSV should be updated into the new table for example:
ID 3's value is 'KKK' and in table T is 'CCC', then what should be updated is the CSV table.
You seem to want a left join and to match to the second table if available:
select t.id, coalesce(csv.value, t.value) as value
from t left join
csv
on t.id = csv.id;
If you want this in a new table, use the appropriate construct for your database, or use insert to insert into an existing table.
How would I delete records from a table where they match a delete table? As in, I have a table of record keys that say what need to be deleted from my main table. How would I write a delete to say "delete anything from my main table where this field matches a field in my delete table?"
If it's a small delete table:
delete from TableA A
where a.key in ( select key from deleteTable );
If it's a bigger table, you can try an EXISTs:
delete from TableA A
where exists ( select * from deleteTable d
where d.key = A.key );
All this depends of course, on your indexes and sizes of tables ... etc ....
(and if it gets really big, you might want to consider another option .. ie partitioning, rebuild table, etc.)