Hive INSERT INTO vs UNION ALL performance

Hive INSERT INTO vs UNION ALL performance - hive

Lets say I have two tables A and B with same schema and 1 Million rows each,
I can add values in table A and B into a table C using either a UNION ALL or two INSERT INTO statements.
I actually did it and found that the INSERT INTO did better, but I would like to know why? And is UNION ALL better in any specific scenario?

Related

Compare two table or insert unique values

Dear users of BigQuery.
I've a table with millions of records (table 2) and in this table I've few lost data. So I genrate an other table with all data (table 1).
I need to integrate lost data from table 1 to table 2 or integrate all data of table 1 to table 2 and remove all duplicate records, so I've several ways to do that.
What is the best way to do this according to you ?
Thanks for your help.

The best way to do this would be using UNION DISTINCT .
Your query will look something like :
Select name, timestamp, value from Project.Dataset.Table2
UNION DISTINCT
Select name, timestamp, value from Project.Dataset.Table1
This should work fine and give you the results.

Effective way to move data from one table into multiple tables

I have TableA that has millions of records and 40 columns.
I would like to move:
- columns 1-30 into Table B
- columns 31-40 into Table C
This multiple Insert question shows how I would assume I should do it
INSERT INTO TableB (col1, col2, ...)
SELECT c1, c2,...
FROM TableA...
I wanted to know if there was a different/quicker way I could pass the data. Essentially, I don't want to wait for One table to finish Insert processing before the other Insert statement starts to execute

I'm afraid there is no way in the SQL standard to have what is often called a T junction at the end of an INSERT .. SELECT. This, I'm afraid, is the privilege of ETL tools. But the ETL tools connect twice to the database, once for each leg of the T junction, and the resulting two INSERT INTO tab_x VALUES (?,?,?,?) statements run in parallel.
Which brings me to a possible solution that could make sense:
Create two scripts. One goes INSERT INTO table_b1 SELECT col1,col2 FROM table_a;. One goes INSERT INTO table_b2 SELECT col3,col4 FROM table_a;. Then, as it's SQL server, launch two isql sessions in parallel, each running their own script.

Insert from 2 tables into 1 table

I would like to insert columns from 2 different tables into one table.
The following is not working on the second Insert. Is this possible to do this way? the receiving table is currently empty. the idea is to have one row in the third table per the 2 inserts.
INSERT INTO CA1665AFTT.EMODESADV3
(E3ecsn, e3mena, e3hand)
SELECT e1ecsn, e1mena, e1hand
FROM CA1665AFTT.EMODESADV1
INSERT INTO CA1665AFTT.EMODESADV3
(E3CPRD, E3CQTY)
SELECT prdc, oqty
FROM Monica.emod

Yes it's possible using a join between the two tables

Copy data from a table and load it into another table

I have a table 'A' having 40 columns. I need to copy the data from 20 specific columns from 'A' , to another table 'B' having those 20 columns. There will be around 3 - 10 million records.
What will be the most efficient way to do this in PLSQL.

"daily table B will be truncated and new data will be inserted into it
from A."
Okay, so the most efficient way to do this is not to do it. Use a materialized view instead; a materialized view log on table A will allow you to capture incremental changes and apply them daily, or at any other window you like. Find out more.
Compared to that approach using handrolled PL/SQL - or even pure SQL - is laughably inefficient.

Do you need to do any sort of conversion on the data or is it just copying data straight from one table to another?
The easiest way to do this is, although you would have to create the indexes separately.
create table B as (select A.c1, A.c2, A.c3..... from A);
If table x already existed, you could just do a
insert into B select A.c1, A.c2.... from A
To speed this up, you would want to drop all the indexes on table x until the insert was done.

Query select a bulk of IDs from a table - SQL

I have a table which holds ~1M rows. My application has a list of ~100K IDs which belong to that table (the list being generated by the application layer).
Is there a common-method of how to query all of these IDs? ~100K Select queries? A temporary table which I insert the ~100K IDs to, and Select query via join the required table?
Thanks,
Doori Bar

You could do it in one query, something like
SELECT * FROM large_table WHERE id IN (...)
Insert a comma-separated list of IDs where I put the ...
Unfortunately, there is no easy way that I know of to parametrize this, so you need to be extra-super careful to avoid SQL injection vulnerabilities.

A temporary table which holds the 100k IDs seems like a good solution. Don't insert them one by one though ; INSERT ... VALUES syntax in MySQL accepts the insertion of multiple rows.
By the way, where do you get your 100k IDs, if it's not from the database ? If they come from a preceding request, I'd suggest to have it fill the temporary table.
Edit : For a more portable way of multiple insert :
INSERT INTO mytable (col1, col2) SELECT 'foo', 0 UNION SELECT 'bar', 1

Do those id's actually reference the table with 1M rows?
If so, you could use SELECT * ids FROM <1M table>
where ids is the ID column and where "1M table" is the name of the table which holds the 1M rows.
but I don't think I really understand your question...

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Hive INSERT INTO vs UNION ALL performance - hive

Related

Compare two table or insert unique values

Effective way to move data from one table into multiple tables

Insert from 2 tables into 1 table

Copy data from a table and load it into another table

Query select a bulk of IDs from a table - SQL

Categories

Resources