SQL: Insert certain records from one table into another and also add few other fields using query - sql

I have two tables say TABLE1 and TABLE2. And say the field id is common in both. Rest of field are different.
I now select all distinct id from TABLE1 and want to insert them into TABLE2 while also writing its other attributes. Like the pseudocode below.
for each distinct id (i) in TABLE1:
INSERT in TABLE2 (i, false, unix_timestamp())
end
Now I for some reason cannot use a programming language to do this. Is it possible to do this in SQL using Apache Drill?

What you could do is write a query that produces the output you're looking for and then save that as a table. Drill is really a query engine and doesn't support INSERT operations the way a database does.
So a pseudo query migth look like this:
CREATE TABLE <your file> AS
SELECT ...
Then you could query that file. I don't know if that helps or not. You can also create views and temporary tables, but Drill itself doesn't really implement INSERT commands.

Related

Reject / Bad Records Table in BigQuery

I am looking for a reject link type of solution in a dedup scenario. For example in the following code:
MERGE
temp.many_random t
USING
( SELECT DISTINCT * FROM temp.many_random WHERE d=CURRENT_DATE() )
ON FALSE
WHEN NOT MATCHED
BY SOURCE AND d=CURRENT_DATE() THEN DELETE
WHEN NOT MATCHED BY TARGET THEN INSERT ROW
Can I replace THEN DELETE with something like INSERT INTO TABLE (different than the compare tables) so that we can capture these rejects and troubleshoot for pipeline analysis?
If I understand the problem, you want the result of a MERGE query to write to 2 different tables.
Since MERGE can't do that, I'll suggest to write 2 queries:
One that does whatever the primary query is doing.
A second almost identical one, but that writes the wrong records to a different table.

BigQuery loop to select values from dynamic table_names registered in another table

I'm looking for a solution to extract data from multiple tables and insert it into another table automatically running a single script. I need to query many tables, so I want to make a loop to select from those table's names dynamically.
I wonder if I could have a table with table names, and execute a loop like:
foreach(i in table_names)
insert into aggregated_table select * from table_names[i]
end
Below is for BigQuery Standard SQL
#standardSQL
SELECT * FROM `project.dataset1.*`
WHERE _TABLE_SUFFIX IN (SELECT table_name FROM `project.dataset2.list`)
This approach will work if below conditions are met
all to be processed table from list have exact same schema
one of those tables is the most recent table - this table will define schema that will be used for all the rest tables in the list
to meet above bullet - ideally list should be hosted in another dataset
Obviously, you can add INSERT INTO ... to insert result into whatever destination is to be
Please note: Filters on _TABLE_SUFFIX that include subqueries cannot be used to limit the number of tables scanned for a wildcard table, so make sure your are using longest possible prefix - for example
#standardSQL
SELECT * FROM `project.dataset1.source_table_*`
WHERE _TABLE_SUFFIX IN (SELECT table_name FROM `project.dataset2.list`)
So, again - even though you will select data from specific tables (set in project.dataset2.list) the cost will be for scanning all tables that match project.dataset1.source_table_* woldcard
While above is purely in BigQuery SQL - you can use any client of your choice to script exacly the logic you need - read table names from list table and then select and insert in loop - this option is simplest and most optimal I think

Insert with select, dependent on the values in the table inserting into EDITED

So I need to figure out how to insert into a table, from another table, with a where clause that requires me to access the table that I am inserting into. I tried an alias from the table I am inserting into, but I quickly found out that you cannot do that. Basically, what I want to check is that the values that I am inserting into the table match a particular field within the table that I am inserting into. Here is what I've tried:
INSERT INTO "USER"."TABLE1" AS A1
SELECT *
FROM "USER"."TABLE2" AS A2
WHERE A2."HIERARCHYLEVEL" = 2
AND A2."PARENT" = A1."INSTANCE"
Obviously, this was to no avail. I've tried a couple other queries, but they didn't me anywhere, either. Any help would be much appreciated.
EDIT:
I would like to add rows to this table, not add columns to the table. The two tables are of the exact same structure -- in fact, I extracted the data already in table1 from table2. What I have in table1 currently is a bunch of records who have NO PARENT, but an instance. What I want to add is all the records who have a parent in table2 that are equal to the instance in table 1.
Currently there is no way to join on a table when inserting. The solution with the subselect where you select from the table, is the correct.
Aliasing the table you want to change is only possible with UPDATE, UPSERT and MERGE. For these operations it makes sense, as you need to match a column and then decide if you need to update it or insert something instead. In your example the line from table1 that you match is not relevant, as you don't want to change it, so from the statement point of view it is not really relevant that the table you use in your subselect is the same that the one you insert into.
As alternative, I can suggest you following solution, which is equivalent with yours:
INSERT INTO "user"."table1"
SELECT
A1."ROOT",
A1."INSTANCE",
A1."PARENT",
A1."HIERARCHYLEVEL"
FROM "user"."table2" AS A1
WHERE A1."INSTANCE" in (select "PARENT" from "user"."table1")
AND A2."HIERARCHYLEVEL" = 2
This gave me the answer I was looking for, although I am sure there is an easier -- or more efficient -- way to do it.
INSERT INTO "user"."table1"
SELECT
A1."ROOT",
A1."INSTANCE",
A1."PARENT",
A1."HIERARCHYLEVEL"
FROM "user"."table2" AS A1,
"user"."table1" AS A2
WHERE A1."INSTANCE" = A2."PARENT"
AND A2."HIERARCHYLEVEL" = 2

Oracle SQL merge tables without specifying columns

I have a table people with less than 100,000 records and I have taken a backup of this table using the following:
create table people_backup as select * from people
I add some new records to my people table over time, but eventually I want to merge the records from my backup table into people. Unfortunately I cannot simply DROP my table as my new records will be lost!
So I want to update the records in my people table using the records from people_backup, based on their primary key id and I have found 2 ways to do this:
MERGE the tables together
use some sort of fancy correlated update
Great! However, both of these methods use SET and make me specify what columns I want to update. Unfortunately I am lazy and the structure of people may change over time and while my CTAS statement doesn't need to be updated, my update/merge script will need changes, which feels like unnecessary work for me.
Is there a way merge entire rows without having to specify columns? I see here that not specifying columns during an INSERT will direct SQL to insert values by order, can the same methodology be applied here, is this safe?
NB: The structure of the table will not change between backups
Given that your table is small, you could simply
DELETE FROM table t
WHERE EXISTS( SELECT 1
FROM backup b
WHERE t.key = b.key );
INSERT INTO table
SELECT *
FROM backup;
That is slow and not particularly elegant (particularly if most of the data from the backup hasn't changed) but assuming the columns in the two tables match, it does allow you to not list out the columns. Personally, I'd much prefer writing out the column names (presumably those don't change all that often) so that I could do an update.

Query select a bulk of IDs from a table - SQL

I have a table which holds ~1M rows. My application has a list of ~100K IDs which belong to that table (the list being generated by the application layer).
Is there a common-method of how to query all of these IDs? ~100K Select queries? A temporary table which I insert the ~100K IDs to, and Select query via join the required table?
Thanks,
Doori Bar
You could do it in one query, something like
SELECT * FROM large_table WHERE id IN (...)
Insert a comma-separated list of IDs where I put the ...
Unfortunately, there is no easy way that I know of to parametrize this, so you need to be extra-super careful to avoid SQL injection vulnerabilities.
A temporary table which holds the 100k IDs seems like a good solution. Don't insert them one by one though ; INSERT ... VALUES syntax in MySQL accepts the insertion of multiple rows.
By the way, where do you get your 100k IDs, if it's not from the database ? If they come from a preceding request, I'd suggest to have it fill the temporary table.
Edit : For a more portable way of multiple insert :
INSERT INTO mytable (col1, col2) SELECT 'foo', 0 UNION SELECT 'bar', 1
Do those id's actually reference the table with 1M rows?
If so, you could use SELECT * ids FROM <1M table>
where ids is the ID column and where "1M table" is the name of the table which holds the 1M rows.
but I don't think I really understand your question...