BigQuery loop to select values from dynamic table_names registered in another table - sql

I'm looking for a solution to extract data from multiple tables and insert it into another table automatically running a single script. I need to query many tables, so I want to make a loop to select from those table's names dynamically.
I wonder if I could have a table with table names, and execute a loop like:
foreach(i in table_names)
insert into aggregated_table select * from table_names[i]
end

Below is for BigQuery Standard SQL
#standardSQL
SELECT * FROM `project.dataset1.*`
WHERE _TABLE_SUFFIX IN (SELECT table_name FROM `project.dataset2.list`)
This approach will work if below conditions are met
all to be processed table from list have exact same schema
one of those tables is the most recent table - this table will define schema that will be used for all the rest tables in the list
to meet above bullet - ideally list should be hosted in another dataset
Obviously, you can add INSERT INTO ... to insert result into whatever destination is to be
Please note: Filters on _TABLE_SUFFIX that include subqueries cannot be used to limit the number of tables scanned for a wildcard table, so make sure your are using longest possible prefix - for example
#standardSQL
SELECT * FROM `project.dataset1.source_table_*`
WHERE _TABLE_SUFFIX IN (SELECT table_name FROM `project.dataset2.list`)
So, again - even though you will select data from specific tables (set in project.dataset2.list) the cost will be for scanning all tables that match project.dataset1.source_table_* woldcard
While above is purely in BigQuery SQL - you can use any client of your choice to script exacly the logic you need - read table names from list table and then select and insert in loop - this option is simplest and most optimal I think

Related

SQL: Insert certain records from one table into another and also add few other fields using query

I have two tables say TABLE1 and TABLE2. And say the field id is common in both. Rest of field are different.
I now select all distinct id from TABLE1 and want to insert them into TABLE2 while also writing its other attributes. Like the pseudocode below.
for each distinct id (i) in TABLE1:
INSERT in TABLE2 (i, false, unix_timestamp())
end
Now I for some reason cannot use a programming language to do this. Is it possible to do this in SQL using Apache Drill?
What you could do is write a query that produces the output you're looking for and then save that as a table. Drill is really a query engine and doesn't support INSERT operations the way a database does.
So a pseudo query migth look like this:
CREATE TABLE <your file> AS
SELECT ...
Then you could query that file. I don't know if that helps or not. You can also create views and temporary tables, but Drill itself doesn't really implement INSERT commands.

Merge multiple SQL Server tables into one

I have many SQL Server tables in a database that have information about the same domain (same columns) and their names are the same plus a date suffix (yyyyMMdd):
TABLE_ABOUT_THIS_THING_20200131
TABLE_ABOUT_THIS_THING_20191231
TABLE_ABOUT_THIS_THING_20191130
TABLE_ABOUT_THIS_THING_20191031
TABLE_ABOUT_THIS_THING_20190930
TABLE_ABOUT_THIS_THING_20190831
...
This seems like it would make more sense if it was all in the same table. Is there a way, using a query/SSIS or something similar, to merge this tables into one (TABLE_ABOUT_THIS_THING) with a new column (extraction_date) made out of the current table suffix?
Using SSIS: use union for the collect data from the multi table and use Derived Column for the extraction_date before the destination for more information you can take from following link:
https://www.tutorialgateway.org/union-all-transformation-in-ssis/
You can use UNION ALL:
create view v_about_this_thing as
select convert(date, '20200131') as extraction_date t.*
from TABLE_ABOUT_THIS_THING_20200131
union all
select convert(date, '20201912') as extraction_date t.*
from TABLE_ABOUT_THIS_THING_20191231
union all
. . .
This happens to be a partitioned view, which has some other benefits.
The challenge is how to keep this up-to-date. My recommendation is to fix your data processing so all the data goes into a single table. You can also set up a job that runs once a month and inserts the most recent values into an existing table.
An alternative is to reconstruct the view every month or periodically. You can do this using a DDL trigger that recreates the view when the new table appears.
Another alternative is to create a year's worth of tables all at once -- but empty -- and to create the view manually once a year. But a note on your calendar to remind you!
You can use SSIS with the new table "TABLE_ABOUT_THIS_THING" as destination and a query that look like this as source:
`Select * FROM table1
UNION
Select * FROM table2
UNION
.
.
.`

Is there a way to query multiple tables at once based on common table name in Oracle through SQL or PL/SQL

I have a requirement to fetch data from all the tables based on common table name pattern every week.Basically my requirement is to merge all the data to form a single lookup table.
Example:
Table Names:
Department_20190101
Department_20190109
Department_20190122
Department_20190129
I have to fetch data from all the tables and have to create a single lookup table. Is there a simple way to do this other than by going through iteration in PL/SQL by getting the table names with the help of ALL_TABLES
Note:The Date part is not consistent.If i can achieve this requirement once then i can easily start inserting the data from the new table to the existing lookup(n+1)
Please share your suggestions.
Regards,
Scott
If you have a very long list of tables, or you requirement is to aggergate the results from all tables e.g. starting with Department_ followed by a certain date or range of dates, then you might need dynamic SQL for this. For the exact example you did show in your question, a CTE with a union query might work:
WITH cte AS (
SELECT * FROM Department_20190101 UNION ALL
SELECT * FROM Department_20190109 UNION ALL
SELECT * FROM Department_20190122 UNION ALL
SELECT * FROM Department_20190129
)
And then use the CTE as:
SELECT *
FROM cte;
This assumes that all tables have identical structures. Also, as a side note, if this be the case, you might want to consider just having a single table with a date column to differentiate between the otherwise common data.
Check here: Execute For Each Table in PLSQL there is nice example to resolve Your problem using pl/sql (pl/sql help You dinamicly grow up sql-query) .

Copying contents of table A to table B (one more column than table A)

In our application, we have two sets of tables: One set of working tables (with the data that is currently analyzed) and another set of archive tables (with all data that has even been analyzed, same table-name but with a a_prefix). The structure of the tables is the same, except that the archive tables have an extra column run_id to distinguish between different sets of data.
Currently, we have a SQL script that copies the contents over with statements similar to this:
insert into a_deals (run_id, deal_id, <more columns>)
select maxrun, deal_id, <more columns>
from deals,
(select max(run_id) maxrun from batch_runs);
This works fine, but whenever we add a new column to the table, we also have to mpdify the script. Is there a better way to do this that is stable when we have new columns? (Of course the structures have to match, but we'd like to be able not to have to change the script as well.)
FWIW, we're using Oracle as our RDBMS.
Following up on the first answer, you could build a pl/sql procedure which will read all_tab_columns to build the insert statement, then execute immediate. Not too hard, but be careful about what input parameters you allow (table_name and the like) and who can run it since it could provide a great opportunity for SQL Injection.
If the 2 tables have the SAME columns in the same order (column_id from all_tab_columns) except for this run_id in front, then you can do something like:
insert into a_deals
select (select max(run_id) from maxrun), d.*
from deals
where ...;
This is a lazy approach imo, and you'll want to ensure that the columns are in the same position for both tables as part of this script (inspect all_tab_columns). 2 varchar2 fields that are switched will lead to data inserted into incorrect fields.

Query select a bulk of IDs from a table - SQL

I have a table which holds ~1M rows. My application has a list of ~100K IDs which belong to that table (the list being generated by the application layer).
Is there a common-method of how to query all of these IDs? ~100K Select queries? A temporary table which I insert the ~100K IDs to, and Select query via join the required table?
Thanks,
Doori Bar
You could do it in one query, something like
SELECT * FROM large_table WHERE id IN (...)
Insert a comma-separated list of IDs where I put the ...
Unfortunately, there is no easy way that I know of to parametrize this, so you need to be extra-super careful to avoid SQL injection vulnerabilities.
A temporary table which holds the 100k IDs seems like a good solution. Don't insert them one by one though ; INSERT ... VALUES syntax in MySQL accepts the insertion of multiple rows.
By the way, where do you get your 100k IDs, if it's not from the database ? If they come from a preceding request, I'd suggest to have it fill the temporary table.
Edit : For a more portable way of multiple insert :
INSERT INTO mytable (col1, col2) SELECT 'foo', 0 UNION SELECT 'bar', 1
Do those id's actually reference the table with 1M rows?
If so, you could use SELECT * ids FROM <1M table>
where ids is the ID column and where "1M table" is the name of the table which holds the 1M rows.
but I don't think I really understand your question...