I have many SQL Server tables in a database that have information about the same domain (same columns) and their names are the same plus a date suffix (yyyyMMdd):
TABLE_ABOUT_THIS_THING_20200131
TABLE_ABOUT_THIS_THING_20191231
TABLE_ABOUT_THIS_THING_20191130
TABLE_ABOUT_THIS_THING_20191031
TABLE_ABOUT_THIS_THING_20190930
TABLE_ABOUT_THIS_THING_20190831
...
This seems like it would make more sense if it was all in the same table. Is there a way, using a query/SSIS or something similar, to merge this tables into one (TABLE_ABOUT_THIS_THING) with a new column (extraction_date) made out of the current table suffix?
Using SSIS: use union for the collect data from the multi table and use Derived Column for the extraction_date before the destination for more information you can take from following link:
https://www.tutorialgateway.org/union-all-transformation-in-ssis/
You can use UNION ALL:
create view v_about_this_thing as
select convert(date, '20200131') as extraction_date t.*
from TABLE_ABOUT_THIS_THING_20200131
union all
select convert(date, '20201912') as extraction_date t.*
from TABLE_ABOUT_THIS_THING_20191231
union all
. . .
This happens to be a partitioned view, which has some other benefits.
The challenge is how to keep this up-to-date. My recommendation is to fix your data processing so all the data goes into a single table. You can also set up a job that runs once a month and inserts the most recent values into an existing table.
An alternative is to reconstruct the view every month or periodically. You can do this using a DDL trigger that recreates the view when the new table appears.
Another alternative is to create a year's worth of tables all at once -- but empty -- and to create the view manually once a year. But a note on your calendar to remind you!
You can use SSIS with the new table "TABLE_ABOUT_THIS_THING" as destination and a query that look like this as source:
`Select * FROM table1
UNION
Select * FROM table2
UNION
.
.
.`
Related
Suppose, I have several customers today so I am storing their information like customer_id, customer_name, customer_emailid etc. If my customer is leaving and he wants that his personal information should be removed from my hdfs.
So I have below two approaches to achieve the same.
Approach 1:
1.Create Internal Table on top of HDFS
2.Create external table from first table using filter logic
3.While Creating 2nd Table apply udfs on specific columns for more column filtering
Approach 2:
Spark=> Read, filter, write
Is there any other solution?
Approach 2 is possible in Hive - select, filter, write
Create a table on top of directory in hdfs (external or managed, does not matter in this context, better external if you are going to drop table later and keep the data as is). Insert overwrite table or partition from select with filter.
insert overwrite mytable
select *
from mytable --the same table
where customer_id not in (...) --filter rows
I have a requirement to fetch data from all the tables based on common table name pattern every week.Basically my requirement is to merge all the data to form a single lookup table.
Example:
Table Names:
Department_20190101
Department_20190109
Department_20190122
Department_20190129
I have to fetch data from all the tables and have to create a single lookup table. Is there a simple way to do this other than by going through iteration in PL/SQL by getting the table names with the help of ALL_TABLES
Note:The Date part is not consistent.If i can achieve this requirement once then i can easily start inserting the data from the new table to the existing lookup(n+1)
Please share your suggestions.
Regards,
Scott
If you have a very long list of tables, or you requirement is to aggergate the results from all tables e.g. starting with Department_ followed by a certain date or range of dates, then you might need dynamic SQL for this. For the exact example you did show in your question, a CTE with a union query might work:
WITH cte AS (
SELECT * FROM Department_20190101 UNION ALL
SELECT * FROM Department_20190109 UNION ALL
SELECT * FROM Department_20190122 UNION ALL
SELECT * FROM Department_20190129
)
And then use the CTE as:
SELECT *
FROM cte;
This assumes that all tables have identical structures. Also, as a side note, if this be the case, you might want to consider just having a single table with a date column to differentiate between the otherwise common data.
Check here: Execute For Each Table in PLSQL there is nice example to resolve Your problem using pl/sql (pl/sql help You dinamicly grow up sql-query) .
I'm looking for a solution to extract data from multiple tables and insert it into another table automatically running a single script. I need to query many tables, so I want to make a loop to select from those table's names dynamically.
I wonder if I could have a table with table names, and execute a loop like:
foreach(i in table_names)
insert into aggregated_table select * from table_names[i]
end
Below is for BigQuery Standard SQL
#standardSQL
SELECT * FROM `project.dataset1.*`
WHERE _TABLE_SUFFIX IN (SELECT table_name FROM `project.dataset2.list`)
This approach will work if below conditions are met
all to be processed table from list have exact same schema
one of those tables is the most recent table - this table will define schema that will be used for all the rest tables in the list
to meet above bullet - ideally list should be hosted in another dataset
Obviously, you can add INSERT INTO ... to insert result into whatever destination is to be
Please note: Filters on _TABLE_SUFFIX that include subqueries cannot be used to limit the number of tables scanned for a wildcard table, so make sure your are using longest possible prefix - for example
#standardSQL
SELECT * FROM `project.dataset1.source_table_*`
WHERE _TABLE_SUFFIX IN (SELECT table_name FROM `project.dataset2.list`)
So, again - even though you will select data from specific tables (set in project.dataset2.list) the cost will be for scanning all tables that match project.dataset1.source_table_* woldcard
While above is purely in BigQuery SQL - you can use any client of your choice to script exacly the logic you need - read table names from list table and then select and insert in loop - this option is simplest and most optimal I think
I have three linked tables in my MS Access Database, and I am trying to run just one query to append them all into a master table, rather than creating a separate query for each one.
Can someone give me an example code using generic table names to show how I would union or join these table with query code?
SELECT [TableA].*
FROM TableA
UNION
SELECT [TableB].*
FROM TableB;
etc.
You can do a SELECT * INTO NEWTABLE . . .
You can do all kinds of things. Google is your best friend.
We have a large database with monthly partitioned tables. I need to aggregate a selection of these tables every month but I don't want to update the union all every month to add the new monthly table.
CREATE VIEW dynamic_view AS
SELECT timestamp,
traffic
FROM traffic_table_m_2017_01
UNION ALL
SELECT timestamp,
traffic
FROM traffic_table_m_2017_02
Is this where I would use a stored procedure? I am not really familiar with them.
I think it would also work as:
SELECT timestamp,
traffic
FROM REPLACE(REPLACE('traffic_table_m_yyyy_mm',
yyyy, FORMAT(GETDATE(),'yyyy', 'en-us')),
mm, FORMAT(GETDATE(),'mm', 'en-us'));
This might work for the current month but I would need to save the data from the past months which would also be an issue.
you should append each table as it arrives to 1 larger table then run your queries against that. there are many ways to do this but probable the fastest and most elegant is to use.
ALTER TABLE APPEND
Instructions here https://docs.aws.amazon.com/redshift/latest/dg/r_ALTER_TABLE_APPEND.html