How to transfer data between databases with SELECT statement in Teradata - sql

So I am stuck on this Teradata problem and I am looking to the community for advice as I am new to the TD platform. I am currently working with a Teradata Data Warehouse and have an interesting task to solve. Currently we store our information in a live production database but want to stage tables in another database before using FastExport to export the files. Basically we want to move our tables into a database to take a quick snapshot.
I have been exploring different solutions and am unsure how to proceed. I need to be able to automate a create table process from one DB in Teradata to another. The tricky part is I would like to create many tables off of the source table using a WHERE clause. For example, I have a transaction table and want to take a snapshot of the transaction table for a certain date range month by month. Meaning that the original table Transaction would be split into many tables such as Transaction_May2001, Transaction_June2001, Transaction_July2001 and so on and so forth.
Thanks

This is assuming by two databases you are referring to the same physical installation of Teradata.
You can use the CREATE TABLE AS construct to accomplish this:
CREATE TABLE {MyDB}.Transaction_May2001
AS (
SELECT *
FROM Transaction
WHERE Transaction_Date BETWEEN DATE '2001-05-01' AND '2001-05-31'
)
{UNIQUE} PRIMARY INDEX ({Same PI definition as Transaction Table})
WITH DATA AND STATS;
If you neglect to specify the explicit PI in the CREATE TABLE AS then Teradata will take the first column of the SELECT clause and use it as the PI of the new table.
Otherwise, you would be looking to use a Teradata utility as suggested by ryanbwork in the comment to your question.

Related

Postgresql dump with data restriction

I'm working on developing a fast way to make a clone of a database to test an application. My database has some specif tables that are quite big (+50GB), but the big majority of the tables only have a few MBs. On my current server, the dump + restore takes some hours. These bigs tables have date fields.
With the context in mind, my question is: Is possible to use some type of restrictions on table rows to select the data that is being dumped? e.g. On table X only dump the rows that date is Y.
If this is a possible show can I do it? if it's not possible what would be a good alternative?
You can use COPY SELECT whatever FROM yourtable WHERE ... TO '/some/file' to limit what you export.
COPY command
You could use row level security and create a policy that lets the dumping database user see only those rows that you want to dump (make sure that that user is neither a superuser nor owns the tables, because these users are exempt from row level security).
Then dump the database with that user, using the --enable-row-security option of pg_dump.

Deleting records in a table with billion records using spark or scala

we have a table in Azure Data Warehouse with 17 billion records. Now we have a scenario where we have to delete records from this table based on some where condition. We are writing Spark in Scala language in Azure Databricks notebooks.
We searched for different options to do this in Spark, but all suggested to first read the entire table, delete records from this and then overwrite the entire table in Data Warehosue. However this approach will not work in our case due to huge number of records in our table.
Can you please suggest how we can achieve this functionality using spark/scala?
1) checked if we can call stored procedure through spark/scala code in azure databricks but Spark do not support stored procedures.
2) Tried reading the entire table first to delete the records but it goes into never ending loop.
Is possible to create view with select clause as per your requirement, then using of the view

Create Partition table in Big Query

Can anyone please suggest how to create partition table in Big Query ?.
Example: Suppose I have one log data in google storage for the year of 2016. I stored all data in one bucket partitioned by year , month and date wise. Here I want create table with partitioned by date.
Thanks in Advance
Documentation for partitioned tables is here:
https://cloud.google.com/bigquery/docs/creating-partitioned-tables
In this case, you'd create a partitioned table and populate the partitions with the data. You can run a query job that reads from GCS (and filters data for the specific date) and writes to the corresponding partition of a table. For example, to load data for May 1st, 2016 -- you'd specify the destination_table as table$20160501.
Currently, you'll have to run several query jobs to achieve this process. Please note that you'll be charged for each query job based on bytes processed.
Please see this post for some more details:
Migrating from non-partitioned to Partitioned tables
There are two options:
Option 1
You can load each daily file into separate respective table with name as YourLogs_YYYYMMDD
See details on how to Load Data from Cloud Storage
After tables created, you can access them either using Table wildcard functions (Legacy SQL) or using Wildcard Table (Standar SQL). See also Querying Multiple Tables Using a Wildcard Table for more examples
Option 2
You can create Date-Partitioned Table (just one table - YourLogs) - but you still will need to load each daily file into respective partition - see Creating and Updating Date-Partitioned Tables
After table is loaded you can easily Query Date-Partitioned Tables
Having partitions for an External Table is not allowed as for now. There is a Feature Request for it:
https://issuetracker.google.com/issues/62993684
(please vote for it if you're interested in it!)
Google says that they are considering it.

Merge data using Integration Service

Please Consider this scenario:
I have a table in my database. I want move this data in my OLAP database using SSIS.I can move all record from my table to OLAP database.The problem is I don't know how I can apply changes in OLAP environment.For example if just 100 record of my table were changed how I can apply this changes NOT copy all records from scratch.
How I can Merge this two tables?
thanks
There are two main approaches to this:
Lookup Transformation --> OLE DB Command / OLE DB Destination
Load all data to a staging table and perform the MERGE using SQL.
My Preference is for the latter because the update is SET Based, but I do use the former where I know it will be predominantly inserts.
With the former you will end up with a data flow task something like:
This is a OLE DB Source from the OLTP database, which then looks up against your OLAP Database to retrieve the surrogate key. Where there is no match it simple inserts a new record to the OLE DB Destination, when there is a match it does a conditional split, if any fields have changed it will use the OLE DB Command to update the OLAP table.
It can obviously get much more complicated than this, but this covers the simplest example.
You can also use the Slowly Changing Dimension Transformation to open up a wizard to create your data flow for you, which again gets a bit more complex:
As mentioned though, my Preference is for a staging table and a set based update, because the OLE DB Command executes on a row by row basis, so if you are updating millions of records this will take a long time. You can simply create a staging table on your OLAP database and move the data in with a simple OLE DB Source and Destination, then use MERGE to update the OLAP Table:
MERGE OLAP o
USING Staging s
ON o.BusinessKey = s.BusinessKey
AND o.Type2SCD = s.Type2SCD
AND o.Active = 1
WHEN MATCHED AND o.Type1SCD != s.Type1SCD THEN
UPDATE
SET Type1SCD = s.Type1SCD
WHEN NOT MATCHED BY TARGET THEN
INSERT (BusinessKey, Type1SCD, Type2SCD, Active, EffectiveDate)
VALUES (s.BusinessKey, s.Type1SCD, s.Type2SCD, 1, GETDATE())
WHEN NOT MATCHED BY SOURCE AND o.Active = 1 THEN
UPDATE
SET Active = 0;
The above assumes you have one active record per business Key, and both type 1 and type 2 slowly changing dimentions, it will insert a new record where there is no match on BusinessKey and Type2SCD, in addition it will set any unmatched records in the source table to inactive. When there is a match but the type 1 SCD is different this will be updated.
It is worth noting that MERGE has it's downsides, and you may want to write your set based upserts as separate INSERT and UPDATE statements. One major issue I have come across is that on all my Dimension tables I have a unique filtered index on my BusinessKey field WHERE Active = 1 to ensure there is only one active record, which the MERGE I have written should work fine for, but doesn't as detailed in this connect item. Although it was not the end of the world having to add OPTION (QUERYTRACEON 8790); to the end of all the MERGE statements in my ETL it was not ideal.
Sounds like you're wanting to use incremental loads.
The first five tutorials on this page should point you in the right direction - I found them really useful in the past.

Partitioning a database table in MySQL

I am writing a data warehouse, using MySQL as the back-end. I need to partition a table based on two integer IDs and a name string.
A more concrete example would be to assume that I am storing data about a school. I want to partition the school_data table based on COMPOSITE 'Key' based on the following:
school id (integer)
course_id (integer)
student_surname (string)
For the student surname, it is just the first character of the surname that determines which 'partitioned table' the data should go in to.
How may I implement this requirement using MySQL (5.1) with InnoDb tables?
Also, I am doing my development on a Windows box, but I will deploy onto a *nix box for production. I have two further questions:
I am assuming that I will have to dump and restore the data when moving from Windows to Linux. I don't know if this is OK if the database contains partitioned tables (pointer to where it states this in the documentation will put my mind to rest - I have not been able to find any specific mention of dump/restore regarding partitioned tables.
I may also need to change databases (if Oracle pulls a surprise move on MySQL users) in which case I will need to SOMEHOW export the data into another database. In this (hopefully unlikely scenario) - what will be the best way to dump data out of MySQL (maybe to text or something) bearing in mind the partitioned table?
RANGE Partitioning
A table that is partitioned by range is partitioned in such a way that each partition contains rows for which the partitioning expression value lies within a given range.
CREATE TABLE employees (
school id (integer)
course_id (integer)
student_surname (string)
)
PARTITION BY RANGE (student_surname) (
PARTITION p0 VALUES LESS THAN ('ezzzzzzzzzzzzzzzzzzzzzzz'),
PARTITION p1 VALUES LESS THAN ('ozzzzzzzzzzzzzzzzzzzzzzz'),
PARTITION p2 VALUES LESS THAN ('tzzzzzzzzzzzzzzzzzzzzzzz'),
PARTITION p3 VALUES LESS THAN (MAXVALUE)
);
Range partitioning
Data Migration to Another DB
MySQLDUMP will output the table and data to a file. However, Oracle supports connecting to other databases via ODBC, just as SQL Server has it's linked server capability.
Addendum
It looks like you are partitioning by only one of the 3 fields I mentioned (i.e. name). I saw partitioning by a single field in the MySQL docs, but not 3 fields (int, int, string) like I want to do.
Partitioning by three columns is possible, but my example is per your requirements in the OP:
For the student surname, it just the first character of the surname that determines which 'partitioned table' the data should go in to.
How may I implement this requirement using mySQL (5.1) with InnoDb tables?
Have a look at the Chapter 18. Partitioning of MySQL documentation and especially the Partition Types (I'd look at the HASH partitioning). But keep in mind that the partitioning implementation in MySQL 5.1 is still undergoing development and there are some limitations and restrictions.
I am assuming that I will have to dump and restore the data when moving from windows to Linux. I dont know if this is OK if the db contains partitioned tables (pointer to where it states this in the docs will put my mind to rest - I have not been able to find any specific mention of dump/restore regarding partitioned tables.
I didn't find anything in 18.3 Partition Management but, according to this post, backing up and restoring a partitioned table is nothing special. To backup:
mysqldump --opt db_name table_name > file.dump
And to restore:
mysql db_name < file.dump
I would do some testing though.
I may also need to change databases (if Oracle pulls a suprise move on mySQL users) in which case I will need to SOMEHOW export the data into another database. In this (hopefully unlikely scenario) - what will be the best way to dump data out of mySQL (maybe to text or something) bearing in mind the partitioned table?
Oracle SQL Developer incorporates migration support by including redeveloped features and greatly extending the functionality and usability offered by the original Oracle Migration Workbench to migrate Microsoft Access, Microsoft SQL Server, MySQL and Sybase databases to Oracle.