I have a table which has approximately 140 columns. The data for this table comes from a transactional system and many of the columns are Unicode.
I dump the daily load to my Staging database whose data type matches exactly with what the source system has.From Staging, I do some cleaning and load it to a Reports database. When loading from Staging to Reports database, I convert all the Unicode character data to String and then load it to reports database. This process takes an awful lot of time and I am trying to optimize this process (make the load times faster).
I use the Derived column transformation to convert all the Unicode Data to String data. Any suggestions here for me?
how about if you cast your columns as varchar(size) on your source query?
Related
I've been wanting to learn more about Data Analysis using IBM DB2 and SQL.
I want to upload a Covid-19 open source dataset to practice. However I'm getting this error: 'Rows truncated because the data is longer than the target database column' The database isn't loading ANY rows. Do I need to change the datatype? Right now it is specified as VARCHAR and DECIMALS.
Does anyone know the best way to handle this error?
Thanks in advance!
I'm dealing with a SQL Server database which contains a column "defined data" with JSON data in it (and some other simple columns). The data builds up over time, right now we have about 8 million rows.
The data from this db is periodically read by an ETL system which then reads the JSON data in the "defined data" column and maps the data to a new SQL Server table based on the columns names contained in the JSON data.
This SQL Server table is prone to changes, meaning that about every 4 months additional columns are needed or column names change. Whenever this SQL Server table changes its data structure, a new version is introduced, which also forces the JSON data structure to change.
However, the ETL system should still be able to load all historical (JSON) data from the SQL Server database, regardless of the changing version throughout time. How can I make this work, taking into consideration version changes of the SQL Server tables and the JSON data?
!example]1
So in this example my question is:
How can I ensure that I can load both client 20 and 21 into one SQL Server table without getting errors because the JSON data structure is not reflecting version 2 in the case of historical data?
Given the size of the SQL Server database, it doesn't seem like an option to update all historical JSON data according to the latest version (in this example that would mean adding "AssetType" for the 01-01-2021 data and filling it in with NULL).
Many, many thanks in advance!
First I would check if json fields exist in the table as column names by looking them up in the information schema. If not exists then alter table add column.
How can I ensure that I can load both client 20 and 21 into one SQL Server table without getting errors because the JSON data structure is not reflecting version 2 in the case of historical data?
You maintain 2 separate tables. A Raw/Staging/Bronze table that has the same schema as the source, and a Cleansed/Warehouse/Silver table that has the desired schema for reporting. If you have multiple separate sources, you may have separate Raw tables.
Periodically you enhance the schema of the Cleansed table to add new data that has appeared in the Raw table.
I'm trying to explore different methods for inserting very large (10 million rows) CSV-Data from a File into the SAP HANA Database (version 1.12).
We've tried many things so far (some more successful then others) and right now I'm stuck at the following:
Managing the data in javascripts has its problems (as in the js-engine freezes) and right now I've managed to upload and insert it with the blob datatype. I know of the 'IMPORT FROM CSV FILE' SQL-function in SAP Hana and I was wondering it is possible to use it with the blob saved in our database.
Thanks in advance for your time & help
Yes, importing BLOB data types is possible with CSV import.
The relevant column needs to be represented as hex-coded binary data enclosed in quotation marks ("<hex data goes here>").
I am developing a migration tool and using Talend ETL tool (Free edition).
Challenges faced:-
is it possible to create a Talend job that uses dynamic schema every time it runs i.e. no hard-coded mappings in tMap component.
I want user to give a input CSV/Excel file and the job should create mappings on the basis of that input file. Is it possible in talend?
Any other free source ETL tool can also be helpful, or any sample job.
Yes, this can be done in Talend but if you do not wish to use a tMap then your table and file must match exactly. The way we have implemented it is for stage tables which are all datatype of varchar. This works when you are loading raw data into a stage table, and your validation is done after the load, prior to loading the stage data into a data warehouse.
Here is a summary of our method:
the filenames contain the table name so the process starts with a tFileList and parsing out the table name from the file name.
using tMSSQLColumnList obtain each column name, type, and length for the table (one way is to store it as an inline table in tFixedFlowInput)
run this thru a tSetDynamicSchema to produce your dynamic for that table
use a file input reference the dynamic schema.
load that into a MSSQLOutput again referencing the dynamic schema.
One more note on data types. It may work with data types than varchar, but our stage tables only have varchar and datetime. We had issues with datetime, so we filtered out those column types with a tMap.
Keep in mind, this is a summary to point you in the right direction, not a precise tutorial. But with this info in your hands, it can save you many hours of work while building your solution.
I have a table to be be fetched, One of the column in particular table contains HTML data stored as a CLOB, The length is '1048576'
In my ETL job, I have replaced CLOB with LongVarChar of same size (1048576) as CLOB is not defined in data Stage but the job is not working (There is no error but it stays in running stage for long without moving a single row).
Can anyone recommend solution for similar issue faced? Thanks!