How to ETL CLOB data type using DataStage - clob

I have a table to be be fetched, One of the column in particular table contains HTML data stored as a CLOB, The length is '1048576'
In my ETL job, I have replaced CLOB with LongVarChar of same size (1048576) as CLOB is not defined in data Stage but the job is not working (There is no error but it stays in running stage for long without moving a single row).
Can anyone recommend solution for similar issue faced? Thanks!

Related

Get a clob output of a stored procedure over a dblink

I have a procedure that runs queries on a few tables and manipulates the output into a clob that it returns. I need to call this procedure in a remote Database over a dblink and get the clob value that the procedure returns. I know that we cannot access non-scalar data like clob over a dblink. I know that if the clob were in a table on the remote side, I could just create a global temp table and on the local side and do a insert into my local temp table with a select over the remote table. But in my case, the clob is a manipulated output of the procedure.
Any suggestions on how I can do this?
On the remote database, create a function to wrap around the procedure and return the CLOB as its return value. Then create a view that selects from this function and exposes the CLOB as a column. You should be able to query that CLOB column through the view remotely over a database link. I know this can work as I pull CLOB data over dblinks thousands of times a day in utilities I wrote, though I do remember it taking a bit of trial-and-error to make it happy.
If you cannot get that to work, there are a number of other workarounds available. One involves a remote package presenting package-declared collection types which can be used by a remote function in that package to disassemble the CLOB into a collection of varchar2(32767) records, return that collection to the calling database, which then using remote reference #dblink to that remote package's types is able to reassemble a local CLOB from the collection contents. But this kind of heavy-handed workaround really shouldn't be necessary.
Lastly, I should at least mention that using CLOBs for structured data is not a good design choice. CLOBs should have only unstructured data, the kind that is meaningful only to humans (like log files, free-form notes, user-entered descriptions, etc..). It should never be used for combining multiple pieces of meaningful structured data that a program is meant to interpret and work with. There are many other constructs that would handle that better than a CLOB.
I think that that CLOB is to be split into chunks of varchar2(4000) and stored into a temporary table with preserve rows, so that via that DB-link you will only select from that table that contains the chunks of the CLOB and a column that indicates their order. That would mean creating a procedure in that remote DB which calls the procedure generating the CLOB, then splits that CLOB into chunks and inserts them into the global temporary table.

How can I read a very long BLOB column in Oracle?

I want to connect a Node Express API with an Oracle 11g Database which has a table with a BLOB column. I want to read it using a SQL query, but the problem is that the BLOB column can have a very long text, more than 100k characters. How can i do this?
I tried using: select utl_raw.cast_to_varchar2(dbms_lob.substr(COLUMN_NAME)) from TABLE_NAME.
But it returns 'raw variable length too long'.
I can make multiple queries in a loop and then join them if it was necessary, but I haven't found how bring just a part of the blob.
Use the node-oracledb module to access Oracle Database (which you are probably already doing, but don't mention).
By default, node-oracledb will return LOBs as Lob instances that you can stream from. Alternatively you can fetch the data directly as a String or Buffer, which is useful for 'small' LOBs. For 100K, I would just get the data as a Buffer, which you can do by setting:
oracledb.fetchAsBuffer = [ oracledb.BLOB ];
Review the Working with CLOB, NCLOB and BLOB Data documentation, and examples like blobhttp.js and the other lob*.js files in the examples directory.
You may also want to look at https://jsao.io/2018/03/creating-a-rest-api-with-node-js-and-oracle-database/ which shows Express and node-oracledb.

Talend ETL tool

I am developing a migration tool and using Talend ETL tool (Free edition).
Challenges faced:-
is it possible to create a Talend job that uses dynamic schema every time it runs i.e. no hard-coded mappings in tMap component.
I want user to give a input CSV/Excel file and the job should create mappings on the basis of that input file. Is it possible in talend?
Any other free source ETL tool can also be helpful, or any sample job.
Yes, this can be done in Talend but if you do not wish to use a tMap then your table and file must match exactly. The way we have implemented it is for stage tables which are all datatype of varchar. This works when you are loading raw data into a stage table, and your validation is done after the load, prior to loading the stage data into a data warehouse.
Here is a summary of our method:
the filenames contain the table name so the process starts with a tFileList and parsing out the table name from the file name.
using tMSSQLColumnList obtain each column name, type, and length for the table (one way is to store it as an inline table in tFixedFlowInput)
run this thru a tSetDynamicSchema to produce your dynamic for that table
use a file input reference the dynamic schema.
load that into a MSSQLOutput again referencing the dynamic schema.
One more note on data types. It may work with data types than varchar, but our stage tables only have varchar and datetime. We had issues with datetime, so we filtered out those column types with a tMap.
Keep in mind, this is a summary to point you in the right direction, not a precise tutorial. But with this info in your hands, it can save you many hours of work while building your solution.

pentaho ETL Tool data migration

I am migrating the data through pentaho. there is a problem occur when the number of rows is more than 4 lankhs.transaction fail in b/w the transaction.how can we migrate the large data by pentaho ETL Tool.
As a basic debugging, do the following
If your output is a text file or Excel file, make sure that you check the size of string/text columns. As defaut the 'text ouput step' will take the maximum string length and when you start writing, it can throw up heap errors. So reduce the size and re-run the ktr files.
If the output is a table ouput step, then again check for columns with datatypes and maximum column size defined in your output table.
Kindly share the error logs if you think there is something else running around. :)

Slow data load with conversion from nvarchar to varchar

I have a table which has approximately 140 columns. The data for this table comes from a transactional system and many of the columns are Unicode.
I dump the daily load to my Staging database whose data type matches exactly with what the source system has.From Staging, I do some cleaning and load it to a Reports database. When loading from Staging to Reports database, I convert all the Unicode character data to String and then load it to reports database. This process takes an awful lot of time and I am trying to optimize this process (make the load times faster).
I use the Derived column transformation to convert all the Unicode Data to String data. Any suggestions here for me?
how about if you cast your columns as varchar(size) on your source query?