Pentaho table output step not showing proper error in log - pentaho

In Pentaho, I have a table output step where I load a huge num of records into a netezza target table.
One of the rows fails and the log shows me which values are causing the problem. But the log is probably not right, because when i create an insert statement with those values and run it separately on teh database, it works fine.
My question is:
In Pentaho, is there a way to identify that when a db insert fails, exactly which values caused the problem and why?
EDIT: The error is 'Column width exceeded' and it shows me the values that is supposedly causing the problem. But I made an insert statement with those values and it works good. So I think Pentaho is not showing me the correct error message, it is a different set of values that are causing the problem.

Another way I've used to deal with these kind of problems is to create another table in the DB with widened column types. Then in your transform, add a Table output step connected to the new table. Then connect your original Table output to the new step, but when asked, choose 'Error handling' as the hop type.
When you run your transform, the offending rows will end up in the new table. Then you can investigate exactly what the problem is with that particular row.
For example you can do something like:
insert into [original table] select * from [error table];
You'll probably get a better error message from your native DB interface than from the JDBC driver.

I don't know what is your problem exactly, but I think I had the same problem before.
Everything seems right, but the problem was that in some tranformations, when I transform a numeric value to string for example, the transformation added a whitespace at the end of the field, and the long of the field was n+1 instead of n, but that is very difficult to see.
A practical example would be if you are transforming with a calculator step, you may use YEAR() function to extract the year of a date field, and maybe to that new field with the year have been added a whitespace, so if the year had a length of 4, after that step it will has a length of 5, and when you are going to load a row (with that year field that is a string(5)) into the data warehouse and in your data warehouse is expecting a string(4), you will get the same error that are getting now.
You think is happening --> year = "2013" --> length 4
Really is happening --> year = "2013 " --> length 5
I recommend you to pay quite attention to the string fields and their lengths, because if some transformation adds a whitespace that you don't expect you can lose a lot of time to find the error (myself experience).
I hope this can be useful for you!
EDIT: I'm guessing you are working with PDI (Spoon, before Kettle) and the error is producing when you are loading a data warehouse, so correct me if I'm wrong.

Can you use the file with nzload command, with this command you can find exact error, and bad records in badFile provided by you for detailed analysis.
e.g. -
nzload -u <username> -pw <password> -host <netezzahost> -db <database> -t <tablename> -df <datafile> -lf <logfile> -bf <badrecords file name> -delim <delimiter>

Related

Problems using cubeSQL trying to add data to table

I just got cubeSQL admin and SQL Lite manager, and am new at this, trying to create a database for an mobile app to get video info and urls from to stream. I set up a database and connected it to the manager, but cannot get it to accept the script that I am using. This is what i am putting in to get it to add data to a table.
INSERT INTO Sabbath School
VALUES
(number 1, hello, great, google.com, google.com),
This is the error I get:
Here are screenshots of what I am working with. The first one is the database:
The next one is the table configuration
The final one is what the table looks like.
Any help would be most appreciated as to what I am doing wrong here. I am really not knowing what I am doing and trying to learn how to use sql.
This looks a "quoted identifier" issue. Since the table name has a space in it, you will need to surround the table in double quotes. The query parser believes your table name is "Sabbath" and is expecting the VALUES keyword next, or an opening parenthesis ( to start your (column list). Since it sees "School" next, you get the syntax error. My preference is to avoid spaces in table names so you don't need to quote it all the time.

Pentaho Execute SQL Statements variable conversion to null

I am using PDI to delete and insert some data from a DB. I have the following issue. I create two variables called START_DATE and END_DATE that are used to select the data that will be deleted from my DB. I am able to get them and run my transformation with no erors in the log file, but when I checked if data was deleted, I find it didn't. I send checked my "DeleteProcedure" step, and it says "Conversion error: null". I have tried different approached to take the variables and pass them as Strings, but I haven't been able to solve this issue. It cannot be a SQL mistake as I tested it with a constant and it works.
Any ideas? I attach some pics. Thanks!
As a documentation of the Execute SQL script says:
Note: When you have an issue, that the SQL is started at the initialization phase of the transformation and not for each row, make sure to check the option "Execute for each row" (see description below).
In your case it executes during the initialization phase of the transformation that's why it gets null values instead of ones from previous step.

capture executed sql from input table in pentaho pdi

I am using pentaho for data migration testing. I have set a "table input" step where many parts of the query inside "table inputs" are variables. I have been looking for a way to capture that query after it gets executed during runtime.
I was wondering if there is any specific system log variables for sql or is it to do with metadata. need help! Thanks
Maybe the following approach will help:
We assume a transformation reading a CSV file to get the dynamic portion of the SELECT statement (e.g. the columns) and setting the variable columns with it.
The second transformation uses this variable to generate the SELECT statement and store it into the variable sql_statement.
In the main transformation we use ${sql_statement} as the SELECT statement of the table input and write the data to an output file (that's the business process so to say). From the same input we copy the output to another path. There we add the current time as a field (use element "Get system data") and we add the generated SQL statement, join them as a cartesian product and group the result by the sql_statement. That way we can compute the first time and the last time that the statement was used. These results are written to a text file.
The last thing we need is a job calling the three transformations sequentially.
This is a sample output:
sql_statement;min_time;max_time
SELECT my_column FROM test_table;2014/05/08 00:41:21.143;2014/05/08 00:41:21.144
Thank you Marcus! I did some thing similar.
It works. awesome.
I gathered parts of queries from table field where they were kept and formed a full query in javascript. After that full query will be sent as parameter to a transformation that will run and log the query.

VS 2005 SSIS Error value origin

I have an ssis package created in vs 2005 that has started to give me the following error:
[Lawson Staging Table [4046]] Error: There was an error with input column "JOB_CODE" (4200) on input "OLE DB Destination Input" (4059). The column status returned was: "The value violated
the integrity constraints for the column.".
My first question is: what are the 4046, 4200 & 4059 values following my table, column and destination?
My second question is about the integrity constraint message. The destination table is a heap (no keys or indexes) with no constraints. The destination column is defined as a varchar(10). The input column is from oracle, is defined as char(9) and is called job_code. So - where is there an integrity constraint defined?
The final question is about the select statement; looks like the following:
Select ...
,lpad(trim(e.job_code),10,'0') as job_code ...
If I take the lpad and trim functions out, it works but I need these functions in place because my spec calls for a fixed length column padded with leading zeros. This column returns data as expected in TOAD but fails in the ssis package. Does anyone see an issue with how the functions are being used?
Since this package worked in the past but suddenly started to throw this error, I'm assuming that new invalid data has come into play. however, recently added rows don't seem to be any different then historical records.
Those numbers are more likely to be the ids assigned to the each task/table/column etc.
You could probably go to the advanced editor of the data flow task and look at the input and output properties. You can see that for each input or for each column there is an ID assigned.
Next: The error that you are getting occurs usually when "Allow Nulls" option is unchecked.
Try this:
Look at the name of the column for this error/warning.
Go to SSMS and find the table
Allow Nulls for that Column
Save the table
Rerun the SSIS

SSIS package fails with error: “Text was truncated or one or more characters had no match in the target code page.”

I recently updated an SSIS package that had been working fine and now I receive the following error:
Text was truncated or one or more characters had no match in the target code page.
The package effectively transferred data from tables in one database to a table in another database on another server. The update I made was to add another column to the transfer. The column is Char(10) in length and it is the same length on both the source and destination server. Before the data is transferred it Char(10) there as well. I've seen people reporting this error in blog posts as well as on Stack, none of what I have read has helped. One solution I read about involved using a data conversion to explicitly change the offending column, this did not help (or I misapplied the fix).
whihc version of SQl Server and SSIS are you usign?
I would say to take a look at the output and imput fields of your components. CHAR always ocupies all it's length (I mean, char(10) will always use 10 bytes) and since you are having a truncation error, it may be a start. try to increase the size of the field or cast as varchar on the query that loads the data (not as a permanet solution, just to try to isolate the problem)
Which connection you are using ADO.Net or OLEDB connection ??
Try deleting the source and destination if there are not much of changes you have to make ..Sometime the metadata cuases this problems. If this doesn't solve your problem post the screen shot of error.