dispatching started for transformation - pentaho

When I preview rows in Text file Input control of Pentaho, no rows appear and 'Show log' option displays this message
"Dispatching started for transformation".
What does it mean? How to overcome this issue?

It seems that either your transformation is invalid (you're missing one essential checkbox or another) or your PDI installation isn't working properly.
Which JAVA version are you using? And which PDI version? Try it on a fresh install and if it still doesn't work, go over your text file input step and validate that it's correctly configured.
Also, try removing all other steps, it could be that one of the subsequent steps is the one causing problems and stopping PDI from starting the transformation execution.

Well... maybe it's quite late, but I'm currently struggling with this issue in the Pentaho Community Version 8.
What I found, and solved some of my issues is that this message can be a potential warning for a Deadlock process. You have to be sure that none of this situations are present in your code:
An external component like a table lock by the database blocks the transformation.
The "Block this step until steps finish" step might run into a deadlock when there are more rows to process than the number of Rows in Rowset.
Within transformations there are situations when streams get split and joined again, so that the transformation blocks by design.
You could see full examples in the Jira Pentaho documentation page:
https://pentaho-community.atlassian.net/wiki/spaces/EAI/pages/386807182/Transformation+Deadlocks
I hope that it will help you!

Related

ADF Azure data Factory debug not running saved changes

Anyone see this behavior? For example here is my code in an activity....#{concat(
substring(activity('GetMaxDate').output.firstRow.MAX_DATE,0,4)
This IS saved. Multiple times. But when I run in debug this is what is run...
#{concat(\n substring(activity('GetMaxDate').output.firstRow.MAX_DATE,1,4)\n ,'
It's running the prior version (0,4) instead of the new version (1,4). I first noticed this because I changed the name of the activity and debug still ran the old name. This seems like new problem I've not had before. If I publish and run it as trigger it picks up the change. It's just debug that's not picking it up. This seems an inexcusable bug. This is 101 functionality folks.
Any suggestions? Should this be logged with Microsoft as bug?
Additinal option to Gary's comment:
C) Rename your pipeline, save, run debug. Rename back after.
This worked for me.
Seen this cache behavior in the past. Preview query shows cached data from source table even though the source table data was completely changed.
Deleting the pipeline,dataset.. and creating new pipeline solved the issue for me.
Seems this happenens when the debug is being used too many times. Recommend to log this behavior as a bug.

pentaho job stops in the middle of a transformation without any indication in log file

I'm new in using pentaho and I need your help to investigate a problem.
I have scheduled in crontab to run a job by kitchen command. I'm using pentaho release 6.0.1.0.386.
Sometimes (it's not a deterministic problem) one of the transformation stops after "Loading transformation from repository" and before "Dispatching started for transformation". The log interrupts. No errors. Nothing. And the job doesn't go on.
Any idea? Any check I can do ? Thanks
is so many bigger the quantity data in this transformation?
There are some files that can cause some errors, you can find them in this path:
enter image description here
my computer/users / your user / .kettle
If you delete the ones I marked in the image, they will be created automatically when you open the pentaho again.

BI Publisher - Fail to load and save data model

Started BI Publisher about a week ago.
When working on a new data model, about one or two queries in, I get this error when I try to save:
Failed to load servlet/res?s=%252F~developer1%252Ftest%252FJustin%2520Tests%252FOSRP%2520Information.xdm&desc=&_sTkn=9ba70c01152efbcb413.
I can no longer save my data model.
I tried deleting my queries, logging in and out, turning machine off and on, but no luck.
I'm currently resolved to saving all of my queries locally in notepad.
I can create a whole new data model and it will save fine, but then after two or three queries the same thing happens.
What's going on and why would anyone design such a confusing error message?
Any help would be greatly appreciated.
After restarting your server once you won't get this issue.It happens some time due to the connection problem.so restart should work for this.It resolved my problem.
None of the proposed solutions worked for me. I found out, on my own, that any unnecessary brackets around CASE in a select statement will cause this error. Remove the unnecessary brackets and the error goes away.
Oracle meta link Doc ID 2173333.1. In BI Publisher releases 11.1.1.8.x and up, there is an option to Manage Cache in the Administration section of BIP. This option was also added to 11.1.1.7 in patch 140715 (11.1.1.7.140715).
Clearing the object cache will resolve the saving errors:
Click on the Administration link
Manage BI Publisher
Manage Cache
Click on the 'Clear Object Cache'

Good data - debugging a graph (grf file)

I've got a graph that isn't behaving as it should in CloudConnect.
I'm running it locally, and it's completing, but not doing its work.
In an effort to figure out why this is, I've added printLog calls in many places, like the following
printLog(warn, 'transfrom from file ' + $in.0.fileName);
printLog(debug, 'joining etc');
The Phase consists of a FileList into a SimpleCopy, into a LookupJoin, a Reformat (produce SQL) and a DBInsert.
However, while I see logs for phases above, I'm not seeing anything produced in the log for any part of my phase. All parts of the phase do report running successfully in log. I've also done Enable Debugging on all connections in this phase.
Am I missing something to enable logging? Is there a better way to debug processing in CloudConnect?
Discovered the problem - the FileList will succeed if the source file cannot be found, but none of the subsequent steps will then fire. It's somewhat unintuitive, since the log files says 'succeeded'.
For debugging, after run you can access the data by right clicking on the connection, and selecting "View Data"
Sorry for the elementary question, but documentation didn't seem to cover this clearly, at least for a GoodData noob. I'll leave it up for anyone with the same problem!

Audit and error handling in SSIS

We are starting a project to handle big, big flat files. These files are kind of 'normalized' and we want to process them first to an intermediate file.
I would like to see a custom table for audit rows and a custom table for errors that are thrown during processing. Also errors must be stored in the Event Log.
What are the best practices according to audit & error handling in general for SSIS (VS2008)?
(edit)
We have made (I think) very elegant solution by designing 1 master package. This package runs a child package (the one orginally intended). The master package subscribes to the 3 events like OnInformation, OnWarning and OnError. These events are routed to a generic audit & logging service that makes calls to the Enterprise Library Logging & Exception handling blocks.
What I would recommend you is to adopt the following philosophy for stable etl processes coming from files:
Never cast anything in the connector, just import the fields as nvarchars of the maximum lenght they will achieve.
Procedurally add a rowcount for error tracking in casting errors.
Cast and control each column to your specification.
If a row cannot be read at some stage, you will not know the index, but you will know that the file is malformed (extremely rare in my experience, for half transferred files), and it should be rejected anyway.
A quick screenshot of a part of a file loading process shows how the rejection (after assigning row_id) can work (link to dataflow image). To this you can add further countless checks (duplicates...) and even have a repository for the loaded files to check upon the rejects and whatever else you might want to control (Link to control flow image).
In some of my processes, I even use a flat file connector and just import each row as a bulk text and then split it in columns with an intermediate script component, allowing for different versions of the columns in the files.
Anyway, sorry not to be more detailed (due to my status I can't add more links or any images), but I hope that you understand the concept.
Regards,
Francisco.