Not able to get system data at the end of transformation step - pentaho

I wants to log transformation start time and end time into table. But I am getting error Field [start_date] is required and couldn't be found!.
Following steps I did.
Step 1 : Get Transformation name and system date from Get System Data as
Transformation Start_Date.
Step 2 : Use Table Input to get count of records in table A.
Step 3 : Use Filter to check if table A is empty (Count = 0), if empty then
copy of data from table B to Table A.
Step 4 : IF empty then control goes to Table Input to select all
data from table B.
Step 5 : Use Table Output To insert data from Table Input.
Step 6 : Get system date from Get System Data as transformation End_date.
Step 7 : Use Table Output step to insert data into log table, Into this step I
am inserting Transformation name,Start Date and End Date.
Can someone let me know where I am wrong. I am not able to get Start Date at the end of transformation. Following is the Diagram.
Transformation Diagram

Table Input step ignores records that were generated before. In your diagram "Get_Transformation_name_and_start_time" generates a single row that is passed to the next step (the Table Input one) and then it's not propagated any further.
You can use a single "Get System Info" step at the end of your transformation to obtain start/end date (in your diagram that would be Get_Transformation_end_time 2). To get transformation start date you can use "with system date (fixed)" value. It will return the system time, determined at the start of the transformation, that will be common for all rows. You can use "system date (variable)" as an end timestamp (in case of more than one record you'll have to take max of these values).
It's probably worth looking at standard Pentaho logging options: http://wiki.pentaho.com/display/EAI/.08+Transformation+Settings#.08TransformationSettings-Logging , you can set-up a DB conection and a table that will store transformation execution data "out of the box".

Related

Create Hourly Average query from Logger table

I use a logger table with timestamp,point id and value fields.
The control system adds a record each time a point id value is changed to the logger table.
Here's an example of the SELECT query of the logger table:
I want to run query on that table and get in return 1 hourly average values of some tags(point id's of different value - for example: 1 hourly average of point_id=LT_174, 1 hourly average of point_id=AT_INK and so on).
I would like to present the results list in pivot table results list.
Can I do that? and if it's imposible to get all requested tags together, how can I run the same query for 1 tag? (I use VB.Net as a platform for running this query, so I can build it by calculating all requested tags in a loop, each time 1 tag).
I'll be happy to get ideas and suggestions for this problem.

Value from last row of output file and specify into variable

I have a ETL (pentaho) which give an excel file output from following steps.
Transformation 1:
Table input (has got SQL statement with created > DATEVALUE ORDER BY created ASC)
Sort rows
Excel output
Now how can I read last row of the excel output (created column) value and store into text file? So I can make sure when the job re-run then SQL statement created date is grater than text file stored value.
Transformation 1:
Table input (SQL statement like created > (get the value from text file) ORDER BY created ASC)
Sort rows
Excel output
What would be the simplest way of achieving this?
You can save last row of the data stream, which matches to last row written to Excel, using a combination of Group by and Text file output which you can apped right after your Excel output step:
Group by step: Set Last value in Type column of Aggregates tab. Take your date field as a Subject and give it some Name e.g. last_date.
Text file output step: Write last_date into a file.
You trasformation would then start by a step which reads the last_date from file (Text file input) and passes it to the Table input step where it is used as a parameter of your SQL query.
You can also use the Identify last row in a stream step. Just keep the rows flowing out of the Excel output step, identify the last row, discard all but that row, then write it to a text file. It would look something like this:

Pentaho kettle conveting month numeric value to month name

I am facing an issue related to olap and kettle. Thing is I am using pentaho DI(kettle) to export data from my database to olap star schema(facts & dimensions). Now when it comes to timestamp I am able to retreive months,years and days from aw timestamp using calculator in kettle . Now What I need is to convert month numeric value to its coresponding month name (like replace 1 with JAN and so on). How this can be acheived with kettle . Please suggest me .
In PDI/Kettle, I've found it tricky to set the value of one field based on the value of another field that is a different data type. A JavaScript or Formula step will do it, but in this case I think I'd use a Stream Lookup because your lookup values are few and fixed.
Use a Data Grid step with an Integer column for the month number (1, 2, 3 ...) and a string column with the month name (JAN, FEB, MAR ...). Use this step as the 'Lookup step' in your Stream Lookup. Then just retrieve the month name into your data flow. This should be pretty fast, which is good if you're working with typical data warehouse volumes.
As Brian said you can also use the Modified JavaScript step to perform the conversion. Here is how you can do it with that:
var month_names=new Array();
month_names[1]="Jan";
month_names[2]="Feb";
month_names[3]="Mar";
month_names[4]="Apr";
month_names[5]="May";
month_names[6]="Jun";
month_names[7]="Jul";
month_names[8]="Aug";
month_names[9]="Sep";
month_names[10]="Oct";
month_names[11]="Nov";
month_names[12]="Dec";
var month_short = month_names[month_num];

QlikView Set Analysis: use either column number or other unique info from row

I am trying to use Set Analysis in the table below for the column labelled test. I am trying to get sum([Best Dollar]) for the date range specified by the Start and End columns.
This expression returns results, but it's naturally static for each row of the table:
=sum({$<AsAtDate={">=40959 and <=40960"}>} [Best Dollar])/1000
This is what I want to have but it returns 0:
=Sum({$<AsAtDate={">=(num(floor(BroadcastWeekStart2))) and <=(num(floor(BroadcastWeekStart2)))+6"}>} [Best Dollar])/1000
To obtain unique start date serial numbers for each line for the start column (BroadcastWeekStart2) I use the following expression:
=(num(floor(BroadcastWeekStart2)))
How can I specify that the values or calculations used for the start and end columns are used in Set Analysis for the field above?
There is at least one information missing in your question.
Do you want to select on fixed values or should the sum depend on the current time?
For the static version something like
=sum( {$<BroadcastWeekStart2={"40959"}, BroadcastWeekStart2={"<=40960"}>} [Best Dollar])/1000
should work. (Assuming that BroadcastWeekStart2 contains these kind of values.)
Let me show you an example how I calculate values for the current week:
='Count: ' & count({$<start_week={"$(=WeekStart(Today()))"}>} Games)
where the start_week is set in the load script by:
WeekStart(date#(#4, 'YYYYMMDD')) as start_week,

SSIS 2008 Row Count Transformation - Row count return 0

This should be rather simple but I don't know why I get Row Count as Zero when I use ROW COUNT transformation in Data Flow Task. I have created a variable(NoOfRecords) with Package scope.
Variable name set to variable NoOfRecords in Row Count Transformation.
Used a Derived column to assign the row count.
The package runs successfully and shows record count 265
But the Derived column shows record count as 0 instead of 265 rows.
After the Row Count, add an Aggregate Taks and select count option in the Operation tab in the Aggregate task properties.
Then you can use the row count variable for further operation where it holds the total row count of the input file.
Row Count is processed after rows has passed.
You're adding the variable to each row as they pass through the Derived Column step, but at this time, the variable has not been updated (as it happens after all rows has passed) - so the value 0 is correct.
You -might- be able to achieve this by using an asynchronous task before your derived (but i'm not sure this'll work, it just popped to my mind). Add a Sort or Aggregate step before your Derived and try again.
I used this in the query as an efficient way of getting the row count:
count(all SnapshotDate) over () as nRowCount
Here's the successful technique for recording rows that worked in my situation.
The scenario is I want to log the rows migrated between tables. The RowCount doesn't get populated until you exit the DataFlow.
[Control Flow]
1. Data Flow Task
a. read origin data - Source control
b. Add RowCount transformation. Link a to b.
Right-click RowCount and map to UserVariable (int64)
c. Add Destination control for loading table.
d. Link b to c.
2. Add Execute SQL Task to ControlFlow. right click, edit
INSERT SQL statement: Insert Into LogTable(rowcount) Values(?)
Parameter Mapping
Variable Direction DataType ParameterName ParameterSize
User::RowCount INPUT LONG 0 -1