Value from last row of output file and specify into variable - pentaho

I have a ETL (pentaho) which give an excel file output from following steps.
Transformation 1:
Table input (has got SQL statement with created > DATEVALUE ORDER BY created ASC)
Sort rows
Excel output
Now how can I read last row of the excel output (created column) value and store into text file? So I can make sure when the job re-run then SQL statement created date is grater than text file stored value.
Transformation 1:
Table input (SQL statement like created > (get the value from text file) ORDER BY created ASC)
Sort rows
Excel output
What would be the simplest way of achieving this?

You can save last row of the data stream, which matches to last row written to Excel, using a combination of Group by and Text file output which you can apped right after your Excel output step:
Group by step: Set Last value in Type column of Aggregates tab. Take your date field as a Subject and give it some Name e.g. last_date.
Text file output step: Write last_date into a file.
You trasformation would then start by a step which reads the last_date from file (Text file input) and passes it to the Table input step where it is used as a parameter of your SQL query.

You can also use the Identify last row in a stream step. Just keep the rows flowing out of the Excel output step, identify the last row, discard all but that row, then write it to a text file. It would look something like this:

Related

how to get the data from a column based on name not the index number

I have a dataframe with column abc having values like below
[{note=Part 3 of 4; Total = $11,000, cost=2750, startDate=2021-11-01T05:00:00Z+0000}]
Now I want to extract data based on name,for example i want to extract cost and start date and create a new column.
Asking it to be working on name because the order of these values might change.
I have tried below line of code but due to change in the data order I am getting wrong data.
df_mod = df_mod.withColumn('cost', split(df_mod['costs'], ',').getItem(1)) \
.withColumn('costStartdate', split(df_mod['costs'], ',').getItem(2))
That's because your data is not comma-separated, it just looks like that. You'll want to use regexp_extract to find the correct content.

Combine Rows but concatenate on a certain field in Excel Power Query or Microsoft SQL

I have brought a table from an Authority database into Excel via power query OBDC type, that includes fields like:
Date - various
Comments - mem_txt
Sequence - seq_num
The Comments field has a length restriction, and if a longer string is entered, it returns multiple rows with the Comments field being chopped into suitable lengths and the order returned in the Sequence field as per extract below. All other parts of the records are the same.
I want to collapse the rows based and concatenate the various Comments into a single entry. There is a date/time column just outside of the screen shot above that can be used to group the rows by (it is the same for the set of rows, but unique across the data set).
For example:
I did try bring the data in by a query connection, using the GROUP_CONCAT(Comments SEPARATOR ', ') and GROUP BY date, but that command isn't available in Microsoft Query.
Assuming the date/time column you refer to is named date_time, the M code would be:
let
Source = Excel.CurrentWorkbook(){[Name = "Table1"]}[Content],
#"Grouped Rows" = Table.Group(
Source,
{"date_time"},
{{"NewCol", each Text.Combine([mem_text])}}
)
in
#"Grouped Rows"
Amend the Source line as required.

Not able to get system data at the end of transformation step

I wants to log transformation start time and end time into table. But I am getting error Field [start_date] is required and couldn't be found!.
Following steps I did.
Step 1 : Get Transformation name and system date from Get System Data as
Transformation Start_Date.
Step 2 : Use Table Input to get count of records in table A.
Step 3 : Use Filter to check if table A is empty (Count = 0), if empty then
copy of data from table B to Table A.
Step 4 : IF empty then control goes to Table Input to select all
data from table B.
Step 5 : Use Table Output To insert data from Table Input.
Step 6 : Get system date from Get System Data as transformation End_date.
Step 7 : Use Table Output step to insert data into log table, Into this step I
am inserting Transformation name,Start Date and End Date.
Can someone let me know where I am wrong. I am not able to get Start Date at the end of transformation. Following is the Diagram.
Transformation Diagram
Table Input step ignores records that were generated before. In your diagram "Get_Transformation_name_and_start_time" generates a single row that is passed to the next step (the Table Input one) and then it's not propagated any further.
You can use a single "Get System Info" step at the end of your transformation to obtain start/end date (in your diagram that would be Get_Transformation_end_time 2). To get transformation start date you can use "with system date (fixed)" value. It will return the system time, determined at the start of the transformation, that will be common for all rows. You can use "system date (variable)" as an end timestamp (in case of more than one record you'll have to take max of these values).
It's probably worth looking at standard Pentaho logging options: http://wiki.pentaho.com/display/EAI/.08+Transformation+Settings#.08TransformationSettings-Logging , you can set-up a DB conection and a table that will store transformation execution data "out of the box".

Pentaho Adding summary rows

Any idea how to summarize data in a Pentaho transformation and then insert the summary row directly under the group being summarized.
I can use a Group By step and get a summarised result stream having one row per key field, but what I want is each sorted group written to the output and the summary row inserted underneath, thus preserving the input.
In the Group By, you can do 'Include all Rows', but this just appends the summary fields to the end of each existing row. It does not create new summary rows.
Thanks in advance
To get the summary rows to appear under the group by blocks you have to use some tricks, such as introducing a numeric "order" field, setting the value of the original data to 1 and the sub totals rows to 2.
Also in the group-by/ sub-totals stream, I am generating a sum field, say "subtotal". You have to make sure to also include this as a blank in your regular stream or else the metadata will be divergent and the final merge will not work.
Here is the best explanation I have found for this pattern:
https://www.packtpub.com/books/content/pentaho-data-integration-4-working-complex-data-flows
You will need to copy the rows too a different stream, and then merge or join them again, to make it a separate row.

SSRS how to use multiple criteria in expression - based on a row value and a field name

Please look at the image below, my dataset has two processes, 'logs processed' and 'stacked at kilns'.
I need to take the total 'stacked at kilns' and divide it by the total 'logs processed' for each length.
so for example for field name 5.4 (dataset field length), I would like to divide 2784/2283 to return a percentage of the recovery.
my expressions currently is
=Sum(IIf(
(Fields!process.Value = "Logs Processed") AND (Fields!Length.Value=Fields!Length.Value)
, Fields!cubes.Value
, Nothing)
, "Wetmill_to_Kiln")
But this returns the value of all lengths where process is 'Logs Processed' not for just length 5.4 as per example.
So each length field is dynamically created (3.3,3.6,3.9 .... 6,6.3,6.6)
I would like to get the total for 'stacked at kiln'/'logs processed' for each length field.
any help appreciated as always
example of my desired output in bottom image.
current output:
Desired output:
*****UPDATE AS PER TPHE*********
I have created a text box inside the column group. this returns the value for that group but how can I reference the value of that text box.
if I use something like ReportItems!tbxSource.Value how can I reference the value of the textbox when the it is dynamically created across the column group? there are then mulitple instances of that textbox name?
with reference to the picture how do get the value of the white <> from the textbox with green <>
Thanks,
Since you are using a column group, you can put your expression into a text box within the group and it will execute on only the data that is captured within each column. So if your code for the Logs processed row is something like Sum(Logs) and your code for the Stacked at Kiln row is something like Sum(Stacked), your expression code for the recovery row would be Sum(Stacked)/Sum(Logs). The key is to make sure that it is within the column group.
So what I got to work was to create two variables on the column group. one called kilntotal and one called logtotal. the variables value was equal to the result of this expression:
=sum(iif(Fields!process.Value="logs",cdbl(Fields!cubes.Value),cdbl(`0)))`
and
=sum(iif(Fields!process.Value="kiln",cdbl(Fields!cubes.Value),cdbl(0)))
I then use these variable in my logic in my recovery % row:
=Variables!kilntotal.Value/Variables!logtotal.Value
Thanks for the input and your time.