Get segments from a Message Hl7 - pentaho

I want to retrieve data from each segment DG1 or OBX from hl7 message, by using pentaho data-integration, in other term, how can i use hl7 input step in kettle to extract data from all repeated segments as DG1, PV1, OBX.

In order to extract a specific HL7 segment from your input, take a look at the following output fields of the HL7 Input step:
StructureName: Yields the ID of the segment.
FieldName: Description of the field according to HL7.
Coordinates: Level within each segment.
To tell repeated segments apart, you have to concatenate StructureName and Coordinates. This can be accomplished by the Calculator step. Afterwards the fields in question need to be extracted. Use the Row denormaliser step which looks up key-value pairs and assigns them to new fields in the output rows.
Let HL7ID be the new field of the Calculator step which adds A and B, where A is StructureName and B is Coordinates. Within the Row denormaliser step, HL7ID will be be key field. Enter your desired segments into be the Key value, according to the concatenated value scheme, while the Value fieldname column must be assigend to contain the data, i.e. the Value output field. Also, the input and output hops of HL7 Input are to copy the data, no round robin.

Related

Single Column name splitting to multiple columns with data

I am analyzing the inverter data from a power plant. There are more than 10 inverters and each inverter has 3 parameters that need to be analyzed. The parameters are Energy generated per interval, AC Power P_AC and DC Power P_DC. The inverters are numbered as 17.02 or 22.03 etc. The data is taken at a time step of 5minutes. After downloading the data in a csv file, there is only 1 column in the csv file. The column name contains numbers of all the inverter and their parameter names separated by a ';'. Also, the data at each time step is in 1 single cell separated by ';'. I want to analyse all the parameters of all the inverters and i want to make sure that each parameter of every inverter comes in a separate column. Can somebody help me to segregate this? Also, I want to ensure that columns are sorted in the increasing order of inverter numbering. I am attaching the the link to actual csv file - https://drive.google.com/file/d/1Rp54DEarzFUGm2oU5Bfkl3karbUYYwcd/view?usp=sharing
https://drive.google.com/file/d/12InL3N-ZMMODGWVUYn_8nTwPgAQtSBzq/view?usp=sharing
In the data frame above, you can see that every column has a project code -'SM10046 Akadyr Ext', then the inverter number 'INV 17.02' and then the name of parameter 'Energy generated per interval [kWh]' and lastly the code of parameter 'E_INT' . I want that the project code should be removed and only inverter number and parameter code should be present as a column name. Also, all the inverter should come in a serial order.
Essentially you have a multitude of columns, and from your description, you need to sort /analyze data from each plant ?
If you need permanent storage of data, I would use SQLite or similar, and convert each plant into a row with a key holding plant ID.
Like this:
2020-07-28 13:33:09;A1;A2;A3;B1;B2;B3
turned into something like this (now in a database, 5 fiels per record)
2020-07-28 13:33:09;A;A1;A2;A3
2020-07-28 13:33:09;B;A1;A2;A3
my goto-too for this would be a scripting language like AutoIT3, Perl or Python, which makes splitting lines and connecting to SQLite trivial.
If you just need real-time sorting/reporting etc, AWK is a perfect tool for this, since you can create sorted arrays very easily. (Perl/Python again of course alternatives as well).
It could be useful if you provide actual (trivial) example of what you expect output to be ?

Filtering and display unique column pairs in Excel

Follow on from Excel Count unique value multiple columns
I am trying to filter and setup a table containing all the unique combinations of message types.
So with three message types as an example below, I want to create a table with all the possible flows from this.
So every time MessageA exists, it is either followed by a MessageA, MessageB, MessageC or is the last of the sequence.
And everytime we see MessageC it is only followed by MessageA.
On the left, is the data and on the right is the desired result.
I want this to be able to scale to multiple columns/rows
You could do it by comparing two offset ranges, A1:D5 and B1:E5
=SUMPRODUCT(($A$1:$D$5=$G2)*($B$1:$E$5=K$1))
As you can see, I have cheated slightly by setting K1 blank so it compares correctly with column E, but this could be made part of a longer formula if it was necessary to have END as the column header for K.

How can I map each specific row value to an ID in Pentaho?

I’m new to Pentaho and I’m currently having an issue with mapping specific row values to an ID.
I have a data file with around 30 columns, one of which is for currencies (USD, GBP, AUD, etc).
The main objective is to have the user select up to 8 (minimum of 1) currencies and map them to a corresponding ID 1-8. All other currencies not in the specified 8 will be mapped with an ID of 9.
The final step is to then output the original data set, along with the IDs.
I’m pretty sure I’m making this way harder than it should, but here is what I have at the moment.
I have created a job where the first step is to set the variables for my 8 currencies, selectionOne -> AUD, selectionTwo -> GBP, …, selectionEight -> JPY.
I then have a transformation to read the data from the file and use the copy rows to result step.
Following that I have a second job called for-each which is my loop for checking the current currency in the row.
Within this job I have two transformations, one called set-current, one called map-currencies.
set-current simply uses the get rows from result step (to grab the data from the first transformation). I then use the set variable step to set the current currency to the value in currency field. This works fine, as each pass through in the loop changes the current variable to the correct value.
Map-currencies is where I’m having the most issues.
The goal is to use the filter row step to compare the current currency against the original 8 selected currencies, and then using the value mapper step to map it to an ID, before outputting the csv file.
The main issue here, is that I can’t use my original variables in the filter or value mapper.
So, what I’ve done here is use the get variables step to retrieve the variables and named them: one, two, three, …, eight. This allows me to bypass the filtering issue, but they don’t seem to work for the value mapper, which is the all important step.
The second issue is that when the file is output it only outputs one value (because of the loop), selecting the append option works, but this could be a problem if the job is run more than once.
However, the priority here is the mapping issue.
I understand that this is rather long, and perhaps a tad confusing, but I will greatly appreciate any help on this, even if it’s an entirely new approach 😊.
Like I said, I’m probably making it harder than it should be.
Thanks for your time.
Edit for AlainD
Input example
Output example
This should be doable in a single transformation using the Stream Lookup step.
Text File Input is your main file, Property input reads your property file into Key and Value columns. You could use a normal text file with two columns instead of the property input.
Below are the settings of the Stream lookup. Note the default value "9" for records that are not found in the lookup stream.

Pentaho compare values from table to a number from REST api

I need to make a dimension for a datawarehouse using pentaho.
I need to compare a number in a table with the number I get from a REST call.
If the number is not in the table, I need to set it to a default (999). I was thinking to use table input step with a select statement, and a javascript step that if the result is null to set it to 999. The problem is if there is no result, there is nothing passed through. How can this be done? Another idea was to get all values from that table and somehow convert it to something so I can read id as an array in javascript. I'm very new to pentaho DI but I've did some research but couldn't find what I was looking for. Anyone know how to solve this? If you need information, or want to see my transformation let me know!
Steps something like this:
Load number from api
Get Numbers from table
A) If number not in table -> set number to value 999
B) If number is in table -> do nothing
Continue with transformation with that number
I have this atm:
But the problem is if the number is not in the table, it returns nothing. I was trying to check in javascript if number = null or 0 then set it to 999.
Thanks in advance!
Replace the Input rain-type table by a lookup stream.
You read the main input with a rest, and the dimension table with an Input table, then make a Stream Lookup in which you specify that the lookup step is the dimension input table. In this step you can also specify a default value of 999.
The lookup stream works like this: for each row coming in from the main stream, the steps looks if it exists in the reference step and adds the reference fields to the row. So there is always one and exactly one passing by.

Generate consecutive rows in Pentaho

How do I generate consecutive rows in Pentaho Spoon?
I have a text file and I am using "Sample Rows" step to select every third line from the text file. But the problem with the "Sample Rows" is that I have to manually type "3,6,9,12....".
Is there a better way to do this. I tried adding the field name from "Add Sequence" step, but it doesn't read.
Attached Image
You can add a counter using the Add Sequence step and setting the Maximim value as 3.
This will create a new field, integer, with values 1,2,3,1,2,3,...
Then, a Filter Rows step can be used on the condition that the field must equal 3, and only every 3rd row will pass to the output of the filter rows step.
If I understood issue correctly,
You can use a separate Table or file which will have input configuration for Transformations and Job.
So manually you don't need to enter 3,5,7 etc. it will read input data from input table or file.