populate the content of two files into a final file using pentaho(different fields) - pentaho

I have two files.
file A has, 3 columns
Sno,name,age,key,checkvalue
file B has 3 columns
Sno,title,age
I want to merge these two into final file C which has
Sno,name,age,key,checkvalue
I tried renaming "title" to "name" and then I used "Add constants" to add the other two field.
but, when i try to merge these, I get the below error
"
The name of field number 3 is not the same as in the first row received: you're mixing rows with different layout. Field [age String] does not have the same name as field [age String].
"
How to solve this issue.

After getting the input from file B. You use a select values and remove title column. Then you use a Add constants step and add new columns name,key,checkvalue and make Set empty string? to Y. Finally do the join accordingly. So it won't fail since both the files have same number of columns. Hope this helps.

Actually, there was an issue with the fields ... field mismatch. I used "Rename" option and it got fixed.

Related

Open Refine: Exporting nested XML with templating

I have a question regarding the templating option for XML in Open Refine. Is it possible to export data from two columns in a nested XML-structure, if both columns contain multiple values, that need to be split first?
Here's an example to illustrate better what I mean. My columns look like this:
Column1
Column2
https://d-nb.info/gnd/119119110;https://d-nb.info/gnd/118529889
Grützner, Eduard von;Elisabeth II., Großbritannien, Königin
https://d-nb.info/gnd/1037554086;https://d-nb.info/gnd/1245873660
Müller, Jakob;Meier, Anina
Each value separated by semicolon in Column1 has a corresponding value in Column2 in the right order and my desired output would look like this:
<rootElement>
<recordRootElement>
...
<edm:Agent rdf:about="https://d-nb.info/gnd/119119110">
<skos:prefLabel xml:lang="zxx">Grützner, Eduard von</skos:prefLabel>
</edm:Agent>
<edm:Agent rdf:about="https://d-nb.info/gnd/118529889">
<skos:prefLabel xml:lang="zxx">Elisabeth II., Großbritannien, Königin</skos:prefLabel>
</edm:Agent>
...
</recordRootElement>
<recordRootElement>
...
<edm:Agent rdf:about="https://d-nb.info/gnd/1037554086">
<skos:prefLabel xml:lang="zxx">Müller, Jakob</skos:prefLabel>
</edm:Agent>
<edm:Agent rdf:about="https://d-nb.info/gnd/1245873660">
<skos:prefLabel xml:lang="zxx">Meier, Anina</skos:prefLabel>
</edm:Agent>
...
</recordRootElement>
<rootElement>
(note: in my initial posting, the position of the root element was not indicated and it looked like this:
<edm:Agent rdf:about="https://d-nb.info/gnd/119119110">
<skos:prefLabel xml:lang="zxx">Grützner, Eduard von</skos:prefLabel>
</edm:Agent>
<edm:Agent rdf:about="https://d-nb.info/gnd/118529889">
<skos:prefLabel xml:lang="zxx">Elisabeth II., Großbritannien, Königin</skos:prefLabel>
</edm:Agent>
)
I managed to split the values separated by ";" for both columns like this
{{forEach(cells["Column1"].value.split(";"),v,"<edm:Agent rdf:about=\""+v+"\">"+"\n"+"</edm:Agent>")}}
{{forEach(cells["Column2"].value.split(";"),v,"<skos:prefLabel xml:lang=\"zxx\">"+v+"</skos:prefLabel>")}}
but I can't find out how to nest the splitted skos:prefLabel into the edm:Agent element. Is that even possible? If not, I would work with seperate columns or another workaround, but I wanted to make sure, if there's a more direct way before.
Thank you!
Kristina
I am going to expand the answer from RolfBly using the Templating Exporter from OpenRefine.
I do have the following assumptions:
There is some other column left of Column1 acting as record identifying column (see first screenshot).
The columns actually have some proper names
The columns URI and Name are the only columns with multiple values. Otherwise we might produce empty XML elements with the following recipe.
We will use the information about records available via GREL to determine whether to write a <recordRootElement> or not.
Recipe:
Split first Name and then URI on the separator ";" via "Edit cells" => "Split multi-valued cells".
Go to "Export" => "Templating..."
In the prefix field use the value
<?xml version="1.0" encoding="utf-8"?>
<rootElement>
Please note that I skipped the namespace imports for edm, skos, rdf and xml.
In the row template field use the value:
{{if(row.index - row.record.fromRowIndex == 0, '<recordRootElement>', '')}}
<edm:Agent rdf:about="{{escape(cells['URI'].value, 'xml')}}">
<skos:prefLabel xml:lang="zxx">{{escape(cells['Name'].value, 'xml')}}</skos:prefLabel>
</edm:Agent>
{{if(row.index - row.record.fromRowIndex == row.record.rowCount - 1, '</recordRootElement>', '')}}
The row separator field should just contain a linebreak.
In the suffix field use the value:
</rootElement>
Disclaimer: If you're keen on using only OpenRefine, this won't be the answer you were hoping for. There may be ways in OR that I don't know of. That said, here's how I would do it.
Edit The trick is to keep URL and literal side by side on one line. b2m's answer below does just that: go from right to left splitting, not from left to right. You can then skip steps 2 and 3, to get the result in the image.
split each column into 2 columns by separator ;. You'll get 4 columns, 1 and 3 belong together, and 2 and 4 belong together. I'm assuming this will be the case consistently in your data.
export 1 and 3 to a file, and export 2 and 4 to another file, of any convenient format, using the custom tabular exporter.
concatenate those two files into one single file using an editor (I use Notepad++), or any other method you may prefer. Several ways to Rome here. Result in OR would be something like this.
You then have all sorts of options to put text strings in front, between and after your two columns.
In OR, you could use transform on column URL to build your XML using the below code
(note the \n for newline, that's probably just a line feed, you may want to use \r\n for carriage return + line feed if you're using Windows).
'<edm:Agent rdf:about="' + value + '">\n<skos:prefLabel xml:lang="zxx">' + cells.Name.value + '</skos:prefLabel>\n</edm:Agent>'
to get your XML in one column, like so
which you can then export using the custom tabular exporter again. Or instead you could use Add column based on this column in a similar manner, if you want to retain your URL column.
You could even do this in the editor without re-importing the file back into OR, but that's beyond the scope of this answer.

How do I use values within a list to specify changing selection conditions and export paths?

I'm trying to split a large csv data using a condition. To automate this process, I'm pulling a list of unique conditions from a column in the data set and wanting to use this list within a loop to specify condition and also rename the export file.
I've converted the array of values into a list and have tried fitting my function into a loop, however, I believe syntax is the main error.
# df1718 is my df
# znlist is my list of values (e.g. 0 1 2 3 4)
# serial is specified at the top e.g. '4'
for x in znlist:
dftemps = df1718[(df1718.varname == 'RoomTemperature') & (df1718.zone == x)]
dftemps.to_csv('E:\\path\\test%d_zone(x).csv', serial)
So in theory, I would like each iteration to export the data relevant to the next zone in the list and the export file to be named test33_zone0.csv (for example). Thanks for any help!
EDIT:
The error I am getting is: "delimiter" must be string, not int
So if the error is in saving the file try this
dftemps.to_csv('E:\\path\\test{}_zone{}.csv'.format(str(serial),str(x)))

Splunk : formatting a csv file during indexing, values are being treated as new columns?

I am trying to create a new field during indexing however the fields become columns instead of values when i try to concat. What am i doing wrong ? I have looked in the docs and seems according ..
Would appreciate some help on this.
e.g.
.csv file
**Header1**, **Header2**
Value1 ,121244
transform.config
[test_transformstanza]
SOURCE_KEY = fields:Header1,Header2
REGEX =^(\w+\s+)(\d+)
FORMAT =
testresult::$1.$2
WRITE_META = true
fields.config
[testresult]
INDEXED = True
The regex is good, creates two groups from the data, but why is it creating a new field instead of assigning the value to result?. If i was to do ... testresult::$1 or testresult::$2 it works fine, but when concatenating it creates multiple headers with the value as headername. Is there an easier way to concat fields , e.g. if you have a csv file with header names can you just not refer to the header names? (i know how to do these using calculated fields but want to do it during indexing)
Thanks

Zoho Creator making a custom function for a report

Trying to wrap my head around zoho creator, its not as simple as they make it out to be for building apps… I have an inventory database, and i have four fields that I call to fill a field called Inventory Number (Inv_Num1) –
First Name (First_Name)
Last Name (Last_Name)
Year (Year)
Number (Number)
I have a Custom Function script that I call through a Custom Action in the form report. What I am trying to do is upload a CSV file with 900 entries. Of course, not all of those have those values (first/last/number) so I need to bulk edit all of them. However when I do the bulk edit, the Inv_Num1 field is not updated with the new values. I use the custom action to populate the Inv_Num1 field with the values of the other 4 fields.
Heres is my script:
void onetime.UpdateInv()
{
for each Inventory_Record in Management
{
FN = Inventory_Record.First_Name.subString(0,1);
LN = Inventory_Record.Last_Name.subString(0,1);
YR = Inventory_Record.Year.subString(2,4);
NO = Inventory_Record.Number;
outputstr = FN + LN + YR + NO;
Inventory_Record.Inv_Num1 = outputstr;
}
}
I get this error back when I try to run this function
Error.
Error in executing UpdateInv workflow.
Error in executing For Each Record task.
Error in executing Set Variable task. Unable to update template variable FN.
Error evaluating STRING expression :
Even though there is a First Name for example, it still thinks there is none. This only happens on the fields I changed with Bulk Edit. If I do each one by hand, then the custom action works—but of course then the Inv_Num1 is already updated through my edit on success functions and makes the whole thing moot.
this may be one year late, you might have found the solution but just to highlight, the error u were facing was just due to the null value in first name.
you just have put a null check on each field and u r good to go.
you can generate the inv_number on the time of bulk uploading also by adding null check in the same code on and placing the code on Add> On Submt.( just the part inside the loop )
the Better option would be using a formula field, you just have to put this formula in that formula field and you'll get your inventory_number autogenerated , you can rename the formula_field to Inv Number or whaterver u want.
Since you are using substring directly in year Field, I am assuming the
year field as string.else you would have to user Year.tostring().substring(2,4) & instead of if(Year=="","",...) you have to put if(Year==null , null,...);
so here's the formula
if(First_Name=="","",First_Name.subString(0,1))+if(Last_Name =="","",Last_Name.subString(0,1)) + if(Year=="","",Year.subString(2,4)+Number
Let me know ur response if u implement this.
Without knowing the datatype its difficult to fix, but making the assumption that your Inventory_Record.number is a numeric data item you are adding a string to a number:
The "+" is used for string Concatenation - Joiner but it also adds two numbers together so think "a" + "b" = "ab" for strings but for numbers 1 + 2 = 3.
All good, but when you do "a" + 2 the system doesn't know whether to add or concatenate so gives an error.

Splunk: Extracting multiple fields with the same name

I am using Splunk to index logs with multiple fields with the same name. All fields have the same meaning:
2012-02-22 13:10:00,ip=127.0.0.1,to=email1#example.com,to=email2#example.com
In the automatic extraction for this event, I only get "email1#example.com" extracted for the "to" field. How can I make sure all the values are extracted?
Thanks!
I think adding this to the end of the search this may do it:
| extract pairdelim="," kvdelim="=" mv_add=t | table to
(the 'table' is just for demonstration).
So, I think, in 'transforms.conf' (from http://docs.splunk.com/Documentation/Splunk/latest/admin/transformsconf) put:
[my-to-extraction]
DELIMS = ",", "="
MV_ADD = true
and reference it in 'props.conf':
[eventtype::my_custom_eventtype]
REPORT-to = my-to-extraction
where 'eventtype::my_custom_eventtype' could be anything that works as a 'props.conf' specification (<spec> in http://docs.splunk.com/Documentation/Splunk/latest/admin/propsconf).