I'm working on a personal project and very new (learning as I go) to JSON, NiFi, SQL, etc., so forgive any confusing language used here or a potentially really obvious solution. I can clarify as needed.
I need to take the JSON output from a website's API call and insert it into a table in my MariaDB local server that I've set up. The issue is that the JSON data is nested, and two of the key pieces of data that I need to insert are used as variable key objects rather than values, so I don't know how to extract it and put it in the database table. Essentially, I think I need to identify different pieces of the JSON expression and insert them as values, but I'm clueless how to do so.
I've played around with the EvaluateJSON, SplitJSON, and FlattenJSON processors in particular, but I can't make it work. All I can ever do is get the result of the whole expression, rather than each piece of it.
{"5381":{"wind_speed":4.0,"tm_st_snp":26.0,"tm_off_snp":74.0,"tm_def_snp":63.0,"temperature":58.0,"st_snp":8.0,"punts":4.0,"punt_yds":178.0,"punt_lng":55.0,"punt_in_20":1.0,"punt_avg":44.5,"humidity":47.0,"gp":1.0,"gms_active":1.0},
"1023":{"wind_speed":4.0,"tm_st_snp":26.0,"tm_off_snp":82.0,"tm_def_snp":56.0,"temperature":74.0,"off_snp":82.0,"humidity":66.0,"gs":1.0,"gp":1.0,"gms_active":1.0},
"5300":{"wind_speed":17.0,"tm_st_snp":27.0,"tm_off_snp":80.0,"tm_def_snp":64.0,"temperature":64.0,"st_snp":21.0,"pts_std":9.0,"pts_ppr":9.0,"pts_half_ppr":9.0,"idp_tkl_solo":4.0,"idp_tkl_loss":1.0,"idp_tkl":4.0,"idp_sack":1.0,"idp_qb_hit":2.0,"humidity":100.0,"gp":1.0,"gms_active":1.0,"def_snp":23.0},
"608":{"wind_speed":6.0,"tm_st_snp":20.0,"tm_off_snp":53.0,"tm_def_snp":79.0,"temperature":88.0,"st_snp":4.0,"pts_std":5.5,"pts_ppr":5.5,"pts_half_ppr":5.5,"idp_tkl_solo":4.0,"idp_tkl_loss":1.0,"idp_tkl_ast":1.0,"idp_tkl":5.0,"humidity":78.0,"gs":1.0,"gp":1.0,"gms_active":1.0,"def_snp":56.0},
"3396":{"wind_speed":6.0,"tm_st_snp":20.0,"tm_off_snp":60.0,"tm_def_snp":70.0,"temperature":63.0,"st_snp":19.0,"off_snp":13.0,"humidity":100.0,"gp":1.0,"gms_active":1.0}}
This is a snapshot of an output with a couple thousand lines. Each of the numeric keys that you see above (5381, 1023, 5300, etc) are player IDs for the following stats. I have a table set up with three columns: Player ID, Stat ID, and Stat Value. For example, I need that first snippet to be inserted into my table as such:
Player ID Stat ID Stat Value
5381 wind_speed 4.0
5381 tm_st_snp 26.0
5381 tm_off_snp 74.0
And so on, for each piece of data. But I don't know how to have NiFi select the right pieces of data to insert in the right columns.
I believe that it's possible to use jolt to transform your json into a format:
[
{"playerId":"5381", "statId":"wind_speed", "statValue": 0.123},
{"playerId":"5381", "statId":"tm_st_snp", "statValue": 0.456},
...
]
then use PutDatabaseRecord with json reader.
Another approach is to use ExecuteGroovyScript processor.
Add new parameter to it with name SQL.mydb and link it to your DBCP controller service
And use the following script as Script Body parameter:
import groovy.json.JsonSlurper
import groovy.json.JsonBuilder
def ff=session.get()
if(!ff)return
//read flow file content and parse it
def body = ff.read().withReader("UTF-8"){reader->
new JsonSlurper().parse(reader)
}
def results = []
//use defined sql connection to create a batch
SQL.mydb.withTransaction{
def cmd = 'insert into mytable(playerId, statId, statValue) values(?,?,?)'
results = SQL.mydb.withBatch(100, cmd){statement->
//run through all keys/subkeys in flow file body
body.each{pid,keys->
keys.each{k,v->
statement.addBatch(pid,k,v)
}
}
}
}
//write results as a new flow file content
ff.write("UTF-8"){writer->
new JsonBuilder(results).writeTo(writer)
}
//transfer to success
REL_SUCCESS << ff
I am trying to create a new field during indexing however the fields become columns instead of values when i try to concat. What am i doing wrong ? I have looked in the docs and seems according ..
Would appreciate some help on this.
e.g.
.csv file
**Header1**, **Header2**
Value1 ,121244
transform.config
[test_transformstanza]
SOURCE_KEY = fields:Header1,Header2
REGEX =^(\w+\s+)(\d+)
FORMAT =
testresult::$1.$2
WRITE_META = true
fields.config
[testresult]
INDEXED = True
The regex is good, creates two groups from the data, but why is it creating a new field instead of assigning the value to result?. If i was to do ... testresult::$1 or testresult::$2 it works fine, but when concatenating it creates multiple headers with the value as headername. Is there an easier way to concat fields , e.g. if you have a csv file with header names can you just not refer to the header names? (i know how to do these using calculated fields but want to do it during indexing)
Thanks
Can anyone help me solve this case?
I have much file to process, two of them is like on below screenshot with my expected output.
I use this transformation on Talend: tFileList---tInputExcel---tUnpivotRow---tMap---tPostgresqlOutput
The output is different to my expected output. This is the screenshot of the output
Can anyone help me to reach my expected output which is like on my first picture above?
This will be pretty hard. You'd have to handle that as a text file. And whenever you found "store" value in the first column you'd update your type with the value.
Here's how I'd start:
Basically tJavaFlex begin piece would contain:
String col1Type
String colNType
main part:
if input_row.col0.equalsIgnoreCase("store") {
col1Type = input_row.col1;
col2Type = input_row.col2;
colNType = input_row.colN;
continue; /*(so this record will be Ignored for the rest of the components!)*/
}
output_row.col1Type = col1Type;
output_row.col1Value = Integer.valueOf(input_row.col1);
/*coz we have text and need numbers :( */
I think using propagate results will save you from writing down all the other fields.
And from here it would be very simple as you have key-type-value-type-value-type-value results.
Scenario:
I have created transformation to load data into table from csv file and I have following columns in csv file:
Customer_Id
Company_Id
Employee_Name
But user may give input file with column ordering (random order) as
Employee_Name
Company_Id
Customer_Id
so, if I try to load file which has random column ordering, will kettle load correct column values as per column names ... ?
Using ETL Metadata Injection you can use a transformation like this, to either normalize the data, or to store it to your database:
Then you just need to send the correct data to that transformation. You can read the header line from the CSV, and use Row Normaliser to convert to the format used by ETL Metadata Injection.
I have included a quick example here: csv_inject on Dropbox, if you make something like this and run it from something that runs it per csv file it should work.
Ooh, thats some nasty javascript!
The way to do this is with metadata injection. Look at the samples, but basically you need a template which reads the file, and writes it back out. you then use another parent transformation to figure out the headings, configure that template and then execute it.
There are samples in the PDI samples folder, and also take a look at the "figuring out file format" example in matt casters blueprints project on github.
You could try something like this as your JavaScript:
//Script here
var seen;
trans_Status = CONTINUE_TRANSFORMATION;
var col_names = ['Customer_Id','Company_Id','Employee_Name'];
var col_pos;
if (!seen) {
// First line
trans_Status = SKIP_TRANSFORMATION;
seen = 1;
col_pos = [-1,-1,-1];
for (var i = 0; i < col_names.length; i++) {
for (var j = 0; j < row.length; j++) {
if (row[j] == col_names[i]) {
col_pos[i] = j;
break;
}
}
if (col_pos[i] === -1) {
writeToLog("e", "Cannot find " + col_names[i]);
trans_Status = ERROR_TRANSFORMATION;
break;
}
}
}
var Customer_Id = row[col_pos[0]];
var Company_Id = row[col_pos[1]];
var Employee_Name = row[col_pos[2]];
Here is the .ktr I tried: csv_reorder.ktr
(edit, here are the test csv files)
1.csv:
Customer_Id,Company_Id,Employee_Name
cust1,comp1,emp1
2.csv:
Employee_Name,Company_Id,Customer_Id
emp2,comp2,cust2
Assuming rejecting the input file is not an option you basically have 4 solutions.
reorder the fields in an external editor (don't use excel if it contains dates)
Use code within your transformation to detect the column headers and reorder the file.
Use metadata injection as proposed by bolav
Create a job. This need to:
a. load the file into a temporary database.
b. use an sql statement to retrieve the fields (use a SELECT with an ORDER By clause)
c. output the file in the correct order
I need to make an XML file based on the SQL query that I run using powershell. I already know the schema for the XML that I need to create. The query results need to be looped through and I want to add each data value to specific XML node as per the schema.
I am able to run the query and get the results I need but I am having issues placing the data as per prescribed format.
Here's an example of how I am trying to accomplish this:
**Parsing the XMl Template
$XmlTemplate= [xml](get-content $xml) ($xml is the schema I have from the client)
***Parsing through XML Template and jumping to tag
$PlanIDXML= $XmlTemplate.NpiLink.PlanProvider.PlanID (to get to the node I need to enter data into)
**Parsing through XML Template and jumping to tag
$PlannameXML= $XmlTemplate.NpiLink.PlanProvider.PlanName (to get to the node I need to enter data into)
sample qry;
select PlanID,PlanName from plan
**Assuming I ran my query and saved the results as $qryresults***
foreach($result in $qryresults)
{
$PlanID=$result.PlanID
$PlanName=$result.PlanName
**Make Clone
$NewPlanIDXML=$PlanIDXML.Clone()
**Make Changes to the data
$NewPlanIDXML=$PlanID
***Append
$PlanIDXML.AppendChild($NewPlanIDXML)
* Do the same thing for Plan Name **
$PlanNameXML=$result.PlanName
}
$XmlTemplate.Save('filepath')
My concern is that I need to do this for each plan or planid that I get in my query results and I need to keep generating tags and tags even and append them to orginal nodes and save the schema.
So, if my query results have 10 Plan IDs it should continue to generate new Plan ID tags and Plan Name tags.
Its not letting me append (because system.string can not be converted to system.xml). I am really stuck and if you have a better approach on how to handle this, I am all ears.
Thanks much in advance!!!
You might be overengineering this a bit. If you have a template for the XML node, just treat it as a string, popping your values in at the appropriate place. Generate some array of these nodes as strings, then join them together and save to disk.
Let's say your template looks like this (type in some tokens yourself where generated values should go):
--Template.xml---
<Node attr="##VALUE1##">
<Param>##VALUE2##</Param>
</Node>
And you want to run some query to generate a bunch of these nodes, filling in VALUE1 and VALUE2. Then something like this could work:
$template = (gc .\Template.xml) -join "`r`n"
$val1Token = '##VALUE1##'
$val2Token = '##VALUE2##'
$nodes = foreach( $item in Run-Query)
{
# simple string replace
$result = $template
$result = $result.Replace($val1Token, $item.Value1)
$result = $result.Replace($val2Token, $item.Value2)
$result
}
# you have all nodes in a string array, just join them together along with parent node
$newXml = (#("<Nodes>") + $nodes + "</Nodes>") -join "`r`n"
$newXml | out-file .\Results.xml