Extract key value pair from JSON string dynamically (Redshift) - sql

I am working with a column which stores the data for camera effects usage in a JSON string.
The values inside it look something like this:
{"camera": {"GIFs": ["floss_dance", "kermit"], "filters": ["blur"], "GIF_count": 2, "Filter_count": 1}}
If I want to extract data for GIFs, I use this code:
json_extract_path_text(camera_effects, 'camera', 'GIFs') which will yield the result as ["floss_dance", "kermit"]
If I want to extract particular GIF names, I use json_extract_array_element_text(camera_effects, 'camera', 'GIFs') which will give me floss_dance and kermit in separate rows.
My issue is that we keep adding new effects to our camera, which means the number of elements in the 'camera' array will keep increasing and using the json_extract_path_text(camera_effects, 'camera', 'name_of_effect') is not a dynamic code.
Is there a way to extract a list of all key:value pairs that exist for 'camera' and then use the keys as a column, and values as rows?
Note: I am using Redshift SQL

Related

Open Refine: Exporting nested XML with templating

I have a question regarding the templating option for XML in Open Refine. Is it possible to export data from two columns in a nested XML-structure, if both columns contain multiple values, that need to be split first?
Here's an example to illustrate better what I mean. My columns look like this:
Column1
Column2
https://d-nb.info/gnd/119119110;https://d-nb.info/gnd/118529889
Grützner, Eduard von;Elisabeth II., Großbritannien, Königin
https://d-nb.info/gnd/1037554086;https://d-nb.info/gnd/1245873660
Müller, Jakob;Meier, Anina
Each value separated by semicolon in Column1 has a corresponding value in Column2 in the right order and my desired output would look like this:
<rootElement>
<recordRootElement>
...
<edm:Agent rdf:about="https://d-nb.info/gnd/119119110">
<skos:prefLabel xml:lang="zxx">Grützner, Eduard von</skos:prefLabel>
</edm:Agent>
<edm:Agent rdf:about="https://d-nb.info/gnd/118529889">
<skos:prefLabel xml:lang="zxx">Elisabeth II., Großbritannien, Königin</skos:prefLabel>
</edm:Agent>
...
</recordRootElement>
<recordRootElement>
...
<edm:Agent rdf:about="https://d-nb.info/gnd/1037554086">
<skos:prefLabel xml:lang="zxx">Müller, Jakob</skos:prefLabel>
</edm:Agent>
<edm:Agent rdf:about="https://d-nb.info/gnd/1245873660">
<skos:prefLabel xml:lang="zxx">Meier, Anina</skos:prefLabel>
</edm:Agent>
...
</recordRootElement>
<rootElement>
(note: in my initial posting, the position of the root element was not indicated and it looked like this:
<edm:Agent rdf:about="https://d-nb.info/gnd/119119110">
<skos:prefLabel xml:lang="zxx">Grützner, Eduard von</skos:prefLabel>
</edm:Agent>
<edm:Agent rdf:about="https://d-nb.info/gnd/118529889">
<skos:prefLabel xml:lang="zxx">Elisabeth II., Großbritannien, Königin</skos:prefLabel>
</edm:Agent>
)
I managed to split the values separated by ";" for both columns like this
{{forEach(cells["Column1"].value.split(";"),v,"<edm:Agent rdf:about=\""+v+"\">"+"\n"+"</edm:Agent>")}}
{{forEach(cells["Column2"].value.split(";"),v,"<skos:prefLabel xml:lang=\"zxx\">"+v+"</skos:prefLabel>")}}
but I can't find out how to nest the splitted skos:prefLabel into the edm:Agent element. Is that even possible? If not, I would work with seperate columns or another workaround, but I wanted to make sure, if there's a more direct way before.
Thank you!
Kristina
I am going to expand the answer from RolfBly using the Templating Exporter from OpenRefine.
I do have the following assumptions:
There is some other column left of Column1 acting as record identifying column (see first screenshot).
The columns actually have some proper names
The columns URI and Name are the only columns with multiple values. Otherwise we might produce empty XML elements with the following recipe.
We will use the information about records available via GREL to determine whether to write a <recordRootElement> or not.
Recipe:
Split first Name and then URI on the separator ";" via "Edit cells" => "Split multi-valued cells".
Go to "Export" => "Templating..."
In the prefix field use the value
<?xml version="1.0" encoding="utf-8"?>
<rootElement>
Please note that I skipped the namespace imports for edm, skos, rdf and xml.
In the row template field use the value:
{{if(row.index - row.record.fromRowIndex == 0, '<recordRootElement>', '')}}
<edm:Agent rdf:about="{{escape(cells['URI'].value, 'xml')}}">
<skos:prefLabel xml:lang="zxx">{{escape(cells['Name'].value, 'xml')}}</skos:prefLabel>
</edm:Agent>
{{if(row.index - row.record.fromRowIndex == row.record.rowCount - 1, '</recordRootElement>', '')}}
The row separator field should just contain a linebreak.
In the suffix field use the value:
</rootElement>
Disclaimer: If you're keen on using only OpenRefine, this won't be the answer you were hoping for. There may be ways in OR that I don't know of. That said, here's how I would do it.
Edit The trick is to keep URL and literal side by side on one line. b2m's answer below does just that: go from right to left splitting, not from left to right. You can then skip steps 2 and 3, to get the result in the image.
split each column into 2 columns by separator ;. You'll get 4 columns, 1 and 3 belong together, and 2 and 4 belong together. I'm assuming this will be the case consistently in your data.
export 1 and 3 to a file, and export 2 and 4 to another file, of any convenient format, using the custom tabular exporter.
concatenate those two files into one single file using an editor (I use Notepad++), or any other method you may prefer. Several ways to Rome here. Result in OR would be something like this.
You then have all sorts of options to put text strings in front, between and after your two columns.
In OR, you could use transform on column URL to build your XML using the below code
(note the \n for newline, that's probably just a line feed, you may want to use \r\n for carriage return + line feed if you're using Windows).
'<edm:Agent rdf:about="' + value + '">\n<skos:prefLabel xml:lang="zxx">' + cells.Name.value + '</skos:prefLabel>\n</edm:Agent>'
to get your XML in one column, like so
which you can then export using the custom tabular exporter again. Or instead you could use Add column based on this column in a similar manner, if you want to retain your URL column.
You could even do this in the editor without re-importing the file back into OR, but that's beyond the scope of this answer.

Can't figure out how to insert keys and values of nested JSON data into SQL rows with NiFi

I'm working on a personal project and very new (learning as I go) to JSON, NiFi, SQL, etc., so forgive any confusing language used here or a potentially really obvious solution. I can clarify as needed.
I need to take the JSON output from a website's API call and insert it into a table in my MariaDB local server that I've set up. The issue is that the JSON data is nested, and two of the key pieces of data that I need to insert are used as variable key objects rather than values, so I don't know how to extract it and put it in the database table. Essentially, I think I need to identify different pieces of the JSON expression and insert them as values, but I'm clueless how to do so.
I've played around with the EvaluateJSON, SplitJSON, and FlattenJSON processors in particular, but I can't make it work. All I can ever do is get the result of the whole expression, rather than each piece of it.
{"5381":{"wind_speed":4.0,"tm_st_snp":26.0,"tm_off_snp":74.0,"tm_def_snp":63.0,"temperature":58.0,"st_snp":8.0,"punts":4.0,"punt_yds":178.0,"punt_lng":55.0,"punt_in_20":1.0,"punt_avg":44.5,"humidity":47.0,"gp":1.0,"gms_active":1.0},
"1023":{"wind_speed":4.0,"tm_st_snp":26.0,"tm_off_snp":82.0,"tm_def_snp":56.0,"temperature":74.0,"off_snp":82.0,"humidity":66.0,"gs":1.0,"gp":1.0,"gms_active":1.0},
"5300":{"wind_speed":17.0,"tm_st_snp":27.0,"tm_off_snp":80.0,"tm_def_snp":64.0,"temperature":64.0,"st_snp":21.0,"pts_std":9.0,"pts_ppr":9.0,"pts_half_ppr":9.0,"idp_tkl_solo":4.0,"idp_tkl_loss":1.0,"idp_tkl":4.0,"idp_sack":1.0,"idp_qb_hit":2.0,"humidity":100.0,"gp":1.0,"gms_active":1.0,"def_snp":23.0},
"608":{"wind_speed":6.0,"tm_st_snp":20.0,"tm_off_snp":53.0,"tm_def_snp":79.0,"temperature":88.0,"st_snp":4.0,"pts_std":5.5,"pts_ppr":5.5,"pts_half_ppr":5.5,"idp_tkl_solo":4.0,"idp_tkl_loss":1.0,"idp_tkl_ast":1.0,"idp_tkl":5.0,"humidity":78.0,"gs":1.0,"gp":1.0,"gms_active":1.0,"def_snp":56.0},
"3396":{"wind_speed":6.0,"tm_st_snp":20.0,"tm_off_snp":60.0,"tm_def_snp":70.0,"temperature":63.0,"st_snp":19.0,"off_snp":13.0,"humidity":100.0,"gp":1.0,"gms_active":1.0}}
This is a snapshot of an output with a couple thousand lines. Each of the numeric keys that you see above (5381, 1023, 5300, etc) are player IDs for the following stats. I have a table set up with three columns: Player ID, Stat ID, and Stat Value. For example, I need that first snippet to be inserted into my table as such:
Player ID Stat ID Stat Value
5381 wind_speed 4.0
5381 tm_st_snp 26.0
5381 tm_off_snp 74.0
And so on, for each piece of data. But I don't know how to have NiFi select the right pieces of data to insert in the right columns.
I believe that it's possible to use jolt to transform your json into a format:
[
{"playerId":"5381", "statId":"wind_speed", "statValue": 0.123},
{"playerId":"5381", "statId":"tm_st_snp", "statValue": 0.456},
...
]
then use PutDatabaseRecord with json reader.
Another approach is to use ExecuteGroovyScript processor.
Add new parameter to it with name SQL.mydb and link it to your DBCP controller service
And use the following script as Script Body parameter:
import groovy.json.JsonSlurper
import groovy.json.JsonBuilder
def ff=session.get()
if(!ff)return
//read flow file content and parse it
def body = ff.read().withReader("UTF-8"){reader->
new JsonSlurper().parse(reader)
}
def results = []
//use defined sql connection to create a batch
SQL.mydb.withTransaction{
def cmd = 'insert into mytable(playerId, statId, statValue) values(?,?,?)'
results = SQL.mydb.withBatch(100, cmd){statement->
//run through all keys/subkeys in flow file body
body.each{pid,keys->
keys.each{k,v->
statement.addBatch(pid,k,v)
}
}
}
}
//write results as a new flow file content
ff.write("UTF-8"){writer->
new JsonBuilder(results).writeTo(writer)
}
//transfer to success
REL_SUCCESS << ff

How to show Feature Names in Graphviz?

I'm building a tree in Graphviz and I can't seem to be able to get the feature names to show up, I have defined a list with the feature names like so:
names = list(df.columns.values)
Which prints:
['Gender',
'SuperStrength',
'Mask',
'Cape',
'Tie',
'Bald',
'Pointy Ears',
'Smokes']
So the list is being created, later I build the tree like so:
export_graphviz(tree, out_file=ddata, filled=True, rounded=True, special_characters=False, impurity=False, feature_names=names)
But the final image still has the feature names listed like X[]:
How can I get the actual feature names to show up? (Cape instead of X[3], etc.)
I can only imagine this has to do with passing the names as an array of the values. It works fine if you pass the columns directly:
export_graphviz(tree, out_file=ddata, filled=True, rounded=True, special_characters=False, impurity=False, feature_names=df.columns)
If needed, you can also slice the columns:
export_graphviz(tree, out_file=ddata, filled=True, rounded=True, special_characters=False, impurity=False, feature_names=df.columns[5:])

How to Extract the value of resultSet returned from JDBC response (Via MEL) Mule ESB

I have JDBC where I'm calling the stored Procedure, It is returning the response as below, But I'm pretty not sure how to extract the value of result set
Please find the response from DB
{updateCount1=4,resultSet1=[{XML_F5RYI-11YTR=<Customers><Customer1>John<Customer1><Customer2>Ganesh<Customer2><Customers>}],resultSet2[{SequenceNumber=94}],updateCount2=1, updateCount3=4}
I have used the this expression #[message.payload.get(0)], It has return the ResultSet as below, But not exactly value required. I need to take the xml value of XML_F5RYI-11YTR.
{XML_F5RYI-11YTR=<Customers><Customer1>John<Customer1><Customer2>Ganesh<Customer2><Customers>}
Also tried like below
#[message.payload.get(0).XML_F5RYI-11YTR] but getting error , not able to extract the xml.
Could you please suggest how can I extract the xml from the ResultSet1
In most cases, the way you did it should work. I think what is happening here is that the hyphen in the column name is interpreted by the MEL parser as a subtraction. So you could change yours to this syntax, and it should work:
#[message.payload.get(0)['XML_F5RYI-11YTR']]
Also you can omit "message", as payload is resolvable directly:
#[payload.get(0)['XML_F5RYI-11YTR']]
You could use array bracket syntax to access the first row in the result set, instead of the get method:
#[payload[0]['XML_F5RYI-11YTR']]
Finally, you might want to do something for each row returned from the database. If you use a collection-splitter or a for-each, your payload will be the map that represents the row, instead of a list of maps representing the whole result set:
<collection-splitter />
<logger message="#[payload['XML_F5RYI-11YTR']]" />
EDIT
To access the result set in the payload shown in the question, you would need to access it like so:
#[payload.resultSet1[0]['XML_F5RYI-11YTR']]
The database connector gives you a list of maps. The map keys will be the name of the columns. Therefore if you want to get updateCount1, you can use something like this:
#[payload.get('updateCount1')]"
Thump rule - you database connector gives you list of map, not sure what format does it is carry, if you want XML_F5RYI.. value then do the below
[message.payload.get(0)] convert it to json or map from which #[message.payload.get("XML_F5RYI-11YTR")]

Generate dynamic hstore key calls in Prawn

I have an hstore column that I'm using to build a table in Prawn (pdf builder). The data will consist of records for a given month. Since it is hstore, the keys used will likely change from day to day so this needs to be dynamic.
I need to determine:
1 What unique keys are used that month
I created a helper to find the unique keys that were used in the month. These will be used as column headers.
keys(#users_logs)
# this returns an array like - ["XC", "PIC", "Mountain"]
The table will display a users dutylog data for the month. For testing...If I explicitly call known hstore keys...the data displays correctly. But, since its hstore...I wont know what the table column will be in production.
For testing, I call known hstore keys...this creates the prawn table row data per duty log.
#users_logs.map do |dutylog|
[ dutylog.properties["XC"],
dutylog.properties["PIC"],
dutylog.properties["Mountain"]
]
end
But, since this is hstore...I wont know what keys to call in production. So, I need to make the above iteration dynamic.
I tried, without success, to iterate over each dutylog entry, then iterate over each unique key and output one "dutylog.properties[x]" call for each key value...but, this just outputs the array of key values. I tried using send() in the block, but that didnt help.
#users_logs.map do |dutylog|
[ keys(#users_logs).each { |k| dutylog.properties[k] }.join(",") ]
end
Any ideas on how I could make the "dutylog.properties[k]" dynamic?
Took some head scratching...but turning out to be quit easy
This will build the rows for the Prawn table
def hstore_duty_log_rows
[keys(#users_logs)] +
#users_logs.map do |dutylog|
keys(#users_logs).map { |key| dutylog.properties.keys.include?(key) ? "#{dutylog.properties[key]}" : "0" }
end
end