elasticsearch bulk API using pandas - pandas

I have a dataframe that I can import in elasticsearch without any problem. But each row will be created as a new record. I want to save the destination number as ID and update the document with additional records.
from StringIO import StringIO
import pandas as pd
u_cols = ["destination", "status", "batchid", "dateint", "message", "senddate"]
audit_trail = StringIO('''
918968400000 | DELIVRD | abcd_xyz-e89a4ebd3729675c | 20150226103700 | "some company is advertising here" | 2015-04-02 13:12:18
918968400000 | DELIVRD | efgh_xyz-e89a4ebd3729675c | 20160226103700 | "some company is advertising here" | 2016-04-02 13:12:18
8918968400000 | FAILED | abcd_xyz-e89a4ebd3729675c | 20150826103700 | "some company is advertising here" | 2015-08-02 13:12:18
8918968400000 | DELIVRD | xyz_abc-e89a4ebd3729675c | 20140226103700 | "some company is advertising here" | 2014-04-02 13:12:18
918968400000 | FAILED | abcd_pqr-e89a4ebd3729675c | 20150221103700 | "some company is advertising here" | 2015-04-21 13:12:18
''')
df11 = pd.read_csv(audit_trail, sep="|", names = u_cols )
import json
tmp = df11.to_json(orient = "records")
df_json= json.loads(tmp)
mylist=[]
for doc in df_json:
action = { "_index": "myindex3", "_type": "myindex1type", "_source": doc }
mylist.append(action)
import elasticsearch
from elasticsearch import helpers
es = elasticsearch.Elasticsearch('http://23.23.186.196:9200')
helpers.bulk(es, mylist)
In the above case I expect only 2 documents. One document ID 918968400000 with 3 records and the other 8918968400000 with only 2 records. These records will be nested something like this...
doc={"campaigns" : [{"status": "FAILED", "batchid": "abcd_xyz-e89a4ebd3729675c", "dateint": 20150826103700, "message" : "some company is advertising here", "senddate": "2015-08-02 13:12:18"},
{"status": "DELIVRD", "batchid": "xyz_abc-e89a4ebd3729675c", "dateint": 20140226103700, "message" : "some company is advertising here", "senddate": "2014-04-02 13:12:18" }]}
res = es.index(index="test-index", doc_type='tweet', id=8918968400000, body=doc)
I need the pandas dataframe to use bulk API to insert the data as shown above. Is it possible?
Update
I changed the option type from index to update. This stores all the fields those I need. But it does not store them as nested objects.
mylist=[]
for id, doc in enumerate(df_json):
mydoc[id] = {}
mydoc[id]['doc']=doc
mydoc[id]['doc_as_upsert'] = True
action = { "_index": "myindex9", "_type": "myindex1type", "_id": doc['destination'] , "_on_type": "update", "_source": mydoc }
mylist.append(action)
Is there any way to store rows as nested objects?

Related

Splunk : Extracting the elements from JSON structure as separate fields

In Splunk, I'm trying to extract the key value pairs inside that "tags" element of the JSON structure so each one of the become a separate column so I can search through them.
for example :
| spath data | rename data.tags.EmailAddress AS Email
This does not help though and Email field comes as empty.I'm trying to do this for all the tags. Any thoughts/pointers?
{
"timestamp": "2021-10-26T18:23:05.180707Z",
"data": {
"tags": [
{
"key": "Email",
"value": "john.doe#example.com"
},
{
"key": "ProjectCode",
"value": "ABCD"
},
{
"key": "Owner",
"value": "John Doe"
}
]
},
"field1": "random1",
"field2": "random2"
}
I think does what you want:
| spath data.tags{}
| mvexpand data.tags{}
| spath input=data.tags{}
| table key value
| transpose header_field=key
| fields - column
How it works:
| spath data.tags{} takes the json and creates a multi value field that contains each item in the tags array
| mvexpand data.tags{} splits the multi value field into individual events - each one contains one of the items in the tags array
| spath input=data.tags{} takes the json in each event and makes a field for each KVP in that item (key and value in this case)
| table key value limits further commands to these two fields
| transpose header_field=key makes a field for each value of the key field (including one for the field named column)`
| fields - column removes the column field from the output
Here is a fully runnable example:
| makeresults
| eval _raw="
{
\"timestamp\": \"2021-10-26T18:23:05.180707Z\",
\"data\": {
\"tags\": [
{\"key\": \"Email\", \"value\": \"john.doe#example.com\"},
{\"key\": \"ProjectCode\", \"value\": \"ABCD\"},
{\"key\": \"Owner\", \"value\": \"John Doe\"}
]
},
\"field1\": \"random1\",
\"field2\": \"random2\"
}
"
| spath data.tags{}
| mvexpand data.tags{}
| spath input=data.tags{}
| table key value
| transpose header_field=key
It creates this output:
+----------------------+-------------+----------+
| Email | ProjectCode | Owner |
+----------------------+-------------+----------+
| john.doe#example.com | ABCD | John Doe |
+----------------------+-------------+----------+

BigQuery JSON Field Extraction

I have the following JSON payload stored in a single string column in a BQ table.
{
"customer" : "ABC Ltd",
"custom_fields" : [
{
"name" : "DOB",
"value" : "2000-01-01"
},
{
"name" : "Account_Open_Date",
"value" : "2019-01-01"
}
]
}
I am trying to figure out how I can extract the custom_fields name value pairs as columns?
Something like follows.
| Customer.name | Customer.DOB | Customer.Account_Open_Date |
| ABC Ltd | 2000-01-01 | 2019-01-01 |
You can use json-functions , such as
JSON_EXTRACT(json_string_expr, json_path_string_literal)
In your case will be
SELECT
JSON_EXTRACT(json_text, '$.customer') as Customer.Name,
JSON_EXTRACT(json_text, '$.custom_fields[0].value') as Customer.DOB,
JSON_EXTRACT(json_text, '$.custom_fields[1].value') as Customer.Account_Open_Date

Match each using Examples from Scenario Outline

I find using * match each response.xyz super powerful for merging object structure testing with content testing. Is there a way to use it with Examples tables and <placeholder>?
I've got something like this, that I want to use an Examples table on:
* match each response.Services ==
"""
{
"ServiceId" : #present,
"Name" : <Name>,
"Description" : #present,
"InActive" : <Inactive>,
}
"""
Examples:
| ClientId | Name | Status | ErrorCode | Inactive |
| 400152 | "Foxtrot" | 200 | 0 | false |
| 400152 | "Waltz" | 200 | 0 | false |
I get
"Services": [
{
"ServiceId": 3,
"Name": "Waltz",
"Description": "Waltzing like Matilda",
"InActive": false,
},
{
"ServiceId": 4,
"Name": "Foxtrot",
"Description": "",
"InActive": false,
},
back as a response.
Obviously, when I'm using multiple lines in Examples:, it results in several tests.
What I'm looking for is to test each object in the array against predefined values, but without knowing what order they'll show up in. And use the ordered approach like tables produce.
Instead of each, try this:
* match response.Services contains
"""
{
"ServiceId" : #present,
"Name" : <Name>,
"Description" : #present,
"InActive" : <Inactive>,
}
"""
EDIT: okay, an alternate option. By the way there are at least 5 different ways I can think of :P
Scenario:
* table data
| ClientId | Name | Status | ErrorCode | Inactive |
| 400152 | "Foxtrot" | 200 | 0 | false |
| 400152 | "Waltz" | 200 | 0 | false |
* def expected = karate.map(data, function(x){ return { ServiceId: '#present', Name: x.Name, Description: '#present', InActive: x.Inactive} })
* match response.Services contains expected
EDIT2: if you can control the whole table:
Scenario:
* table expected
| Name | InActive | ServiceId | Description |
| "Foxtrot" | false | '#present' | '#present' |
| "Waltz" | false | '#present' | '#present' |
* match response.Services contains expected

How to parse JSON metrics array in Splunk

I receive JSON from API in the following format:
[
{
"scId": "000DD2",
"sensorId": 2,
"metrics": [
{
"s": 5414,
"dateTime": "2018-02-02T13:03:30+01:00"
},
{
"s": 5526,
"dateTime": "2018-02-02T13:04:56+01:00"
},
{
"s": 5631,
"dateTime": "2018-02-02T13:06:22+01:00"
}
}, .... ]
Currently trying to display these metrics on the linear chart with dateTime for the X-axis and "s" for Y.
I use the following search query:
index="main" source="rest://test3" | spath input=metrics{}.s| mvexpand metrics{}.s
| mvexpand metrics{}.dateTime | rename metrics{}.s as s
| rename metrics{}.dateTime as dateTime| table s,dateTime
And I receive the data in the following format which is not applicable for linear chart. The point is - how to correctly parse the JSON to apply date-time from dateTime field in JSON to _time in Splunk.
Query results
#Max Zhylochkin,
Can you please try following search?
index="main" source="rest://test3"
| spath input=metrics{}.s
| mvexpand metrics{}.s
| mvexpand metrics{}.dateTime
| rename metrics{}.s as s
| rename metrics{}.dateTime as dateTime
| table s,dateTime
| eval _time = strptime(dateTime,"%Y-%m-%dT%H:%M:%S.%3N")
Thanks

How to extract a repeated nested field from json string and join with existing repeated nested field in bigquery

I have a table with one nested repeated field called article_id and a string field that contains a json string.
Here is the schema of my table:
Here is an example row of the table:
[
{
"article_id": "2732930586",
"author_names": [
{
"AuN": "h kanahashi",
"AuId": "2591665239",
"AfN": null,
"AfId": null,
"S": "1"
},
{
"AuN": "t mukai",
"AuId": "2607493793",
"AfN": null,
"AfId": null,
"S": "2"
},
{
"AuN": "y yamada",
"AuId": "2606624579",
"AfN": null,
"AfId": null,
"S": "3"
},
{
"AuN": "k shimojima",
"AuId": "2606600298",
"AfN": null,
"AfId": null,
"S": "4"
},
{
"AuN": "m mabuchi",
"AuId": "2606138976",
"AfN": null,
"AfId": null,
"S": "5"
},
{
"AuN": "t aizawa",
"AuId": "2723380540",
"AfN": null,
"AfId": null,
"S": "6"
},
{
"AuN": "k higashi",
"AuId": "2725066679",
"AfN": null,
"AfId": null,
"S": "7"
}
],
"extra_informations": "{
\"DN\": \"Experimental study for improvement of crashworthiness in AZ91 magnesium foam controlling its microstructure.\",
\"S\":[{\"Ty\":1,\"U\":\"https://shibaura.pure.elsevier.com/en/publications/experimental-study-for-improvement-of-crashworthiness-in-az91-mag\"}],
\"VFN\":\"Materials Science and Engineering\",
\"FP\":283,
\"LP\":287,
\"RP\":[{\"Id\":2024275625,\"CoC\":5},{\"Id\":2035451257,\"CoC\":5}, {\"Id\":2141952446,\"CoC\":5},{\"Id\":2126566553,\"CoC\":6}, {\"Id\":2089573897,\"CoC\":5},{\"Id\":2069241702,\"CoC\":7}, {\"Id\":2000323790,\"CoC\":6},{\"Id\":1988924750,\"CoC\":16}],
\"ANF\":[
{\"FN\":\"H.\",\"LN\":\"Kanahashi\",\"S\":1},
{\"FN\":\"T.\",\"LN\":\"Mukai\",\"S\":2},
{\"FN\":\"Y.\",\"LN\":\"Yamada\",\"S\":3},
{\"FN\":\"K.\",\"LN\":\"Shimojima\",\"S\":4},
{\"FN\":\"M.\",\"LN\":\"Mabuchi\",\"S\":5},
{\"FN\":\"T.\",\"LN\":\"Aizawa\",\"S\":6},
{\"FN\":\"K.\",\"LN\":\"Higashi\",\"S\":7}
],
\"BV\":\"Materials Science and Engineering\",\"BT\":\"a\"}"
}
]
In the extra_information.ANF I have an nested array that contains some more author name information.
The nested repeated author_name field has a sub-field author_name.S which can be mapped into extra_informations.ANF.S for a join. Using this mapping I am trying to achieve the following table:
| article_id | author_names.AuN | S | extra_information.ANF.FN | extra_information.ANF.LN|
| 2732930586 | h kanahashi | 1 | H. | Kanahashi |
| 2732930586 | t mukai | 2 | T. | Mukai |
| 2732930586 | y yamada | 3 | Y. | Yamada. |
| 2732930586 | k shimojima | 4 | K. | Shimojima |
| 2732930586 | m mabuchi | 5 | M. | Mabuchi |
| 2732930586 | t aizawa | 6 | T. | Aizawa |
| 2732930586 | k higashi | 7 | K. | Higashi |
The primary problem I faced is that when I convert a json_string using JSON_EXTRACT(extra_information,"$.ANF"), it does not give me an array, instead it gives me the string format of the nested repeated array, which I could not convert into an array.
Is it possible to produce such table using standards-sql in bigquery?
Option 1
This is based on REGEXP_REPLACE function and few more functions (REPLACE, SPLIT, etc.) to manipulate with result. Note - we need extra manipulation because wildcards and filters are not supported in JsonPath expressions in BigQuery?
#standard SQL
SELECT
article_id, author.AuN, author.S,
REPLACE(SPLIT(extra, '","')[OFFSET(0)], '"FN":"', '') FirstName,
REPLACE(SPLIT(extra, '","')[OFFSET(1)], 'LN":"', '') LastName
FROM `table` , UNNEST(author_names) author
LEFT JOIN UNNEST(SPLIT(REGEXP_REPLACE(JSON_EXTRACT(extra_informations, '$.ANF'), r'\[{|}\]', ''), '},{')) extra
ON author.S = CAST(REPLACE(SPLIT(extra, '","')[OFFSET(2)], 'S":', '') AS INT64)
Option 2
To overcome BigQuery "limitation" for JsonPath, you can use custom function as the example below shows:
Note : it uses jsonpath-0.8.0.js that can be downloaded from https://code.google.com/archive/p/jsonpath/downloads and assumed to be uploaded to Google Cloud Storage - gs://your_bucket/jsonpath-0.8.0.js
#standard SQL
CREATE TEMPORARY FUNCTION CUSTOM_JSON_EXTRACT(json STRING, json_path STRING)
RETURNS STRING
LANGUAGE js AS """
try { var parsed = JSON.parse(json);
return jsonPath(parsed, json_path);
} catch (e) { return null }
"""
OPTIONS (
library="gs://your_bucket/jsonpath-0.8.0.js"
);
SELECT
article_id, author.AuN, author.S,
CUSTOM_JSON_EXTRACT(extra_informations, CONCAT('$.ANF[?(#.S==', CAST(author.S AS STRING), ')].FN')) FirstName,
CUSTOM_JSON_EXTRACT(extra_informations, CONCAT('$.ANF[?(#.S==', CAST(author.S AS STRING), ')].LN')) LastName
FROM `table`, UNNEST(author_names) author
As you can see - now you can do all magic in one simple JsonPath