I am using "spath" to read json structure from a log file.
{"failure_reason":null,"gen_flag":"GENERATED","gen_date":"2020-02-15","siteid":"ABC","_action":"Change","order":"123"}
I am able to parse above json.
However, "spath" function is not able to read nested array inside that json:
{"failure_reason":"[{"module":"Status Report","reason":"Status Report is not available","statusCode":"503"}]","gen_flag":"GENERATED_PARTIAL","gen_date":"2020-02-15","siteid":"ABC","_action":"Change","wonum":"321"}.
please help!
Your event is not valid JSON. A JSON array should not be surrounded by "s.
Copy your event into any of the following JSON validators, and confirm that it is incorrect.
https://jsonformatter.curiousconcept.com/
https://jsonlint.com/
https://jsonformatter.org/
Now, try with the corrected event.
{"failure_reason":[{"module":"Status Report","reason":"Status Report is not available","statusCode":"503"}],"gen_flag":"GENERATED_PARTIAL","gen_date":"2020-02-15","siteid":"ABC","_action":"Change","wonum":"321"}
You can see that spath works correctly with the modified JSON with the following search.
| makeresults
| eval raw="{\"failure_reason\":[{\"module\":\"Status Report\",\"reason\":\"Status Report is not available\",\"statusCode\":\"503\"}],\"gen_flag\":\"GENERATED_PARTIAL\",\"gen_date\":\"2020-02-15\",\"siteid\":\"ABC\",\"_action\":\"Change\",\"wonum\":\"321\"}"
| spath input=raw
If you need a way to pre-process your event to remove the "s from the array, you may be able to try the following, which may remove the extra "s. This is really dependent on the structure of the event, and may not be 100%, but should be enough to get you started. Try to fix the format of the event at the source.
| makeresults | eval raw="{\"failure_reason\":\"[{\"module\":\"Status Report\",\"reason\":\"Status Report is not available\",\"statusCode\":\"503\"}]\",\"gen_flag\":\"GENERATED_PARTIAL\",\"gen_date\":\"2020-02-15\",\"siteid\":\"ABC\",\"_action\":\"Change\",\"wonum\":\"321\"}"
| rex mode=sed field=raw "s/\"\[/[/" | rex mode=sed field=raw "s/\]\"/]/"
| spath input=raw
Related
I get Request body data from excel file.
I have already covert excel to csv format.
I have kind of able to find a solution but it is not working 100% as jsonbody format in not fetching data correctly is shows forward slash in csv import data from runner collections.
Request Body
{{jsonBody}}
Set Global variables jsonBody
Run collection select data file as csv file as per screenshot request body shows with forward slash.
After running the collection I'm getting body incorrect version with forward slash.
This below screenshot show correct version on csv data I require to remove forward slash from csv data
I had similar issue with postman and realized my problem was more of a syntax issue.
Lets say your cvs file has the following columns:
userId | mid | platform | type | ...etc
row1 94J4J | 209444894 | NORTH | PT | ...
row2 324JE | 934421903 | SOUTH | MB | ...
row3 966RT | 158739394 | EAST | PT | ...
This is how you want your json request body to look like:
{
"userId" : "{{userId}}",
"mids":[{
"mid":"{{mid}}",
"platform":"{{platform}}"
}],
"type":["{{type}}"],
.. etc
}
Make sure your colums names match the varibales {{variableName}}
The data coming from CSV is already in a stringified format so you don't need to do anything in pre-request.
example:
let csv be
| jsonBody |
| {"name":"user"}|
Now in postman request just use:
{{jsonBody}}
as {{column_name}} will be considered as data varaible so , in your case {{jsonBody}}
csv:
make sure you save this as csv file :
Now in request use :
output:
if you want to add the json body as value of another then just use :
Output:
My homework is giving me a hard time with pyspark. I have this view of my "df2" after a groupBy:
df2.groupBy('years').count().show()
+-----+-----+
|years|count|
+-----+-----+
| 2003|11904|
| 2006| 3476|
| 1997| 3979|
| 2004|13362|
| 1996| 3180|
| 1998| 4969|
| 1995| 1995|
| 2001|11532|
| 2005|11389|
| 2000| 7462|
| 1999| 6593|
| 2002|11799|
+-----+-----+
Every attempt to save this (and then load with pandas) to a file gives back the original source data text file form I read with pypspark with its original columns and attributes, only now its .csv but that's not the point.
What can I do to overcome this ?
For your concern I do not use SparkContext function in the begining of the code, just plain "read" and "groupBy".
df2.groupBy('years').count().write.csv("sample.csv")
or
df3=df2.groupBy('years').count()
df3.write.csv("sample.csv")
both of them will create sample.csv in your working directory
You can assign the results into a new dataframe results, and then write the results to a csv file. Note that there are two ways to output the csv. If you use spark you need to use .coalesce(1) to make sure only one file is outputted. The other way is to convert .toPandas() and use to_csv() function of pandas DataFrame.
results = df2.groupBy('years').count()
# writes a csv file "part-xxx.csv" inside a folder "results"
results.coalesce(1).write.csv("results", header=True)
# or if you want a csv file, not a csv file inside a folder (default behaviour of spark)
results.toPandas().to_csv("results.csv")
I have a few .txt files with data in JSON to be loaded to google BigQuery table. Along with the columns in the text files I will need to insert filename and current timestamp for each rows. It is in GCP Dataflow with Python 3.7
I accessed the Filemetadata containing the filepath and size using GCSFileSystem.match and metadata_list.
I believe I need to get the pipeline code to run in a loop, pass the filepath to ReadFromText, and call a FileNameReadFunction ParDo.
(p
| "read from file" >> ReadFromText(known_args.input)
| "parse" >> beam.Map(json.loads)
| "Add FileName" >> beam.ParDo(AddFilenamesFn(), GCSFilePath)
| "WriteToBigQuery" >> beam.io.WriteToBigQuery(known_args.output,
write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND)
)
I followed the steps in Dataflow/apache beam - how to access current filename when passing in pattern? but I can't make it quite work.
Any help is appreciated.
You can use textio.ReadFromTextWithFilename instead of ReadFromText. That will produce a PCollection of (filename,line) tuples.
To include the file and timestamp in your output json record, you could change your "parse" line to
| "parse" >> beam.map(lambda (file, line): {
**json.loads(line),
"filename": file,
"timestamp": datetime.now()})
Trying to graph Bandwidth consumed using Azure Log Analytics
Perf
| where TimeGenerated > ago(1d)
| where CounterName contains "Network Send"
| summarize sum(CounterValue) by bin(TimeGenerated, 1m), _ResourceId
| render timechart
This generates a reasonable chart except the y axis runs from 0 - 15,000,000,000. I tried
Perf
| where TimeGenerated > ago(1d)
| where CounterName contains "Network Send"
| extend MeB_bandwidth_out = todouble(CounterValue)/1,048,576
| summarize sum(MeB_bandwidth_out) by bin(TimeGenerated, 1m), _ResourceId
| render timechart
but I get exactly the same chart. I've tried without the todouble(), or doing it after the division, but nothing changes. Any hint why this is not working?
A bit hard to say without seeing a sample of the data, but here are a couple of idea:
Try removing the commas from 1,048,576
If this doesn't work, remove the last line from both queries and compare the results, and run them to see why the data doesn't make sense
P.S. Regardless, there's a good chance that you can replace contains with has to significantly improve the performance (note that has looks for full words, while contains doesn't - so they are not the same, be careful).
I am trying to run a search against all hosts but I am having difficulty figuring out the right approach. A simplified version of what I am looking for is:
index=os sourcetype=df host=system323 mount=/var | streamstats range(storage_used) as storage_growth window=2
But ultimately I want it to search all mount points on all hosts and then send that to a chart or a report.
I tried a few different approaches but none of them gave me the expected results. I felt like I was on the right path with sub-searches, because it felt like the equivalent of a for loop but it did not yield the expected results
index=os sourcetype=df [search index=os sourcetype=df [search index=os sourcetype=df earliest=-1d#d latest=now() | stats values(host) AS host] earliest=-1d#d latest=now() | stats values(mount) AS mount] | streamstats range(storage_used) as storage_growth window=2
How can I take my first search an build a report that will include all hosts and mount points?
Much simpler than sub-searches. Just use a by clause in your streamstats:
index=os sourcetype=df
| eval mountpoint=host+":"+mount
| streamstats range(storage_used) as storage_growth by mountpoint window=2
| table _time,mountpoint,storage_growth