This is what I need to do..
I have a textfile and parse it using awk. The output should be in json format. It should look like this:
{
"Record X" : { "Key1":"Value1", "Key2":"Value2"},
"Record Y" : { "Key1":"Value1", "Key2":"Value2"},
"Record Z" : { "Key1":"Value1", "Key2":"Value2"},
"Record A" : { "Key1":"Value1", "Key2":"Value2"}
}
Now, this is how the content of textfile looks like:
Record X
Key1 is Value1, Key2 is Value2
Record Y
Key1 is Value1, Key2 is Value2
Record Z
Key1 is Value1, Key2 is Value2
Record A
Key1 is Value1, Key2 is Value2
I tried creating a script to produce the output that I want, I'm in the first part however Im already stuck with printing the line. This is my script:
awk
'BEGIN { print "{" }
{ if($0 ~ /^Record /){print "\"" $0 "\":" }}
END { print "}" }' myRecord.txt
And the output is this..
{
":ecord X
":ecord Y
":ecord Z
":ecord A
}
I do not understand why that kind of script will produce something like that.
Kindly tell me what's wrong. thank you!
Here is another awk without using getline
awk -F"[ ,]*" 'BEGIN {print "{"} /^Record/ {a=$0;next} {print "\""a"\" : { \""$2"\":\""$4"\", \""$5"\":\""$7"\"},"} END {print "}"}'
{
"Record X" : { "Key1":"Value1", "Key2":"Value2"},
"Record Y" : { "Key1":"Value1", "Key2":"Value2"},
"Record Z" : { "Key1":"Value1", "Key2":"Value2"},
"Record A" : { "Key1":"Value1", "Key2":"Value2"},
}
If you get problems with last , you can do like this:
awk -F"[ ,]*" -v f=$(cat file | wc -l) 'BEGIN {print "{"} /^Record/ {a=$0;next} {print "\""a"\" : { \""$2"\":\""$4"\", \""$5"\":\""$7"\"}"(NR==f?"":",")} END {print "}"}' file
{
"Record X" : { "Key1":"Value1", "Key2":"Value2"},
"Record Y" : { "Key1":"Value1", "Key2":"Value2"},
"Record Z" : { "Key1":"Value1", "Key2":"Value2"},
"Record A" : { "Key1":"Value1", "Key2":"Value2"}
}
Or all in only awk
awk -F"[ ,]*" 'BEGIN {print "{"} FNR==NR {f=NR;next} /^Record/ {a=$0;next} {print "\""a"\" : { \""$2"\":\""$4"\", \""$5"\":\""$7"\"}"(FNR==f?"":",")} END {print "}"}' file{,}
{
"Record X" : { "Key1":"Value1", "Key2":"Value2"},
"Record Y" : { "Key1":"Value1", "Key2":"Value2"},
"Record Z" : { "Key1":"Value1", "Key2":"Value2"},
"Record A" : { "Key1":"Value1", "Key2":"Value2"}
}
Your main problem is that your input file was created on Windows and so has control-Ms at the end of each line causing corruption when printing the lines. Remove them with dos2unix or similar before running your script. Do NOT use any getline solution suggested below as that would be the wrong approach and introduce a lot of caveats and complexity (see http://awk.info/?tip/getline).
Try this:
$ cat tst.awk
BEGIN{ print "{" }
NR%2 { id = $0; next }
{
sub(/^ +/,"")
gsub(/ is /,"\":\"")
gsub(/, /,"\", \"")
printf "%s\"%s\" : { \"%s\"}", (c++?",\n":""), id, $0
}
END{ print "\n}" }
.
$ awk -f tst.awk file
{
"Record X" : { "Key1":"Value1", "Key2":"Value2"},
"Record Y" : { "Key1":"Value1", "Key2":"Value2"},
"Record Z" : { "Key1":"Value1", "Key2":"Value2"},
"Record A" : { "Key1":"Value1", "Key2":"Value2"}
}
Using your flow logic:
awk 'BEGIN { print "{" }
/^Record /{
if (c){printf ",\n"}
printf("\"%s\":",$0);next}
{
gsub("is",":")
gsub(" *","\"")
printf(" {%s\"}",$0)
c++
}
END { print "\n}" }' infile
You could do this through awk's getline function,
$ awk 'BEGIN{printf "{\n"}/^Record/{var=$0; getline; w=$1; x=$3; y=$4; z=$6;}{printf "\""var"\"" " : { ""\""w"\""":\""x"\", \""y"\":\""z"\"},\n"} END{printf "}\n"}' file
{
"Record X" : { "Key1":"Value1,", "Key2":"Value2"},
"Record Y" : { "Key1":"Value1,", "Key2":"Value2"},
"Record Z" : { "Key1":"Value1,", "Key2":"Value2"},
"Record A" : { "Key1":"Value1,", "Key2":"Value2"},
}
Through GNU awk's gsub function,
$ awk -v RS="Record" 'BEGIN{print "{"} gsub(/\n/,"",$0){gsub(/.$/,"",$4); print "\""RS" "$1"\" : { \""$2"\":\""$4"\", \""$5"\":\""$7"\"},"} END{print "}"}' file
{
"Record X" : { "Key1":"Value1", "Key2":"Value2"},
"Record Y" : { "Key1":"Value1", "Key2":"Value2"},
"Record Z" : { "Key1":"Value1", "Key2":"Value2"},
"Record A" : { "Key1":"Value1", "Key2":"Value2"},
}
Related
I want to update 3 fields in my Array of paylaod.
TotalSpendamount
price
lineAmount.
My script is as follows;
%dw 2.0
output application/json
---
payload update {
case .IntegrationEntities.integrationEntity -> $ map {
($ update {
case .integrationEntityDetails.contractUtilization.items.item -> $ map {
($ update {
case .price -> if ( $ as Number < 1 ) "0" ++ $ else $
case .lineAmount -> if ( $ as Number < 1 ) "0" ++ $ else $
})
}
case totalSpendAmount at .integrationEntityDetails.contractUtilization -> totalSpendAmount update
{
case totalSpendAmount at .totalSpendAmount -> if ( totalSpendAmount as Number < 1 ) "0" ++ totalSpendAmount else totalSpendAmount
}
})
}
}
If I run above script, only totalspendAmount' is getting update.If I remove the 'totalspendAmount' case block, my 'price and lineamount fields are updating correctly.
What is wrong in my script?
My payload is;
{
"IntegrationEntities": {
"integrationEntity": [
{
"integrationEntityHeader": {
"integrationTrackingNumber": "XXXX",
"referenceCodeForEntity": "132804",
"additionalInfo": "ADDITIONALINFO"
},
"integrationEntityDetails": {
"contractUtilization": {
"externalId": "417145",
"utilizationType": "INVOICE",
"isDelete": "No",
"documentNumber": "132804",
"documentDescription": "",
"documentDate": "2021-03-26",
"totalSpendAmount": ".92",
"documentCurrency": "AUD",
"createdBy": "Oracle Integration",
"status": "FULLY PAID",
"items": {
"item": [
{
"lineItemId": "132804_1",
"contractNumber": "YYYYYYY",
"contractLineId": "",
"lineNumber": "1",
"name": "132804",
"description": "132804",
"quantity": "1",
"price": ".92",
"lineAmount": ".92",
"purchaseOrderNumber": "YYYYYY",
"purchaseOrderDescription": ""
},
{
"lineItemId": "132804_2",
"contractNumber": "YYYYYYY",
"contractLineId": "",
"lineNumber": "1",
"name": "132804",
"description": "132804_2",
"quantity": "1",
"price": ".95",
"lineAmount": ".95",
"purchaseOrderNumber": "YYYYYY",
"purchaseOrderDescription": ""
}
]
}
}
}
}
]
}
}
The output I look for is;
{
"IntegrationEntities": {
"integrationEntity": [
{
"integrationEntityHeader": {
"integrationTrackingNumber": "XXXX",
"referenceCodeForEntity": "132804",
"additionalInfo": "ADDITIONALINFO"
},
"integrationEntityDetails": {
"contractUtilization": {
"externalId": "417145",
"utilizationType": "INVOICE",
"isDelete": "No",
"documentNumber": "132804",
"documentDescription": "",
"documentDate": "2021-03-26",
"totalSpendAmount": "0.92",
"documentCurrency": "AUD",
"createdBy": "Oracle Integration",
"status": "FULLY PAID",
"items": {
"item": [
{
"lineItemId": "132804_1",
"contractNumber": "YYYYYYY",
"contractLineId": "",
"lineNumber": "1",
"name": "132804",
"description": "132804",
"quantity": "1",
"price": "0.92",
"lineAmount": "0.92",
"purchaseOrderNumber": "YYYYYY",
"purchaseOrderDescription": ""
},
{
"lineItemId": "132804_2",
"contractNumber": "YYYYYYY",
"contractLineId": "",
"lineNumber": "1",
"name": "132804",
"description": "132804_2",
"quantity": "1",
"price": "0.95",
"lineAmount": "0.95",
"purchaseOrderNumber": "YYYYYY",
"purchaseOrderDescription": ""
}
]
}
}
}
}
]
}
}
Try with this script:
%dw 2.0
output application/json
---
payload.IntegrationEntities.integrationEntity.integrationEntityDetails.contractUtilization map ((cu, index) -> cu update {
case .totalSpendAmount if ($ as Number < 1) -> "0" ++ $
case .items.item -> $ map {
($ update {
case .price -> if ( $ as Number < 1 ) "0" ++ $ else $
case .lineAmount -> if ( $ as Number < 1 ) "0" ++ $ else $
})
}
})
Updated Scripts:
Approach 1
%dw 2.0
output application/json
---
payload update {
case .IntegrationEntities.integrationEntity -> $ map {
($ update {
case .integrationEntityDetails.contractUtilization-> $ update {
case .totalSpendAmount -> if ($ as Number < 1) "0" ++ $ else $
case .items.item -> $ map ((cuItem,index) -> cuItem update {
case .price -> if ( $ as Number < 1 ) "0" ++ $ else $
case .lineAmount -> if ( $ as Number < 1 ) "0" ++ $ else $
})
}
} )}
}
Approach 2
%dw 2.0
output application/json
---
payload update {
case .IntegrationEntities.integrationEntity[0].integrationEntityDetails.contractUtilization-> $ update {
case .totalSpendAmount -> if ($ as Number < 1) "0" ++ $ else $
case .items.item -> $ map ((cuItem,index) -> cuItem update {
case .price -> if ( $ as Number < 1 ) "0" ++ $ else $
case .lineAmount -> if ( $ as Number < 1 ) "0" ++ $ else $
})
}
}
I have a file with a single line. I want to replace the text between positions 188 (inclusive) to 197 with the system date (YYYY-MM-DD).
I tried this but it doesn't work:
sed 's/\(.\{188\}\)\([0-9-]\{10\}\)\(.*\)/\1$(date '+%Y-%m-%d')\188/g'
I want to use sed or anything else that works in a shell script.
The input file is:
{ "agent": { "run_as_user": "root" }, "logs": { "logs_collected": { "files": { "collect_list": [ { "file_path": "/home/ec2-user/logs/**", "log_group_name": "Staging", "log_stream_name": "2020-10-24", "timestamp_format": "[%Y-%m-%d %H:%M:%S]" } ] } } } }
. . . and in the output, I want to change only the the date as shown below.
{ "agent": { "run_as_user": "root" }, "logs": { "logs_collected": { "files": { "collect_list": [ { "file_path": "/home/ec2-user/logs/**", "log_group_name": "Staging", "log_stream_name": "2020-10-25", "timestamp_format": "[%Y-%m-%d %H:%M:%S]" } ] } } } }
Could you please try following, written as per shown attempts of OP in GNU awk.
awk -v date=$(date +%Y-%m-%d) '{print substr($0,1,187) date substr($0,198)}' Input_file
I want to be able to extract content from a PDF file and to be able to search within that content using ElasticSearch.
I did install elasticsearch/elasticsearch-mapper-attachments/2.6.0
I have created a new index named "docs".
I did create a file named "tmp.json" with that content :
{"title": "file.pdf", "file": "IkdvZCBTYXZlIHRoZSBRdWVlbiIgKGFsdGVybmF0aXZlbHkgIkdvZCBTYXZlIHRoZSBLaW5nIg=="}
I did execute the following :
curl -X PUT "http://localhost:9200/docs/attachment/_mapping" -d '{
"attachment": {
"properties" : {
'file" : {
"type" : "attachment",
"fields" : {
"title" : {"store":"yes"},
"file":{
"type":"string",
"term_vector":"with_positions_offsets",
"store":"yes"}
}
}
}
}
}'
and the following :
curl -X POST "http://localhost:9200/docs/attachment" -d #tmp.json
The problem is that the content is stored as it is in the file.
I was expecting the content to be decoded, like so :
base64.b64decode("IkdvZCBTYXZlIHRoZSBRdWVlbiIgKGFsdGVybmF0aXZlbHkgIkdvZCBTYXZlIHRoZSBLaW5nIg==")
That gives :
b'"God Save the Queen" (alternatively "God Save the King"'
To encode in base64, here what I do :
import json, base64
file64 = base64.b64encode(open('file.pdf', "rb").read()).decode('ascii')
f = open('tmp.json', 'w')
data = {"file":file64, "title":fname}
json.dump(data,f)
f.close()
I would like to be able to see the content using kibana (but for now I see only the base64 data ...)
This didn't work :
curl -X PUT "http://localhost:9200/docs/attachment/_mapping" -d '{
"attachment": {
"properties" : {
"content" : {
"type" : "attachment",
"fields" : {
"title" : {"store":"yes"},
"content":{
"type":"string",
"term_vector":"with_positions_offsets",
"store":"yes"}
}
}
}
}
}'
This worked, and I can see the content of the PDF through Kibana :
curl -X PUT "http://localhost:9200/docs" -d '{
"mappings" : {
"attachment" : {
"properties" : {
"content" : {
"type" : "attachment",
"fields" : {
"content" : { "store" : "yes" },
"author" : { "store" : "yes" },
"title" : { "store" : "yes"},
"date" : { "store" : "yes" },
"keywords" : { "store" : "yes", "analyzer" : "keyword" },
"name" : { "store" : "yes" },
"content_length" : { "store" : "yes" },
"content_type" : { "store" : "yes" }
}
}
}
}
}
}'
I'm new to elastic search, and I'm having a hard time with the analyzers.
I am creating an index like this (to replicate my problem, you can copy and paste the follwoing code in your console directly.)
Please read comments in the script for my problem and questions.
#!/bin/bash
# fails if the index doesn't exist but that's OK
curl -XDELETE 'http://localhost:9200/movies/'
# creating the index that will allow type wrapper, and generate _id automatically from the path
curl -XPOST http://localhost:9200/movies -d '{
"settings" : {
"number_of_shards" : 1,
"mapping.allow_type_wrapper" : true,
"analysis": {
"analyzer": {
"en_std": {
"type":"standard",
"stopwords": "_english_"
}
}
}
},
"mappings" : {
"movie" : {
"_id" : {
"path" : "movie.id"
}
}
}
}'
# inserting some data
curl -XPOST http://localhost:9200/movies/movie -d '{
"movie" : {
"id" : 101,
"title" : "Bat Man",
"starring" : {
"firstname" : "Christian",
"lastname" : "Bale"
}
}
}'
#trying to get by ID ... \m/ works!!!
curl -XGET http://localhost:9200/movies/movie/101
# tryign to search using query_string ... \m/ works
curl -XPOST http://localhost:9200/movies/movie/_search -d '{
"query" : {
"query_string" : {
"query" : "bat"
}
}
}'
# when i try to search in a paricular field it fails. returns 0 hits
curl -XPOST http://localhost:9200/movies/_search -d '{
"query" : {
"query_string" : {
"query" : "bat",
"fields" : ["movie.title"]
}
}
}'
#I thought the analyzer was the problem, so i checked.
curl 'http://localhost:9200/movies/movie/_search?pretty=true' -d '{
"query" : {
"query_string" : {
"query" : "bat"
}
},
"script_fields": {
"terms" : {
"script": "doc[field].values",
"params": {
"field": "movie.title"
}
}
}
}'
# The field wasn't analyzed.
# the follwoing is the result
#{
# "took" : 1,
# "timed_out" : false,
# "_shards" : {
# "total" : 1,
# "successful" : 1,
# "failed" : 0
# },
# "hits" : {
# "total" : 1,
# "max_score" : 0.13424811,
# "hits" : [ {
# "_index" : "movies",
# "_type" : "movie",
# "_id" : "101",
# "_score" : 0.13424811,
# "fields" : {
# "terms" : [ "Bat Man" ]
# }
# } ]
# }
#}
# So i even tried the term as such... Nope didn't work :( 0 hits.
curl -XPOST http://localhost:9200/movies/_search -d '{
"query" : {
"query_string" : {
"query" : "Bat Man",
"fields" : ["movie.title"]
}
}
}'
Can anyone point out what i'm doing wrong?
You should insert a sleep 1 command right after inserting the doc and everything will work.
Elasticsearch provides search in near real-time (read this). Everytime you index a document, the Lucene index is not updated (refreshed in terms of Elasticsearch) immediately. How frequently your index is refreshed is configurable on Index level. You can also forcefully refresh the index by passing the query parameter refresh=true with every HTTP request, which will make ES update the index. But you may start suffering on the performance because of that depending upon your requirement.
There is a Refresh API as well.
BEGIN {
q = "\""
FS = OFS = q ", " q
}
{
split($1, arr, ": " q)
for(i in arr ) {
if(arr[i] == "name") {
gsub(q, "'", arr[i+1])
# print arr[1] ": " q arr[2], $2, $3
}
}
}
I have a json file, some data like this:
{"last_modified": {"type": "/type/datetime", "value": "2008-04-01T03:28:50.625462"}, "type": {"key": "/type/author"}, "name": "National Research Council. Committee on the Scientific and Technologic Base of Puerto Rico"s Economy.", "key": "/authors/OL2108538A", "revision": 1}
The name's value have a double quote, I only want to replace the double quote to single quote , not the all double quote, please tell me how to fix it?
awk '{for(i=1;i<=NF;i++) if($i~/name/){ gsub("\042","\047",$(i+1)) }}1' file