parsing and incrementing date in json file with jq - awk

I would like to update multiple dates in a json file.
My input json contains many properties but the following is the extracted part that matters: I want to parse the date in metadata, here 2022-07-27, replace it by today's date (e.g. 2022-08-05), and set the delta (here 9 days), and add that delta to all other date found in "data_1h" / "time"
edit: (first forgotten) I need also that metadata's date get eventually replaced by today's date.
I could wrote a small tool in any language, but I would like a linux script that can be run from a gitlab pipeline. It is about preparing mockdata for some tests.
So I started fighting with jq, awk or sed, but am a bit confused there. Maybe an experienced jq guy would find the solution immediately?
{
"metadata":
{
"modelrun_utc": "2022-07-27 00:00",
"modelrun_updatetime_utc": "2022-07-27 07:27"
},
"data_1h":
{
"time": ["2022-07-27 00:00", "2022-07-27 01:00", "2022-08-03 11:00", "2022-08-03 12:00", "2022-08-03 13:00", "2022-08-03 14:00"]
}
}
Any idea?
pseudo code would be:
base_date_str=$(jq .metadata.modelrun_utc $1)
echo $base_date_str
base_date=$(date -d $base_date_str)
today=$(date)
delta=$base_date-$today
input-data=$(jq .data_1h.time $1)
foreach (s in $input-data)
# transform s to date d, add delta to d, replace s by d in output string
replace modelrun_utc modelrun_updatetime_utc by today's date only, keeping the time.
# write output json
How does this look like in real shell commands?
Expected output:
{
"metadata": {
"modelrun_utc": "2022-08-05 00:00",
"modelrun_updatetime_utc": "2022-08-05 07:27"
},
"data_1h": {
"time": [
"2022-08-05 00:00",
"2022-08-05 01:00",
"2022-08-12 11:00",
"2022-08-12 12:00",
"2022-08-12 13:00",
"2022-08-12 14:00"
]
}
}

Here's one way using jq logic, not shell commands:
jq '(
.metadata.modelrun_utc | strptime("%Y-%m-%d %H:%M")
| (now - mktime) / (24 * 60 * 60)
) as $diffdays | .data_1h.time[] |= (
strptime("%Y-%m-%d %H:%M") | .[2] += $diffdays
| mktime | strftime("%Y-%m-%d %H:%M")
)'
{
"metadata": {
"modelrun_utc": "2022-07-27 00:00",
"modelrun_updatetime_utc": "2022-07-27 07:27"
},
"data_1h": {
"time": [
"2022-08-05 00:00",
"2022-08-05 01:00",
"2022-08-12 11:00",
"2022-08-12 12:00",
"2022-08-12 13:00",
"2022-08-12 14:00"
]
}
}
Demo

Related

Querying Line Items of Order with JSON Functions in BigQuery

I am banging my head head here for the past 2 hours with all the available JSON_... functions in BigQuery. I've read quite a few questions here but no matter why I try, I never succeed in extracting the "amounts" from my JSON below.
This is my JSON stored in a BQ column:
{
"lines": [
{
"id": "70223039-83d6-463d-a482-7ce4d50bf0fc",
"charges": [
{
"type": "price",
"amount": 50.0
},
{
"type": "discount",
"amount": -40.00
}
]
},
{
"id": "70223039-83d6-463d-a482-7ce4d50bf0fc",
"charges": [
{
"type": "price",
"amount": 20.00
},
{
"type": "discount",
"amount": 0.00
}
]
}
]
}
Imagine the above being an order containing multiple items.
I am trying to get a sum of all amounts => 50-40+20+0. The result needs to be 30 = the total order price.
Is it possible to pull all the amount values and then have them summed up just via SQL without any custom JS functions? I guess the summing is the easy part - getting the amounts into an array is the challenge here.
Use below
select (
select sum(cast(json_value(charge, '$.amount') as float64))
from unnest(json_extract_array(order_as_json, '$.lines')) line,
unnest(json_extract_array(line, '$.charges')) charge
) total
from your_table
if applied to sample data in y our question - output is

AWS Cloudwatch Logs Insights: Query into array

I have a Log Group with this kind of messages.
{
"m": [
{
"id": "5b6973c7c86e8689368b4569",
"ts": 1634112000.062
},
{
"id": "6116d21e02e38f5045079c42",
"ts": 1634120807.402
},
{
"id": "60c368ff1085fc0d546fad52",
"ts": 1634120807.512
},
{
"id": "6053536817a46610797ed472",
"ts": 1634120809.249
}
]
}
I want to run a query over the field m.*.ts (It's an array). Something like this...
fields #message
| filter (m.*.ts > 1634112000.062 and m.*.ts < 1634120807.000 )
It's posible?
fields #message
| parse #message "[*] *" as id, ts
| filter (ts > 1634112000.062 and ts < 1634120807.000)
Hi I don't know what format you want, so try this and you can adapt it, many more samples here on AWS
Option 1: helps you break it down in steps to debug
fields #message
|"[*] *" as id, ts
| filter ts > 1634112000.062
| filter ts < 1634120807.000
Option 2:
fields #message
| parse #message '[] * {"*"}' as id, ts
| filter (ts > 1634112000.062 and ts < 1634120807.000)

How to load a jsonl file into BigQuery when the file has mix data fields as columns

During my work flow, after extracting the data from API, the JSON has the following structure:
[
{
"fields":
[
{
"meta": {
"app_type": "ios"
},
"name": "app_id",
"value": 100
},
{
"meta": {},
"name": "country",
"value": "AE"
},
{
"meta": {
"name": "Top"
},
"name": "position",
"value": 1
}
],
"metrics": {
"click": 1,
"price": 1,
"count": 1
}
}
]
Then it is store as .jsonl and put on GCS. However, when I load it onto BigQuery for further extraction, the automatic schema inference return the following error:
Error while reading data, error message: JSON parsing error in row starting at position 0: Could not convert value to string. Field: value; Value: 100
I want to convert it in to the following structure:
app_type
app_id
country
position
click
price
count
ios
100
AE
Top
1
1
1
Is there a way to define manual schema on BigQuery to achieve this result? Or do I have to preprocess the jsonl file before put it to BigQuery?
One of the limitations in loading JSON data from GCS to BigQuery is that it does not support maps or dictionaries in JSON.
A invalid example would be:
"metrics": {
"click": 1,
"price": 1,
"count": 1
}
Your jsonl file should be something like this:
{"app_type":"ios","app_id":"100","country":"AE","position":"Top","click":"1","price":"1","count":"1"}
I already tested it and it works fine.
So wherever you process the conversion of the json files to jsonl files and storage to GCS, you will have to do some preprocessing.
Probably you have to options:
precreate target table with an app_id field as an INTEGER
preprocess jsonfile and enclose 100 into quotes like "100"

JSON Parsing in Snowflake - Square Brackets At Start

I'm trying to parse out some JSON files in snowflake. In this case, I'd like to extract the "gift card" from the line that has "fulfillment_service": "gift_card". I've had success querying one dimensional JSON data, but this - with the square brackets - is confounding me.
Here's my simple query - I've created a small table called "TEST_WEEK"
select line_items:fulfillment_service
from TEST_WEEK
, lateral flatten(FULFILLMENTS:line_items) line_items;
Hopefully this isn't too basic a question. I'm very new with parsing JSON.
Thanks in advance!
Here's the start of the FULLFILLMENTS field with the info I want to get at.
[
{
"admin_graphql_api_id": "gid://shopify/Fulfillment/2191015870515",
"created_at": "2020-08-10T14:54:38Z",
"id": 2191015870515,
"line_items": [
{
"admin_graphql_api_id": "gid://shopify/LineItem/5050604355635",
"discount_allocations": [],
"fulfillable_quantity": 0,
"fulfillment_service": "gift_card",
"fulfillment_status": "fulfilled",
"gift_card": true,
"grams": 0,
"id": 5050604355635,
"name": "Gift Card - $100.00",
"origin_location": {
"address1": "100 Indian Road",
"address2": "",
"city": "Toronto",
"country_code": "CA",
Maybe you can use two lateral flatten to process values in line_items array:
Sample table:
create table TEST_WEEK( FULFILLMENTS variant ) as
select parse_json(
'[
{
"admin_graphql_api_id": "gid://shopify/Fulfillment/2191015870515",
"created_at": "2020-08-10T14:54:38Z",
"id": 2191015870515,
"line_items": [
{
"admin_graphql_api_id": "gid://shopify/LineItem/5050604355635",
"discount_allocations": [],
"fulfillable_quantity": 0,
"fulfillment_service": "gift_card",
"fulfillment_status": "fulfilled",
"gift_card": true,
"grams": 0,
"id": 5050604355635,
"name": "Gift Card - $100.00",
"origin_location": {
"address1": "100 Indian Road",
"address2": "",
"city": "Toronto",
"country_code": "CA"
}
}
]
}
]');
Sample query:
select s.VALUE:fulfillment_service
from TEST_WEEK,
lateral flatten( FULFILLMENTS ) f,
lateral flatten( f.VALUE:line_items ) s;
The output:
+-----------------------------+
| S.VALUE:FULFILLMENT_SERVICE |
+-----------------------------+
| "gift_card" |
+-----------------------------+
Those square brackets indicate that you have an array of JSON objects in your FULLFILLMENTS field. Unless there is a real need to have an array of objects in one field you should have a look at the STRIP_OUTER_ARRAY property of the COPY command. An example can be found here in the Snowflake documentation:
copy into <table>
from #~/<file>.json
file_format = (type = 'JSON' strip_outer_array = true);
In case others are stuck with same data issue (all json data in one array), I have this solution:
select f.VALUE:fulfillment_service::string
from TEST_WEEK,
lateral flatten( FULFILLMENTS[0].line_items ) f;
With this, you just grab the first element of the array (which is the only element).
If you have nested array elements, just add this to the lateral flatten:
, RECURSIVE => TRUE, mode => 'array'

I am trying to access the data stored in a snowflake table using python sql. Below is the columns given below i want to access

Below is the data-sample and i want to access columns value,start. This data i dumped in one column(DN) of a table (stg)
{
"ok": true,
"metrics": [
{
"name": "t_in",
"data": [{"value": 0, "group": {"start": "00:00"}}]
},
{
"name": "t_out",
"data": [{"value": 0,"group": {"start": "00:00"}}]
}
]
}
##consider many lines stored in same column in different rows.
Below query only fetched data for name. I want to access other columns value also. This query is a part of python script.
select
replace(DN : metrics[0].name , '"' , '')as metrics_name, #able to get
replace(DN : metrics[2].data , '"' , '')as metrics_data_value,##suggestion needed
replace(DN : metrics.data.start, '"','') as metrics_start, ##suggestion needed
replace(DN : metrics.data.group.finish, '"','') as metrics_finish, ##suggestion needed
from stg
Do i need to iterate over data and group? If yes, please suggest the code.
Here is an example of how to query that data.
Set up sample data:
create or replace transient table test_db.public.stg (DN variant);
insert overwrite into test_db.public.stg (DN)
select parse_json('{
"ok": true,
"metrics": [
{
"name": "t_in",
"data": [
{"value": 0, "group": {"start": "00:00"}}
]
},
{
"name": "t_out",
"data": [
{"value": 0,"group": {"start": "00:00"}}
]
}
]
}');
Select statement example:
select
DN:metrics[0].name::STRING,
DN:metrics[1].data,
DN:metrics[1].data[0].group.start::TIME,
DN:metrics[1].data[0].group.finish::TIME
from test_db.public.stg;
Instead of querying individual indexes of the JSON arrays, I think you'll want to use the flatten function which is documented here.
Here is how you do it with the flatten which is what I am guessing you want:
select
mtr.value:name::string,
dta.value,
dta.value:group.start::string,
dta.value:group.finish::string
from test_db.public.stg stg,
lateral flatten(input => stg.DN:metrics) mtr,
lateral flatten(input => mtr.value:data) dta