AWS mqtt SQL query - sql

I have the following mqtt message:
{
"sensors": [
{
"lsid": 412618,
"data": [
{
"temp_in": 72.3,
"heat_index_in": 72,
"dew_point_in": 55.9,
"ts": 1652785241,
"hum_in": 56.3
}
],
"sensor_type": 243,
"data_structure_type": 12
},
{
"lsid": 421195,
}
I can get the "sensors,0.lsid" value and the entire "data" array using this query:
select get(sensors,0).lsid as ls, get(sensors, 0).data as data1 from "topic"
but what I really need is to get "temp_in:72.3" , i.e. the values from the second level array
I've tried using this :AWS Doc., but unless I'm not following it correctly, it doesn't seem to work.
Any help would be greatly appreciated

Related

Querying Line Items of Order with JSON Functions in BigQuery

I am banging my head head here for the past 2 hours with all the available JSON_... functions in BigQuery. I've read quite a few questions here but no matter why I try, I never succeed in extracting the "amounts" from my JSON below.
This is my JSON stored in a BQ column:
{
"lines": [
{
"id": "70223039-83d6-463d-a482-7ce4d50bf0fc",
"charges": [
{
"type": "price",
"amount": 50.0
},
{
"type": "discount",
"amount": -40.00
}
]
},
{
"id": "70223039-83d6-463d-a482-7ce4d50bf0fc",
"charges": [
{
"type": "price",
"amount": 20.00
},
{
"type": "discount",
"amount": 0.00
}
]
}
]
}
Imagine the above being an order containing multiple items.
I am trying to get a sum of all amounts => 50-40+20+0. The result needs to be 30 = the total order price.
Is it possible to pull all the amount values and then have them summed up just via SQL without any custom JS functions? I guess the summing is the easy part - getting the amounts into an array is the challenge here.
Use below
select (
select sum(cast(json_value(charge, '$.amount') as float64))
from unnest(json_extract_array(order_as_json, '$.lines')) line,
unnest(json_extract_array(line, '$.charges')) charge
) total
from your_table
if applied to sample data in y our question - output is

Apache superset with mongoDB(NO SQL database)

I am using MongoDB. My task is to build Dashboard charts for the data. So, I am using Apache superset. I connected MongoDB to apache drill as it wont connect directly with superset. Then connected apache drill to Apachesueperset. My collection is nested. How can I process this nested data to get use for dashboard charts.My data looks as below
{
"_id": {
"$oid": "6229d3cfdbfc81a8777e4821"
},
"jobs": [
{
"job_ID": {
"$oid": "62289ded8079821eb24760e0"
},
"New": false,
"Expired": false
},
{
"job_ID": {
"$oid": "6228a252fb4554dd5c48202a"
},
"New": true,
"Expired": true
},
{
"job_ID": {
"$oid": "622af1c391b290d34701af9f"
},
"New": true,
"Expired": false
}
],
"email": "mani2090996#ail.com"
}
I am querying in apache drill as follows
SELECT flat.fill FROM (SELECT FLATTEN(t.jobs) AS fill FROM mongo.recruitingdb.flatten.`Vendorjobs` t) flat WHERE flat.fill.New = flase;
And i am getting parsing error
org.apache.drill.common.exceptions.UserRemoteException: PARSE ERROR: Encountered "." at line 1, column 123.
Superset doesn't really handle nested data very well. Drill does however, so you'll have to craft queries to produce columns that can be visualized.
Take a look here: https://drill.apache.org/docs/json-data-model/
and here: https://drill.apache.org/docs/querying-complex-data-introduction/.
UPDATE:
Try the query below. The FROM clause may not be exactly right, but you should get the idea from this.
Note that you can access maps in Drill in two ways:
tablename.mapname.field OR
mapname['field']
You can do this for any level of nesting.
SELECT mongoTable.jobs.job_ID.`$oid` AS job_ID,
mongoTable.jobs.`New` AS new,
mongoTable.jobs.`Expired` AS expired
FROM
(
SELECT flatten(jobs) AS jobs
FROM mongo.recruitingdb.flatten.`Vendorjobs` AS t1
WHERE t1.jobs.New = false
) AS mongoTable

How to load a jsonl file into BigQuery when the file has mix data fields as columns

During my work flow, after extracting the data from API, the JSON has the following structure:
[
{
"fields":
[
{
"meta": {
"app_type": "ios"
},
"name": "app_id",
"value": 100
},
{
"meta": {},
"name": "country",
"value": "AE"
},
{
"meta": {
"name": "Top"
},
"name": "position",
"value": 1
}
],
"metrics": {
"click": 1,
"price": 1,
"count": 1
}
}
]
Then it is store as .jsonl and put on GCS. However, when I load it onto BigQuery for further extraction, the automatic schema inference return the following error:
Error while reading data, error message: JSON parsing error in row starting at position 0: Could not convert value to string. Field: value; Value: 100
I want to convert it in to the following structure:
app_type
app_id
country
position
click
price
count
ios
100
AE
Top
1
1
1
Is there a way to define manual schema on BigQuery to achieve this result? Or do I have to preprocess the jsonl file before put it to BigQuery?
One of the limitations in loading JSON data from GCS to BigQuery is that it does not support maps or dictionaries in JSON.
A invalid example would be:
"metrics": {
"click": 1,
"price": 1,
"count": 1
}
Your jsonl file should be something like this:
{"app_type":"ios","app_id":"100","country":"AE","position":"Top","click":"1","price":"1","count":"1"}
I already tested it and it works fine.
So wherever you process the conversion of the json files to jsonl files and storage to GCS, you will have to do some preprocessing.
Probably you have to options:
precreate target table with an app_id field as an INTEGER
preprocess jsonfile and enclose 100 into quotes like "100"

cannot copy json - Dynamo db Streams to redshift

Following is the use case i am working on:
I have configured enable Streams when creating DynamoDB with new and old Image.I have created a Kinesis Firehose delivery stream with Destination as Redshift(Intermediate s3).
From Dynamodb my stream reaches Firhose and from there to the Bucket as JSON (S3 Bucket -Gzip)given below. My Problem is i cannot COPY this JSON to redshift.
Things i am not able to get:
Not Sure what should be the Create table Statement in Redshift
What should be the COPY Syntax in Kinesis firhose.
How should i use JsonPaths here. Kinesis Data firehouse set to return only json to my s3 bucket.
How to mention the Maniphest in the COPY Command
JSON Load to S3 is shown Below:
{
"Keys": {
"vehicle_id": {
"S": "x011"
}
},
"NewImage": {
"heart_beat": {
"N": "0"
},
"cdc_id": {
"N": "456"
},
"latitude": {
"N": "1.30951"
},
"not_deployed_counter": {
"N": "1"
},
"reg_ind": {
"N": "0"
},
"operator": {
"S": "x"
},
"d_dttm": {
"S": "11/08/2018 2:43:46 PM"
},
"z_id": {
"N": "1267"
},
"last_end_trip_dttm": {
"S": "11/08/2018 1:43:46 PM"
},
"land_ind": {
"N": "1"
},
"s_ind": {
"N": "1"
},
"status_change_dttm": {
"S": "11/08/2018 2:43:46 PM"
},
"case_ind": {
"N": "1"
},
"last_po_change_dttm": {
"S": "11/08/2018 2:43:46 PM"
},
"violated_duration": {
"N": "20"
},
"vehicle_id": {
"S": "x011"
},
"longitude": {
"N": "103.7818"
},
"file_status": {
"S": "Trip_Start"
},
"unhired_duration": {
"N": "10"
},
"eo_lat": {
"N": "1.2345"
},
"reply_eo_ind": {
"N": "1"
},
"license_ind": {
"N": "0"
},
"indiscriminately_parked_ind": {
"N": "0"
},
"eo_lng": {
"N": "102.8978"
},
"officer_id": {
"S": "xxxx#gmail.com"
},
"case_status": {
"N": "0"
},
"color_status_cd": {
"N": "0"
},
"parking_id": {
"N": "2345"
},
"ttr_dttm": {
"S": "11/08/2018 2:43:46 PM"
},
"deployed_ind": {
"N": "1"
},
"status": {
"S": "PI"
}
},
"SequenceNumber": "1200000000000956615967",
"SizeBytes": 570,
"ApproximateCreationDateTime": 1535513040,
"eventName": "INSERT"
}
My Create table Statement :
create table vehicle_status(
heart_beat integer,
cdc_id integer,
latitude integer,
not_deployed_counter integer,
reg_ind integer,
operator varchar(10),
d_dttm varchar(30),
z_id integer,
last_end_trip_dttm varchar(30),
land_ind integer,
s_ind integer,
status_change_dttm varchar(30),
case_ind integer,
last_po_change_dttm varchar(30),
violated_duration integer,
vehicle_id varchar(8),
longitude integer,
file_status varchar(30),
unhired_duration integer,
eo_lat integer,
reply_eo_ind integer,
license_ind integer,
indiscriminately_parked_ind integer,
eo_lng integer,
officer_id varchar(50),
case_status integer,
color_status_cd integer,
parking_id integer,
ttr_dttm varchar(30),
deployed_ind varchar(3),
status varchar(8));
And My Copy Statement (Manually trying to reslove this from Redshift):
COPY vehicle_status (heart_beat, cdc_id, latitude, not_deployed_counter, reg_ind, operator, d_dttm, z_id, last_end_trip_dttm, land_ind, s_ind, status_change_dttm, case_ind, last_po_change_dttm, violated_duration, vehicle_id, longitude, file_status, unhired_duration, eo_lat, reply_eo_ind, license_ind, indiscriminately_parked_ind, eo_lng, officer_id, case_status, color_status_cd, parking_id, ttr_dttm, deployed_ind, status)
FROM 's3://<my-bucket>/2018/08/29/05/vehicle_status_change-2-2018-08-29-05-24-42-092c330b-e14a-4133-bf4a-5982f2e1f49e.gz' CREDENTIALS 'aws_iam_role=arn:aws:iam::<accountnum>:role/<RedshiftRole>' GZIP json 'auto';
When i try the above procedure - i get to Insert the records - but all the columns and rows are null.
How can i copy this json format to redhsift. Have been stuck here last 3 days.Any help on this would do.
S3 Bucket:
Amazon S3/<My-bucket>/2018/08/29/05
Amazon S3/<My-bucket>/manifests/2018/08/29/05
I'm not very much familiar with Amazon, but let me try to answer most of your questions, so that you could move on. Other people are most welcome to edit this answer or additional details. Thank you!
Not Sure what should be the Create table Statement in Redshift
Your create statement create table vehicle_status(...) has no problem, though you could add distribution key, sort key and encoding based on your requirement, refer more here and here
As per AWS Kenesis documents, your table must be present in Redshift, hence you could connect to Redshift using psql command and run the create statement manually.
What should be the COPY Syntax in Kinesis firhose.
The Copy syntax would remain same either you run it via psql or firhose, luckily the copy script you have come up with works without any error, I tried it in my instance with small modification of direct AWS/SECRET key supply rather then it works fine, here the sql I run that worked fine and copied 1 data record to the table vehicle_status.
Actually your json path structure is complex, hence json 'auto' will not work. Here is the working command, I have created a sample jsonpath file for you with 4 example fields and you could follow same structure to create jsonpath file with all the data points.
COPY vehicle_status (heart_beat, cdc_id, operator, status) FROM 's3://XXX/development/test_file.json' CREDENTIALS 'aws_access_key_id=XXXXXXXXXXXXXXXXX;aws_secret_access_key=MYXXXXXXXXXXXXXXXXXXXXXX' json 's3://XXX/development/yourjsonpathfile';
And your json path file should have content similar of as below.
{
"jsonpaths": [
"$['NewImage']['heart_beat']['N']",
"$['NewImage']['cdc_id']['N']",
"$['NewImage']['operator']['S']",
"$['NewImage']['status']['S']"
]
}
I have tested it and it works.
How should i use JsonPaths here. Kinesis Data firehouse set to return only json to my s3 bucket.
I have used your example json data only and it works, so I see no issue here.
How to mention the Maniphest in the COPY Command
This is good question, I could try explaining it, I hope, here you are referring menifest.
If you see above copy command, it works fine for one file or couple of files, but think it you have lot of files, here comes the concept of menifest.
Straight from Amazon docs, "Instead of supplying an object path for the COPY command, you supply the name of a JSON-formatted text file that explicitly lists the files to be loaded."
In short, if you want to load multiple files in single shot which is preferred way as well by Redshift, you could create a simple menifest with json and supply the same in copy command.
{
"entries": [
{"url":"s3://mybucket-alpha/2013-10-04-custdata", "mandatory":true},
{"url":"s3://mybucket-alpha/2013-10-05-custdata", "mandatory":true},....
]
}
upload the menifest to S3 and use the same in your copy command like below.
COPY vehicle_status (heart_beat, cdc_id, latitude, not_deployed_counter, reg_ind, operator, d_dttm, z_id, last_end_trip_dttm, land_ind, s_ind, status_change_dttm, case_ind, last_po_change_dttm, violated_duration, vehicle_id, longitude, file_status, unhired_duration, eo_lat, reply_eo_ind, license_ind, indiscriminately_parked_ind, eo_lng, officer_id, case_status, color_status_cd, parking_id, ttr_dttm, deployed_ind, status) FROM 's3://XXX/development/test.menifest' CREDENTIALS 'aws_access_key_id=XXXXXXXXXXXXXXXXX;aws_secret_access_key=MYXXXXXXXXXXXXXXXXXXXXXX' json 's3://yourbucket/jsonpath' menifest;
Here is detail reference for menifest.
I hope this gives you some ideas, how to move on and If there is spefic error you see, I would be happy to refocus on the answer.

Is it possible to turn an array returned by the Mongo GeoNear command (using Ruby/Rails) into a Plucky object?

As a total newbie I have been trying to get the geoNear command working in my rails application and it appear to be working fine. The major annoyance for me is that it is returning an array with strings rather than keys which I can call on to pull out data.
Having dug around, I understand that MongoMapper uses Plucky to turn the the query resultant into a friendly object which can be handled easily but I haven't been able to find out how to transform the result of my geoNear query into a plucky object.
My questions are:
(a) Is it possible to turn this into a plucky object and how do i do that?
(b) If it is not possible how can I most simply and systematically extract each record and each field?
here is the query in my controller
#mult = 3963 * (3.14159265 / 180 ) # Scale to miles on earth
#results = #db.command( {'geoNear' => "places", 'near'=> #search.coordinates , 'distanceMultiplier' => #mult, 'spherical' => true})
Here is the object i'm getting back (with document content removed for simplicity)
{"ns"=>"myapp-development.places", "near"=>"1001110101110101100100110001100010100010000010111010", "results"=>[{"dis"=>0.04356444023196527, "obj"=>{"_id"=>BSON::ObjectId('4ee6a7d210a81f05fe000001'),...}}], "stats"=>{"time"=>0, "btreelocs"=>0, "nscanned"=>1, "objectsLoaded"=>1, "avgDistance"=>0.04356444023196527, "maxDistance"=>0.0006301239824196907}, "ok"=>1.0}
Help is much appreciated!!
Ok so lets say you store the results into a variable called places_near:
places_near = t.command( {'geoNear' => "places", 'near'=> [50,50] , 'distanceMultiplier' => 1, 'spherical' => true})
This command returns an hash that has a key (results) which maps to a list of results for the query. The returned document looks like this:
{
"ns": "test.places",
"near": "1100110000001111110000001111110000001111110000001111",
"results": [
{
"dis": 69.29646421910687,
"obj": {
"_id": ObjectId("4b8bd6b93b83c574d8760280"),
"y": [
1,
1
],
"category": "Coffee"
}
},
{
"dis": 69.29646421910687,
"obj": {
"_id": ObjectId("4b8bd6b03b83c574d876027f"),
"y": [
1,
1
]
}
}
],
"stats": {
"time": 0,
"btreelocs": 1,
"btreelocs": 1,
"nscanned": 2,
"nscanned": 2,
"objectsLoaded": 2,
"objectsLoaded": 2,
"avgDistance": 69.29646421910687
},
"ok": 1
}
To iterate over the responses just iterate as you would over any list in ruby:
places_near['results'].each do |result|
# do stuff with result object
end