AWS boto3 page_iterator.search can't compare datetime.datetime to str - amazon-s3

Trying to capture delta files(files created after last processing) sitting on s3. To do that using boto3 filter iterator by query LastModified value rather than returning all the list of files and filtering on the client site.
According to http://jmespath.org/?, the below query is valid and filters the following json respose;
filtered_iterator = page_iterator.search(
"Contents[?LastModified>='datetime.datetime(2016, 12, 27, 8, 5, 37, tzinfo=tzutc())'].Key")
for key_data in filtered_iterator:
print(key_data)
However it fails with;
RuntimeError: xxxxxxx has failed: can't compare datetime.datetime to str
Sample paginator reponse;
{
"Contents": [{
"LastModified": "datetime.datetime(2016, 12, 28, 8, 5, 31, tzinfo=tzutc())",
"ETag": "1022dad2540da33c35aba123476a4622",
"StorageClass": "STANDARD",
"Key": "blah1/blah11/abc.json",
"Owner": {
"DisplayName": "App-AWS",
"ID": "bfc77ae78cf43fd1b19f24f99998cb86d6fd8220dbfce0ce6a98776253646656"
},
"Size": 623
}, {
"LastModified": "datetime.datetime(2016, 12, 28, 8, 5, 37, tzinfo=tzutc())",
"ETag": "1022dad2540da33c35abacd376a44444",
"StorageClass": "STANDARD",
"Key": "blah2/blah22/xyz.json",
"Owner": {
"DisplayName": "App-AWS",
"ID": "bfc77ae78cf43fd1b19f24f99998cb86d6fd8220dbfce0ce6a81234e632c5a8c"
},
"Size": 702
}
]
}

Boto3 Jmespath implementation does not support dates filtering (it will mark them as incompatible types "unicode" and "datetime" in your example). But by the way Dates are parsed by Amazon you can perform lexographical comparison of them using to_string() method of Jmespath.
Something like this:
"Contents[?to_string(LastModified)>='\"2015-01-01 01:01:01+00:00\"']"
But keep in mind that its a lexographical comparison and not dates comparison. Works most of the time tho.

After spend a few minutes on boto3 paginator documentation, I just realist it is actually an syntax problem, which I overlook it as a string.
Actually, the quote that embrace comparison value on the right is a backquote/backtick, symbol [ ` ] . You cannot use single quote [ ' ] for the comparison values/objects.
After inspect JMESPath example, I notice it is using backquote for comparative value. So boto3 paginator implementation indeed comply to JMESPath standard.
Here is the code I run without error using the backquote.
import boto3
s3 = boto3.client("s3")
s3_paginator = s3.get_paginator('list_objects')
s3_iterator = s3_paginator.paginate(Bucket='mytestbucket')
filtered_iterator = s3_iterator.search(
"Contents[?LastModified >= `datetime.datetime(2016, 12, 27, 8, 5, 37, tzinfo=tzutc())`].Key"
)
for key_data in filtered_iterator:
print(key_data)

Related

AWS IoT SQL: Parsing string to JSON

I am writing an AWS IoT Core rule where the incoming message object has a property that contains JSON in an escaped string. Is there a way to convert this to JSON in the result?
Example
Input message
{
"Value": "{\"x\": 1, \"y\": 2}",
"Timestamp": "2022-09-09T13:44:37.000Z"
}
Desired output
{
"x": 1,
"y": 2,
"Timestamp": "2022-09-09T13:44:37.000Z"
}
I am aware that it is possible to write a lambda to do this, but I was hoping that it would be possible to do with just SQL

Extracting particular nested properties with a $ prefix in Amazon Redshift or Quicksight

I am using PostHog for product analytics and have exported some event data to Amazon Redshift as well as S3 to be used in Quicksight.
Under the personal properties part of the JSON, each individual property is nested but begins with a $
I am quite new to SQL queries as well as getting specific details from JSON. in Quicksight using parseJson
Here is an example of the JSON from PostHog
"properties": {
"$active_feature_flags": [],
"$browser": "Chrome",
"$browser_version": 98,
"$ce_version": 1,
"$device_type": "Desktop",
"$environment": "test",
"$event_type": "click",
"$lib": "web",
"$lib_version": "1.17.8",
"$os": "Mac OS X",
"$pathname": "/events",
"$plugins_deferred": [],
"$plugins_failed": [],
"$plugins_succeeded": [
"First Event Today (4914)",
"GeoIP (5539)"
],
I have sought help from a few sources who have mentioned it isn't as simple because of the $ symbol at the beginning.
So my question would be,
How would I query this in Redshift to successfully extract $device_type and $os for example?
How would I pull the same properties using parseJson in Amazon Quicksight?
I can answer #1.
The json provided looks to be a snippet and invalid as is. So I removed the trailing ',' and used SQL to provide the surrounding '{}'. Once it is valid json this runs fine:
create table test as select '"properties": {
"$active_feature_flags": [],
"$browser": "Chrome",
"$browser_version": 98,
"$ce_version": 1,
"$device_type": "Desktop",
"$environment": "test",
"$event_type": "click",
"$lib": "web",
"$lib_version": "1.17.8",
"$os": "Mac OS X",
"$pathname": "/events",
"$plugins_deferred": [],
"$plugins_failed": [],
"$plugins_succeeded": [
"First Event Today (4914)",
"GeoIP (5539)"
]
}' as json_text;
select json_extract_path_text('{' || json_text ||'}', 'properties' ,'$device_type') as device_type,
json_extract_path_text('{' || json_text ||'}', 'properties' ,'$os') as os
from test;

SQL Server trim before and after specific values

I have a database that has a column with a long string and I'm looking for a way to extract just a certain portion of it.
Here is a sample:
{
"vendorId": 53,
"externalRef": "38828059 $567.82",
"lines": [{
"amount": 0,
"lineType": "PURCHASE",
"lineItemType": "INVENTORY",
"inventory": {
"cost": 0,
"quantity": 1,
"row": "6",
"seatType": "CONSECUTIVE",
"section": "102",
"notes": "http://testurl/0F005B52CE7F5892 38828059 $567.82 ,special",
"splitType": "ANY",
"stockType": "ELECTRONIC",
"listPrice": 0,
"publicNotes": " https://brokers.123.com/wholesale/event/146489908 https://www.123.com/buy-event/4897564 ",
"eventId": 3757669,
"eventMapping": {
"eventDate": "",
"eventName": "Brandi Carlile: Beyond These Silent Days Tour",
"venueName": "Gorge Amphitheatre"
},
"tickets": [{
"seatNumber": 1527
}]
}
}]
}
What I'm looking to extract is just http://testurl/0F005B52CE7F5892
Would someone be able to assist me with the syntax how to call my query that it will make a new temp column and give me just this extracted value for each row in this column?
I user SQL Server 2008 so some newer functions won't work for me.
Upgrade your SQL Server to a supported version.
But till then, we pity those who dare to face the horror of handling JSON with only the old string functions.
select
[notes_url] =
CASE
WHEN [json_column] LIKE '%"notes": "http%'
THEN substring([json_column],
patindex('%"notes": "http%', [json_column])+10,
charindex(' ', [json_column] ,
patindex('%"notes": "http%', [json_column])+15)
- patindex('%"notes": "http%', [json_column])-10)
END
from [YourTable];
db<>fiddle here

Get all values from property with LINQ from JSON?

I'm having trouble sorting out the exact syntax to properly query my JSON response.
My API endpoint returns some JSON as follows:
{
"status": "Succeeded",
"recognitionResults": [
{
"page": 1,
"clockwiseOrientation": 0.14,
"width": 2835,
"height": 2241,
"unit": "pixel",
"lines": [
{
"boundingBox": [
25,
11,
324,
15,
323,
51,
24,
46
],
"text": "Custom Report",
"words": [
{
"boundingBox": [
37,
11,
171,
14,
172,
49,
38,
48
],
"text": "Custom"
},
{
"boundingBox": [
193,
15,
322,
17,
323,
49,
194,
49
],
"text": "Report"
}
]
}...
What is important to note is that is that the root element will only contain 1 recognitionResults array. Inside this array it will contain many arrays of lines. Inside each line I have a property of text, I'll also have a property of words that also contains a property of text. I'm only concerned with the property text that is the direct child of lines.
I'm attempting to select all of the text properties into a list of strings.
vb.net code:
File.WriteAllText(Path.GetFileName(strFilePath) & ".json", JToken.Parse(strResult).ToString())
Dim c1 As JArray = CType(tmpObj("recognitionResults"), JArray)
Dim c2 = (From s In c1.Children() Select s("text")).ToList()
This throws an exception that the JArray has an invalid key; an int is expected.
I also thought I could just query it with LINQ directly:
Dim c3 = (From s In tmpObj Select s("text")).ToList()
This throws an exception that it cannot access a child value on Newtonsoft.Json.Linq.Jproperty
Lastly, I've also tried this:
Dim c2 = (From p In tmpObj("recognitionResults")("lines").Children() Select p("text"))
I'm really stuck at this point. I think I just have a syntax problem in how I am trying to select. Can someone point me in the right direction?
You can use SelectTokens with a wildcard to get the information you want from the JSON more easily:
Dim token As JToken = JToken.Parse(json)
Dim lines As List(Of String) = _
(From t In token.SelectTokens("..lines[*].text") Select CStr(t)).ToList()
Working demo: https://dotnetfiddle.net/Zlenga

Azure Stream Analytics alternative to Sparks mapWithState

Is there a way in Azure Stream Analytics to create some aggregation with a custom state like Sparks mapWithState does?
Here is my scenario:
I have data from IoT devices containing the following fields:
DeviceId
Position
Value
The data may arrive out of order.
Whenever a new packet arrives for a given DeviceId, I want to output the last n positions and values for that device. Like
Input:
{ "DeviceId": "A", "Position": 10, "Value": 100}
Output:
{ "DeviceId": "A", "Positions": [10], "Value": [100]}
Next Input:
{ "DeviceId": "A", "Position": 11, "Value": 101}
Output:
{ "DeviceId": "A", "Positions": [10, 11], "Value": [100, 101]}
Next Input:
{ "DeviceId": "A", "Position": 9, "Value": 99}
Output:
{ "DeviceId": "A", "Positions": [9, 10, 11], "Value": [9, 100, 101]}
In Spark Structured Streaming I would implement this using groupBy and mapWithState. Is there a way to implement this in ASA?
in ASA, you can use one of the following methods to do this:
if you have an additional column that can be use for TIMESTAMP, you
can use TIMESTAMP BY and ASA will reorder the events. Then you can
use LAG to fetch latest events for this particular device.
without any timestamp column, you can create COLLECTTOP operator, and order the events according to your "Position" column
alternatively, you can implement your own stateful logic using User Defined Aggregates (UDA) as described here.
Let me know if you need help to implement one of these 3 methods. I'll be happy to provide further details.
Thanks,
JS