I have a message coming from a topic through MQTT.
I need change the name os the columns of the message.
The original message:
{
"timestamp": 1645722065088,
"Heart Rate Measurement": 24550,
"Energy Expended": 1900,
"RR-Interval": 1
}
I need to take just timestamp and Heart Rate inside of a rule:
SELECT "Heart Rate Measurement"as heartrate, timestamp as date FROM
'pulsewave/heart_rate'
The timestamp is easy to get but the "Heart Rate Measurement" is not
I ended up getting the following:
{
"heartrate": "Heart Rate Measurement",
"date": 1645722065088
}
any tips to get the message inside of the Heart Rate Measurement? When i set without the quotes it doesnt accept
The rule works for the timestamp attribute but not Heart Rate Measurement as the AWS IoT SQL syntax doesn't support spaces in attribute names.
From https://docs.aws.amazon.com/iot/latest/developerguide/iot-sql-reference.html
Attribute names with spaces in them can't be used as field names in the SQL statement. While the incoming payload can have attribute names with spaces in them, such names can't be used in the SQL statement. They will, however, be passed through to the outgoing payload if you use a wildcard (*) field name specification.
An alternate approach is to implement a lambda that projects your JSON payload to an equivalent without the spaces.
Related
I am trying to perform a simply copy activity in Azure Data Factory from CSV to SQL Table, but I'm getting the following error:
{
"errorCode": "2200",
"message": "ErrorCode=DelimitedTextMoreColumnsThanDefined,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Error found when processing 'Csv/Tsv Format Text' source 'organizations.csv' with row number 6696: found more columns than expected column count 41.,Source=Microsoft.DataTransfer.Common,'",
"failureType": "UserError",
"target": "Copy data1",
"details": []
}
The copy activity is as follows
Source
My Sink is as follows:
As preview of the data in source is as follows:
This seems like a very straight forward copy activity. Any thoughts on what might be causing the error?
My row 6696 looks like the following:
3b1a2e5f-d08b-166b-4b91-eb53009b2377 Compassites Software Solutions organization compassites-software https://www.crunchbase.com/organization/compassites-software 318375 17/07/2008 10:46 05/12/2022 12:17 company compassitesinc.com http://www.compassitesinc.com IND Karnataka Bangalore "Pradeep Court", #163/B, 6th Main 3rd Cross, JP Nagar 3rd phase 560078 operating Custom software solution experts Big Data,Cloud Computing,Information Technology,Mobile,Software Data and Analytics,Information Technology,Internet Services,Mobile,Software 01/11/2005 51-100 info#compassitesinc.com 080-42032572 http://www.facebook.com/compassites http://www.linkedin.com/company/compassites-software-solutions http://twitter.com/compassites https://res.cloudinary.com/crunchbase-production/image/upload/v1397190270/c3e5acbde40f36eaf4f8c6f6eda3f803.png company
No commas
As the error message indicates, there is a record at row number 6696 where there is a value containing , as a character in it.
Look at the following demonstration where I have taken a similar case. I have 3 columns in my source. The data looks as shown below:
When I run use similar dataset settings and read these values, the same error would be thrown.
So, the value T1,OG is being considered as if they belong to 2 different columns since they have dataset delimiter within the value.
Such values would throw an error as it is ambiguous to read. One way to avoid this is to enclose such values with quote character (double quote in this case).
Now when I run the copy activity, it would give the desired output.
The table data would look like this:
I want to extract 4 values out of one field, called msg, from a Splunk query; and the msg is in the form of:
msg: "Service call successful k1=v1 k2=v2 k3=v3 k4=v4 k5=v5 something else can be ignored"
keys are always static but values are not, for instance, v2 could be XXX or XXYYZZ; similarly possible values for v3 just have unpredictable length.
I query to get some sample results and hope to use Field Extractor to generate a regex, but the regex generated can't get all the values out and I guess it's probably because values are not having the same length?
Do I need to change my logging format by separating each key=value using a common? Or I am not using the field extractor correctly?
[Update1]: A few sample data:
msg:Service call successful k1=XXX k2=BBBB k3=Something I made up k4=YYYNNN k5=do not need to retrieve this value
msg:Service call successful k1=SSSSSS k2=AAA k3=This could contain space and comma, like this one k4=YYYNNM k5=can be ignored
I could change the logging format if it makes easier to query and extract fields. Will adding a separator like dot or pipe help?
Normally Splunk will pull key-value pairs out automatically
However, when it doesn't, go try your regular expression(s) on regex101 - the field extractor is often a good[ish] start, but rarely creates efficient (or complete) regular expressions
An inline version of this would be as follows (presuming the "value" half of the key-value pair is contiguous characters):
| rex field=_raw "k1=(?<k1>\S+)\s+k2=(?<k2>\S+)\s+k3=(?<k3>\S+)\s+k4=(?<k4>\S+)\s+k5=(?<k5>\S+)"
Normally I prefer to do sequential rex calls, in case something's out of order or missing, but if your data's consistent, this will work
Once you have it the way you want it, update your props.conf and transforms.conf as appropriate for the sourcetype
EDIT for updated sample data / comment response:
...
| rex field=_raw "k3=(?<k3>.+)\s+k4="
| rex field=_raw "k4=(?<k4>.+)\s+k5="
...
Are Heart Points available in the REST API for reading? IF so, how do we get to them? I'm not seeing the documentation data. Thanks.
Eric
You should use the Users.dataSources.datasets API endpoint. You can grab the hearts points merged from all data points by querying the dataSourceId "derived:com.google.heart_minutes:com.google.android.gms:merge_heart_minutes". It returns a JSON object with an array called "points". You'll find each heart point in that list and if you drill down further for each heart point you'll get the derived source.
The endpoint takes the form:
https://www.googleapis.com/fitness/v1/users/me/dataSources/dataSourceId/datasets/datasetId
Replace the following in the URL above:
dataSourceId: derived:com.google.heart_minutes:com.google.android.gms:merge_heart_minutes
datasetId: The ID is formatted like: "startTime-endTime" where startTime and endTime are 64 bit integers.
Expanding on WiteCastle's answer, this datasource will provide you with the heart points.
"derived:com.google.heart_minutes:com.google.android.gms:merge_heart_minutes"
You will need to specify a timeframe denoted by the datasetId parameter which is a start time and and an end time in epoch time with nanoseconds format, e.g.:
1607904000000000000-1608057778000000000
The json response includes an array of points, essentially each time the sensor detected the user's activity. The 'heart points' are accessible within each point's "fpVal". Example of a point is below:
{
"startTimeNanos": "1607970900000000000",
"endTimeNanos": "1607970960000000000",
"dataTypeName": "com.google.heart_minutes",
"originDataSourceId": "derived:com.google.heart_rate.bpm:com.google.android.gms:merge_heart_rate_bpm",
"value": [
{
"fpVal": 2, <--- 2 heart points recorded during this activity
"mapVal": []
}
],
"modifiedTimeMillis": "1607976569329"
},
To get the heart points for today, specify the timeframe (00:00-23:59 in epcoch format), then loop through each point adding up all the "fpVal" values.
While reading in the knowledge center, the following is mentioned:
The TTL properties are not applied to data that already exists in the
Analytics Platform. You must set the TTL properties before you add
data.
So how can I remove existing logs before setting those properties?
You must use the Elastic Search delete APIs to remove existing documents from Worklight Analytics.
Before using any of the Elastic Search delete APIs it is advised to back up your data first, as misuse of the APIs or an undesired query will result in permanent data loss.
Below is an example of how to delete client logs in a specified date range, assuming your instance of Elastic Search is running on http://localhost:9500. This specific example deletes all client logs between October 1st and October 15th 2014.
curl -XDELETE 'http://localhost:9500/worklight/client_logs/_query' -d
'
{
"query": {
"range": {
"timestamp": {
"gt" : 1412121600000,
"lt" : 1413331200000
}
}
}
}
'
You can delete any type of document using the path http://localhost:9500/worklight/{document_type}. The types of documents are app_activities, network_activities, notification_activities, client_logs and server_logs.
When deleting documents, you can filter on two properties: "timestamp" or "daystamp", which are both represented in epoch time in milliseconds. Please note, "daystamp" is simply the first timestamp for the given day (i.e. 12:00AM). The range query also accepts the following parameters:
gte - greater than or equal to
gt - greater than
lte - less than or equal to
lt - less than
For more information refer to Elastic Search delete and query APIS:
Delete by Query API
Queries
Range Query
I am using AWS Data Pipeline to save a text file to my S3 bucket from RDS. I would like the file name to to have the date and the hour in the file name like:
myfile-YYYYMMDD-HH.txt
myfile-20140813-12.txt
I have specified my S3DataNode FilePath as:
s3://mybucketname/out/myfile-#{format(myDateTime,'YYYY-MM-dd-HH')}.txt
When I try to save my pipeline I get the following error:
ERROR: Unable to resolve myDateTime for object:DataNodeId_xOQxz
According to the AWS Data Pipeline documentation for date and time functions this is the proper syntax for using the format function.
When I save pipeline using a "hard-coded" the date and time I don't get this error and my file is in my S3 bucket and folder as expected.
My thinking is that I need to define "myDateTime" somewhere or use a NOW()
Can somebody tell me how to set "myDateTime" to the current time (e.g. NOW) or give a workaround so I can format the current time to be used in my FilePath?
I am not aware of an exact equivalent of NOW() in Data Pipeline. I tried using makeDate with no arguments (just for fun) to see if that worked.. it did not.
The closest are runtime variables scheduledStartTime, actualStartTime, reportProgressTime.
http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-object-s3datanode.html
The following for eg. should work.
s3://mybucketname/out/myfile-#{format(#scheduledStartTime,'YYYY-MM-dd-HH')}.txt
Just for fun, here is some more info on Parameters.
At the end of your Pipeline Json (click List Pipelines, select into one, click Edit Pipeline, then click Export), you need to add a Parameters and/or Values object.
I use a myStartDate for backfill processes which you can manipulate once it is passed in for ad hoc runs. You can give this a static default, but can't set it to a dynamic value so it is limited for regular schedule tasks. For realtime/scheduled dates, you need to use the #scheduledStartTime, etc, as suggested. Here is a sample of setting up some Parameters and or Values. Both show up in Parameters in the UI. These values can be used through out your pipeline activities (shell, hive, etc) with the #{myVariableToUse} notation.
"parameters": [
{
"helpText": "Put help text here",
"watermark": "This shows if no default or value set",
"description": "Label/Desc",
"id": "myVariableToUse",
"type": "string"
}
]
And for Values:
"values": {
"myS3OutLocation": "s3://some-bucket/path",
"myThreshold": "30000",
}
You cannot add these directly in the UI (yet) but once they are there you can change and save the values.