Can Azure Cosmos DB do this kind of query? - sql

I have a JSON object stored in Azure Cosmos DB, and I'm seeing if there's a way to write workable queries doing basic things like Order By.
The structure looks something like :
[
{
"id":"id1",
"title":"test title",
"dataRecord":{
"version":1,
"dataRecordItems":[
{
"itemTitle":"item title 1",
"type":"string",
"value":"My First Title"
},
{
"itemTitle":"item number",
"type":"number",
"value":1
},
{
"itemTitle":"date",
"type":"date",
"value":"21/11/2019 00:00:00"
}
]
}
},
{
"id":"id2",
"title":"test title again",
"dataRecord":{
"version":1,
"dataRecordItems":[
{
"itemTitle":"item title 2",
"type":"string",
"value":"My Second Title"
},
{
"itemTitle":"item number",
"type":"number",
"value":2
},
{
"itemTitle":"date",
"type":"date",
"value":"20/11/2019 00:00:00"
}
]
}
]
I can use ARRAY_CONTAINS to find objects with a particular value, but I run into all kinds of issues if I try to sort by an the value of an object which has the title of "date".
So, as an example, I'd like to be able to say something like (pseudoish code here):
SELECT * FROM c WHERE
ARRAY_CONTAINS(c.dataRecord.dataRecordItems,
{"itemTitle":"item title 2", "value" : "My Second Title"}, true)
AND
ARRAY_CONTAINS(c.dataRecord.dataRecordItems,{"itemTitle":"item number", "value" : 2}, true)
ORDER BY < *** SOMEHOW GET THE DATE HERE from itemTitle = date ***
Then, in this simple case, I would everything returned, but ordered by date.
Obviously in the future I would be pulling out individual fields, but it's all kind of moot if I can't do the first part.
Just wondering if anyone has any great ideas.
Cheers!

You need to store the date in ISO 8601 format:
Year:
YYYY (eg 1997)
Year and month:
YYYY-MM (eg 1997-07)
Complete date:
YYYY-MM-DD (eg 1997-07-16)
Complete date plus hours and minutes:
YYYY-MM-DDThh:mmTZD (eg 1997-07-16T19:20+01:00)
Complete date plus hours, minutes and seconds:
YYYY-MM-DDThh:mm:ssTZD (eg 1997-07-16T19:20:30+01:00)
Complete date plus hours, minutes, seconds and a decimal fraction of a
second
YYYY-MM-DDThh:mm:ss.sTZD (eg 1997-07-16T19:20:30.45+01:00)
where:
YYYY = four-digit year
MM = two-digit month (01=January, etc.)
DD = two-digit day of month (01 through 31)
hh = two digits of hour (00 through 23) (am/pm NOT allowed)
mm = two digits of minute (00 through 59)
ss = two digits of second (00 through 59)
s = one or more digits representing a decimal fraction of a second
TZD = time zone designator (Z or +hh:mm or -hh:mm)
https://www.w3.org/TR/NOTE-datetime

Related

How count e.g. ice days over month/years

I have a weather station with data over 14 years every 10min: like this
_id:60fbcf880000000000000000
datum:2021-07-24T08:30:00.000+00:00
temperature:19.5
Now I want to count per year and per month certain days.
Ice days (temperature is a 24h always below 0°),
winter days (temperature is at a time under 0°)
cold days (max temp <10°C)
hot days (max temp 25-30°C)
very hot days (max temp over 30°).
I have no real clue about the best query code for mongo.
I can group for days, but then I have the issue to count those certain days ($buckets?)
I come until the grouping:
{$group: {
_id: [{$year: '$datum'}, <br>
{$month: '$datum'}, <br>
{$dayOfMonth: '$datum'}], <br>
temp_avg: {$avg: '$tempAussen'}, <br>
temp_min: {$min: '$tempAussen'}, <br>
temp_max: {$max: '$tempAussen'}, <br>
}}
Result in a list with elements like:
_id:Array
0:2020
1:3
2:2
temp_avg:6.12
temp_min:0.7
temp_max:9.6
But now starts my problem: How to count the days for (e.g. 2020 month 3) with temperature <0° and the other days?
Your grouping is working fine. You just need one more extra $group by month to sum conditionally for each type of day. For example, you can do
{
$group: {
_id: {
"year": "$_id.year",
"month": "$_id.month",
},
winter_days_count: {
$sum: {
"$cond": {
"if": {
$lt: [
"$temp_min",
0
]
},
"then": 1,
"else": 0
}
}
}
...
to count the winter days in the month.
Here is the Mongo playground for your reference.

How to display UTC time in vega-tooltip "version": "0.5.1",

I use "utc" vega scale in order to display data in UTC time, but when I hover an item in the chart then tooltip show a date in local format. How to display UTC data in vega tooltip?
Here is vega tooltip config
let options = {
showAllFields: false,
fields: [
{
field: "x",
title: "Time",
formatType: "time",
format: "%x %X "+ this.props.data.Timezone
},
{
field: "y",
title: "Value",
formatType: "number"
},
{
field: "value",
title: "Time",
formatType: "time",
format: "%x %X "+this.props.data.Timezone
},
{
field: "label",
title: "Data",
formatType: "string"
},
{
field: "info",
title: "Info",
formatType: "string"
},
{
field: "startTime",
title: "Start",
formatType: "time",
format: "%x %X "+this.props.data.Timezone
},
{
field: "endTime",
title: "End",
formatType: "time",
format: "%x %X "+this.props.data.Timezone
}
]
}
vegaTooltip.vega(vegaView, options);
I can't judge about the data of dates you ingest, but this was discussed recently in https://github.com/altair-viz/altair/pull/1053.
The core of the matter is that you'll have to parse your datetime data in ISO-8601 standard to make your browser parse it as UTC, if you parse your data in a different format it will assumes local time zone:
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Date/parse
Differences in assumed time zone
Given a date string of "March 7, 2014", parse() assumes a local time
zone, but given an ISO format such as "2014-03-07" it will assume a
time zone of UTC (ES5 and ECMAScript 2015). Therefore Date objects
produced using those strings may represent different moments in time
depending on the version of ECMAScript supported unless the system is
set with a local time zone of UTC. This means that two date strings
that appear equivalent may result in two different values depending on
the format of the string that is being converted.

Setting date format in Google Sheets using API and Python

I'm trying to set the date format on a column so that dates are displayed like this: 14-Aug-2017. This is the way I'm doing it:
requests = [
{
'repeatCell':
{
'range':
{
'startRowIndex': 1,
'startColumnIndex': 4,
'endColumnIndex': 4
},
'cell':
{
"userEnteredFormat":
{
"numberFormat":
{
"type": "DATE",
"pattern": "dd-mmm-yyyy"
}
}
},
'fields': 'userEnteredFormat.numberFormat'
}
}
]
body = {"requests": requests}
response = service.spreadsheets().batchUpdate(spreadsheetId=SHEET, body=body).execute()
I want all the cells in column E except the header cell to be updated, hence the range definition. I used http://wescpy.blogspot.co.uk/2016/09/formatting-cells-in-google-sheets-with.html and https://developers.google.com/sheets/api/samples/formatting as the basis for this approach.
However, the cells don't show their contents using that format. They continue to be in "Automatic" format, either showing the numeric value that I'm storing (the number of days from 1st Jan 1900) or (sometimes) the date.
Adding sheetId to the range definition doesn't alter the outcome.
I'm not getting an error back from the service and the response only contains the spreadsheetId and an empty replies structure [{}].
What am I getting wrong?
I've found the error - the endColumnIndex needs to be 5, not 4.
I didn't read that first linked article carefully enough!

Elasticsearch how to quickly find the result when search array fields

My JSON data:
{
"date": 1484219926,
"uid": "1234567",
"interest": [
"2000001",
"2000002",
"....",
"2000xxx"
],
"other": "xxxx"
}
The search result as the following SQL:
select count(*)
from xxxxx
where date > time1 and
date < time2 and
interest="20000xxx"
The filed interest may have 500 itmes. I want to quickly get the search result, what shou I do? The total data may be 2 billion.

Logstash: modify apache date format

The grok-filter %{COMBINEDAPACHELOG} formats the timestamp as dd/MMM/YYYY:HH:mm:ss Z however I need the timestamp in the format of yyyy-MM-dd HH:mm:ss
I tried the below configuration
grok {
match => [
"message", "%{COMBINEDAPACHELOG}",
]
break_on_match => false
}
date {
match => [ "timestamp", "yyyy-MM-dd HH:mm:ss" ]
target => ["datetime"]
}
but got the below parsing error:
Failed parsing date from field {:field=>"timestamp", :value=>"19/May/2012:12:40:18 -0700", :exception=>java.lang.IllegalArgumentException: Invalid format: "19/May/2012:12:40:18 -0700" is malformed at "/May/2012:12:40:18 -0700", :level=>:warn}
Would highly appreciate if anyone can throw more light on the same.
The COMBINEDAPACHELOG pattern is expecting the date in the log entry to match the format so it can shove it into the "timestamp" field. It doesn't format your timestamp at all.
Once the date has been grok'ed out into "timestamp", you can use the date{} filter to move it into #timestamp. The pattern you supply there should match whatever's in the field.
So, pass "dd/MMM/yyyy:HH:mm:ss Z" as the format to date{} and you should be all set.
EDIT:
Based on your additional details, I was hoping that you could match each component of the input date and then combine them into a new field. That would work if you were trying to swap, say, firstName and lastName in a string, but dates are more complicated. A simple string swap wouldn't handle converting "Jan" to "01" or deal with timezones at all.
So, we're back to creating a date object and then outputting that as a string in the format you desire.
# convert "timestamp" to a date field "datetime"
date {
match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ]
target => ["datetime"]
}
# convert "datetime" to a string "datestring"
ruby {
code => "
event['datestring'] = event['datetime'].strftime('%Y-%m-%d %H:%M:%S')
"
}
For the latest version of Logstash the code would be:
# convert "datetime" to a string "datestring"
ruby {
code => "event.set('datestring', event.get('datetime').strftime('%Y-%m-%d %H:%M:%S'))"
}