How to use KQL to format datetime stamp in 'yyyy-MM-ddTHH:mm:ss.fffZ'? - kql

I receive the error format_datetime(): failed to parse format string in argument #2 when trying to format_datetime() using ISO8601 format yyyy-MM-ddTHH:mm:ss.fffZ.
If I leave the T and the Z out, it works,
Surely KQL can format datetimestamps in timezone-aware format. I'm just missing it. I read the docs and it appears that T and Z are not supported formats nor delimiters yet each example in the docs shows the T and Z present(?).
Example:
StorageBlobLogs
| where
AccountName == 'stgtest'
| project
TimeGenerated = format_datetime(TimeGenerated, 'yyyy-MM-ddTHH:mm:ss.fffZ'), //herein lies the issue
AccountName,
RequestBodySize = format_bytes(RequestBodySize)
| sort by TimeGenerated asc
If the code is changed to...
- `TimeGenerated = format_datetime(TimeGenerated, 'yyyy-MM-dd HH:mm:ss.fff')`
...it works, but this is not a timezone-aware timestamp (something I prefer to work in to reduce confusion).

datetime_utc_to_local()
Timezones
I would highly recommend not doing that.
If possible, you would like to let the time zone & format being dealt on the client side.
All datetime values in KQL are UTC. Always.
Even the result of datetime_utc_to_local() is another UTC datetime.
That may lead to (what seems like) unexpected behavior of datetime manipulations (example).
StorageBlobLogs
| sample 10
| project TimeGenerated
| extend Asia_Jerusalem = datetime_utc_to_local(TimeGenerated, "Asia/Jerusalem")
,Europe_London = datetime_utc_to_local(TimeGenerated, "Europe/London")
,Japan = datetime_utc_to_local(TimeGenerated, "Japan")
| extend Asia_Jerusalem_str = format_datetime(Asia_Jerusalem ,"yyyy-MM-dd HH:mm:ss.fff")
,Europe_London_str = format_datetime(Europe_London ,"yyyy-MM-dd HH:mm:ss.fff")
,Japan_str = format_datetime(Japan ,"yyyy-MM-dd HH:mm:ss.fff")
| project-reorder TimeGenerated, Asia_Jerusalem, Asia_Jerusalem_str, Europe_London, Europe_London_str, Japan, Japan_str

Related

Converting "4:00AM" to ISO8601

The frontend is sending time values like "4:00AM" or "7:15PM" and I need to convert these to ISO8601 (i.e. "04:00:00" and "19:15:00" respectively) so that I can store them in a "time without time zone" column in PostgreSQL.
What is the easiest way to do this? I have Timex available, and am using Ecto.
Postgres is flexible about date and time formats. You can just cast these strings:
select '4:00AM'::time, '7:15PM'::time
time | time
:------- | :-------
04:00:00 | 19:15:00
In Postgres 'timestamp(tz), date, time(tz), etc` are not stored in a format. Formatting is something you do on retrieval. So:
create table time_test(time_fld time);
insert into time_test values ('4:00AM');
insert into time_test values ('7:15PM');
select time_fld from time_test ;
time_fld
----------
04:00:00
19:15:00
Just pass the string in and let Postgres do the work.
One might manually convert the value to the appropriate Time struct.
iex|1 ▶ [h, m, pm?] =
...|1 ▶ "4:00PM"
...|1 ▶ |> String.downcase()
...|1 ▶ |> String.split(~r/:|(?<=\d)(?=\D)/)
#⇒ ["4", "00", "pm"]
iex|2 ▶ m = String.to_integer(m)
iex|3 ▶ h =
...|3 ▶ if(match?("pm", pm?), do: 12, else: 0) + String.to_integer(h)
#⇒ 16
iex|4 ▶ Time.from_erl!({h, m, 0})
#⇒ ~T[16:00:00]

issue formatting into human time

SELECT
prefix_grade_items.itemname AS Course,
prefix_grade_items.grademax,
ROUND(prefix_grade_grades_history.finalgrade, 0)
AS finalgrade,
prefix_user.firstname,
prefix_user.lastname,
prefix_user.username,
prefix_grade_grades_history.timemodified
FROM
prefix_grade_grades_history
INNER JOIN prefix_user ON prefix_grade_grades_history.userid = prefix_user.id
INNER JOIN prefix_grade_items ON prefix_grade_grades_history.itemid =
prefix_grade_items.id
WHERE (prefix_grade_items.itemname IS NOT NULL)
AND (prefix_grade_items.itemtype = 'mod' OR prefix_grade_items.itemtype = 'manual')
AND (prefix_grade_items.itemmodule = 'quiz' OR prefix_grade_items.itemmodule IS NULL)
AND (prefix_grade_grades_history.timemodified IS NOT NULL)
AND (prefix_grade_grades_history.finalgrade > 0)
AND (prefix_user.deleted = 0)
ORDER BY course
Currently I am trying to polish this query. The problem I am having is using a UNIX Command to convert the time queried from timemodified into Human time. It comes out in epoch time. I have been attempting to use commands such as FROM_UNIXTIME(timestamp,'%a - %D %M %y %H:%i:%s') as timestamp. For reference this is a adhoc query to a moodle server contained in MariaDB. My desired result from the query is that nothing would change as far as the results we are getting, except that the time would be in a month/day/year format instead of the current format.
I have converted the timestamp into a custom date format using the below command in my select query.
DATE_FORMAT(FROM_UNIXTIME(`timestamp`), "%b-%d-%y")
As included in your question where you mention FROM_UNIXTIME(timestamp,'%a - %D %M %y %H:%i:%s'), it is indeed possible to include a second argument in order to specify the specific time/date format you wish to output converted from the UNIX timestamp.
That's the bit that looks like: '%a - %D %M %y %H:%i:%s' - this particular format string will give you an output that looks something like this: Fri - 24th January 20 14:17:09, which as you stated isn't quite what you were looking for, but we can fix that!
For example, the statement below will return the human-readable date (according to the value returned in the timestamp) in the form of month/day/year as you specified as the goal in your question, and would look similar to this: Jan/01/20
FROM_UNIXTIME(timestamp), '%b/%d/%y')
If you instead wish to use a 4 digit year you can substitute the lowercase %y for a capital %Y.
Additionally if a numeric month is instead preferred you can use %m in place of %b.
For a more comprehensive reference on the available specifiers that can be used to build up the format string, this page has a handy table
So putting it all together in the specific context of your original SQL query, using FROM_UNIXTIME to gain the human readable date (along with a suitable format string to specify the format of the output) may look something like this perhaps:
SELECT
prefix_grade_items.itemname AS Course,
prefix_grade_items.grademax,
ROUND(prefix_grade_grades_history.finalgrade, 0) AS finalgrade,
prefix_user.firstname,
prefix_user.lastname,
prefix_user.username,
FROM_UNIXTIME(prefix_grade_grades_history.timemodified, '%b/%d/%Y') AS grademodified
FROM
prefix_grade_grades_history
INNER JOIN prefix_user ON prefix_grade_grades_history.userid = prefix_user.id
INNER JOIN prefix_grade_items ON prefix_grade_grades_history.itemid = prefix_grade_items.id
WHERE (prefix_grade_items.itemname IS NOT NULL)
AND (prefix_grade_items.itemtype = 'mod' OR prefix_grade_items.itemtype = 'manual')
AND (prefix_grade_items.itemmodule = 'quiz' OR prefix_grade_items.itemmodule IS NULL)
AND (prefix_grade_grades_history.timemodified IS NOT NULL)
AND (prefix_grade_grades_history.finalgrade > 0)
AND (prefix_user.deleted = 0)
ORDER BY course
NOTE: I ended up specifying an alias for the timemodified column, calling it instead grademodified. This was done as without an alias the column name ends up getting a little busy :)
Hope that is helpful to you! :)

Issue while converting string data to decimal in proper format in sparksql

I am facing issue in spark sql while converting string to decimal(15,7).
Input data is:
'0.00'
'28.12'
'-39.02'
'28.00'
I have tried converting it into float and then converting into decimal but got unexpected results.
sqlContext.sql("select cast(cast('0.00' as float) as decimal(15,7)) from table").show()
The result I received is as follows
0
But I need to have data in the below format:
0.0000000
28.1200000
-39.0200000
28.0000000
You can try using the format_number method. Something like this.
df.withColumn("num", format_number(col("value").cast("decimal(15,7)"), 7)).show()
The results should be like this.
+------+-----------+
| value| num|
+------+-----------+
| 0.00| 0.0000000|
| 28.12| 28.1200000|
|-39.02|-39.0200000|
| 28.00| 28.0000000|
+------+-----------+

Bad performance when filtering Azure logs - WCF Data Services filters

Azure Diagnostics is pushing Windows Events into a storage table "WADWindowsEventLogsTable".
I would like to query this storage Table using VisualStudio(2015) and CloudExplorer.
As this table has an huge content, I'm indefinitely waiting for the results..
Here is a query sample:
EventId eq 4096 and Timestamp gt datetime'2016-06-24T08:20:00' and Timestamp lt datetime'2016-06-24T10:00:00'
I suppose that this query is correct ?
Does exist a way to improve performance ?
filter result columns ?
return only TOP X results ?
another usefull tips ?
I know that a better way would be to script that; for example using Python, but I would like to use the UI as much as possible..
(Edit) following Gaurav Mantri answer I used this little C# program to build my query. The answer is so quick and that solve my initial performance issue:
static void Main(string[] args)
{
string startDate = "24 June 2016 8:20:00 AM";
string endDate = "24 June 2016 10:00:00 AM";
string startPKey = convertDateToPKey(startDate);
string endPKey = convertDateToPKey(endDate);
Debug.WriteLine("(PartitionKey gt '" + startPKey + "'"
+ " and PartitionKey le '" + endPKey +"')"
+ " and (EventId eq 4096)"
);
}
private static string convertDateToPKey(string myDate)
{
System.DateTime dt = System.Convert.ToDateTime(myDate);
long dt2ticks = dt.Ticks;
string ticks = System.Convert.ToString(dt2ticks);
return "0" + ticks;
}
NB: for those, like me, who are searching so far away how to export results to a CSV file, you should know that this icon is your answer (and it's not a 'undo' ;) ):
In your query, you're filtering on Timestamp attribute which is not indexed (Only PartitionKey and RowKey attributes are indexed). Thus your query is making a full table scan (i.e. going from the 1st record till the time it finds a matching record) and hence not optimized.
In order to avoid full table scan, please use PartitionKey in your query. In case of WADWindowsEventLogsTable, the PartitionKey essentially represents the date/time value in ticks. What you would need to do is convert the date/time range for which you want to get the data into ticks, prepend a 0 in front of it and then use it in the query.
So your query would be something like:
(PartitionKey gt 'from date/time value in ticks prepended with 0' and PartitionKey le 'to date/time value in ticks prepended with 0') and (EventId eq 4096)
I wrote a blog post about it some time ago that you may find useful: http://gauravmantri.com/2012/02/17/effective-way-of-fetching-diagnostics-data-from-windows-azure-diagnostics-table-hint-use-partitionkey/

RODBC loses time values of datetime when result set is large

So this is VERY strange. RODBC seems to drop the time portion of DateTime SQL columns if the result set is large enough. (The queries are running against an SQL Server 2012 machine, and, yes, when I run them on the SQL Server side they produce identical and proper results, regardless of how many rows are returned.)
For example, the following works perfectly:
myconn <- odbcConnect(dsnName, uid, pwd)
results <- sqlQuery(myconn, "SELECT TOP 100 MyID, MyDateTimeColumn from MyTable ORDER BY MyDateTimeColumn DESC")
close(myconn)
In R, the following works as expected:
> results$MyDateTimeColumn[3]
[1] "2013-07-01 00:01:22 PDT"
which is a valid POSIXct date time. However, when somewhere between 10,000 and 100,000 rows are returned, suddenly the time portion disappears:
myconn <- odbcConnect(dsnName, uid, pwd)
bigResults <- sqlQuery(myconn, "SELECT TOP 100000 MyID, MyDateTimeColumn from MyTable ORDER BY MyDateTimeColumn DESC")
close(myconn)
(same code, simply a larger number of rows returned; NOTE: the exact same row has now lost its time component), R responds:
> bigResults$MyDateTimeColumn[3]
[1] "2013-07-01 PDT"
Note that the time is now missing (this is not a different row; it's the exact same row as previous), as the following shows:
>strptime(results$TriggerTime[3], "%Y-%m-%d %H:%M:%S")
[1] "2013-07-01 00:01:22"
>strptime(bigResults$TriggerTime[3], "%Y-%m-%d %H:%M:%S")
[1] NA
Obviously the work-around is either incremental query-with-append or export-to-CSV-and-import-to-R, but this seems very odd. Anyone ever seen anything like this?
Config: I'm using the latest version of RODBC (1.3-10) and can duplicate the behavior on both an R installation running on Windows x64 and an R installation running on Mac OS X 10.9 (Mavericks).
EDIT #2 Adding output of dput() to compare the objects, per request:
> dput(results[1:10,]$MyDateTimeColumn)
structure(c(1396909903.347, 1396909894.587, 1396909430.903, 1396907996.9, 1396907590.02, 1396906077.887, 1396906071.99, 1396905537.36, 1396905531.413, 1396905231.787), class = c("POSIXct", "POSIXt"), tzone = "")
> dput(bigResults[1:10,]$MyDateTimeColumn)
structure(c(1396854000, 1396854000, 1396854000, 1396854000, 1396854000, 1396854000, 1396854000, 1396854000, 1396854000, 1396854000), class = c("POSIXct", "POSIXt"), tzone = "")
It would appear that the underlying data are actually changing as a result of the number of rows returned by the query, which is downright strange.
sqlQuery() has an option called as.is. Setting this to TRUE will pull everything as seen in for example Microsoft SQL Management Studio.
I cope with the same problem as well. Even stranger, on a large dataset one column would import both date and time the other column only imported the date.
My advise would be to split the data/time in SQL
myconn <- odbcConnect(dsnName, uid, pwd)
results <- sqlQuery(myconn, "SELECT TOP 100 MyID, format(MyDateTimeColumn,"HH:mm:ss") as MyTimeColumn,format(MyDateTimeColumn,"yyyy-MM-dd") as MyDateColumn from MyTable ORDER BY MyDateTimeColumn DESC")
close(myconn)
Then combine them in R afterwards. Hope it helps.
I had the same issue and concluded that it is due to DST:
This fails:
as.POSIXct(c("2015-03-29 01:59:22", "2015-03-29 02:00:04"))
This works:
as.POSIXct(c("2015-03-29 01:59:22", "2015-03-29 02:00:04"), tz="UTC")
I could not find how to force tz="UTC" in default RODBC behavior, however using as.is = TRUE and converting columns myself does the job.
Note: At first I had the impression that it was due to huge result, but in fact it was due to the fact that in huge results I have more chances to cross DST updates.
This is an older question, but I had similar issues when trying to programmatically read in data from 15 different .accdb. All POSIXct fields were read in correctly for every database except those from the months of March, from which I inferred that it is some sort of daylight-savings time issue.
The solution for me (because I didn't want to have to make multiple queries to a dB and then rbind() everything together) was to alter my function to include the lines
#Get initial tz
current_tz <- Sys.timezone()
Sys.setenv(TZ = "America/Phoenix")
[BODY OF FUNCTION]
Sys.setenv(TZ = curent_tz)
After including these few lines, the day/time fields from the March databases were being read in correctly.
sqlQuery(ch, getSQL(sqlquerypath))
stripped the times off my datetime column.
sqlQuery(ch, getSQL(sqlquerypath), as.is = TRUE)
fixed the issue.
I think this is a case of times being stripped from the dates where the date range includes shifts to/from daylight savings time. If you are selecting periods that don't include daylight savings shifts, the times will be retained (e.g., from 1/1/2007 to 3/1/2007. This could possibly be avoided by changing the system time on your computer to follow a time zone (e.g., Arizona) where there are no daylight savings shifts (sounds bizarre, but it has worked for me).
To overcome this issue, import the DateTimes as characters (using "as.is") and then convert them to POSIXct. You could alternatively use "strptime" which converts to POSIXlt and allows you to specify the format. Here's an example of a SQL query where DateTimes are imported as.is (TRUE), but asociated DataValues are not (FALSE) and then the DateTime is converted to an R date format:
data <- sqlQuery(channel, paste("SELECT LocalDateTime, DataValue FROM DataValues WHERE LocalDateTime >= '1/1/2007 0:00' AND LocalDateTime < '1/1/2008 0:00' ORDER BY LocalDateTime ASC"),as.is=c(TRUE,FALSE))
data$LocalDateTime <- as.POSIXct(totalP$LocalDateTime,tz="MST")
It may be a daylight saving issue. If there is a time that doesnt exist in your timezone (because of daylight saving) it may cause something like this.
Why this happens on large datasets returned from sqlQuery()? I don't know. But was able to workaround it by applying a sql conversion in the sql call:
data <- sqlQuery(channel, "SELECT CONVERT(nvarchar(24), DtTimeField, 21) AS HourDt, * FROM ...
This is your workaround.