Convert epoch time in Data Factory to datetime - sql

I have the following query. This query copies the data from Cosmos DB to Azure Data Lake.
select c.Tag
from c
where
c.data.timestamp >= '#{formatDateTime(addminutes(pipeline().TriggerTime, -15), 'yyyy-MM-ddTHH:mm:ssZ' )}'
However, I've got to use the _ts which is the epoch time when the document was created on the cosmos DB collection instead of c.data.timestamp. How do I convert epoch time to date time and compare with it with '#{formatDateTime(addminutes(pipeline().TriggerTime, -15), 'yyyy-MM-ddTHH:mm:ssZ' )}'
I have also tried using
dateadd( SECOND, c._ts, '1970-1-1' ) which clearly isn't supported.

As #Chris said, you could use UDF in cosmos db query.
udf:
function convertTime(unix_timestamp){
var date = new Date(unix_timestamp * 1000);
return date;
}
sql:
You could merge it into your transfer sql:
select c.Tag
from c
where
udf.convertTime(c._ts) >= '#{formatDateTime(addminutes(pipeline().TriggerTime, -15), 'yyyy-MM-ddTHH:mm:ssZ' )}'

Related

Best way for storing and querying DateTime in Azure CosmosDB

Currently I work on indexing documents in Azure Cosmos DB SQL API. I read here:
https://learn.microsoft.com/en-us/azure/cosmos-db/working-with-dates
that date type could be stored as a string or numeric value as Unix timestamp.
There are several advantages when using string format yyyy-MM-ddTHH:mm:ss.fffffffZ like e.g. human readable, but there is not written directly that this is also more performant when querying.
So we can use string range index or numeric range index for epoch time.
Does someone know what is faster for querying - range filters operations?
e.g. startTime > x and startTime < y
or in udf in similar way when taking into account only hour
e.g.
function somefunc(ts, firstHour, secondHour) {
var ts_date = new Date(ts);
var hour = ts_date.getHours();
return hour >= firstHour && hour < secondHour;
}

Defining an EXTRACT range from a SELECT statement

I intend to process a dataset from EventHub stored in ADLA, in batches. It seems logical to me to process intervals, where my date is between my last execution datetime and the current execution datetime.
I thought about saving the execution timestamps in a table so I can keep track of it, and do the following:
DECLARE #my_file string = #"/data/raw/my-ns/my-eh/{date:yyyy}/{date:MM}/{date:dd}/{date:HH}/{date:mm}/{date:ss}/{*}.avro";
DECLARE #max_datetime DateTime = DateTime.Now;
#min_datetime =
SELECT (DateTime) MAX(execution_datetime) AS min_datetime
FROM my_adldb.dbo.watermark;
#my_json_bytes =
EXTRACT Body byte[],
date DateTime
FROM #my_file
USING new Microsoft.Analytics.Samples.Formats.ApacheAvro.AvroExtractor(#"{""type"":""record"",""name"":""EventData"",""namespace"":""Microsoft.ServiceBus.Messaging"",""fields"":[{""name"":""SequenceNumber"",""type"":""long""},{""name"":""Offset"",""type"":""string""},{""name"":""EnqueuedTimeUtc"",""type"":""string""},{""name"":""SystemProperties"",""type"":{""type"":""map"",""values"":[""long"",""double"",""string"",""bytes""]}},{""name"":""Properties"",""type"":{""type"":""map"",""values"":[""long"",""double"",""string"",""bytes"",""null""]}},{""name"":""Body"",""type"":[""null"",""bytes""]}]}");
How do I properly add this interval to my EXTRACT query? I tested it using a common WHERE clause with interval defined by hand and it worked, but when I attempt to use #min_datetime it doesn't work, since its result is a rowset.
I thought about applying some filtering in a subsequent query, but I am afraid this means #my_json_bytes will extract my whole dataset and filter it aftewards, resulting in a suboptimized query.
Thanks in advance.
You should be able to apply the filter as part of a later SELECT. U-SQL can push up predicates in certain conditions but I haven't been able to test this yet. Try something like this:
#min_datetime =
SELECT (DateTime) MAX(execution_datetime) AS min_datetime
FROM my_adldb.dbo.watermark;
#my_json_bytes =
EXTRACT Body byte[],
date DateTime
FROM #my_file
USING new Microsoft.Analytics.Samples.Formats.ApacheAvro.AvroExtractor(#"{""type"":""record"",""name"":""EventData"",""namespace"":""Microsoft.ServiceBus.Messaging"",""fields"":[{""name"":""SequenceNumber"",""type"":""long""},{""name"":""Offset"",""type"":""string""},{""name"":""EnqueuedTimeUtc"",""type"":""string""},{""name"":""SystemProperties"",""type"":{""type"":""map"",""values"":[""long"",""double"",""string"",""bytes""]}},{""name"":""Properties"",""type"":{""type"":""map"",""values"":[""long"",""double"",""string"",""bytes"",""null""]}},{""name"":""Body"",""type"":[""null"",""bytes""]}]}");
#working =
SELECT *
FROM #my_json_bytes AS j
CROSS JOIN
#min_datetime AS t
WHERE j.date > t.min_datetime;

Date comparison in Hive

I'm working with Hive and I have a table structured as follows:
CREATE TABLE t1 (
id INT,
created TIMESTAMP,
some_value BIGINT
);
I need to find every row in t1 that is less than 180 days old. The following query yields no rows even though there is data present in the table that matches the search predicate.
select *
from t1
where created > date_sub(from_unixtime(unix_timestamp()), 180);
What is the appropriate way to perform a date comparison in Hive?
How about:
where unix_timestamp() - created < 180 * 24 * 60 * 60
Date math is usually simplest if you can just do it with the actual timestamp values.
Or do you want it to only cut off on whole days? Then I think the problem is with how you are converting back and forth between ints and strings. Try:
where created > unix_timestamp(date_sub(from_unixtime(unix_timestamp(),'yyyy-MM-dd'),180),'yyyy-MM-dd')
Walking through each UDF:
unix_timestamp() returns an int: current time in seconds since epoch
from_unixtime(,'yyyy-MM-dd') converts to a string of the given format, e.g. '2012-12-28'
date_sub(,180) subtracts 180 days from that string, and returns a new string in the same format.
unix_timestamp(,'yyyy-MM-dd') converts that string back to an int
If that's all getting too hairy, you can always write a UDF to do it yourself.
Alternatively you may also use datediff. Then the where clause would be
in case of String timestamp (jdbc format) :
datediff(from_unixtime(unix_timestamp()), created) < 180;
in case of Unix epoch time:
datediff(from_unixtime(unix_timestamp()), from_unixtime(created)) < 180;
I think maybe it's a Hive bug dealing with the timestamp type. I've been trying to use it recently and getting incorrect results.
If I change your schema to use a string instead of timestamp, and supply values in the
yyyy-MM-dd HH:mm:ss
format, then the select query worked for me.
According to the documentation, Hive should be able to convert a BIGINT representing epoch seconds to a timestamp, and that all existing datetime UDFs work with the timestamp data type.
with this simple query:
select from_unixtime(unix_timestamp()), cast(unix_timestamp() as
timestamp) from test_tt limit 1;
I would expect both fields to be the same, but I get:
2012-12-29 00:47:43 1970-01-16 16:52:22.063
I'm seeing other weirdness as well.
TIMESTAMP is milliseconds
unix_timestamp is in seconds
You need to multiply the RHS by 1000.
where created > 1000 * date_sub(from_unixtime(unix_timestamp()), 180);
After reviewing this and referring to Date Difference less than 15 minutes in Hive I came up with a solution. While I'm not sure why Hive doesn't perform the comparison effectively on dates as strings (they should sort and compare lexicographically), the following solution works:
FROM (
SELECT id, value,
unix_timestamp(created) c_ts,
unix_timestamp(date_sub(from_unixtime(unix_timestamp()), 180), 'yyyy-MM-dd') c180_ts
FROM t1
) x
JOIN t1 t ON x.id = t.id
SELECT to_date(t.Created),
x.id, AVG(COALESCE(x.HighestPrice, 0)), AVG(COALESCE(x.LowestPrice, 0))
WHERE unix_timestamp(t.Created) > x.c180_ts
GROUP BY to_date(t.Created), x.id ;

converting Epoch timestamp to sql server(human readable format)

I have a problem in converting the Unix timestamp to sql server timestamp.
I have a data in excel sheet and I will import that data through a tool. So I am looking for a code or syntax which can convert that Epoch timestamp to sql server timestamp.
I have 3 different columns with the same format. How can I change the values in those columns.
For Example:
Epoch timestamp ---1291388960
sql server timestamp--- 2010-12-03 15:09:20.000
I have 3 different columns with the same format. How can I change the values in those columns.
To update 3 columns in a table, you can pair DATEADD seconds to the epoch (1 Jan 1970) with the column name, i.e.
update tbl set
datetimecol1 = dateadd(s, epochcol1, '19700101'),
datetimecol2 = dateadd(s, epochcol2, '19700101'),
datetimecol3 = dateadd(s, epochcol3, '19700101')
You can't update in place since a bigint column cannot also be a datetime column. You have to update them into 3 other columns.
Use the DATEADD function:
SELECT DATEADD(ss, 1291388960, '19700101')
...specifying a date of January 1st, 1970. In this example, it was provided in the YYYYMMDD format.
DATEADD will return a DATETIME data type, so if you have a table & column established -- you can use the function to INSERT/UPDATE depending on your needs. Provide details, and I'll clarify. Once you have a DATETIME to work with, you can use CAST or CONVERT to format the date in TSQL.

Sql Shorthand For Dates

Is there a way to write a query equivalent to
select * from log_table where dt >= 'nov-27-2009' and dt < 'nov-28-2009';
but where you could specify only 1 date and say you want the results for that entire day until the next one.
I'm just making this up, but something of the form:
select * from log_table where dt = 'nov-27-2009':+1;
I do not believe there is one method that is portable to all RDBMSes.
A check in one of my references (SQL Cookbook) shows that no one RDBMS solves the problem quite the same way. I would recommend checking out Chapter 8 of that book, which covers all of the different methods for DB2, Oracle, PostgreSQL, MySQL.
I've had to deal with this issue in SQLite, though, and SQL Cookbook doesn't address that RDBMS, so I'll mention a bit about it here. SQLite doesn't have a date/time data type; you have to create your own by storing all date/time data as TEXT and ensure that your application enforces its formatting. SQLite does have a set of date/time conversion functions that allow you to add nominal date/times while maintaining the data as strings. If you need to add two time durations (HH:MM:SS) to each other, though, based upon data that you've stored in text columns that you are treating as date/time data, you'll have to write your own functions (search for "Defining SQLite User Functions") and attach them to the database at runtime via a call to sqlite3_create_function(). If you want an example of some user functions that add time values, let me know.
For MS SQL Server, check out DATEPART.
/* dy = Day of Year */
select * from log_table where datepart(dy, dt) = datepart(dy, '2009-nov-27');
With SQL Server, you could
Select * From table
Where dt >= DateAdd(day, DateDiff(day, 0, #ParamDate), 0)
And dt < DateAdd(day, DateDiff(day, 0, #ParamDate), 1)
As long as you are dealing with the date data type for the respective data type, the following will work:
t.date_column + 1
...will add one day to the given date. But I have yet to find a db that allows for implicit data type conversion into a date.
SELECT '12-10-2009' + 1
...will fail on SQL Server because SQL Server only performs the implicit conversion when comparing to a datetime data type column. So you need to use:
SELECT CONVERT(DATETIME, '12-10-2009') + 1
For Oracle, you'd have to use the TO_DATE function; MySQL would use something like STR_TO_DATE, etc.
Have a column that just has the date part (time is 00:00:00.000) and then you can add a where clause: WHERE dt = '2009-11-27'