I have a column of bytearray epoch timestamps in milliseconds in pig. I was wondering how to get the hour and minute cooresponding to this timestamp.
For example:
Hour(1441016271778) = 10
Minute(1441016271778) = 17
GetHour and GetMinute from the pig docs isn't working, it produces null.
GetHour and GetMinute will take DateTime object as input.
Ref :
http://pig.apache.org/docs/r0.12.0/func.html#get-hour
http://pig.apache.org/docs/r0.12.0/func.html#get-minute
Input :
1441016271778
Pig Script :
A = LOAD 'input.csv' USING PigStorage(',') AS (epoch_time:long);
B = FOREACH A GENERATE GetHour(ToDate(epoch_time)) AS hour, GetMinute(ToDate(epoch_time)) AS min;
Output :
(3,17)
Related
using BigQuery query API to retrieve data from BigQuery. for timestamp column , am getting values in different format.
query="select * from table"
QueryJobConfiguration queryConfig = QueryJobConfiguration
.newBuilder(query)
.setUseLegacySql(false)
.build();
Value in Table : "2022-02-25 08:47:48.801665"
Value in Output : 1.645778868801665E9
If I am casting to string the getting proper value. why is this happening ?
can someone explain ?
You need to convert to TIMESTAMP, because it takes milliseconds since 1970-01-01T00:00:00 UTC. That's why you are getting this result.
You can convert this result to TIMESTAMP using this formula timestamp.toInstant().toEpochMilli() * 1000.
You can see this example:
QueryParameterValue.timestamp(
// Timestamp takes microseconds since 1970-01-01T00:00:00 UTC
timestamp.toInstant().toEpochMilli() * 1000))
Here is more documentation about it.
If you want to cast from BigQuery. You have some options.
CAST the TIMESTAMP columns.
SELECT CAST(DATE("2022-02-25 08:47:48.801665") AS TIMESTAMP )
CAST TIMESTAMP to STRING.
SELECT STRING(TIMESTAMP "2022-02-25 08:47:48.801665", "UTC") AS string;
Give some format.
SELECT FORMAT_TIMESTAMP("%c", TIMESTAMP "2022-02-25 08:47:48.801665", "UTC") AS formatted;
You can see more documentation about CAST.
I have a requirement to convert the mentioned input string format and produce the desired output in timestamp as shown below.
Input: 16AUG2001:23:46:32.876086
Desired Output: 2001-08-16 23:46:32.876086
Output which is coming by running the below code: 2001-08-17 00:01:08
Query:
select '16AUG2001:23:46:32.876086' as row_ins_timestamp,
from_unixtime(unix_timestamp('16AUG2001:23:46:32.876086',
'ddMMMyyyy:HH:mm:ss.SSSSSS')) as row_ins_timestamp
from temp;
Milliseconds part is not getting converted as required. Please suggest.
unix_timestamp function does not preserve milliseconds.
Convert without milliseconds, then concatenate with millisecond part:
with your_data as (
select stack(3,
'16AUG2001:23:46:32.876086',
'16AUG2001:23:46:32',
'16AUG2001:23:46:32.123'
) as ts
)
select concat_ws('.',from_unixtime(unix_timestamp(split(ts,'\\.')[0],'ddMMMyyyy:HH:mm:ss')),split(ts,'\\.')[1])
from your_data;
Result:
2001-08-16 23:46:32.876086
2001-08-16 23:46:32
2001-08-16 23:46:32.123
Time taken: 0.089 seconds, Fetched: 3 row(s)
I have the following query. This query copies the data from Cosmos DB to Azure Data Lake.
select c.Tag
from c
where
c.data.timestamp >= '#{formatDateTime(addminutes(pipeline().TriggerTime, -15), 'yyyy-MM-ddTHH:mm:ssZ' )}'
However, I've got to use the _ts which is the epoch time when the document was created on the cosmos DB collection instead of c.data.timestamp. How do I convert epoch time to date time and compare with it with '#{formatDateTime(addminutes(pipeline().TriggerTime, -15), 'yyyy-MM-ddTHH:mm:ssZ' )}'
I have also tried using
dateadd( SECOND, c._ts, '1970-1-1' ) which clearly isn't supported.
As #Chris said, you could use UDF in cosmos db query.
udf:
function convertTime(unix_timestamp){
var date = new Date(unix_timestamp * 1000);
return date;
}
sql:
You could merge it into your transfer sql:
select c.Tag
from c
where
udf.convertTime(c._ts) >= '#{formatDateTime(addminutes(pipeline().TriggerTime, -15), 'yyyy-MM-ddTHH:mm:ssZ' )}'
Is there a function to cast INT to DATE or DATE to INT for timestamp filtering?
Is see no such function in Google BigQuery datetype pages.
Below is the query i had made for a public dataset for testing:
SELECT title, comment, contributor_username FROM [bigquery-public-data:samples.wikipedia] WHERE wp_namespace = 3 AND timestamp = ??
Image
As shown in the image in attached link, the timestamp column is INT datetype.
I suspect you want UNIX_SECONDS to convert a timestamp to the number of seconds since the Unix epoch, and TIMESTAMP_SECONDS to convert a number of seconds since the Unix epoch into a TIMESTAMP.
(I'm guessing that the integer value you've got is a number of seconds since the Unix epoch. That would work for the value you've shown, if it's meant to represent a timestamp in March 2007.)
Here's a sample to show 10 contributions from March 2007:
SELECT title, comment, contributor_username, TIMESTAMP_SECONDS(timestamp) AS timestamp
FROM `bigquery-public-data.samples.wikipedia`
WHERE
wp_namespace = 3
AND timestamp >= UNIX_SECONDS('2007-03-01T00:00:00Z')
AND timestamp < UNIX_SECONDS('2007-04-01T00:00:00Z')
LIMIT 10
For Unix time (in your example), use Daisy's answer.
However if in other cases you want DATE(2018,01,01) to be 20180101 (integer) you can use CAST(FORMAT_DATE("%Y%m%d", date) AS INT64).
For TIMESTAMP, use FORMAT_TIMESTAMP
I'm unable to parse date in pig.
Date format is Mon, 10/11/10 01:02 PM
I load data using the following command:
data = load 'CampaignData.csv' using PigStorage(';');
Next I generate the date column as a chararray using the following command:-
date_data = foreach data generate (chararray) $272 as dates;
when I dump date_data I get the following in output:
Mon
How to get the complete date?
You don't need $272 to convert date provided to datetime object. You can simply follow this :
date_data = foreach data generate ToDate($273, ' MM/dd/yy hh:mm aaa');
Just make sure $273 is chararray and there is space before data format string specified in ToDate function above. Space is required only to make sure format string looks exactly as data that would be present after parsing row using comma delimiter.