I have the following problem to solve. I have a hive table, that store events, and each event timestamp is stored as unix timestamp (e.g. 1484336244).
Every day I want to run a query that fetches yesterdays events.
How could I form this query in Hive?
So for example, today is the 9th February, I want to get only the events that occurred on the 8th February.
Subtract one day from current_date and compare it with the column converted to yyyy-MM-dd format.
date_add(current_date,-1) = from_unixtime(colName,'yyyy-MM-dd')
Related
I have a set of data (start dates, end dates) with agents with login logoff times. I need a to convert the dates in weeks so that I find weekly averages of other columns further ahead in the question. Can we run a hive query on the same? The date is in the format dd-mm-yyyy.
I have a table where each entry has a timestamp. I am using this timestamp to calculate e.g. the number of entries per day in the last week. However, I do not really care whether the results that I get represent a particular day. The actual goal is to divide the entries into 24h-bins that I can compare and see whether there has been any significant change over time. Furthermore, since I am working with almost real-time data, I would like to perform this analysis at any time and also take into account the most recent entries. If I would just group the entries per day and perform the query in the middle of the day then I would get a not particularly insightful result for the current day.
My idea was now to subtract the current time from the timestamp of the entries and then do the grouping by days. This way I could get 24h-bins each of which represents a full 24h period and the youngest one also contains the most recent entries.
Something like this:
created_on - current_time
Of course I cannot subtract a time from a timestamp. Is there a way to convert current_time into an interval? Or is there an entirely different approach that is easier?
Is there a way to convert current_time into an interval?
Yes, just cast it.
Note that the use of current_time is discouraged so it's better to use localtime instead.
You can do:
created_on - localtime::interval
But it seems you might just want to set the time part of the timestamp to 00:00:00 which you can do by using date_trunc()
date_trunc('day', created_on)
In SQL Server I am writing a query to calculate some time between certain user events, and for this I need to run an aggregate query for set days for an event. An event can run on multiple days, and for the data I am using to validate my query, there is data for Feb 27th, Feb 28th, but the event goes from Feb 25th - March 1st.
I am only using a subset of the data to validate the query, there will be a lot more data which matches more days, or less days.
So I am trying to add an IF check to my query to only run the aggregate if data exists for that specific day, like so:
IF EXISTS(SELECT 1 FROM sourcedata SD
JOIN #dwellTime DT ON SD.badgeid = DT.badgeid
WHERE SD.eventid = 1234
AND CONVERT(date, SD.DateAdded, 110) = '2018-02-29')
But as I say, my #dwellTime data does not have data for 29th, and when it tries to do the convert, no data is there to convert and I receive the
Conversion failed when converting date and/or time from character string.
error.
How can I check for a specific day being in the data?
DateAdded is a datetime column.
I believe the problem is that "2018-02-29" is not a valid date. This year, February had only 28 days. The next leap year is 2020.
So, if you use a valid date for the comparison, you shouldn't have a problem.
If you want to include invalid dates, then you can make the comparison as a string rather than a date. Or, you can use try_convert(date, '2018-02-29'). This will return NULL.
I have a table on BigQuery with one of the columns being "timestamp". This column is of datatype INT64. I want to add a new column based on that column with the exact dates.
The data in the timestamp column is as follows:
-600 represents 19:00 EDT on Sunday May 1, 2011
-It is in microseconds, for e.g In one record there is 2506199602819 as the timestamp, this should be around 29 days after.
What would be the right way to proceed with this? I'm having this table on BigQuery but any SQL would be helpful.
You can do:
select timestamp_add(timestamp('2011-05-01T19:00:00', 'America/New_York'), interval 2506199602819 - 600 microsecond)
I have hourly price data for 10 years. Meaning, 24 prices for each day.
The problem is, the price is from the previous hour of trading. So, the source of my data has listed a 24th hour for each day, and there is no 0 hour.
Example (for further clarity):
The records for a day start at: 07/20/2010 01:00:00
The records for a day end at: 07/20/2010 24:00:00
This conflicts with the way my Rails Apps PostgreSQL DB wants to save DateTime value. When I imported this data from CSV into my DB and saved the dates into a DateTime column, it changed all of the 24:00:00 into 00:00:00 of the following day. This throws off the accuracy of my various end-uses.
Is there anyway I can modify my Postgres DB's behavior to not do this? Any other suggestions?
You could always subtract an hour after you perform the import.
I don't know your database schema so to do this in a general fashion you'd have to execute this SQL on each column that has a date.
UPDATE table SET date_field = date_field - INTERVAL '1 hour'