I am using Hive .14 to analyze the following input data in timestamp format (# is irrelevant to the explanation):
#
Datetime
1
2022-03-01 00:13:08
2
2022-03-31 23:52:24
3
2022-02-28 23:32:40
and I want to get what day of the week in which each data took place (either by a number representing the day from 0 to 6, or the day itself) in, similar to the next format:
#
Day of the week
1
Tuesday or 2
2
Thursday or 4
3
Monday or 1
I have tried to use the unixtime command to transform the timestamp into an integer
like this:
select cast(from_unixtime(unix_timestamp(datetime,'yyyy-MM-dd'),'yyyyMMdd') as int) as dayint from yellowtaxi;
To later use the from_unixtime(dayint,u) query to get the day of the week in which it happened, however, this results in all the days from all the rows being equal to 20220301 and to all the days being equal to 7 when using from_unixtime(dayint,u).
What am I doing wrong, or is there an easier way to do it?
I have already tried the day_format() and the dayofweek() queries, but none of them seem to be available in my hive version.
Related
I have a dataset where certain operations occur during the overnight hours which I'd like to attribute to the day before.
For example, anything happening between 2/23 8pm and 2/24 6am should be included in 2/23's metrics rather than 2/24. Anything from 6:01 am to 7:59pm should be counted in 2/24's metrics.
I've seen a few posts about decrementing time by 6 hours but that doesn't work in this case.
Is there a way to use an If function to specify that midnight-6am should be counted as date-1 rather than date without affecting the metrics for the 6am - 7:59pm hours?
Thanks in advance! Also, a SQL newbie here so apologies if I have lots of followup questions.
You can use date_add with -6 hours and then optionally cast the timestamp as a date.
create table t (dcol datetime);
insert into t values
('2022-02-25 06:01:00'),
('2022-02-25 06:00:00'),
('2022-02-25 05:59:00');
SELECT CAST(DATE_ADD(dcol, INTERVAL -6 HOUR)AS DATE) FROM t;
| CAST(DATE_ADD(dcol, INTERVAL -6 HOUR)AS DATE) |
| :-------------------------------------------- |
| 2022-02-25 |
| 2022-02-25 |
| 2022-02-24 |
db<>fiddle here
As said in the comments, your requirement is the occurrences in a 6 AM to 6 AM day instead of a 12-12 day. You can achieve this by decreasing the time by 6 hours as shown in #Kendle’s answer. Another way to do it is to use an IF condition as shown below. Here, the date is decremented if the time is before 6 AM on each day and the new date is put in a new column.
Query:
SELECT
IF
(TIME(eventTime) <= "06:00:00",
DATE_ADD(DATE(eventTime), INTERVAL -1 DAY),
DATE(eventTime)) AS newEventTime
FROM
`project.dataset.table`
ORDER BY
eventTime;
Output from sample data:
As seen in the output, timestamps before 6 AM are considered for the previous day while the ones after are considered in the current day.
I'm breaking my head trying to create a query for the following situation: I'm using an oracle database, I have a job that always runs at 00 o'clock, so I will fetch the beginning of a user's recess, in the example below we see how are the dates.
DATE_RECESS | USER_ID
---------------------------
22/09/21 | 1
21/09/21 | 1
20/09/21 | 1
19/09/21 | 1
18/09/21 | 1
I will need to notify him missing 10 days to the beginning of the recess, this notification will be sent only 1 time ...
So, looking at these dates with example, I should send 1 single notification on 08/09/21, I should not notify on the other days, only on the first day.
I can't send notification for every day, I should have some return just missing 10 days to start. Summarizing there's my doubt:
How to create a query, (which will be executed by a job that will run every day), and that does not bring result every day? Only when the first date is found?
For me to know what dates are missing 10 days I will take as base my current date, today + 10 days... if I find any date in this period then there I have a recess, but the return of the query should only bring something if it is date is the day 18/09/21, because there is beginning of the recess ... if today + 10 fall on the 19th I will bring nothing, because there will be the second day ...
In the script I need a counter to give the same day (in different years) a number, and when a new year begin it will start from 1 again. So when I reload my script I want a table that look like this:
Date Number
01/01/2015 1
...... ...
10/30/2015 303
10/31/2015 304
11/01/2015 305
.... ...
12/31/2015 365
01/01/2016 1
01/02/2016 2
.... ....
How can I do this?
DayNumberOfYear(date[,firstmonth])
Returns the day number of the year according to a timestamp with the first millisecond of the first day of the year containing date. The function always uses years based on 366 days.
By specifying a firstmonth between 1 and 12 (1 if omitted), the beginning of the year may be moved forward to the first day of any month. If you e.g. want to work with a fiscal year starting March 1, specify firstmonth = 3.
You could try subtracting the Date from the beginning of the year of the Date
Date-makedate(year(Date),1,1)+1 as Number
You'll need a bit more maths if leap years are going to matter in you analysis.
USERS
ID TIMEMODIFIED
1 1400481271
2 1400481489
3 1400486453
4 1400486525
5 1401777484
I have timemodified field, From timemodified, I need to get the rows of last 4 weeks by taking from today's date.
SELECT id FROM USERS
WHERE FROM_UNIXTIME(timemodified,'%d-%m-%Y') >= curdate()
AND FROM_UNIXTIME(timemodified,'%d-%m-%Y') < curdate()-1
Your times are already in Unix timestamp format. Bear in mind that it'll be far more efficient to compare [TIMEMODIFIED] against the current date converted to a Unix timestamp. In addition, you don't need to check any upper bound unless [TIMEMODIFIED] can be in the future.
Try:
-- 60x60x24x7x4 = 2419200 seconds in four weeks
SET #unix_four_weeks_ago = UNIX_TIMESTAMP(curdate()) - 2419200;
SELECT id FROM USERS
WHERE timemodified >= #unix_four_weeks_ago;
NB. Four weeks ago (i.e. today – 28 days) was 1,437,696,000 (24th July) at the time of this answer. The latest record in the sample you provided has a timestamp going back to the 3rd June 2014, and so none of these records will be returned by the query.
Have table: item (int) and timestamp (datetime).
Need to know if there are any records with timestamp after last 6 AM.
Example:
At 5 AM it should check if there are any records from after 6 AM
yesterday. AT 7 AM it should check if there are any records from
after 6 AM today
This could be done of course by making a variable with datepart as:
if time now is < 6 AM datepart should be yesterday if time now
is >= 6 AM datepart should be today
but there must be a simpler way ?
This expression should always return the most recent 06:00 in the past:
select DATEADD(hour,
(DATEDIFF(hour,'2014-01-01T06:00:00',CURRENT_TIMESTAMP)/24)*24,
'2014-01-01T06:00:00')
It works by asking how many hours have happened since some arbitrary, known, 6AM. We then round this number down to the closest multiple of 24 (by dividing and multiplying with integer maths), and add this number back onto the same, arbitrary, 6AM