Hive date function to achieve day of week - apache

I'm looking for a workaround or hive date functions that gives day of the week ,
Sunday - 1
Monday - 2
Tuesday - 3
Wednesday - 4
Thursday - 5
Friday - 6
Saturday - 7
Requirement in detail : I'm looking for a function that takes date string (YYYYMMDD) as input and outputs the day of the week as per the above table.

Consider using from_unixtime(your date,'u') - this will return day number of week starting from Monday=1.
If your date is not in unixtime format, you can use the following instead:
from_unixtime(unix_timestamp('20140112','yyyyMMdd'),'u')
see: http://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html for simple date format documentation.

You can now use date_format (Hive 1.2):
hive> select date_format('2016-12-01' ,'u');
OK
4

select pmod(datediff(your_date,'1900-01-07'),7) + 1 as WeekDay from your_table
arbitrary start date picked (1900-01-07)
calculates the mod 7 day of week (plus 1 to start at 1 instead of zero)

Expanding on iggy's answer, here is the query to get the days of the week. Adjust the query to set the first day of the week as necessary.
SELECT current_date AS `Date`,
CASE date_format(current_date,'u')
WHEN 1 THEN 'Mon'
WHEN 2 THEN 'Tues'
WHEN 3 THEN 'Wed'
WHEN 4 THEN 'Thu'
WHEN 5 THEN 'Fri'
WHEN 6 THEN 'Sat'
WHEN 7 THEN 'Sun'
END AS day_of_week

From Hive 2.2 there is another possibility:
hive> select extract(dayofweek FROM your_date) FROM your_table;

As I said you need to write a UDF which will accept a string as parameter and return a string.
Inside the UDF you need to do these steps:
1.) Parse the input string using SimpleDateFormat(YYYYMMDD)
2.) Use the Below code to get the day of week:
Calendar c = Calendar.getInstance();
c.setTime(yourDate);
int dayOfWeek = c.get(Calendar.DAY_OF_WEEK);
3.) Use this dayOfWeek value in a case statement to get your weekday String and return that string.
Hope this helps...!!!

Related

Hive: Calculate exactly 1 year from date in format 'yyyy-MM-dd' string

I need to calculate if has passed exactly 1 year or more from this date '2021-01-29', in HIVE.
So the result date must be in 'yyyy-MM-dd' format, and equal to '2022-01-29' or later. '2022-01-28' it's not correct answer.
It's possible to use date_add('2021-01-29', interval 1 year), if so, could someone explain how?
Thank you in advance.
In newer versions of Hive since 1.2.0 you can add interval to the date:
select date('2021-01-29') + interval 1 year
Result:
2022-01-29
For old version of hive use this recipe:
1 Year = 12 months. Add 12 months using add_months function:
select add_months('2021-01-29',12)
Result:
2022-01-29
If you want to add more than one year, multiply 12 by the number of years.

Why SparkSQL is not giving the right day of the week?

I have this query in SparkSQL.
WITH a AS (
SELECT OrderDts,
CASE EXTRACT(DAYOFWEEK FROM OrderDts)
WHEN 1 THEN 'Mon'
WHEN 2 THEN 'Tues'
WHEN 3 THEN 'Wed'
WHEN 4 THEN 'Thu'
WHEN 5 THEN 'Fri'
WHEN 6 THEN 'Sat'
WHEN 7 THEN 'Sun'
END as dayofweek
FROM Orders
)
SELECT * FROM a
ORDER BY OrderDts DESC
However, I get the wrong day name. For example, it shows me the following.
2021-05-10 05:58 Tues
While 10 May is actually Monday. Any idea why this problem occurs and how to solve it?
The docs says:
"DAYOFWEEK",("DOW") - the day of the week for datetime as Sunday(1) to
Saturday(7) "DAYOFWEEK_ISO",("DOW_ISO") - ISO 8601 based day of the
week for datetime as Monday(1) to Sunday(7)
So you can use DAYOFWEEK_ISO instead of DAYOFWEEK, or change the list of when statements.
by the way in Hive yo could simply use this to get name of day :
select date_format(current_date,'EEEE');

Pass column value as Date Part argument

I am trying to generate a string array of weekdays and use it find how many times each day appears in a month
I am using standard sql on BigQuery
My query would look like
with weeks as (select array['SUNDAY','MONDAY','TUESDAY','WEDNESDAY','THURSDAY','FRIDAY','SATURDAY'] as wk)
select DATE_DIFF('2019-01-31','2019-01-01',WEEK(wk)) AS week_weekday_diff
from weeks, unnest(wk) as wk
The query however fails with the error A valid date part argument for WEEK is required, but found wk. wk is a column value having the Days of Week, WEEK is a Functions which expects a literal DAYOFWEEK. Is there a way i pass the column value as arguments
Below is for BigQuery Standard SQL
error "A valid date part argument for WEEK is required, but found wk"
WEEK(<WEEKDAY>): Valid values for WEEKDAY are literal SUNDAY, MONDAY, TUESDAY, WEDNESDAY, THURSDAY, FRIDAY, and SATURDAY.
... Is there a way i pass the column value as arguments?
If you wish - you can submit feature request at https://issuetracker.google.com/issues/new?component=187149&template=0
find how many times each day appears in a month
To get your expected result and overcome above "issue" you can approach task from opposite angle - just extract weekdays positions and then do needed stats as in example below
#standardSQL
WITH weekdays AS (SELECT ['SUNDAY','MONDAY','TUESDAY','WEDNESDAY','THURSDAY','FRIDAY','SATURDAY'] AS wk)
SELECT wk[ORDINAL(pos)] weekday, COUNT(1) cnt
FROM weekdays,
UNNEST(GENERATE_DATE_ARRAY('2019-01-01','2019-01-31')) day,
UNNEST([EXTRACT(DAYOFWEEK FROM day)]) pos
GROUP BY pos, weekday
ORDER BY pos
with result
Row weekday cnt
1 SUNDAY 4
2 MONDAY 4
3 TUESDAY 5
4 WEDNESDAY 5
5 THURSDAY 5
6 FRIDAY 4
7 SATURDAY 4
Trying your query, what I have noticed to be returning an error is:
select DATE_DIFF('2019-01-31','2019-01-01',WEEK('WEDNESDAY')) AS week_weekday_diff;
as the function WEEK(< WEEKDAY >) is expecting something like:
select DATE_DIFF('2019-01-31','2019-01-01',WEEK(`WEDNESDAY`)) AS week_weekday_diff;
OR
select DATE_DIFF('2019-01-31','2019-01-01',WEEK(WEDNESDAY)) AS week_weekday_diff;
I think that the WEEK(< WEEKDAY >) only accepts the weekdays in the format exposed here, so no strings should be valid.

SQL get data between the last four weeks

I would like to get data of the last four weeks. Now, I usually run the query on a Tuesday, therefore I have this code:
AND datatime between dateadd(day,-30,getdate()) and dateadd(day,-2,getdate())
I would like to run the query whenever I want (not just on Tuesday), but obtaining also the data from the same period (last four weeks from Monday to Sunday).
I have tried to do the following
AND datatime between dateadd(week,-4,getdate()) and dateadd(week,-1,getdate())
but it does not work, as the data obtained is not from the previous weeks (Monday to Sunday) but from the 7 days before the date in which I run it.
Any ideas on how to get the data, using that structure, so that I get data from the 4 previous weeks (Monday to Sunday) no matter when I run the query?
P.S I am using DBeaver.
Thank you.
If you want all data from the first monday in the last 4 weeks to today, you can use this :
AND datatime >= DATE_ADD(
DATE_ADD(CURDATE(), INTERVAL - WEEKDAY(CURDATE()) DAY),
INTERVAL - 4 WEEK)
But, if you want all the data from the first monday of the last 4 weeks to the last sunday, you can use this :
AND datatime >= DATE_ADD(
DATE_ADD(CURDATE(), INTERVAL - WEEKDAY(CURDATE()) DAY),
INTERVAL - 4 WEEK)
AND datatime < DATE_SUB(
DATE(NOW()), INTERVAL DAYOFWEEK(NOW())-2 DAY)
What's happening :
You can find Date and Time Functions Here. In this query, we are using:
DATE_ADD(start, Interval) : Add time values (Interval) to a date value (start),
DATE_SUB(end, Interval) : Subtract a time value (Interval) from a date (end),
CURDATE() : Returns the current date,
WEEKDAY(date) : Returns the weekday index for date (0 = Monday, 1 = Tuesday, … 6 = Sunday),
DAYOFWEEK(date) : Returns the weekday index for date (1 = Sunday, 2 = Monday, …, 7 = Saturday)
So :
DATE_ADD(
DATE_ADD(----------------------------------------------------------
CURDATE(), |
INTERVAL - WEEKDAY(CURDATE()) DAY //<== Return Today Index |-> Will Return the date of this week's monday : (25-01-2018 - 3 DAY) = 22-01-2018
(WEEKDAY(25-01-2018) = 3)|
),-----------------------------------------------------------------
INTERVAL - 4 WEEK //<== Will go 4 weeks back and start from monday
)
A working case Scenario.

Group SQL results by week and specify "week-ending" day

I'm trying to select data grouped by week, which I have working, but I need to be able to specify a different day as the last day of the week. I think something needs to go near INTERVAL (6-weekday('datetime')) but not sure. This kind of SQL is above my pay-grade ($0) :P
SELECT
sum(`value`) AS `sum`,
DATE(adddate(`datetime`, INTERVAL (6-weekday(`datetime`)) DAY)) AS `dt`
FROM `values`
WHERE id = '123' AND DATETIME BETWEEN '2010-04-22' AND '2010-10-22'
GROUP BY `dt`
ORDER BY `datetime`
Thanks!
select
sum(value) as sum,
CASE WHEN (weekday(datetime)<=3) THEN date(datetime + INTERVAL (3-weekday(datetime)) DAY)
ELSE date(datetime + INTERVAL (3+7-weekday(datetime)) DAY)
END as dt
FROM values
WHERE id = '123' and DATETIME between '2010-04-22' AND '2010-10-22'
GROUP BY dt
ORDER BY datetime
This does look pretty evil but, this query will provide you with a sum of value grouped by a week ending on a Thursday (weekday() return of 3).
If you wish to change what day the end of the week is you just need to replace the 3's in the case statement, ie if you wanted Tuesday you would have it say
CASE WHEN (weekday(datetime)<=1) THEN date(datetime + INTERVAL (1-weekday(datetime)) DAY)
ELSE date(datetime + INTERVAL (1+7-weekday(datetime)) DAY)
I hope this helps.
Simple solution that I like. This will return the date for the start of the week assuming the week ends Sunday and starts Monday.
DATE(`datetime`) - INTERVAL WEEKDAY(`datetime`) AS `dt`
This can easily be adjusted to have a week ending on Thursday because Thursday is 3 days earlier than Sunday
DATE(`datetime`) - INTERVAL WEEKDAY(`datetime` + INTERVAL 3 DAY) AS `dt`
this returns for the start of the week that starts on Friday and ends on Thursday.
You can group on this no problem. If you want to use get the end of the week based on the start you do this
DATE(`datetime`) - INTERVAL -6 + WEEKDAY(`datetime` + INTERVAL 3 DAY) AS `dt`
I think you must choose between Sunday and Monday? When you can use DATE_FORMAT for grouping by string format of date, and use %v for grouping by Mondays and %v for grouping by Sundays.
SELECT
sum(`value`) AS `sum`,
DATE_FORMAT(`datetime`,'%v.%m.%Y') AS `dt`
FROM `values`
WHERE id = '123' AND DATETIME BETWEEN '2010-04-22' AND '2010-10-22'
GROUP BY DATE_FORMAT(`datetime`,'%v.%m.%Y')
ORDER BY `datetime`
How to use DATE_FORMAT
I don't remember the exact math, but you can get WEEKDAY to wrap around on different days of the week by adding or subtracting days to its argument. You'll need to tinker with different values of x and y in the expression:
x-weekday(adddate(`datetime`, INTERVAL y DAY))