How can I get Quarter to Date in Hive SQL? - sql

I'm having a hard time getting the quarter to date values from Hive SQL.
How can I get the first day of the current quarter in Hive sql?
Table name: Orders
Fields: date, order_num, sales_num
Please advise.

This seems to be the cleanest way
with t as (select date '2016-08-27' as dt)
select add_months(trunc(dt,'MM'),-(month(dt)-1)%3) from t
;
2016-07-01
Here are 2 more options
with t as (select date '2016-08-27' as dt)
select trunc(add_months(dt,-(month(dt)-1)%3),'MM')
from t
;
2016-07-01
with t as (select date '2016-08-27' as dt)
select add_months(trunc(dt,'YY'),cast((month(dt)-1) div 3 * 3 as INT))
from t
;
2016-07-01
For earlier versions
with t as (select '2016-08-27' as dt)
select printf('%04d-%02d-%02d',year(dt),(((month(dt)-1) div 3) * 3) + 1,1)
from t
2016-07-01
Same, but for today
with t as (select from_unixtime(unix_timestamp(),'yyyy-MM-dd') as today)
select printf('%04d-%02d-%02d',year(today),(((month(today)-1) div 3) * 3) + 1,1) as
from t
2017-04-01

If you can't use the various date function that Dudu suggested, you can always cast it to string, parse out the month, and use a case statement. Depending on your version of Hive, you may have to use a simple case instead of searched.
(assuming YYYY-MM-DD)
case cast (substring (cast(<date field> as varchar(10)),6,2) as integer)
when between 1 and 3 then 1
when between 4 and 6 then 2
when between 7 and 9 then 3
else 4
end
Ugly, but it should work.

Related

postgresql: Select a series of date with conditions and remove the in between dates

I need to get a series of data that will give me consecutive dates that are at least 14 days apart.
For example:
userid
date
1
1/1/2022
1
1/5/2022
1
1/31/2022
1
2/22/2022
Expected Output:
userid
date
1
1/1/2022
1
1/31/2022
1
2/22/2022
I am stuck at how do i remove 1/5/2022 from the data? I not even sure which function i can try to use in postgresql.
TIA
You can use the LAG window functions to check for consecutive dates. After that, you can check whether your date is bigger than your previous date + 14 days.
WITH cte AS (
SELECT *, LAG(date_) OVER(PARTITION BY userid ORDER BY date_) AS prevdate
FROM tab
)
SELECT userid, date_
FROM cte
WHERE date_ > prevdate + INTERVAL '14 day' OR prevdate IS NULL
Check the demo here.

How do I create a new column showing difference between maximum date in table and date in row?

I need two columns: 1 showing 'date' and the other showing 'maximum date in table - date in row'.
I kept getting a zero in the 'datediff' column, and thought a nested select would work.
SELECT date, DATEDIFF(max_date, date) AS datediff
(SELECT MAX(date) AS max_date
FROM mytable)
FROM mytable
GROUP BY date
Currently getting this error from the above code : mismatched input '(' expecting {, ';'}(line 2, pos 2)
Correct format in the end would be:
date | datediff
--------------------------
2021-08-28 | 0
2021-07-26 | 28
2021-07-23 | 31
2021-08-11 | 17
If you want the date difference, you can use:
SELECT date, DATEDIFF(MAX(date) OVER (), date) AS datediff
FROM mytable
GROUP BY date
You can do this using the analytic function MAX() Over()
SELECT date, MAX(date) OVER() - date FROM mytable;
Tried this here on sqlfiddle

Selecting the difference between dates in a stored procedure using a subquery

I can't get my head around whether this is even possible, but I feel like I might have done it before and lost that bit of code. I am trying to craft a select statement that contains an inner join on a subquery to show the number of days between two dates from the same table.
A simple example of the data structure would look like:
Name ID Date Day Hours
Bill 1 3/3/20 Thursday 8
Fred 2 4/3/20 Monday 6
Bill 1 8/3/20 Tuesday 2
Based on this data, I want to select each row plus an extra column which is the number of days between the date from each row for each ID. Something like:
Select * from tblData
Inner join (datediff(Select Top(1) Date from tblData where Date < Date), Date) And ID = ID)
or for simplicity:
Select * from tblData
Inner join (datediff(Select Top(1) Date from tblData where Date < 8/3/20), 8/3/20) And ID = 1)
The resulting dataset would look like:
Name ID Date Day Hours DaysBtwn
Bill 1 3/3/20 Thursday 8 4 (Assuming there was an earlier row in the table)
Fred 2 4/3/20 Monday 6 5 (Assuming there was an earlier row in the table)
Bill 1 8/3/20 Tuesday 2 5 (Based on the previous row date being 3/3/20 for Bill)
Does this make sense and am I trying to do this the wrong way? I want to do this for about 600000 rows in table and therefore efficiency is the key, so if there is a better way to do this, i'm open to suggestions.
You can use lag():
select t.*, datediff(day, lag(date) over(partition by id order by date), date) diff
from mytable t
I think you just want lag():
select t.*,
datediff(day,
lag(date) over (partition by name order by date),
date
) as diff
from tblData t;
Note: If you want to filter the data so rows in the result set are used for the lag() but not in the result set, then use a subquery:
select t.*
from (select t.*,
datediff(day,
lag(date) over (partition by name order by date),
date
) as diff
from tblData t
) t
where date < '2020-08-03';
Also note the use of the date constant as a string in YYYY-MM-DD format.

Need equivalent function for trunc(date,'q') in sql to be converted to hive

I have a table which has month start date as dates . I convert these into quarter using the oracle function trunc(date_column,'q') in oracle. However, this does not work on Hive QL. Need equivalent of this in hive.
Thanks
This seems to be the cleanest way
with t as (select date '2016-08-27' as dt)
select add_months(trunc(dt,'MM'),-(month(dt)-1)%3) from t
;
2016-07-01
Here are 2 more options
with t as (select date '2016-08-27' as dt)
select trunc(add_months(dt,-(month(dt)-1)%3),'MM')
from t
;
2016-07-01
with t as (select date '2016-08-27' as dt)
select add_months(trunc(dt,'YY'),cast((month(dt)-1) div 3 * 3 as INT))
from t
;
2016-07-01

Counting an already counted column in SQL (db2)

I'm pretty new to SQL and have this problem:
I have a filled table with a date column and other not interesting columns.
date | name | name2
2015-03-20 | peter | pan
2015-03-20 | john | wick
2015-03-18 | harry | potter
What im doing right now is counting everything for a date
select date, count(*)
from testtable
where date >= current date - 10 days
group by date
what i want to do now is counting the resulting lines and only returning them if there are less then 10 resulting lines.
What i tried so far is surrounding the whole query with a temp table and the counting everything which gives me the number of resulting lines (yeah)
with temp_count (date, counter) as
(
select date, count(*)
from testtable
where date >= current date - 10 days
group by date
)
select count(*)
from temp_count
What is still missing the check if the number is smaller then 10.
I was searching in this Forum and came across some "having" structs to use, but that forced me to use a "group by", which i can't.
I was thinking about something like this :
with temp_count (date, counter) as
(
select date, count(*)
from testtable
where date >= current date - 10 days
group by date
)
select *
from temp_count
having count(*) < 10
maybe im too tired to think of an easy solution, but i can't solve this so far
Edit: A picture for clarification since my english is horrible
http://imgur.com/1O6zwoh
I want to see the 2 columned results ONLY IF there are less then 10 rows overall
I think you just need to move your having clause to the inner query so that it is paired with the GROUP BY:
with temp_count (date, counter) as
(
select date, count(*)
from testtable
where date >= current date - 10 days
group by date
having count(*) < 10
)
select *
from temp_count
If what you want is to know whether the total # of records (after grouping), are returned, then you could do this:
with temp_count (date, counter) as
(
select date, counter=count(*)
from testtable
where date >= current date - 10 days
group by date
)
select date, counter
from (
select date, counter, rseq=row_number() over (order by date)
from temp_count
) x
group by date, counter
having max(rseq) >= 10
This will return 0 rows if there are less than 10 total, and will deliver ALL the results if there are 10 or more (you can just get the first 10 rows if needed with this also).
In your temp_count table, you can filter results with the WHERE clause:
with temp_count (date, counter) as
(
select date, count(distinct date)
from testtable
where date >= current date - 10 days
group by date
)
select *
from temp_count
where counter < 10
Something like:
with t(dt, rn, cnt) as (
select dt, row_number() over (order by dt) as rn
, count(1) as cnt
from testtable
where dt >= current date - 10 days
group by dt
)
select dt, cnt
from t where 10 >= (select max(rn) from t);
will do what you want (I think)