Hive - How do I get last months of data from Hive? - sql

How do I get last 2 months of data from Hive?
Here is my attempt:
select (date_add(FROM_UNIXTIME(UNIX_TIMESTAMP(), 'yyyy-MM-dd'),
2 - month(FROM_UNIXTIME(UNIX_TIMESTAMP(), 'yyyy-MM-dd'))
));
This results in 2015-05-30. The results should be like: if Today is '2015-06-03', then the result of last two months should be like: '2015-04-01'. Notice that I put the first day of the month for the results. What am I doing wrong here? Thanks!
Extra Notes:
In SQL is it pretty easy to get:
select * from date_field >= DATEADD(MONTH, -2, GETDATE());

date_add adds days, not months. The below line evaluates to -4
2 - month(FROM_UNIXTIME(UNIX_TIMESTAMP(), 'yyyy-MM-dd'))
So you are basically subtracting 4 days from '2015-06-03', which is why you get the result '2015-05-30'.
As far as I know, there is no direct way to subtract months in Hive. Solutions you could consider:
Subtract 60 days, but that won't give you accurate results.
Write a custom UDF to return the date 2 months ago.
Calculate the date in a script, and pass it to hive.

Related

Databricks SQL syntax for previous six months in where statement

I'm trying to figure out how to look for data in the last six months in the where statement of a SQL query in Databricks, but I'm having a lot of issues with the syntax.
Right now I have:
Select * from table
where datediff(add_months(date_column, -6), date_column) = 1
The query doesn't throw an error, but returns no results.
I think you're expecting the wrong thing from datediff. Datediff tells you the number of days between two dates. In your case, you're comparing your date_column to your date_column - 6 months. That's always going to be 6 months or ~180 days.
Try this.
WHERE date_column > DATEADD(MONTH, -6, CURRENTDATE())
AKA, where your date column is greater than the current date minus 6 months.

Snowflake SQL: Is there a way to select records between 2 years back and the current date?

I have seen a lot of questions similar to this but I have yet to see one that goes into detail of how to get records from two years back to today but include the start of the year two years back. Meaning I would like to create a function that will always give me results from January 1st two years back. For this year the results would come from 01-01-2020 to today’s date.
This is what I have so far, but in reality I am using it for a temporary table in my query.
SELECT *
FROM final
WHERE order_date BETWEEN DATEADD(‘year’, -2, current_date) AND current_date
You may use a combination of DATE_TRUNC and DATEADD to find January 1 of two years ago.
SELECT *
FROM final
WHERE order_date BETWEEN DATE_TRUNC('year', DATEADD('year', -2, current_date)) AND current_date;
What you had is close. Just truncate the the year.
You can see what happens in isolation:
select trunc('2021-03-14 08:24:12'::timestamp, 'YEAR');
-- Output: 2021-01-01 00:00:00.000
Adding to your SQL:
SELECT *
FROM final
WHERE order_date
BETWEEN trunc(DATEADD(‘year’, -2, current_date), 'YEAR') AND current_date
It is possible to construct arbitrary date using DATE_FROM_PARTS:
Creates a date from individual numeric components that represent the year, month, and day of the month.
DATE_FROM_PARTS( <year>, <month>, <day> )
For current_date:
SELECT DATE_FROM_PARTS(YEAR(current_date)-2,1,1);
-- 2020-01-01
Full query:
SELECT *
FROM final
WHERE order_date BETWEEN DATE_FROM_PARTS(YEAR(current_date)-2,1,1) AND current_date
This should be enough
where year(current_date)-year(order_date)<=2
But in case you have an order_date from the future
where year(current_date)-year(order_date)<=2 and order_date<=current_date

Return the number of months between now and datetime value SQL

I apologize, I am new at SQL. I am using BigQuery. I have a field called "last_engaged_date", this field is a datetime value (2021-12-12 00:00:00 UTC). I am trying to perform a count on the number of records that were "engaged" 12 months ago, 18 months ago, and 24 months ago based on this field. At first, to make it simple for myself, I was just trying to get a count of the number of records per year, something like:
Select count(id), year(last_engaged_date) as last_engaged_year
from xyz
group by last_engaged_year
order by last_engaged_year asc
I know that there are a lot of things wrong with this query but primarily, BQ says that "Year" is not a valid function? Either way, What I really need is something like:
Date() - last_engaged_date = int(# of months)
count if <= 12 months as "12_months_count" (# of records where now - last engaged date is less than or equal to 12 months)
count if <= 18 months as "18_months_count"
count if <= 24 months as "24_months_count"
So that I have a count of how many records for each last_engaged_date period there are.
I hope this makes sense. Thank you so much for any ideas
[How to] Return the number of months between now and datetime value [in BigQuery] SQL
The simples way is just to use DATE_DIFF function as in below example
date_diff(current_date(), date(last_engaged_date), month)

Get All records of previous date from current date in oracle

I want all the data from a table which is more than 6 months available in my table. So for that I wrote the below query but it wasn't giving the exact records.
Select * from changerequests where lastmodifiedon < sysdate - 180;
The issue is I was getting the records for 2nd april, 2020 which is not more than 6 months. Please suggest the query
If you want records that were last modified within the last 6 months, then you want the inequality condition the other way around:
where lastmodifiedon > sysdate - 180
Note that 180 days is not exactly 6 months. You might want to use add_months() for something more accurate:
where lastmodifiedon > add_months(sysdate, -12)

Return last week data in hive

I am new to hive and sql.
Is there any way if we run a query today with count fields then it should fetch last 7 days data ( example- if i run a query with count fields on monday then I should get the total count from last week monday to sunday) And date in my table is in the format 20150910. (yyyyMMdd).
Kindly please help me on this.
You can use date_sub() in this case. Something like this should work...
select * from table
where date_field >= date_sub(current_date, 7)
assuming that the current day's data is not loaded yet. If you want to exclude the current day's data too, you will have to include that too in the filter condition
and date_field <= date_sub(current_date, 1)
current_date would work if your hive version > 0.12
else, you can explicitly pull the date from unix using to_date(from_unixtime(unix_timestamp()))