I'm putting together a query that will get refreshed daily that needs to pull records from the last ten dates.
The tables I'm accessing have a 'xxdatetime' column with the unix time stamp and an 'eventdate' column with the date in a yyyy-mm-dd.
In Impala, the answer was easy:
where eventdate > to_date(days_sub(now(), 10))
I used a variation of it in Hive that failed because I guess it was scanning the whole table and the tables are MASSIVE:
where datediff(cast(current_timestamp() as string), eventdate)=10
Is there a light weight way in Hive SQL to filter the xxdatetime or eventdate columns by 'today - 10 days'?
Eventdate is my indexed column, date_sub is (string, days to subtract) and current_date is the current Hive yyyy mm dd.
eventdate > date_sub(current_date, 5)
I'm not sure if this is super light weight but it works!
Related
I have a below query that I run to extract material movements from the last 7 days.
Purpose is to get the data for the last calender week for certain reports.
select
*
From
redshift
where
posting_date between CURRENT_DATE - 7 and CURRENT_DATE - 1
That means I need to run the query on every Monday to get the data for the former week.
Sometimes I am too busy on Monday or its vacation/bank holiday. In that case I would need to change the query or pull the data via SAP.
Question:
Is there a function for redshift that pulls out the data for the last calender week regardless when I run the query?
I already found following solution
SELECT id FROM table1
WHERE YEARWEEK(date) = YEARWEEK(NOW() - INTERVAL 1 WEEK)
But this doesnt seem to be working for redshift sql
Thanks a lot for your help.
Redshift offers a DATE_TRUNC('week', datestamp) function. Given any datestamp value, either a date or datetime, it gives back the date of the preceding Sunday.
So this might work for you. It filters rows from the Sunday before last, up until but not including, the last Sunday, and so gets a full week.
SELECT id
FROM table1
WHERE date >= DATE_TRUNC('week', NOW()) - INTERVAL 1 WEEK
AND date < DATE_TRUNC('week', NOW())
Pro tip: Every minute you spend learning your DBMS's date/time functions will save you an hour in programming.
I am running the below query to get data recorded in the past 24 hours. I need the same data recorded starting midnight (DATE > 12:00 AM) and also data recorded starting beginning of the month. Not sure if using between will work or if there is better option. Any suggestions.
SELECT COUNT(NUM)
FROM TABLE
WHERE
STATUS = 'CNLD'
AND
TRUNC(TO_DATE('1970-01-01','YYYY-MM-DD') + OPEN_DATE/86400) = trunc(sysdate)
Output (Just need Count). OPEN_DATE Data Type is NUMBER. the output below displays count in last 24 hours. I need the count beginning midnight and another count starting beginning of the month.
The query you've shown will get the count of rows where OPEN_DATE is an 'epoch date' number representing time after midnight this morning*. The condition:
TRUNC(TO_DATE('1970-01-01','YYYY-MM-DD') + OPEN_DATE/86400) = trunc(sysdate)
requires every OPEN_DATE value in your table (or at least all those for CNLD rows) to be converted from a number to an actual date, which is going to be doing a lot more work than necessary, and would stop a standard index against that column being used. It could be rewritten as:
OPEN_DATE >= (trunc(sysdate) - date '1970-01-01') * 86400
which converts midnight this morning to its epoch equivalent, once, and compares all the numbers against that value; using an index if there is one and the optimiser thinks it's appropriate.
To get everything since the start of the month you could just change the default behaviour of trunc(), which is to truncate to the 'DD' element, to truncate to the start of the month instead:
OPEN_DATE >= (trunc(sysdate, 'MM') - date '1970-01-01') * 86400
And the the last 24 hours, subtract a day from the current time instead of truncating it:
OPEN_DATE >= ((sysdate - 1) - date '1970-01-01') * 86400
db<>fiddle with some made-up data to get 72 back for today, more for the last 24 hours, and more still for the whole month.
Based on your current query I'm assuming there won't be any future-dated values, so you don't need to worry about an upper bound for any of these.
*Ignoring leap seconds...
It sounds like you have a column that is of data type TIMESTAMP and you only want to select rows where that TIMESTAMP indicates that it is today's date? And as a related problem, you want to find those that are the current month, based on some system values like CURRENT TIMESTAMP and CURRENT DATE? If so, let's call your column TRANSACTION_TIMESTAMP instead of (reserved word) DATE. Your first query could be:
SELECT COUNT(NUM)
FROM TABLE
WHERE
STATUS = 'CLND'
AND
DATE(TRANSACTION_TIMESTAMP)=CURRENT DATE
The second example of finding all for the current month up to today's date could be:
SELECT COUNT(NUM)
FROM TABLE
WHERE
STATUS = 'CLND'
AND
YEAR(DATE(TRANSACTION_TIMESTAMP)=YEAR(CURRENT DATE) AND
MONTH(DATE(TRANSACTION_TIMESTAMP)=MONTH(CURRENT DATE) AND
DAY(DATE(TRANSACTION_TIMESTAMP)<=DAY(CURRENT DATE)
I'm working on a project with a db that contains a date column for patient visits in the format of %m-%d-yyyy and need to sort so that it only pulls the rows where that date is within the last two weeks. I've tried a few different functions of convert, to_date, and can't seem to get anything to work.
I'm still very new to SQL and I don't know if this is a special case because I'm working with an oracle db
Not the full code, because it has dozens of queries and multiple joins (would that affect the date syntax?) but this is the format I'm trying for...
create table "Visits"
insert into "Visits" (
'John Doe',
'5/24/2021',
'Story about the visit',
'More room for story if needed')
select
"User_Name",
"Visit_Date",
"Visit_Narrative",
"Visit_Narrative_Overflow"
from "Visits"
where "Visits"."Visit_Date" >= TRUNC(SYSDATE) - 14
I'm working on a project with a db that contains a date column for patient visits in the format of %m-%d-yyyy
No, you don't have it in the format mm.dd.yyyy. A DATE data type value is stored within the database as a binary value in 7-bytes (representing century, year-of-century, month, day, hour, minute and second) and this has no format.
need to sort so that it only pulls the rows where that date is within the last two weeks.
You want a WHERE filter:
If you want to have the values that happened in the last 14 days then TRUNCate the current date back to midnight and subtract 14 days:
SELECT visit_date
FROM patient
WHERE visit_date >= TRUNC(SYSDATE) - INTERVAL '14' DAY
or
SELECT visit_date
FROM patient
WHERE visit_date >= TRUNC(SYSDATE) - 14
(or subtract 13 if you want today and 13 days before today.)
If you want it after Monday of last week then TRUNCate to the start of the ISO Week (which is always a Monday) and subtract 7 days:
SELECT visit_date
FROM patient
WHERE visit_date >= TRUNC(SYSDATE, 'IW') - INTERVAL '7' DAY
I ended up figuring it out based on an answer from another forum (linked below);
it appears that my original to_date() was incomplete without a second and operator in my where clause. This code is working perfectly
select
"User_Name",
"Visit_Date",
"Visit_Narrative",
"Visit_Narrative_Overflow"
from "SQLUser"."Visits"
where "SQLUser"."Visits"."Visit_Date" >= to_date('5/10/2021', 'MM/DD/YYYY')
and "SQLUser"."Visits"."Visit_Date" < to_date('5/24/2021', 'MM/DD/YYYY')
I am new to hive and sql.
Is there any way if we run a query today with count fields then it should fetch last 7 days data ( example- if i run a query with count fields on monday then I should get the total count from last week monday to sunday) And date in my table is in the format 20150910. (yyyyMMdd).
Kindly please help me on this.
You can use date_sub() in this case. Something like this should work...
select * from table
where date_field >= date_sub(current_date, 7)
assuming that the current day's data is not loaded yet. If you want to exclude the current day's data too, you will have to include that too in the filter condition
and date_field <= date_sub(current_date, 1)
current_date would work if your hive version > 0.12
else, you can explicitly pull the date from unix using to_date(from_unixtime(unix_timestamp()))
I have a PostgreSQL database in which one table rapidly grows very large (several million rows every month or so) so I'd like to periodically archive the contents of that table into a separate table.
I'm intending to use a cron job to execute a .sql file nightly to archive all rows that are older than one month into the other table.
I have the query working fine, but I need to know how to dynamically create a timestamp of one month prior.
The time column is stored in the format 2013-10-27 06:53:12 and I need to know what to use in an SQL query to build a timestamp of exactly one month prior. For example, if today is October 27, 2013, I want the query to match all rows where time < 2013-09-27 00:00:00
Question was answered by a friend in IRC:
'now'::timestamp - '1 month'::interval
Having the timestamp return 00:00:00 wasn't terrible important, so this works for my intentions.
select date_trunc('day', NOW() - interval '1 month')
This query will return date one month ago from now and round time to 00:00:00.
When you need to query for the data of previous month, then you need to query for the respective date column having month values as (current_month-1).
SELECT *
FROM {table_name}
WHERE {column_name} >= date_trunc('month', current_date-interval '1' month)
AND {column_name} < date_trunc('month', current_date)
The first condition of where clause will search the date greater than the first day (00:00:00 Day 1 of Previous Month)of previous month and second clause will search for the date less than the first day of current month(00:00:00 Day 1 of Current Month).
This will includes all the results where date lying in previous month.