Selecting date_trunc function for column - sql

I just want to know if it's possible to use date_trunc function for postgresql? Because I have a table that contains time without time zone.
example: SELECT date_trunc('hour', time 'columnName') from tableName
Thank you so much! Keep safe everyone

Related

Performing Date Math on Hive Partition Columns

My data is partitioned by day in the standard Hive format:
/year=2020/month=10/day=01
/year=2020/month=10/day=02
/year=2020/month=10/day=03
/year=2020/month=10/day=04
...
I want to query all data from the last 60 days, using Amazon Athena (IE: Presto). I want this query to use the partitioned columns (year, month, day) so that only the necessary partition files are scanned. Assuming I can't change the file partition format, what is the best approach to this problem?
You don't have to use year, month, day as the partition keys for the table. You can have a single partition key called date and add the partitions like this:
ALTER TABLE the_table ADD
PARTITION (`date` = '2020-10-01') LOCATION 's3://the-bucket/data/year=2020/month=10/day=01'
PARTITION (`date` = '2020-10-02') LOCATION 's3://the-bucket/data/year=2020/month=10/day=02'
...
With this setup you can even set the type of the partition key to date:
PARTITIONED BY (`date` date)
Now you have a table with a date column typed as a DATE, and you can use any of the date and time functions to do calculations on it.
What you won't be able to do with this setup is use MSCK REPAIR TABLE to load partitions, but you really shouldn't do that anyway – it's extremely slow and inefficient and really something you only do when you have a couple of partitions to load into a new table.
An alternative way to that proposed by Theo, is to use the following syntax, e.g.:
select ... from my_table where year||month||day between '2020630' and '20201010'
this works when the format for the columns year, month and day are string. It's particularly useful to query across months.

Generating dates in Hive SQL

I'm looking to be able to create a table that contains all of the dates (inclusive) between the min and max date from another table. See below the simple query to get these dates
-- Get the min and max dates from the table
select min(date(sale_date)) as min_date,
max(date(sale_date)) as max_date
from TABLE;
I've spent the last hour googling this problem and have found attempts at doing this on MySQL and Oracle SQL but not on Hive SQL which I've been unable to convert to Hive SQL. If anyone has any idea on how to do this, please let me know. Thanking you in advance.
Ok this isn't my answer. A colleague was able to answer it. Still I think its important that I show my colleague's solution for your future benefit. It assumes that you've created a table that contains the min date and max date.
CREATE TABLE TABLE_2
STORED AS AVRO
LOCATION 'xxxxxx'
AS
SELECT date_add (t.min_date,pe.i) AS date_key
FROM TABLE_1 t
LATERAL VIEW
posexplode(split(space(datediff(t.max_date,t.min_date)),' ')) pe AS i,x;

explain the two conversions used in between hive date functions?

i am trying to count the number of records in the particular date.
eventually, got the query worked but confused between these two queries which seemed to same for me. why should i enclose the date_time instead of quote in the conversion.
when i hit the query,
select count(*) from TABLENAME
where FROM_UNIXTIME(UNIX_TIMESTAMP(date_time), 'yyyyMMdd')='20170312';
result is count of the particular date is arrived.
but when i hit,
select count(*) from TABLENAME
where FROM_UNIXTIME(UNIX_TIMESTAMP('date_time', 'yyyyMMdd'))='20170312';
the result is 0.
please explain the difference of these queries.
date_time is a column while 'date_time' is a string and the attempt to use it as date result in NULL.
If you want to qualify the column name you should use `date_time`

My simple oracle query takes too long to complete

I'm trying to run the following SQL statement that is taking too long to finish in Oracle.
Here is my query:
SELECT timestamp from data
WHERE (timestamp IN
(SELECT MIN (timestamp) FROM data
WHERE (( TIMESTAMP BETWEEN :t1 AND :t2))
If anyone can help with optimising this query I would be very grateful.
All you need to speed your query is an index on timestamp:
create index data_timestamp on data(timestamp);
If you are expecting only one result, you can also do:
SELECT MIN(timestamp)
FROM data
WHERE TIMESTAMP BETWEEN :t1 AND :t2
I'm not sure why you would want to output the timestamp multiple times.

How do I avoid table a full table scan when converting date to timestamp in POSTGRES query?

I am querying a postgres database table that I only have read access to and cannot modify in any way. Long story short I need to run a query daily to pull records over several days. I have been going in each day and physically modifying the timestamp parameters. The table is about 40 million records and I am running pass through queries from a sql server to the linked server.
There is an index on the c_stime field which is a time stamp that I am trying to take advantage of, but when I perform a function over the field I kill that advantage. Here are some of my results:
select * from p_table where c_stime >= '2013-09-24 00:00:00.0000000'
select * from p_table where c_stime >= current_date::timestamp
select * from p_table where c_stime >= current_date+ interval '0 hour'
First one runs in 10 seconds, second runs in 74 seconds, third runs in 88 seconds. Want something dynamic like the second/third with performance close to the first. Any help appreciated
First, check your query plans. There may be surprises there.
Secondly, if the problem is as you say, I am surprised that this wouldn't have worked. However you can use the pg_catalog.* functions to convert types and these are immutable (allowing the function to be run before the query is planned), so you could try:
select * from p_table where c_stime >= pg_catalog.timestamp(current_date);