Nondeterministic functions in sql partitioning functions - sql

How are non-deterministic functions used in SQL partitioning functions and are they useful?

MsSql allows non-deterministic functions in partitioning functions:
CREATE PARTITION FUNCTION MyArchive(datetime)
AS RANGE LEFT FOR VALUES (GETDATE() – 10)
GO
Does that mean that records older then 10 days are automatically moved to the archive (first) partition? Of course not.
The database stores the date when the partitioning schema was set up and uses it in the most (logical) way.
Lets say one sets the above schema on 2000 -01-11 which makes the delimiting date 2000-01-01.
When you are querying for data with date lower then the initial delimiting date (boundary_value - 2000-01-01) you will use only the archive partition.
When you are querying for data with date higher then the current day minus 10 days (GETDATE() – 10) you will be using only the current partition.
All other queries will use both partitions ie querying for data with date lower then current date minus 10 days but higher then the delimiting date (2000-01-01).
This means that with each passing day, the range of dates for which both partitions are used is growing. And you would have been better of setting the partition to the delimiting date deterministically.
I don't forsee any scenario where this is useful.

Related

Timeseries data query - optimizing query performance

Quick question on optimizing a query type we do a lot in working with time-series data provided by a data logging system.
Database is SQL Server 2019 (v15) and for simplification assume the table is made up of just:
ID (bigint) - unique ID for the row
Timestamp (bigint) - Unix timestamp value.
Sample (float) - Value of sample taken (e.g. temperature measurement).
There is no regular interval or spacing with respect to timestamp as the data logger only logs data on a change to the data point being monitored (i.e. there is no reliable way to determine when in time that a previous sample would have been taken).
Anyway, our queries often involve selecting a range of data between two timestamps, but as expected the timestamps selected as the bounds for the range rarely ever line-up exactly with a timestamp in the data set. Because of this, what we really need to select is all the data in the range plus one record immediately before the range (so we know what the data value is leading into the selected range).
Historically we have done this one of two ways:
Select the rows between the timestamps (inclusive) and union this with a top(1) select of the first row with a timestamp <= to the range start.
OR
Select the top(1) timestamp <= to the range start into a variable and then do a select statement with this new timestamp as the lower bound for the range.
Since I am not an expert, I'm wondering if either one of these methods has better performance over the other or if there is maybe some better, third option we haven't encountered.
Thanks!

Incorrect parameter count in the call to native function 'DATEDIFF' in sql

what command should I use to make an output based on this problem:
Display the all values of customers who have joined as members for more than 700 days until today
This is the table that I have created:
table Customers
I've tried other references using DATEDIFF(), but it's always invalid :
SELECT * FROM Customers where DATEDIFF(DAY,customer_join,GETDATE())>700;
In MySQL/MariaDB, as opposed to SQL Server, DATEDIFF() takes just two arguments, and returns an integer number of days between them. We have timestampdiff(), which takes three arguments.
Also, getdate() is not a thing in MySQL (this is a bespoke SQL Server function).
You don't really need date functions here. I would phrase this logic using simple data arithmetics:
select *
from customers
where customer_join < current_date - interval 700 day
This expression can take advantage of an index on customer_join.
Depending on whether you want to take in account the time portion of customer_join (if it has one), you might want to use now() instead of current_date.

Using Hive, how to query data that is split across multiple partitions?

From a table partitioned over date field (a new partition is generated every day), I need to extract records that range over last three months. This means that I need to query the table on every partition in the last three months to get the data by using "where date < 'today's date' and date>= 'today - 90 days'.
I think that this query would not be very efficient.
Is there a better way of accessing data that is spread across multiple partitions?

Store date range in a single column in Oracle SQL

Here trip 1 involves 2 activity_code in a single day and also concludes in a single day and most other activities are just single day but i have one trip that span over more than one day.
What could be the best possible way to store date range for that column that span more than one days.
Splitting the column into multiple begin date and end date just doesn't make sense as there would be many blank columns?
trip_id(pk,fk) Activity_code(pk,fk) date
1 a1 1st October 2015
1 a2 1st October 2015
2 a3 2nd -5th October 2015
Keep in mind that i need to search the activity_code on basis of month. such as list all the activity code that occur in October ?
Is it possible to insert a range of date in a single column or any other design solution ?
Is there any datatype that can represent the date range in single value ?
PS: oracle 11g e
Store the date ranges as FirstDate/LastDate or FirstDate/Duration.
This allows you to store the values in the native format for dates. Storing dates as strings is a bad, bad idea, because strings don't have all the built-in functionality provided for native date types.
Don't worry about the additional storage for a second date or duration. In fact, the two columns together are probably smaller than storing the value as a string.
Splitting the date into start date and end date would be ideal. Storing dates as strings is not recommended. If you store your dates as strings then there is a possibility of malformed data being stored in the column since a VARCHAR2 column will allow any value. You will have to build strong validations in your script while inserting the data which is unnecessary.
Secondly, you will not be able to perform simple operations like calculating the duration/length of the trip easily if both the start_date and end_date are stored in the same column. If they are stored in different columns it would be as simple as
SELECT trip_id, activity_code, end_date - start_date FROM trips;

Comparing values of type DATE - Oracle

Is there any way of comparing to date values to check if one is before the other?
For example how do i know which came first on the following rows
SEQ CREATION_DTM
--------------------
234 2011-03-26 22:59:03
235 2011-03-26 22:59:03
The column for the above data is declarad as datatype DATE. Having read around it appears that the DATE datatype does not store milliseconds. Does this mean
i cant compare the above two dates to find out which one is before the other?
EDIT
I am using Oracle 10G on Solaris.
DATE precision only goes to the nearest second, so if you have two dates that are the same to that precision then you can't distinguish between or order them. To get any more precision you'd need to store them as TIMESTAMP.
In the more general case where the dates do differ you can compare and order them much like numbers. When you get two the same the results are uncertain; in you case if you ordered by CREATION_DTM then you couldn't reliably predict whether the results would be ordered as 234,235 or 235,234. You would need to determine a way to break a tie, as Justin has suggested.
A DATE only stores up to the second. So if two rows are inserted in the same second, you can't determine which came first based on the CREATION_DTM column. If you want that level of resolution, you'd be better served with a TIMESTAMP [WITH [LOCAL] TIME ZONE] column which will store the time component up to 9 decimal digits if the host operating system provides that level of granularity (most Unix systems will provide microsecond resolution).
In your case, assuming that you're not using RAC and that you are using an Oracle sequence to populate the SEQ column, you could use that column to break the tie. If the two rows were inserted in different transactions, haven't been updated, and the table was built with ROWDEPENDENCIES, you could also potentially use the ORA_ROWSCN to break the tie.
Seems timestamp data type will be appropriate for you query..
Thanks