When I want to query a single partition I usually use something like that:
Select * from t (partition p1)
But when you have to query it in your pl/sql code it comes to using execute immediate and hard-parse of the statement.
Okay, for RANGE-partitioned table (let it be SOME_DATE of date type) I can workaround it like
Select * from t where some_date <= :1 and some_date > :2
Assuming :1 and :2 stand for partition bonds.
Well, as for LIST-partitioned table I can easily specify the exact value of my partition key field like
Select * from t where part_key = 'X'
And what about HASH-partitioning? For example, I have a table partitioned by hash(id) in 16 partitions. And I have 16 jobs each handling its own partition. So I have to use it like that
Select * from t (partition p<n>)
Question is: can I do it like this for example
Select * from t where hash(id) = :1
To enforce partition pruning take the whole n-th partition?
It's okay when you have just 16 partitions but in my case I have composite partitioning (date + hash(id)), so every time job handles a partition it's always a new sql_id and it ends up in quick shared pool growth
It appears Oracle internally uses the ora_hash function (at least since 10g) to assign a value to a partition. So you could use that to read all the data from a single partition. Unfortunately, though, since you'd be running a query like
select *
from t
where ora_hash( id, 9 ) = 6
to get all the data in the 6th of 8 hash partitions, I'd expect Oracle to have to read every partition in the table (and compute the hash on every id) because the optimizer isn't going to be smart enough to recognize that your expression happens to map exactly to its internal partitioning strategy. So I don't think you'd want to do this to split data up to be processed by different threads.
Depending on what those threads are doing, would it be possible to use Oracle's built-in parallelism instead (potentially incorporating things like parallelizable pipelined table functions if you're doing ETL processing). If you tell Oracle to use 16 parallel threads and your table has 16 partitions, Oracle will internally almost certainly do the right thing.
Related
I have this oracle query that takes around 1 minute to get the results:
SELECT TRUNC(sysdate - data_ricezione) AS delay
FROM notifiche#fe_engine2fe_gateway n
WHERE NVL(n.data_ricezione, TO_DATE('01011900', 'ddmmyyyy')) =
(SELECT NVL(MAX(n2.data_ricezione), TO_DATE('01011900', 'ddmmyyyy'))
FROM notifiche#fe_engine2fe_gateway n2
WHERE n.id_sdi = n2.id_sdi)
--AND sysdate-data_ricezione > 15
Basically i have this table named "notifiche", where each record represents a kind of update to another type of object (invoices). I want to know which invoice has not received any update in the last 15 days. I can do it by joining the notifiche n2 table, getting the most recent record for each invoice, and evaluate the difference between the update date (data_ricezione) and the current date (sysdate).
When i add the commented condition, the query takes then infinite time to complete (i mean hours, never saw the end of it...)
How is possibile that this simple condition make the query so slow?
How can I improve the performance?
Try to keep data_ricezione alone; if there's an index on it, it might help.
So: switch from
and sysdate - data_ricezione > 15
to
and -data_ricezione > 15 - sysdate / * (-1)
to
and data_ricezione < sysdate - 15
As everything is done over the database link, see whether the driving_site hint does any good, i.e.
select /*+ driving_site (n) */ --> "n" is table's alias
trunc(sysdate-data_ricezione) as delay
from
notifiche#fe_engine2fe_gateway n
...
Use an analytic function to avoid a self-join over a database link. The below query only reads from the table once, divides the rows into windows, finds theMAX value for each window, and lets you select rows based on that maximum. Analytic functions are tricky to understand at fist, but they often lead to code that is smaller and more efficient.
select id_sdi, data_ricezion
from
(
select id_sdi, data_ricezion, max(data_ricezion) over (partition by id_sdi) max_date
from notifiche#fe_engine2fe_gateway
)
where sysdate - max_date > 15;
As for why adding a simple condition can make the query slow - it's all about cardinality estimates. Cardinality, the number of rows, drives most of the database optimizer's decision. The best way to join a small amount of data may be very different than the best way to join a large amount of data. Oracle must always guess how many rows are returned by an operation, to know which algorithm to use.
Optimizer statistics (metadata about the tables, columns, and indexes) are what Oracle uses to make cardinality estimates. For example, to guess the number of rows filtered out by sysdate-data_ricezione > 15, the optimizer would want to know how many rows are in the table (DBA_TABLES.NUM_ROWS), what the maximum value for the column is (DBA_TAB_COLUMNS.HIGH_VALUE), and maybe a break down of how many rows are in different age ranges (DBA_TAB_HISTOGRAMS).
All of that information depends on optimizer statistics being correctly gathered. If a DBA foolishly disabled automatic optimizer statistics gathering, then these problems will happen all the time. But even if your system is using good settings, the predicate you're using may be an especially difficult case. Optimizer statistics aren't free to gather, so the system only collects them when 10% of the data changes. But since your predicate involves SYSDATE, the percentage of rows will change every day even if the table doesn't change. It may make sense to manually gather stats on this table more often than the default schedule, or use a /*+ dynamic_sampling */ hint, or create a SQL Profile/Plan Baseline, or one of the many ways to manage optimizer statistics and plan stability. But hopefully none of that will be necessary if you use an analytic function instead of a self-join.
I am using partitions in Athena. I have a partition called snapshot, and when I call a query as such:
select * from mytable where snapshot = '2020-06-25'
Then, as expected only the specified partition is scanned and my query is fast. However, if I use a subquery which returns a single date, it is slooow:
select * from mytable where snapshot = (select '2020-06-25')
The above actually scans all partitions and not only the specified date, and results in very low performance.
My question is can I use a subquery to specify partitions and increase performance. I need to use a subsquery to add some custom logic which returns a date based on some criteria.
Edit:
Trino 356 is able to inline such queries, see https://github.com/trinodb/trino/issues/4231#issuecomment-845733371
Older answer:
Presto still does not inline trivial subquery like (select '2020-06-25').
This is tracked by https://github.com/trinodb/trino/issues/4231.
Thus, you should not expect Athena to inline, as it's based on Presto .172.
I need to use a subsquery to add some custom logic which returns a date based on some criteria.
If your query is going to be more sophisticated, not a constant expression, it will not be inlined anyway. If snapshot is a partition key, then you could leverage a recently added feature -- dynamic partition pruning. Read more at https://trino.io/blog/2020/06/14/dynamic-partition-pruning.html.
This of course assumes you can choose Presto version.
If you are constraint to Athena, your only option is to evaluate the subquery outside of the main query (separately), and pass it back to the main query as a constant (e.g. literal).
The Athena 2.0 released in late 2020 seems to have improved their push_down_predicate handling to support subquery.
Here is their related statement from https://docs.aws.amazon.com/athena/latest/ug/engine-versions-reference.html#engine-versions-reference-0002
Predicate inference and pushdown – Predicate inference and pushdown extended for queries that use a <symbol> IN <subquery> predicate.
My test with our own Athena table indicates this is indeed the case. My test query is roughly as below
SELECT *
FROM table_partitioned_by_scrape_date
WHERE scrape_date = (
SELECT max(scrape_date)
FROM table_partitioned_by_scrape_date
)
From the bytes scanned by the query, I can tell Athena indeed only scanned the partition with the latest scrape_date.
Moreover, I also tested support of push_down_predicate in JOIN clause where the join_to value is result of another query. Even though it is not mentioned in the release note, apparently Athena 2.0 now is smart enough also to support this scenario and only scan the latest scrape_date partition. I have tested similar query in Athena 1.0 before, it would scan all the partitions instead. My test query is as below
WITH l as (
SELECT max(scrape_date) as latest_scrape_date
FROM table_partitioned_by_scrape_date
)
SELECT deckard_id
FROM table_partitioned_by_scrape_date as t
JOIN l ON t.scrape_date = l.latest_scrape_date
I have a table Partitioned by:
HASH(timestamp DIV 43200 )
When I perform this query
SELECT max(id)
FROM messages
WHERE timestamp BETWEEN 1581708508 AND 1581708807
it scans all partitions while both numbers 1581708508 & 1581708807& numbers between them are in the same partition, how can I make it to scan only that partition?
You have discovered one of the reasons why PARTITION BY HASH is useless.
In your situation, the Optimizer sees the range (BETWEEN) and says "punt, I'll just scan all the partitions".
That is, "partition pruning" does not work when the WHERE clause involves a range and you are using PARTITION BY HASH. PARTITION BY RANGE, on the other hand, may be able to prune. But... What's the advantage? It does not make the query any faster.
I have found only four uses for partitioning: http://mysql.rjweb.org/doc.php/partitionmaint . It sounds like your application does not fit any of those cases.
That particular query would best be done without partitioning. Instead have a non-partitioned table with this 'composite' index:
INDEX(timestamp, id)
It must scan all the rows to discover the MAX(id), but with this index, it is
Scanning only the 2-column index
Not touching any rows outside the timestamp range.
Hence it will be as fast as possible. Even if PARTITION BY HASH were smart enough to do the desired pruning, it would not run any faster.
In particular, when you ask for a range on the Partition key, such as with WHERE timestamp BETWEEN 1581708508 AND 1581708807, the execution looks in all partitions for the desired rows. This is one of the major failings of Hash. Even if it could realize that only Partition is needed, it would be no faster than simply using the index I suggest.
You can determine that individual partition by using modular arithmetic
MOD(<formula which's argument of hash function>,<number of partitions>)
assuming you have 2 partitions
CREATE TABLE messages(ID int, timestamp int)
PARTITION BY HASH( timestamp DIV 43200 )
PARTITIONS 2;
look up partition names by
SELECT CONCAT( 'p',MOD(timestamp DIV 43200,2)) AS partition_name, timestamp
FROM messages;
and determine the related partition name for the value 1581708508 of timestamp column (assume p1). Then Use
SELECT MAX(id)
FROM messages PARTITION(p1)
to get all the records only in the partition p1 without need of a WHERE condition such as
WHERE timestamp BETWEEN 1581708508 AND 1581708807
Btw, all partitions might be listed through
SELECT *
FROM INFORMATION_SCHEMA.PARTITIONS
WHERE table_name='messages'
Demo
The below query scans 100 mb of data.
select * from table where column1 = 'val' and partition_id = '20190309';
However the below query scans 15 GB of data (there are over 90 partitions)
select * from table where column1 = 'val' and partition_id in (select max(partition_id) from table);
How can I optimize the second query to scan the same amount of data as the first?
There are two problems here. The efficiency of the the scalar subquery above select max(partition_id) from table, and the one #PiotrFindeisen pointed out around dynamic filtering.
The the first problem is that queries over the partition keys of a Hive table are a lot more complex than they appear. Most folks would think that if you want the max value of a partition key, you can simply execute a query over the partition keys, but that doesn't work because Hive allows partitions to be empty (and it also allows non-empty files that contain no rows). Specifically, the scalar subquery above select max(partition_id) from table requires Trino (formerly PrestoSQL) to find the max partition containing at least one row. The ideal solution would be to have perfect stats in Hive, but short of that the engine would need to have custom logic for hive that open files of the partitions until it found a non empty one.
If you are are sure that your warehouse does not contain empty partitions (or if you are ok with the implications of that), you can replace the scalar sub query with one over the hidden $partitions table"
select *
from table
where column1 = 'val' and
partition_id = (select max(partition_id) from "table$partitions");
The second problem is the one #PiotrFindeisen pointed out, and has to do with the way that queries are planned an executed. Most people would look at the above query, see that the engine should obviously figure out the value of select max(partition_id) from "table$partitions" during planning, inline that into the plan, and then continue with optimization. Unfortunately, that is a pretty complex decision to make generically, so the engine instead simply models this as a broadcast join, where one part of the execution figures out that value, and broadcasts the value to the rest of the workers. The problem is the rest of the execution has no way to add this new information into the existing processing, so it simply scans all of the data and then filters out the values you are trying to skip. There is a project in progress to add this dynamic filtering, but it is not complete yet.
This means the best you can do today, is to run two separate queries: one to get the max partition_id and a second one with the inlined value.
BTW, the hidden "$partitions" table was added in Presto 0.199, and we fixed some minor bugs in 0.201. I'm not sure which version Athena is based on, but I believe it is is pretty far out of date (the current release at the time I'm writing this answer is 309.
EDIT: Presto removed the __internal_partitions__ table in their 0.193 release so I'd suggest not using the solution defined in the Slow aggregation queries for partition keys section below in any production systems since Athena 'transparently' updates presto versions. I ended up just going with the naive SELECT max(partition_date) ... query but also using the same lookback trick outlined in the Lack of Dynamic Filtering section. It's about 3x slower than using the __internal_partitions__ table, but at least it won't break when Athena decides to update their presto version.
----- Original Post -----
So I've come up with a fairly hacky way to accomplish this for date-based partitions on large datasets for when you only need to look back over a few partitions'-worth of data for a match on the max, however, please note that I'm not 100% sure how brittle the usage of the information_schema.__internal_partitions__ table is.
As #Dain noted above, there are really two issues. The first being how slow an aggregation of the max(partition_date) query is, and the second being Presto's lack of support for dynamic filtering.
Slow aggregation queries for partition keys
To solve the first issue, I'm using the information_schema.__internal_partitions__ table which allows me to get quick aggregations on the partitions of a table without scanning the data inside the files. (Note that partition_value, partition_key, and partition_number in the below queries are all column names of the __internal_partitions__ table and not related to your table's columns)
If you only have a single partition key for your table, you can do something like:
SELECT max(partition_value) FROM information_schema.__internal_partitions__
WHERE table_schema = 'DATABASE_NAME' AND table_name = 'TABLE_NAME'
But if you have multiple partition keys, you'll need something more like this:
SELECT max(partition_date) as latest_partition_date from (
SELECT max(case when partition_key = 'partition_date' then partition_value end) as partition_date, max(case when partition_key = 'another_partition_key' then partition_value end) as another_partition_key
FROM information_schema.__internal_partitions__
WHERE table_schema = 'DATABASE_NAME' AND table_name = 'TABLE_NAME'
GROUP BY partition_number
)
WHERE
-- ... Filter down by values for e.g. another_partition_key
)
These queries should run fairly quickly (mine run in about 1-2 seconds) without scanning through the actual data in the files, but again, I'm not sure if there are any gotchas with using this approach.
Lack of Dynamic Filtering
I'm able to mitigate the worst effects of the second problem for my specific use-case because I expect there to always be a partition within a finite amount of time back from the current date (e.g. I can guarantee any data-production or partition-loading issues will be remedied within 3 days). It turns out that Athena does do some pre-processing when using presto's datetime functions, so this does not have the same types of issues with Dynamic Filtering as using a sub-query.
So you can change your query to limit how far it will look back for the actual max using the datetime functions so that the amount of data scanned will be limited.
SELECT * FROM "DATABASE_NAME"."TABLE_NAME"
WHERE partition_date >= cast(date '2019-06-25' - interval '3' day as varchar) -- Will only scan partitions from 3 days before '2019-06-25'
AND partition_date = (
-- Insert the partition aggregation query from above here
)
I don't know if it is still relevant, but just found out:
Instead of:
select * from table where column1 = 'val' and partition_id in (select max(partition_id) from table);
Use:
select a.* from table a
inner join (select max(partition_id) max_id from table) b on a.partition_id=b.max_id
where column1 = 'val';
I think it has something to do with optimizations of joins to use partitions.
I need to do a process on all the records in a table. The table could be very big so I rather process the records page by page. I need to remember the records that have already been processed so there are not included in my second SELECT result.
Like this:
For first run,
[SELECT 100 records FROM MyTable]
For second run,
[SELECT another 100 records FROM MyTable]
and so on..
I hope you get the picture. My question is how do I write such select statement?
I'm using oracle btw, but would be nice if I can run on any other db too.
I also don't want to use store procedure.
Thank you very much!
Any solution you come up with to break the table into smaller chunks, will end up taking more time than just processing everything in one go. Unless the table is partitioned and you can process exactly one partition at a time.
If a full table scan takes 1 minute, it will take you 10 minutes to break up the table into 10 pieces. If the table rows are physically ordered by the values of an indexed column that you can use, this will change a bit due to clustering factor. But it will anyway take longer than just processing it in one go.
This all depends on how long it takes to process one row from the table of course. You could chose to reduce the load on the server by processing chunks of data, but from a performance perspective, you cannot beat a full table scan.
You are most likely going to want to take advantage of Oracle's stopkey optimization, so you don't end up with a full tablescan when you don't want one. There are a couple ways to do this. The first way is a little longer to write, but let's Oracle automatically figure out the number of rows involved:
select *
from
(
select rownum rn, v1.*
from (
select *
from table t
where filter_columns = 'where clause'
order by columns_to_order_by
) v1
where rownum <= 200
)
where rn >= 101;
You could also achieve the same thing with the FIRST_ROWS hint:
select /*+ FIRST_ROWS(200) */ *
from (
select rownum rn, t.*
from table t
where filter_columns = 'where clause'
order by columns_to_order_by
) v1
where rn between 101 and 200;
I much prefer the rownum method, so you don't have to keep changing the value in the hint (which would need to represent the end value and not the number of rows actually returned to the page to be accurate). You can set up the start and end values as bind variables that way, so you avoid hard parsing.
For more details, you can check out this post