I am about to test the deterministic flag for SUDFs that return multiple values (follow up question to this). The DETERMINISTIC flag should cache the results for same inputs to improve performance. However, I can't figure out how to do this for multiple return values. My SUDF looks as following:
CREATE FUNCTION DET_TEST(IN col BIGINT)
RETURNS a int, b int, c int, d int DETERMINISTIC
AS BEGIN
a = 1;
b = 2;
c = 3;
d = 4;
END;
Now when I execute the following select statements:
1) select DET_TEST(XL_ID).a from XL;
2) select DET_TEST(XL_ID).a, DET_TEST(XL_ID).b from XL;
3) select DET_TEST(XL_ID).a, DET_TEST(XL_ID).b,
DET_TEST(XL_ID).c, DET_TEST(XL_ID).d from XL;
I get the corresponding server processing times:
1) Statement 'select DET_TEST(XL_ID).a from XL'
successfully executed in 1.791 seconds (server processing time: 1.671 seconds)
2) Statement 'select DET_TEST(XL_ID).a, DET_TEST(XL_ID).b from XL'
successfully executed in 2.415 seconds (server processing time: 2.298 seconds)
3) Statement 'select DET_TEST(XL_ID).a, DET_TEST(XL_ID).b, DET_TEST(XL_ID).c, ...'
successfully executed in 4.884 seconds (server processing time: 4.674 seconds)
As you can see the processing time increases even though I call the function with the same input. So is this a bug or is it possible that only a single value is stored in cache but not the whole list of return parameters?
I will try out MAP_MERGE next.
I did some tests with your scenario and can confirm that the response time goes up considerably with every additional result parameter retrieved from the function.
The DETERMINISTIC flag helps here, but not as much as one would hope for since only the result value for distinct input parameters are saved.
So, if the same value(s) are entered into the function and it has been executed before with these value(s) then the result is taken from a cache.
This cache, however, is only valid during a statement. That means: for repeated function evaluations with the same value during a single statement, the DETERMINISTIC function can skip the evaluation of the function and reuse the result.
This doesn't mean, that all output parameters get evaluated once and are then available for reuse. Indeed, with different output parameters, HANA practically has to executed different evaluation graphs. In that sense, asking for different parameters is closer to execute different functions than, say, calling a matrix operation.
So, sorry about raising the hope for a massive improvement with DETERMINISTIC functions in the other thread. At least for your use case, that doesn't really help a lot.
Concerning the MAP_MERGE function, it's important to see that this really helps with horizontal partitioning of data, like one would have it in e.g. classic map-reduce situations.
The use case you presented is actually not doing that but tries to create multiple results for a single input.
During my tests, I actually found it quicker to just define four independent functions and call those in my SELECT statement against my source table.
Depending on the complexity of the calculations you like to do and the amount of data, I probably would look into using the Application Function Library (AFL) SDK for SAP HANA. For details on this, one has to check the relevant SAP notes.
Related
This is code I'm dealing with:
declare
--some cursors here
begin
if some_condition = 'N' then
raise form_trigger_failure;
end if;
--some fetches here
end;
It's from post-query trigger and basically my problem is that my block in Oracle forms returns 20k rows and post-query trigger is firing for each of the rows. Execution takes couple of minutes and I want to speed it up to couple of seconds. Data is validated in some_condition (it is a function, but it returns value really fast). If condition isn't met, then form_trigger_failure is raised. Is there any way to speed up this validation without changing the logic? (The same number of rows should be returned, this validation is important)
I've tried to change block properties, but it didn't help me.
Also, when I deleted whole if statement, data was returned really fast, but it wasn't validated and there were returned rows that shouldn't be visible.
Data is validated in some_condition ...
That's OK; but, why would you perform validation in a POST-QUERY trigger? It fetches data that already exist in the database, so it must be valid. Otherwise, why did you store it in the first place?
POST-QUERY should be used to populate non-database items.
Validation should be handled in WHEN-VALIDATE-ITEM or WHEN-VALIDATE-RECORD triggers, not in POST-QUERY.
I suggest you split those two actions. If certain parts of code can/should be shared between those two types of triggers, put it into a procedure (within a form) and call it when appropriate.
By the way, POST-QUERY won't fire for all 20K rows (unless you're buffering that much rows, and - if you do - you shouldn't).
Moreover, saying that the function returns the result really fast - probably, if it runs for a single row. Let it run on 200, 2000, 20000 rows as
select your_function(some_parameters)
from that_table
where rownum < 2; --> change 2 to 200 to 2000 to 20000 and see what happens
On the other hand, what's the purpose in fetching 20.000 rows? Who's going to review that? Are you sure that this is the way that you should be doing it? If so, consider switching to a stored procedure; let it perform those validations within the database, and let the form fetch "clean" data.
I'm refactoring part of a humongous database package in Oracle PL/SQL, and there are many selects with very similar where-statements.
An example (made up, in reality 20 base comparisons, and another 5 or so depending on query):
-- Query 1
select * from data d into r_data
where
d.tm = time and
d.per = period and
d.mm = mm and
d.br = br and
d.ty = ty;
-- Query 2
select * from data d into r_data
where
d.tm = time and
d.per = period and
d.mm = mm and
d.br = br and
d.mat = mat;
As you can see, tm, per, mm and br are being compared in both cases, I thought this would be a smart solution:
-- A function for comparing rows
function are_similar(row1 in data%rowtype, row2 in data%rowtype)
return number is
begin
if row1.tm = row2.tm and
row1.per = row2.per and
row1.mm = row2.mm and
row1.br = row2.br then
then return 1;
else return 0;
end if;
end are_similar;
-- Query 1 (v_row is data%rowtype)
select * from data d into r_data
where
are_similar(d, v_row) = 1 and
d.ty = v_row.ty;
-- Query 2 (v_row is data%rowtype)
select * from data d into r_data
where
are_similar(d, v_row) = 1 and
d.mat = v_row.mat;
But I get:
Error(xxx,xxx): PL/SQL: ORA-00904: "D": invalid identifier
I've tried googling for how to get the "row" out of the "row-identifier" (ie D) but I cannot find anything, and I've also found How to pass an entire row (in SQL, not PL/SQL) to a stored function? Which states what I'm trying to do might be impossible, is it? Or are there any ways of doing the same thing another way? IE getting rid of "where-clause duplication" because the code is really ugly and a hassle to maintain.
I know creating a new view with arguments would solve part of the issue, but if possible I would really like to keep the solution internal to the package I'm working with.
Anyone familiar with OO techniques can see what you're trying to do. You've identified common code and are trying to refactor that into a separate module.
You're working in a different environment when you're working in SQL. What is considered clever in other languages is, well, not so very clever in SQL. And vice versa if it makes you feel any better. In languages such as Java, C#, C++ or any other language specifically designed for the OO environment, we can lean more heavily toward maintainability rather than performance because the cost is so low.
Not so in SQL. Everything takes at least 10 times longer to perform in SQL as any other language. Reworking a query to have it call a function where it did not before will decrease the responsiveness of the query noticeably. It can turn a 5 msec query into a 45 sec query or even worse. Even a 5 sec query is simply not acceptable.
One thing that you have to be aware of in SQL but not in other languages is context switching. This is where you go from SQL to the wrapper language vendors place around their system's SQL. This is PL/SQL in Oracle or Transact-SQL in SQL Server. Every system has one. A query is SQL. That is one context. The body of a stored procedure is the wrapper language. That is another context. Calling a stored procedure involves more than executing code over here to executing code over there. Switching contexts back and forth can be very time consuming. The details differ between systems so you should become familiar with your system's specifics.
Another difference is that other languages are procedural in nature. You identify what they have to do, then define step by step how to do it. In SQL, you identify what data you want. While there are ways to have some influence, by and large the underlying system determines how to go about doing it.
There are many techniques for writing good, responsive SQL code. Rewriting a query to call a stored procedure for every row is not one of them.
i need to get a large amount of data from a remote database. the idea is do a sort of pagination, like this
1 Select a first block of datas
SELECT * FROM TABLE LIMIT 1,10000
2 Process that block
while(mysql_fetch_array()...){
//do something
}
3 Get next block
and so on.
Assuming 10000 is an allowable dimension for my system, let us suppose i have 30000 records to get: i perform 3 call to remote system.
But my question is: when executing a select, the resultset is transmitted and than stored in some local part with the result that fetch is local, or result set is stored in remote system and records coming one by one at any fetch? Because if the real scenario is the second i don't perform 3 call, but 30000 call, and is not what i want.
I hope I explained, thanks for help
bye
First, it's highly recommended to utilize MySQLi or PDO instead of the deprecated mysql_* functions
http://php.net/manual/en/mysqlinfo.api.choosing.php
By default with the mysql and mysqli extensions, the entire result set is loaded into PHP's memory when executing the query, but this can be changed to load results on demand as rows are retrieved if needed or desired.
mysql
mysql_query() buffers the entire result set in PHP's memory
mysql_unbuffered_query() only retrieves data from the database as rows are requested
mysqli
mysqli::query()
The $resultmode parameter determines behaviour.
The default value of MYSQLI_STORE_RESULT causes the entire result set to be transfered to PHP's memory, but using MYSQLI_USE_RESULT will cause the rows to be retrieved as requested.
PDO by default will load data as needed when using PDO::query() or PDO::prepare() to execute the query and retrieving results with PDO::fetch().
To retrieve all data from the result set into a PHP array, you can use PDO::fetchAll()
Prepared statements can also use the PDO::MYSQL_ATTR_USE_BUFFERED_QUERY constant, though PDO::fetchALL() is recommended.
It's probably best to stick with the default behaviour and benchmark any changes to determine if they actually have any positive results; the overhead of transferring results individually may be minor, and other factors may be more important in determining the optimal method.
You would be performing 3 calls, not 30.000. That's for sure.
Each 10.000 results batch is rendered on the server (by performing each of the 3 queries). Your while iterates through a set of data that has already been returned by MySQL (that's why you don't have 30.000 queries).
That is assuming you would have something like this:
$res = mysql_query(...);
while ($row = mysql_fetch_array($res)) {
//do something with $row
}
Anything you do inside the while loop by making use of $row has to do with already-fetched data from your initial query.
Hope this answers your question.
according to the documentation here all the data is fetched to the server, then you go through it.
from the page:
Returns an array of strings that corresponds to the fetched row, or FALSE if there are no more rows.
In addition it seams this is deprecated so you might want to use something else that is suggested there.
I am running into a very strange bit of behavior with a query in Oracle The query itself is enormous and quite complex...but is basically the same every time I run it. However, it seems to execute more slowly when returning a smaller result-set. The best example I can give is that if I set this filter on it,
and mgz_query.IsQueryStatus(10001,lqo.query_id)>0
which returns 960 of 12,429 records, I see an execution time of about 1.9 seconds. However, if I change the filter to
and mgz_query.IsQueryStatus(10005,lqo.query_id)>0
which returns 65 of 12,429 records, I see an execution time of about 6.8 seconds. When digging a bit deeper, I found that it seems the smaller result set was performing considerably more buffer gets than the larger result set. This seems completely counter-intuitive to me.
The query this is being run against is roughly 8000 characters long (Unless someone wants it, I'm not going to clutter this post with the entire query), includes 4 'Union All' statements, but otherwise filters primarily on indexes and is pretty efficient, apart from its massive size.
The filter in use is executed via the below function.
Function IsQueryStatus(Pni_QueryStateId in number,
Pni_Query_Id in number) return pls_integer as
vn_count pls_integer;
Begin
select count(1)
into vn_count
from m_query mq
where mq.id = Pni_Query_Id
and mq.state_id = Pni_QueryStateId;
return vn_count;
End;
Any ideas as to what may be causing the smaller result set to perform so much worse than the large result set?
I think you are facing a situation where determining that something is not in the set takes a lot longer than determining if it is in the set. This can occur quite often. For instance, if there is an index on m_query(id), then consider how the where clause might be executed:
(1) The value Pni_Query_Id is looked up in the index. There is no match. Query is done with a value of 0.
(2) There are a bunch of matches. Now, let's fetch the pages where state_id is located and compare to Pni_QueryStateId. Ohh, that's a lot more work.
If that is the case, then having an index on m_query(id, state_id) should help the query.
By the way, this is assuming that the only change is in function call in the where clause. If there are other changes to get fewer rows, you might just be calling this function fewer times.
As part of a data analysis project, I will be issuing some long running queries on a mysql database. My future course of action is contingent on the results I obtain along the way. It would be useful for me to be able to view partial results generated by a SELECT statement that is still running.
Is there a way to do this? Or am I stuck with waiting until the query completes to view results which were generated in the very first seconds it ran?
Thank you for any help : )
In general case the partial result cannot be produced. For example, if you have an aggregate function with GROUP BY clause, then all data should be analysed, before the 1st row is returned. LIMIT clause will not help you, because it is applied after the output is computed. Maybe you can give a concrete data and SQL query?
One thing you may consider is sampling your tables down. This is good practice in data analysis in general to get your iteration speed up when you're writing code.
For example, if you have table create privelages and you have some mega-huge table X with key unique_id and some data data_value
If unique_id is numeric, in nearly any database
create table sample_table as
select unique_id, data_value
from X
where mod(unique_id, <some_large_prime_number_like_1013>) = 1
will give you a random sample of data to work your queries out, and you can inner join your sample_table against the other tables to improve speed of testing / query results. Thanks to the sampling your query results should be roughly representative of what you will get. Note, the number you're modding with has to be prime otherwise it won't give a correct sample. The example above will shrink your table down to about 0.1% of the original size (.0987% to be exact).
Most databases also have better sampling and random number methods than just using mod. Check the documentaion to see what's available for your version.
Hope that helps,
McPeterson
It depends on what your query is doing. If it needs to have the whole result set before producing output - such as might happen for queries with group by or order by or having clauses, then there is nothing to be done.
If, however, the reason for the delay is client-side buffering (which is the default mode), then that can be adjusted using "mysql-use-result" as an attribute of the database handler rather than the default "mysql-store-result". This is true for the Perl and Java interfaces: I think in the C interface, you have to use an unbuffered version of the function that executes the query.