How to use entire row in select in function call? - sql

I'm refactoring part of a humongous database package in Oracle PL/SQL, and there are many selects with very similar where-statements.
An example (made up, in reality 20 base comparisons, and another 5 or so depending on query):
-- Query 1
select * from data d into r_data
where
d.tm = time and
d.per = period and
d.mm = mm and
d.br = br and
d.ty = ty;
-- Query 2
select * from data d into r_data
where
d.tm = time and
d.per = period and
d.mm = mm and
d.br = br and
d.mat = mat;
As you can see, tm, per, mm and br are being compared in both cases, I thought this would be a smart solution:
-- A function for comparing rows
function are_similar(row1 in data%rowtype, row2 in data%rowtype)
return number is
begin
if row1.tm = row2.tm and
row1.per = row2.per and
row1.mm = row2.mm and
row1.br = row2.br then
then return 1;
else return 0;
end if;
end are_similar;
-- Query 1 (v_row is data%rowtype)
select * from data d into r_data
where
are_similar(d, v_row) = 1 and
d.ty = v_row.ty;
-- Query 2 (v_row is data%rowtype)
select * from data d into r_data
where
are_similar(d, v_row) = 1 and
d.mat = v_row.mat;
But I get:
Error(xxx,xxx): PL/SQL: ORA-00904: "D": invalid identifier
I've tried googling for how to get the "row" out of the "row-identifier" (ie D) but I cannot find anything, and I've also found How to pass an entire row (in SQL, not PL/SQL) to a stored function? Which states what I'm trying to do might be impossible, is it? Or are there any ways of doing the same thing another way? IE getting rid of "where-clause duplication" because the code is really ugly and a hassle to maintain.
I know creating a new view with arguments would solve part of the issue, but if possible I would really like to keep the solution internal to the package I'm working with.

Anyone familiar with OO techniques can see what you're trying to do. You've identified common code and are trying to refactor that into a separate module.
You're working in a different environment when you're working in SQL. What is considered clever in other languages is, well, not so very clever in SQL. And vice versa if it makes you feel any better. In languages such as Java, C#, C++ or any other language specifically designed for the OO environment, we can lean more heavily toward maintainability rather than performance because the cost is so low.
Not so in SQL. Everything takes at least 10 times longer to perform in SQL as any other language. Reworking a query to have it call a function where it did not before will decrease the responsiveness of the query noticeably. It can turn a 5 msec query into a 45 sec query or even worse. Even a 5 sec query is simply not acceptable.
One thing that you have to be aware of in SQL but not in other languages is context switching. This is where you go from SQL to the wrapper language vendors place around their system's SQL. This is PL/SQL in Oracle or Transact-SQL in SQL Server. Every system has one. A query is SQL. That is one context. The body of a stored procedure is the wrapper language. That is another context. Calling a stored procedure involves more than executing code over here to executing code over there. Switching contexts back and forth can be very time consuming. The details differ between systems so you should become familiar with your system's specifics.
Another difference is that other languages are procedural in nature. You identify what they have to do, then define step by step how to do it. In SQL, you identify what data you want. While there are ways to have some influence, by and large the underlying system determines how to go about doing it.
There are many techniques for writing good, responsive SQL code. Rewriting a query to call a stored procedure for every row is not one of them.

Related

Postgres vs oracle doing 1 million sqrts am I doing it wrong?

We am trying to get an idea of the raw performance of Oracle vs PostgreSQL. We have extensive oracle experience but are new to PostgreSQL. We are going to run lots of queries with our data, etc. But first we wanted to see just how they perform on basic kernel tasks, i.e. math and branching since SQL is built on that.
In AWS RDS we created two db.m3.2xlarge instances one with oracle 11.2.0.4.v1 license included, the other with PostgreSQL (9.3.3)
In both we ran code that did 1 million square roots (from 1 to 1 mill). Then did the same but within an If..Then statement.
The results were a bit troubling:
Oracle 4.8 seconds
PostgreSQL 21.803 seconds
adding an if statement:
Oracle 4.78 seconds
PostgreSQL 24.4 seconds
code
Oracle square root
SET SERVEROUTPUT ON
SET TIMING ON
DECLARE
n NUMBER := 0;
BEGIN
FOR f IN 1..10000000
LOOP
n := SQRT (f);
END LOOP;
END;
PostgreSQL
DO LANGUAGE plpgsql $$ DECLARE n real;
BEGIN
FOR f IN 1..10000000 LOOP
n = SQRT (f);
END LOOP;
RAISE NOTICE 'Result => %',n;
END $$;
oracle adding if
SET SERVEROUTPUT ON
SET TIMING ON
DECLARE
n NUMBER := 0;
BEGIN
FOR f IN 1..10000000
LOOP
if 0 =0 then
n := SQRT (f);
end if;
END LOOP;
postgres adding if
DO LANGUAGE plpgsql $$ DECLARE n real;
BEGIN
FOR f IN 1..10000000 LOOP
if 0=0 then
n = SQRT (f);
end if;
END LOOP;
RAISE NOTICE 'Result => %',n;
END $$;
I used an anonymous block for PostgreSQL. I also did it as a function and got identical results
CREATE OR REPLACE FUNCTION testpostgrescpu()
RETURNS real AS
$BODY$
declare
n real;
BEGIN
FOR f IN 1..10000000 LOOP
n = SQRT (f);
END LOOP;
RETURN n;
END;
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;
ALTER FUNCTION testpostgrescpu()
OWNER TO xxx
Based on what we had heard of PostgreSQL and how it is comparable to Oracle in many ways, we were taken aback by the results. Did we code PostgreSQL incorrectly? What are we missing or is this the way it is.
Note: once we started running queries on the exact same data in Oracle and PostgreSQL we saw a similar pattern. On basic queries little difference, but as they started to get more and more complex Oracle was around 3-5 faster.
Again, this was run on identical AWS RDS instances, we ran them many times during the day on different days and results were always the same
This is a bit of speculation. I would expect Oracle to be slower than Postgres on such calculations. However, I think your may have a performance problem that is in the documentation:
The type numeric can store numbers with a very large number of digits
and perform calculations exactly. It is especially recommended for
storing monetary amounts and other quantities where exactness is
required. However, arithmetic on numeric values is very slow compared
to the integer types, or to the floating-point types described in the
next section.
Your code doesn't declare a data type for f. By context, it would be assigned to be an integer. However, the sqrt() function takes either floating point or a numeric constant. These are not equivalent (and I'm guessing that when a numeric, the function is slower). My guess is that the integer f is converted to a number rather than a real for the operation.
Try running the test by explicitly declaring f to be real or by casting it before the function call. That might improve performance.
I can't see how this will be a useful metric unless you happen to do a lot of computation in pl/sql or pg pl/sql. This isn't really recommended anyway and can be done natively in C or by calling Java classes. Oracle can compile pl/sql natively to c on some platforms / versions and so this may be one of the reasons why you see a big difference in speed.
The speed of a database would be better determined by its ability to perform queries probably including joins with the correct statistics in place or to write and update data. For databases such as Oracle and Postgres sql doing this in a multi-user and transactional environment would be an even better test assuming you have an OLTP application. From what I hear Postgres is doing pretty well with competing with Oracle but it depends on your application.
For better description and analysis of Oracle I'd suggest looking at the asktom https://asktom.oracle.com/ forums. I'm not sure if there is anything close to this for postgres.
To be honest, your benchmark is completely meaningless.
You're computing 1 million square roots and immediately throw away the results; depending on your optimization settings, I'd expect the compiler to completely get rid of your loop.
You should at least store the results somewhere or use them for another computation (e.g. by computing the sum).
Also, I have to disagree with your statement i.e. math and branching since SQL is built on that. A RDBMS does a lot of things, but computing square roots efficiently is certainly not one of its strong sides. If you really, really need this kind of computation, it would make much more sense to move this out of the database and use some kind of specialized software for that, e.g. R .
As others have stated, your example test is pretty meaningless.
I think the basic problem you are having is that you don't really know anything about PostgreSQL and are trying the same basic tricks you would with Oracle.
We am trying to get an idea of the raw performance of Oracle vs PostgreSQL
Well that doesn't really mean anything does it? Unless you're trying to measure raw disk reads or some such.
we have tuned them as much as possible (checked all parameters changed random page cost, set seq scan to off etc)
Well, setting seq_scan to off is unlikely to be something you'd want to do other than to force the planner while exploring a test case. What made you do that? Where in the manuals did it suggest it? You don't say how you changed random-page-cost nor how you determined you had the correct value.
We discovered that with PostgreSQL if the table is >25% of the shared mem setting its table data is not cached.
Well, that's just clearly not possible. Caching takes place at the PostgreSQL and OS level and disk blocks will be cached. How are you measuring this?
(in our case AWS 30 gig instance has a shared mem of 7 gigs, once we get the table size under 2gigs it starts getting cached again)
Well how are you sizing shared_mem then? I'm trying to imagine a scenario where 2G and 7G are both reasonable values and I'm having trouble. You don't supply any memory usage information, so no-one can tell what's going on though.
I think what you need to do is the following:
Get a good hot cup of tea /coffee.
Read through the manuals.
Have a look through the wiki e.g. Tuning Your PostgreSQL Server.
Once you have a reasonable grip on how work-mem and shared-mem operate put some measurement in place on the server so you can see memory usage, disk I/O etc.
Make sure you have a basic understanding of how to EXPLAIN ANALYZE your queries.
Subscribe to one of the postgresql.org mailing-lists (performance seems plausible) so you have somewhere you can hold discussions.
Then start looking at measuring performance.
There are cases where Oracle will be smarter than PostgreSQL, but a general across-the-board major slow-down isn't what you'd expect to see.
I'm a little surprised by these benchmarks, but I'm inclined to notionally agree with Frank Schmitt. While I wouldn't go so far as to say it's "completely meaningless," if you are going to compare two DBMS systems, I would think you would want to look at much more than just how each one does math.
For what it's worth, I used Oracle almost exclusively at my former employer. In my new role, our primary DBMS is Sybase ASE, which is lacking many of the tools I've been accustomed to using, and we are using PostgreSQL as a stop-gap.
There are undoubtedly better write-ups than what I am about to provide, but from a novice's point of view:
Things I miss about Oracle:
OS-authentication (ability to let users log in based on their Windows/Unix Credentials), with no messy password issues
the "merge" statement
bulk inserts and updates through OCI (ODP.net, DBD::Oracle)
the ability to commit partially through a procedure
availability of awesome IDEs (like All Around Automation PL/SQL Developer)
bitmap indexes
more seamless DBlinks
Things I like about PostgreSQL:
the price tag
"copy" is so much easier to use than SQL*Loader
the availablity of drivers like ODBC and Npgsql.dll for .NET that somehow just work
custom functions inside of SQL don't drag down query performance
ability to create custom functions in languages other than PL (ie Perl)
easier to use data types, like a date, time, timestamp and interval
an update syntax that is borderline intuitive (and doesn't require the additional exists wrapper)
Again, I'm far from an expert. Both database platforms are a joy to work with and take care of so much of the heavy lifting.
-- edit --
And I should add, to this day I never figured out how to do this in Oracle:
select * from pg_views
where definition like '%inventory.turns%'
The issue here is with type casting. PostgreSQL sqrt function is defined as accepting double precision (float) or numeric.
So what happens in your code is that the integer is casted to float (which is fast) the float version of sqrt and the result is the type casted from float to real (which is slow).
To see what I am talking about, try to just compare runtime of these two code samples:
DO LANGUAGE plpgsql $$
DECLARE n real;
BEGIN
FOR f IN 1..10000000 LOOP
n = f::float;
END LOOP;
RAISE NOTICE 'Result => %',n;
END $$;
DO LANGUAGE plpgsql $$
DECLARE n float;
BEGIN
FOR f IN 1..10000000 LOOP
n = f::float;
END LOOP;
RAISE NOTICE 'Result => %',n;
END $$;
On my machine first one takes 16s and the second one only 3s.
The moral of the story is that you need to be careful about data types you are using.
You are not performing any PostgreSQL benchmark.
What you are really doing is a pl/pgsqlbenchmark.
You could use any of these PostgreSQL language extensions for this test, and you would probably get rather different results.
There is a pl/pgsql interpreter that will execute your code. It is installed along with PostgreSQL by default.
For further information:
pl/pgsql
implementation
pl/pgsql
overview
Using pl/java you would have a JVM running, pl/sh a shell running.
plpgsql is not an extensively optimized-for-performance language.
I don't know why I would ever want to compute 10 million square roots in an explicit loop inside the database, but if I did I would use plperl to do it.

How to speed up table-retrieval with MATLAB and JDBC?

I am accessing a PostGreSQL 8.4 database with JDBC called by MATLAB.
The tables I am interested in basically consist of various columns of different datatypes. They are selected through their time-stamps.
Since I want to retrieve big amounts of data I am looking for a way of making the request faster than it is right now.
What I am doing at the moment is the following:
First I establish a connection to the database and call it DBConn. Next step would be to prepare a Select-Statement and execute it:
QUERYSTRING = ['SELECT * FROM ' TABLENAME '...
WHERE ts BETWEEN ''' TIMESTART ''' AND ''' TIMEEND ''''];
QUERY = DBConn.prepareStatement(QUERYSTRING);
RESULTSET = QUERY.executeQuery();
Then I store the columntypes in variable COLTYPE (1 for FLOAT, -1 for BOOLEAN and 0 for the rest - nearly all columns contain FLOAT). Next step is to process every row, column by column, and retrieve the data by the corresponding methods. FNAMES contains the fieldnames of the table.
m=0; % Variable containing rownumber
while RESULTSET.next()
m = m+1;
for n = 1:length(FNAMES)
if COLTYPE(n)==1 % Columntype is a FLOAT
DATA{1}.(FNAMES{n})(m,1) = RESULTSET.getDouble(n);
elseif COLTYPE(n)==-1 % Columntype is a BOOLEAN
DATA{1}.(FNAMES{n})(m,1) = RESULTSET.getBoolean(n);
else
DATA{1}.(FNAMES{n}){m,1} = char(RESULTSET.getString(n));
end
end
end
When I am done with my request I close the statement and the connection.
I don´t have the MATLAB database toolbox so I am looking for solutions without it.
I understand that it is very ineffective to request the data of every single field. Still, I failed on finding a way to get more data at once - for example multiple rows of the same column. Is there any way to do so? Do you have other suggestions of speeding the request up?
Summary
To speed this up, push the loops, and then your column datatype conversion, down in to the Java layer, using the Database Toolbox or custom Java code. The Matlab-to-Java method call overhead is probably what's killing you, and there's no way of doing block fetches (multiple rows in one call) with plain JDBC. Make sure the knobs on the JDBC driver you're using are set appropriately. And then optimize the transfer of expensive column data types like strings and dates.
(NB: I haven't done this with Postgres, but have with other DBMSes, and this will apply to Postgres too because most of it is about the JDBC and Matlab layers above it.)
Details
Push loops down to Java to get block fetching
The most straightforward way to get this faster is to push the loops over the rows and columns down in to the Java layer, and have it return blocks of data (e.g. 100 or 1000 rows at a time) to the Matlab layer. There is substantial per-call overhead in invoking a Java method from Matlab, and looping over JDBC calls in M-code is going to incur (see Is MATLAB OOP slow or am I doing something wrong? - full disclosure: that's my answer). If you're calling JDBC from M-code like that, you're incurring that overhead on every single column of every row, and that's probably the majority of your execution time right now.
The JDBC API itself does not support "block cursors" like ODBC does, so you need to get that loop down in to the Java layer. Using the Database Toolbox like Oleg suggests is one way to do it, since they implement their lower-level cursor stuff in Java. (Probably for precisely this reason.) But if you can't have a database toolbox dependency, you can just write your own thin Java layer to do so, and call that from your M-code. (Probably through a Matlab class that is coupled to your custom Java code and knows how to interact with it.) Make the Java code and Matlab code share a block size, buffer up the whole block on the Java side, using primitive arrays instead of object arrays for column buffers wherever possible, and have your M-code fetch the result set in batches, buffering those blocks in cell arrays of primitive column arrays, and then concatenate them together.
Pseudocode for the Matlab layer:
colBufs = repmat( {{}}, [1 nCols] );
while (cursor.hasMore())
cursor.fetchBlock();
for iCol = 1:nCols
colBufs{iCol}{end+1} = cursor.getBlock(iCol); % should come back as primitive
end
end
for iCol = 1:nCols
colResults{iCol} = cat(2, colBufs{iCol}{:});
end
Twiddle JDBC DBMS driver knobs
Make sure your code exposes the DBMS-specific JDBC connection parameters to your M-code layer, and use them. Read the doco for your specific DBMS and fiddle with them appropriately. For example, Oracle's JDBC driver defaults to setting the default fetch buffer size (the one inside their JDBC driver, not the one you're building) to about 10 rows, which is way too small for typical data analysis set sizes. (It incurs a network round trip to the db every time the buffer fills.) Simply setting it to 1,000 or 10,000 rows is like turning on the "Go Fast" switch that had shipped set to "off". Benchmark your speed with sample data sets and graph the results to pick appropriate settings.
Optimize column datatype transfer
In addition to giving you block fetch functionality, writing custom Java code opens up the possibility of doing optimized type conversion for particular column types. After you've got the per-row and per-cell Java call overhead handled, your bottlenecks are probably going to be in date parsing and passing strings back from Java to Matlab. Push the date parsing down in to Java by having it convert SQL date types to Matlab datenums (as Java doubles, with a column type indicator) as they're being buffered, maybe using a cache to avoid recalculation of repeated dates in the same set. (Watch out for TimeZone issues. Consider Joda-Time.) Convert any BigDecimals to double on the Java side. And cellstrs are a big bottleneck - a single char column could swamp the cost of several float columns. Return narrow CHAR columns as 2-d chars instead of cellstrs if you can (by returning a big Java char[] and then using reshape()), converting to cellstr on the Matlab side if necessary. (Returning a Java String[]converts to cellstr less efficiently.) And you can optimize the retrieval of low-cardinality character columns by passing them back as "symbols" - on the Java side, build up a list of the unique string values and map them to numeric codes, and return the strings as an primitive array of numeric codes along with that map of number -> string; convert the distinct strings to cellstr on the Matlab side and then use indexing to expand it to the full array. This will be faster and save you a lot of memory, too, since the copy-on-write optimization will reuse the same primitive char data for repeated string values. Or convert them to categorical or ordinal objects instead of cellstrs, if appropriate. This symbol optimization could be a big win if you use a lot of character data and have large result sets, because then your string columns transfer at about primitive numeric speed, which is substantially faster, and it reduces cellstr's typical memory fragmentation. (Database Toolbox may support some of this stuff now, too. I haven't actually used it in a couple years.)
After that, depending on your DBMS, you could squeeze out a bit more speed by including mappings for all the numeric column type variants your DBMS supports to appropriate numeric types in Matlab, and experimenting with using them in your schema or doing conversions inside your SQL query. For example, Oracle's BINARY_DOUBLE can be a bit faster than their normal NUMERIC on a full trip through a db/Matlab stack like this. YMMV.
You could consider optimizing your schema for this use case by replacing string and date columns with cheaper numeric identifiers, possibly as foreign keys to separate lookup tables to resolve them to the original strings and dates. Lookups could be cached client-side with enough schema knowledge.
If you want to go crazy, you can use multithreading at the Java level to have it asynchronously prefetch and parse the next block of results on separate Java worker thread(s), possibly parallelizing per-column date and string processing if you have a large cursor block size, while you're doing the M-code level processing for the previous block. This really bumps up the difficulty though, and ideally is a small performance win because you've already pushed the expensive data processing down in to the Java layer. Save this for last. And check the JDBC driver doco; it may already effectively be doing this for you.
Miscellaneous
If you're not willing to write custom Java code, you can still get some speedup by changing the syntax of the Java method calls from obj.method(...) to method(obj, ...). E.g. getDouble(RESULTSET, n). It's just a weird Matlab OOP quirk. But this won't be much of a win because you're still paying for the Java/Matlab data conversion on each call.
Also, consider changing your code so you can use ? placeholders and bound parameters in your SQL queries, instead of interpolating strings as SQL literals. If you're doing a custom Java layer, defining your own #connection and #preparedstatement M-code classes is a decent way to do this. So it looks like this.
QUERYSTRING = ['SELECT * FROM ' TABLENAME ' WHERE ts BETWEEN ? AND ?'];
query = conn.prepare(QUERYSTRING);
rslt = query.exec(startTime, endTime);
This will give you better type safety and more readable code, and may also cut down on the server-side overhead of query parsing. This won't give you much speed-up in a scenario with just a few clients, but it'll make coding easier.
Profile and test your code regularly (at both the M-code and Java level) to make sure your bottlenecks are where you think they are, and to see if there are parameters that need to be adjusted based on your data set size, both in terms of row counts and column counts and types. I also like to build in some instrumentation and logging at both the Matlab and Java layer so you can easily get performance measurements (e.g. have it summarize how much time it spent parsing different column types, how much in the Java layer and how much in the Matlab layer, and how much waiting on the server's responses (probably not much due to pipelining, but you never know)). If your DBMS exposes its internal instrumentation, maybe pull that in too, so you can see where you're spending your server-side time.
It occurs to me that to speed up the query to the table, you have to remove if, for that in the JDBC ResulSetMetadata there that give you the information about the data type of the column and the same name, so you save time of if, else if
  ResultSetMetaData RsmD = (ResultSetMetaData) rs.getMetaData ();
        int cols=rsMd.getColumnCount();
        
  while (rs.next)
for i=1 to cols
row[i]=rs.getString(i);
My example is pseudocode becouse i´m not matlab programmer.
I hope you find it useful JDBC, anything let me know!

mysterious oracle query

if a query in oracle takes the first time it is executed 11 minutes, and the next time, the same query 25 seconds, with the buffer being flushed, what is the possible cause? could it be that the query is written in a bad way?
set timing on;
set echo on
set lines 999;
insert into elegrouptmp select idcll,idgrpl,0 from elegroup where idgrpl = 109999990;
insert into SLIMONTMP (idpartes, indi, grecptseqs, devs, idcll, idclrelpayl)
select rel.idpartes, rel.indi, rel.idgres,rel.iddevs,vpers.idcll,nvl(cdsptc.idcll,vpers.idcll)
from
relbqe rel,
elegrouptmp ele,
vrdlpers vpers
left join cdsptc cdsptc on
(cdsptc.idclptcl = vpers.idcll and
cdsptc.cdptcs = 'NOS')
where
rel.idtits = '10BCPGE ' and
vpers.idbqes = rel.idpartes and
vpers.cdqltptfc = 'N' and
vpers.idcll = ele.idelegrpl and
ele.idgrpl = 109999990;
alter system flush shared_pool;
alter system flush buffer_cache;
alter system flush global context;
select /* original */ mvtcta_part_SLIMONtmp.idpartes,mvtcta_part_SLIMONtmp.indi,mvtcta_part_SLIMONtmp.grecptseqs,mvtcta_part_SLIMONtmp.devs,
mvtcta_part_SLIMONtmp.idcll,mvtcta_part_SLIMONtmp.idclrelpayl,mvtcta_part_vrdlpers1.idcll,mvtcta_part_vrdlpers1.shnas,mvtcta_part_vrdlpers1.cdqltptfc,
mvtcta_part_vrdlpers1.idbqes,mvtcta_part_compte1.idcll,mvtcta_part_compte1.grecpts,mvtcta_part_compte1.seqc,mvtcta_part_compte1.devs,mvtcta_part_compte1.sldminud,
mvtcta.idcll,mvtcta.grecptseqs,mvtcta.devs,mvtcta.termel,mvtcta.dtcptl,mvtcta.nusesi,mvtcta.fiches,mvtcta.indl,mvtcta.nuecrs,mvtcta.dtexel,mvtcta.dtvall,
mvtcta.dtpayl,mvtcta.ioi,mvtcta.mtd,mvtcta.cdlibs,mvtcta.libcps,mvtcta.sldinitd,mvtcta.flagtypei,mvtcta.flagetati,mvtcta.flagwarnl,mvtcta.flagdonei,mvtcta.oriindl,
mvtcta.idportfl,mvtcta.extnuecrs
from SLIMONtmp mvtcta_part_SLIMONtmp
left join vrdlpers mvtcta_part_vrdlpers1 on
(
mvtcta_part_vrdlpers1.idbqes = mvtcta_part_SLIMONtmp.idpartes
and mvtcta_part_vrdlpers1.cdqltptfc = 'N'
and mvtcta_part_vrdlpers1.idcll = mvtcta_part_SLIMONtmp.idcll
)
left join compte mvtcta_part_compte1 on
(
mvtcta_part_compte1.idcll = mvtcta_part_vrdlpers1.idcll
and mvtcta_part_compte1.grecpts = substr (mvtcta_part_SLIMONtmp.grecptseqs, 1, 2 )
and mvtcta_part_compte1.seqc = substr (mvtcta_part_SLIMONtmp.grecptseqs, -1 )
and mvtcta_part_compte1.devs = mvtcta_part_SLIMONtmp.devs
and (mvtcta_part_compte1.devs = ' ' or ' ' = ' ')
and mvtcta_part_compte1.cdpartc not in ( 'L' , 'R' )
)
left join mvtcta mvtcta on
(
mvtcta.idcll = mvtcta_part_SLIMONtmp.idclrelpayl
and mvtcta.devs = mvtcta_part_SLIMONtmp.devs
and mvtcta.grecptseqs = mvtcta_part_SLIMONtmp.grecptseqs
and mvtcta.flagdonei <> 0
and mvtcta.devs = mvtcta_part_compte1.devs
and mvtcta.dtvall > 20101206
)
where 1=1
order by mvtcta_part_compte1.devs,
mvtcta_part_SLIMONtmp.idpartes,
mvtcta_part_SLIMONtmp.idclrelpayl,
mvtcta_part_SLIMONtmp.grecptseqs,
mvtcta.dtvall;
"if a query in oracle takes the first
time it is executed 11 minutes, and
the next time, the same query 25
seconds, with the buffer being
flushed, what is the possible cause?"
The thing is, flushing the DB Buffers, like this ...
alter system flush shared_pool
/
... wipes the Oracle data store but there are other places where data gets cached. For instance the chances are your OS caches its file reads.
EXPLAIN PLAN is good as a general guide to how the database thinks it will execute a query, but it is only a prediction. It can be thrown out by poor statistics or ambient conditions. It is not good at explaining why a specific instance of a query took as much time as it did.
So, if you really want to understand what occurs when the database executes a specific query you need to get down and dirty, and learn how to use the Wait Interface. This is a very powerful tracing mechanism, which allows us to see the individual events that happen over the course of a single query execution. Each version of Oracle has extended the utility and richness of the Wait Interface, but it has been essential to proper tuning since Oracle 9i (if not earlier).
Find out more by reading Roger Schrag's very good overview .
In your case you'll want to run the trace multiple times. In order to make it easier to compare results you should use a separate session for each execution, setting the 10046 event each time.
What else is happening on the box when you ran these? You can get different timings based on other processes chewing resources. Also, with a lot of joins, performance will depend on memory usage (hash_area_size, sort_area_size, etc) and availability, so perhaps you are paging (check temp space size/usage also). In short, try sql_trace and tkprof to analyze deeper
Sometimes a block is written to the file system before it is committed (a dirty block). When that block is read later, Oracle sees that it was uncommitted. It checks the open transaction and, if the transaction isn't still there, it knows the change was committed. Therefore it writes the block back as a clean block. It is called delayed block cleanout.
That is one possible reason why reading blocks for the first time can be slower than a subsequent re-read.
Could be the second time the execution plan is known. Maybe the optimizer has a very hard time finding a execution plan for some reason.
Try setting
alter session set optimizer_max_permutations=100;
and rerun the query. See if that makes any difference.
could it be that the query is written in a bad way?
"bad" is a rather emotional expression - but broadly speaking, yes, if a query performs significantly faster the second time it's run, it usually means there are ways to optimize the query.
Actually optimizing the query is - as APC says - rather a question of "down and dirty". Obvious candidate in your examply might be the substring - if the table is huge, and the substring misses the index, I'd imagine that would take a bit of time, and I'd imagine the result of all those substrin operations are cached somewhere.
Here's Tom Kyte's take on flushing Oracle buffers as a testing practice. Suffice it to say he's not a fan. He favors the approach of attempting to emulate your production load with your test data ("real life"), and tossing out the first and last runs. #APC's point about OS caching is Tom's point - to get rid of that (non-trivial!) effect you'd need to bounce the server, not just the database.

Efficient way to compute accumulating value in sqlite3

I have an sqlite3 table that tells when I gain/lose points in a game. Sample/query result:
SELECT time,p2 FROM events WHERE p1='barrycarter' AND action='points'
ORDER BY time;
1280622305|-22
1280625580|-9
1280627919|20
1280688964|21
1280694395|-11
1280698006|28
1280705461|-14
1280706788|-13
[etc]
I now want my running point total. Given that I start w/ 1000 points,
here's one way to do it.
SELECT DISTINCT(time), (SELECT
1000+SUM(p2) FROM events e WHERE p1='barrycarter' AND action='points'
AND e.time <= e2.time) AS points FROM events e2 WHERE p1='barrycarter'
AND action='points' ORDER BY time
but this is highly inefficient. What's a better way to write this?
MySQL has #variables so you can do things like:
SELECT time, #tot := #tot+points ...
but I'm using sqlite3 and the above isn't ANSI standard SQL anyway.
More info on the db if anyone needs it: http://ccgames.db.94y.info/
EDIT: Thanks for the answers! My dilemma: I let anyone run any
single SELECT query on "http://ccgames.db.94y.info/". I want to give
them useful access to my data, but not to the point of allowing
scripting or allowing multiple queries with state. So I need a single
SQL query that can do accumulation. See also:
Existing solution to share database data usefully but safely?
SQLite is meant to be a small embedded database. Given that definition, it is not unreasonable to find many limitations with it. The task at hand is not solvable using SQLite alone, or it will be terribly slow as you have found. The query you have written is a triangular cross join that will not scale, or rather, will scale badly.
The most efficient way to tackle the problem is through the program that is making use of SQLite, e.g. if you were using Web SQL in HTML5, you can easily accumulate in JavaScript.
There is a discussion about this problem in the sqlite mailing list.
Your 2 options are:
Iterate through all the rows with a cursor and calculate the running sum on the client.
Store sums instead of, or as well as storing points. (if you only store sums you can get the points by doing sum(n) - sum(n-1) which is fast).

What was your coolest SQL optimization, on a slow performing query?

Just speaking to a colleague of mine. He was walking with a hop in his step, on the way to the coffee machine.
I asked him "what's with the 'swarmy' walk?", he said, "I just reduced a two hour long query down to 40 seconds! It feels so good".
He altered a stored procedure, that was using cursors and introduced a temp table, that was refactored from the original dataset - I will email him soon, to get more info on actual implemetation.
But ultimately, he was buzzing.
Question is, what SQL, that sticks in your mind and has made you buzz, whilst optimising slow performing queries?
I have to say when I learned how to create and use covered indexes. Now, THAT was a performance booster.
Using SQL's BULKIMPORT to reduce several hours of inherited INSERT code to less than a minute.
It's always nice to take a poorly written, cursor-laden query and eliminate cursors, cut the code by half, and improve performance many-fold.
Some of the best improvements are in clarity (and often result in nice performance boosts, too).
Sorry, I don't tend to get a buzz from that sort of thing but most situations have been pretty basic, monitoring performance of queries and adding indexes to speed them up.
Now increasing the speed of "real" code that I've written by changing data structures and algorithms within the class, that's where I get my buzz (and reputation a the go-to man for performance fixes at work).
hey on the iphone which uses sqlite, i straight away reduced by database processing time from 40 seconds to 2 seconds with the use of exclusive write transactions... i was super happy doing this
as this was my first experience of sql on an embedded device - quite different from the usual server related stuff (indexes, normalizations, etc etc)
--- as far as servers go, indexes are real blessing. also if you take a bit of pain and get rid of as many nulls as you can in your table, you would be surprised with the performance gains - not many developers focus on nulls, they usually go with indexes and other documented stuff
few other lesser exploited ways - using xml to process multiple batch inserts / updates / deletes at 1 go instead of doing 1 insert at a time - in sql 2005 this can be super cool
It's all about indexes. And avoiding stupid things that make them useless.
Changing order of conditions inside WHERE clause so it filters the most discriminating condition first (while at the same time indexes from non-discriminating columns like gender are removed).
Back in the day, I worked on a CICS/DB2 system, written in COBOL. A lot of our queries were doing full table scans (and slow) even though we had all the proper indexes and WHERE clauses.
It turned out (and I may have this backwards, it's been 15 years) that the problem was that we were using PIC S9(n) COMP in WORKING STORAGE for the query parameters, but DB2 wanted PIC S9(n) COMP-3. By using the wrong data type, DB2 had to do a full table scan in order to convert the values in the database to the value we were passing in. We changed our variable definitions and the queries were able to use the indexes now, which dramatically improved our performance.
I had a query that was originally written for SQL Server 6.5, which did not support the SQL 92 join syntax, i.e.
select foo.baz
from foo
left outer join bar
on foo.a = bar.a
was instead written as
select foo.baz
from foo, bar
where foo.a *= bar.a
The query had been around for a while, and the relevant data had accumulated to make the query run too slow, abut 90 seconds to complete. By the time this problem arose, we had upgraded to SQL Server 7.
After mucking about with indexes and other Easter-egging, I changed the join syntax to be SQL 92 compliant. The query time dropped to 3 seconds.
I don't think I'll ever have that feeling again. I was a f%$^ing hero.
I answered this on a previous question ("Biggest performance improvement you’ve had with the smallest change?"), however, it's such a simple improvement, yet one that is and can be so often overlooked, that it bears repeating:
Indexes!
Well we had a similar thing where we had a slow query on a Open Freeway site. The answer wasn't so much optimising the query, but to optimise the server that it was on. We increased the cache limit and cache size so that the server would not run the query so often.
This has massively increased the speed of the system and ultimately made the client happy! :)
Not quite the calibre of the original posts optimisation skills, but it definitely made us buzz!
Splitting one ridiculously long stored procedure, which did a great deal of "if it's after 5 pm, return this bit of sql" and which took in excess of 20 seconds to run, into a set of stored procedures that were called by one controlling sp, and got the times down to subsecond responses.
One Word, Dynamic Queries
If you serching with large numbers of parameters you can discount them from the SQL string. This has sped up my queries dramatically and with reletive ease.
Create PROCEDURE dbo.qryDynamic
(
#txtParameter1 nvarchar(255),
#txtParameter2 nvarchar(255),
AS
SELECT qry_DataFromAView.*
FROM qry_DataFromAView
BEGIN
DECLARE #SQL nvarchar(2500)
DECLARE #txtJoin nvarchar(50)
Set #txtJoin = ' Where '
SET #SQL = 'SELECT qry_DataFromAView.*
FROM qry_DataFromAView'
IF #txtParameter1 is not null
Begin
SET #SQL=#SQL + #txtJoin + ' Field1 LIKE N''%'' + #dynParameter1 + N''%'') '
Set #txtJoin = ' And '
end
IF #txtParameter2 is not null
Begin
SET #SQL=#SQL + #txtJoin + ' Field2 LIKE N''%'' + #dynParameter2 + N''%'') '
Set #txtJoin = ' And '
end
SET #SQL=#SQL + ' ORDER BY Field2'
Exec sp_executesql #SQL, N'#dynParameter1 nvarchar(255), #dynParameter2 nvarchar(255)', #dynParameter1 = #txtParameter1 ,#dynParameter2 = #txtParameter2
END
GO
I had a warm glow after being able to use a Cross Tab query to scrap oodles (technical term) of processing and lookups...
Usually it's simple things like adding indexes or only getting the data you need, but when you find a problem that fits an answer you've seen before... good times!
(Half way of topic)
I rewrote a 3000 line stored procedure into LINQ2SQL/C#.
The stored procedure juggled lots of data between a bunch of unindexed temp tables.
The LINQ2SQL version read the data into a couple of Dictionaries and ILookups and then I joined the data manually with plain old C# code.
The stored procedure took about 20 seconds and the LINQ2SQL/C# version took 0.2 seconds.