How does mySQL handle dynamic value within ORDER BY - sql

It stumbled upon me while I was reading the query in another post.
Take the following query for example (ignore the non-practical use of the ordering):
SELECT
*
FROM Members
ORDER BY (TIMESTAMPDIFF(FRAC_SECOND, DateCreated , SYSDATE()))
Say "Members" table has a huge row count (or the query is complex enough for it to be executed over at least dozen of milliseconds). How does mySQL or other mainstream DB engines evaluate the "SYSDATE()" in the "ORDER BY"?
Say the query takes half a second, the microsecond (FRAC_SECOND) of "SYSDATE" changes 1000 X 1000 X 0.5 = 500 000 times.
My questions are:
Does the "SYSDATE" get fixed on the
start of the query execution or it
gets evaluated and changes as the
execution progresses?
If it's the latter, can I assume the ordering might be jumbled?
UPDATE:
My original post uses NOW as an example of dynamic value, it's SYSDATE now

NOW() returns a constant time that
indicates the time at which the
statement began to execute. (Within a
stored function or trigger, NOW()
returns the time at which the function
or triggering statement began to
execute.) This differs from the
behavior for SYSDATE(), which returns
the exact time at which it executes as
of MySQL 5.0.12.
http://dev.mysql.com/doc/refman/5.0/en/date-and-time-functions.html#function_now
In other words, it is executed only once when the statement is executed.
However, if you want to obtain the time at each execution you should use SYSDATE
As of MySQL 5.0.12, SYSDATE() returns
the time at which it executes. This
differs from the behavior for NOW(),
which returns a constant time that
indicates the time at which the
statement began to execute. (Within a
stored function or trigger, NOW()
returns the time at which the function
or triggering statement began to
execute.)
http://dev.mysql.com/doc/refman/5.0/en/date-and-time-functions.html#function_sysdate
Update:
Well, from what I know Order by will be executed or better said "used" only once. Since the value of TIMESTAMPDIFF(FRAC_SECOND, DateCreated , SYSDATE()) will be different every time you execute the SELECT statement. So, I think (once again I think) ORDER BY will consider either the first evaluated value of the timestampdiff or the last one. Anyway, I think by executing this - you will get a random order every time. Maybe there are better experts than me here who can answer better.

Related

How to get a count of the number of times a sql statement has executed in X hours?

I'm using oracle db. I want to be able to count the number of times that a SQL statement was executed in X hours. For instance, how many times has the statement Select * From ExampleTable been executed in the past 5 hours?
I tried looking in V$SQL, V$SQLSTATS, V$SQLAREA, but they only keep a record of a statement's total amount of executions. They don't store what times the individual executions occurred. Is there any view I missed, or something else that does keep track of each individual statement execution + timestamp so that I can query by which have occurred X hours ago? Thanks for the help.
The views in the Active Workload Repository store historical SQL execution information, specifically the view DBA_HIST_SQLSTAT.
The view is not perfect; it contains a summary of the top SQL statements. This is almost perfect information for performance tuning - in practice, sampling will catch any performance problem. But if you're looking for a perfect record of every SQL execution, as far as I know the only way to get that information is through tracing, which is buggy and slow.
Hopefully this query is good enough:
select begin_interval_time, end_interval_time, executions_delta, dba_hist_sqlstat.*
from dba_hist_sqlstat
join dba_hist_snapshot
on dba_hist_sqlstat.snap_id = dba_hist_snapshot.snap_id
and dba_hist_sqlstat.instance_number = dba_hist_snapshot.instance_number
order by begin_interval_time desc, sql_id;
Apologies for putting this in an answer instead of a comment (I don't have the required reputation), but I think you may be out of luck. Here is an AskTOM asking basically the same question: AskTOM. Tom says unless you are using ASH that just isn't something the database is designed to do.

Performance for selecting multiple out-params from deterministic SUDF

I am about to test the deterministic flag for SUDFs that return multiple values (follow up question to this). The DETERMINISTIC flag should cache the results for same inputs to improve performance. However, I can't figure out how to do this for multiple return values. My SUDF looks as following:
CREATE FUNCTION DET_TEST(IN col BIGINT)
RETURNS a int, b int, c int, d int DETERMINISTIC
AS BEGIN
a = 1;
b = 2;
c = 3;
d = 4;
END;
Now when I execute the following select statements:
1) select DET_TEST(XL_ID).a from XL;
2) select DET_TEST(XL_ID).a, DET_TEST(XL_ID).b from XL;
3) select DET_TEST(XL_ID).a, DET_TEST(XL_ID).b,
DET_TEST(XL_ID).c, DET_TEST(XL_ID).d from XL;
I get the corresponding server processing times:
1) Statement 'select DET_TEST(XL_ID).a from XL'
successfully executed in 1.791 seconds (server processing time: 1.671 seconds)
2) Statement 'select DET_TEST(XL_ID).a, DET_TEST(XL_ID).b from XL'
successfully executed in 2.415 seconds (server processing time: 2.298 seconds)
3) Statement 'select DET_TEST(XL_ID).a, DET_TEST(XL_ID).b, DET_TEST(XL_ID).c, ...'
successfully executed in 4.884 seconds (server processing time: 4.674 seconds)
As you can see the processing time increases even though I call the function with the same input. So is this a bug or is it possible that only a single value is stored in cache but not the whole list of return parameters?
I will try out MAP_MERGE next.
I did some tests with your scenario and can confirm that the response time goes up considerably with every additional result parameter retrieved from the function.
The DETERMINISTIC flag helps here, but not as much as one would hope for since only the result value for distinct input parameters are saved.
So, if the same value(s) are entered into the function and it has been executed before with these value(s) then the result is taken from a cache.
This cache, however, is only valid during a statement. That means: for repeated function evaluations with the same value during a single statement, the DETERMINISTIC function can skip the evaluation of the function and reuse the result.
This doesn't mean, that all output parameters get evaluated once and are then available for reuse. Indeed, with different output parameters, HANA practically has to executed different evaluation graphs. In that sense, asking for different parameters is closer to execute different functions than, say, calling a matrix operation.
So, sorry about raising the hope for a massive improvement with DETERMINISTIC functions in the other thread. At least for your use case, that doesn't really help a lot.
Concerning the MAP_MERGE function, it's important to see that this really helps with horizontal partitioning of data, like one would have it in e.g. classic map-reduce situations.
The use case you presented is actually not doing that but tries to create multiple results for a single input.
During my tests, I actually found it quicker to just define four independent functions and call those in my SELECT statement against my source table.
Depending on the complexity of the calculations you like to do and the amount of data, I probably would look into using the Application Function Library (AFL) SDK for SAP HANA. For details on this, one has to check the relevant SAP notes.

Hibernate query cache based on current time

I have a DAO method which executes the following query to fetch results:
SELECT new com.Person() FROM Person AS person
WHERE (person.start <= now()) AND (person.expires > now()) ORDER BY person.start ASC
The above is a PostgreSQL query. What can I do to enable query caching on the above? If I simply do query.setQueryCache(true), that wouldn't work because the now() will be different each time the above is executed. Is there a best practice to implement such functionality?
Basically you should use discrete values instead of using directly the value of now(), which is always a new one and incompatible with any caching strategy I've heard of :).
So say that you're actually looking to cache data each 15 minutes.
You'd basically have to floor the value of now() to the closest quarter of an hour and use the floored value in the SQL query instead.
You can check out this article on stack for implementing such a thing How to round time to the nearest quarter hour in java?

looking up the control value and run time in a sql query

I have a query in MS Access that in where clause I have:
WHERE (((tb_KonzeptFunktionen.Konzept)=[Formulare]![frm_Fahrzeug]![ID]));
It takes long time to run, but when I delete this where clause the query runs less than a second.
Can I say that pass the [Formulare]![frm_Fahrzeug]![ID] as a parameter does not efficient? Or looking up the control value is slowing it down? If yes how can I solve this problem?
The db engine should retrieve the control's value almost instantaneously. If that WHERE condition slows down your query significantly, it is more likely due to extra work the db engine must perform to retrieve the matching rows. You can check this assumption by temporarily substituting a static known value in place of the control's value.
WHERE tb_KonzeptFunktionen.Konzept=1;
If the version with the static value is equally slow, create an index on tb_KonzeptFunktionen.Konzept and try again.

Get execution time of PostgreSQL query

DECLARE #StartTime datetime,#EndTime datetime
SELECT #StartTime=GETDATE()
select distinct born_on.name
from born_on,died_on
where (FLOOR(('2012-01-30'-born_on.DOB)/365.25) <= (
select max(FLOOR((died_on.DOD - born_on.DOB)/365.25))
from died_on, born_on
where (died_on.name=born_on.name))
)
and (born_on.name <> All(select name from died_on))
SELECT #EndTime=GETDATE()
SELECT DATEDIFF(ms,#StartTime,#EndTime) AS [Duration in millisecs]
I am unable to get the query time. Instead I get the following error:
sql:/home/an/Desktop/dbms/query.sql:9: ERROR: syntax error at or near "#"
LINE 1: DECLARE #StartTime datetime,#EndTime datetime
If you mean in psql, rather than some program you are writing, use \? for the help, and see:
\timing [on|off] toggle timing of commands (currently off)
And then you get output like:
# \timing on
Timing is on.
# select 1234;
?column?
----------
1234
(1 row)
Time: 0.203 ms
There are various ways to measure execution time, each has pros and cons. But whatever you do, some degree of the observer effect applies. I.e., measuring itself may distort the result.
1. EXPLAIN ANALYZE
You can prepend EXPLAIN ANALYZE, which reports the whole query plan with estimated costs actually measured times. The query is actually executed (with all side -effect, if any!). Works for all DDL commands and some others. See:
EXPLAIN ANALYZE not working with ALTER TABLE
To check whether my adapted version of your query is, in fact, faster:
EXPLAIN ANALYZE
SELECT DISTINCT born_on.name
FROM born_on b
WHERE date '2012-01-30' - b.dob <= (
SELECT max(d1.dod - b1.dob)
FROM born_on b1
JOIN died_on d1 USING (name) -- name must be unique!
)
AND NOT EXISTS (
SELECT FROM died_on d2
WHERE d2.name = b.name
);
Execute a couple of times to get more comparable times with warm cache. Several options are available to adjust the level of detail.
While mainly interested in total execution time, make it:
EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF)
Mostly, TIMING matters - the manual:
TIMING
Include actual startup time and time spent in each node in the output.
The overhead of repeatedly reading the system clock can slow down the
query significantly on some systems, so it may be useful to set this
parameter to FALSE when only actual row counts, and not exact times,
are needed. Run time of the entire statement is always measured, even
when node-level timing is turned off with this option. [...]
EXPLAIN ANALYZE measures on the server, using server time from the server OS, excluding network latency. But EXPLAIN adds some overhead to also output the query plan.
2. psql with \timing
Or use \timing in psql. Like Peter demonstrates.
The manual:
\timing [ on | off ]
With a parameter, turns displaying of how long each SQL statement
takes on or off. Without a parameter, toggles the display between on
and off. The display is in milliseconds; intervals longer than 1
second are also shown in minutes:seconds format, with hours and days
fields added if needed.
Important difference: psql measures on the client using local time from the local OS, so the time includes network latency. This can be a negligible difference or huge depending on connection and volume of returned data.
3. Enable log_duration
This has probably the least overhead per measurement and produces the least distorted timings. But it's a little heavy-handed as you have to be superuser, have to adjust the server configuration, cannot just target the execution of a single query, and you have to read the server logs (unless you redirect to stdout).
The manual:
log_duration (boolean)
Causes the duration of every completed statement to be logged. The
default is off. Only superusers can change this setting.
For clients using extended query protocol, durations of the Parse,
Bind, and Execute steps are logged independently.
There are related settings like log_min_duration_statement.
4. Precise manual measurement with clock_timestamp()
The manual:
clock_timestamp() returns the actual current time, and therefore its value changes even within a single SQL command.
filiprem provided a great way to get execution times for ad-hoc queries as exact as possible. On modern hardware, timing overhead should be insignificant but depending on the host OS it can vary wildly. Find out with the server application pg_test_timing.
Else you can mostly filter the overhead like this:
DO
$do$
DECLARE
_timing1 timestamptz;
_start_ts timestamptz;
_end_ts timestamptz;
_overhead numeric; -- in ms
_timing numeric; -- in ms
BEGIN
_timing1 := clock_timestamp();
_start_ts := clock_timestamp();
_end_ts := clock_timestamp();
-- take minimum duration as conservative estimate
_overhead := 1000 * extract(epoch FROM LEAST(_start_ts - _timing1
, _end_ts - _start_ts));
_start_ts := clock_timestamp();
PERFORM 1; -- your query here, replacing the outer SELECT with PERFORM
_end_ts := clock_timestamp();
-- RAISE NOTICE 'Timing overhead in ms = %', _overhead;
RAISE NOTICE 'Execution time in ms = %' , 1000 * (extract(epoch FROM _end_ts - _start_ts)) - _overhead;
END
$do$;
Take the time repeatedly (doing the bare minimum with 3 timestamps here) and pick the minimum interval as conservative estimate for timing overhead. Also, executing the function clock_timestamp() a couple of times should warm it up (in case that matters for your OS).
After measuring the execution time of the payload query, subtract that estimated overhead to get closer to the actual time.
Of course, it's more meaningful for cheap queries to loop 100000 times or execute it on a table with 100000 rows if you can, to make distracting noise insignificant.
PostgreSQL is not Transact-SQL. These are two slightly different things.
In PostgreSQL, this would be something along the lines of
DO $proc$
DECLARE
StartTime timestamptz;
EndTime timestamptz;
Delta double precision;
BEGIN
StartTime := clock_timestamp();
PERFORM foo FROM bar; /* Put your query here, replacing SELECT with PERFORM */
EndTime := clock_timestamp();
Delta := 1000 * ( extract(epoch from EndTime) - extract(epoch from StartTime) );
RAISE NOTICE 'Duration in millisecs=%', Delta;
END;
$proc$;
On the other hand, measuring query time does not have to be this complicated. There are better ways:
In postgres command line client there is a \timing feature which measures query time on client side (similar to duration in bottomright corner of SQL Server Management Studio).
It's possible to record query duration in server log (for every query, or only when it lasted longer than X milliseconds).
It's possible to collect server-side timing for any single statement using the EXPLAIN command:
EXPLAIN (ANALYZE, BUFFERS) YOUR QUERY HERE;