Alternating true/false flag for current time given step - sql

I have a query being executed every X milliseconds. As a part of result I would like to have alternating true/false flag. This flag should change whenever the query is executed again.
Example
Sample query: select 1, <<boolean alternator>>;
1st execution returns: 1 | true
2nd execution returns: 1 | false
3rd execution returns: 1 | true
and so on. It does not matter if it returns true or false for the first time.
For cases when X is odd number of seconds I have the following solution:
select
mod(right(extract(epoch from current_timestamp)::int::varchar,1)::int, 2) = 0
as alternator
This extracts last digit from epoch and then test if this is even number. Because X is defined as odd number, this test will alternate from one execution to another.
What would work in the same way when X is different - even or not in whole seconds? I would like to make it work for X like 500ms, 1200ms, 2000ms, ...
Note: I plan to use this with PostgreSQL.

I suggest a dedicated SEQUENCE.
CREATE SEQUENCE tf_seq MINVALUE 0 MAXVALUE 1 START 0 CYCLE;
Each call with nextval() returns 0 / 1 alternating. You can cast to boolean:
0::bool = FALSE
1::bool = TRUE
So:
SELECT nextval('tf_seq'::regclass)::int::bool;
To keep other roles from messing with the state of the sequence, only
GRANT USAGE ON SEQUENCE tf_seq TO $dedicated_role;. Run your query as that role or create a function with SECURITY DEFINER and ALTER FUNCTION foo() OWNER TO $dedicated_role;
Or, simpler yet, just make it the column default and completely ignore it in your inserts:
ALTER TABLE foo ALTER COLUMN bool_col
SET DEFAULT nextval('tf_seq'::regclass)::int::bool;
You need to grant USAGE on the sequence to roles that can insert.
Every next row gets the flipped value automatically.
The usual notes for sequences apply. Like, if you roll back an INSERT, the sequence stays flipped. Sequence states are never rolled back.

A temporary table can save the boolean state:
create temporary table t (b boolean);
insert into t (b) values (true);
with u as (
update t
set b = not b
returning b
)
select 1, b
from t;

I would do the same (boolean flipping with not b) with a pltcl function.
Advantage: variable is cached in your session tcl interpreter. Disadvantage: Any pl??? function has a call overhead.
Test table, insert some values:
strobel=# create table test (i integer);
CREATE TABLE
strobel=# insert into test (select * from generate_series(1, 10));
INSERT 0 10
The function:
create or replace function flipper () returns boolean as $$
if {![info exists ::flag]} {set ::flag false}
return [set ::flag [expr {! $::flag}]]
$$ language pltcl volatile;
The test:
strobel=# select *, flipper() from test;
i | flipper
----+---------
1 | t
2 | f
3 | t
4 | f
5 | t

Related

Accessing specific row value by id in a scalar SQL function

How can I use a specific value of a field 'ranking' from table 'course' by 'course_id' in a function? I need to return 1 if the value given by a parameter is higher and 0 if the value is lower. So, I need to get the data from the table somehow based on course_id as a parameter.
CREATE OR ALTER FUNCTION dbo.f_rank (#course_id INT, #user_ranking INT)
RETURNS INT
AS
BEGIN
RETURN
CASE
WHEN #course_id.ranking > #user_ranking THEN 1
ELSE 0
END
END
After the function returns 0 or 1 I need to display:
If on call function returns 1 then display ‘Ranking of <course_name> is above score’, else ‘Ranking of <course_name> is below score’.
Sample data:
course_id | course_name | ranking
1 | GIT | 10
2 | CSS | 2
3 | C++ | 6
I need to compare a ranking of course_id = 1 for example which is 10 with the random number given as a parameter. course_id is also given as a parameter.
For example:
If the user chooses as input params (course_id = 1 and user_ranking = 5)
Expected result:
'Ranking of GIT is above score' - if function returns 1
'Ranking of GIT is below score' - if function returns 0
I assume that you probably want something like this (no expected results were given on request):
--Note you will have to DROP your old function if already exists as a scalar function first.
--You cannot ALTER a scalar function to be a table function.
CREATE OR ALTER FUNCTION dbo.f_rank (#course_id INT, #user_ranking INT)
RETURNS table
AS RETURN
SELECT CONVERT(bit, CASE WHEN C.ranking > #user_ranking THEN 1 ELSE 0 END) AS SomeColumnAlias --Obviusly give this a proper name
FROM dbo.Course C
WHERE C.Course_id = #course_id;
GO
As I mentioned, I use an inline table value function, as this will likely be more performant, and you don't mention your version of SQL Server (so don't know if you are on 2019, and so could use an inlineable scalar function).

Oracle performance: query executing multiple identical function calls

Is it possible for Oracle to reuse the result of a function when it is called in the same query (transaction?) without the use of the function result cache?
The application I am working with is heavily reliant on Oracle functions. Many queries end up executing the exact same functions multiple times.
A typical example would be:
SELECT my_package.my_function(my_id),
my_package.my_function(my_id) / 24,
my_package.function_also_calling_my_function(my_id)
FROM my_table
WHERE my_table.id = my_id;
I have noticed that Oracle always executes each of these functions, not realizing that the same function was called just a second ago in the same query. It is possible that some elements in the function get cached, resulting in a slightly faster return. This is not relevant to my question as I want to avoid the entire second or third execution.
Assume that the functions are fairly resource-consuming and that these functions may call more functions, basing their result on tables that are reasonably large and with frequent updates (a million records, updates with say 1000 updates per hour). For this reason it is not possible to use Oracle's Function Result Cache.
Even though the data is changing frequently, I expect the result of these functions to be the same when they are called from the same query.
Is it possible for Oracle to reuse the result of these functions and how? I am using Oracle11g and Oracle12c.
Below is an example (just a random non-sense function to illustrate the problem):
-- Takes 200 ms
SELECT test_package.testSpeed('STANDARD', 'REGEXP_COUNT')
FROM dual;
-- Takes 400ms
SELECT test_package.testSpeed('STANDARD', 'REGEXP_COUNT')
, test_package.testSpeed('STANDARD', 'REGEXP_COUNT')
FROM dual;
Used functions:
CREATE OR REPLACE PACKAGE test_package IS
FUNCTION testSpeed (p_package_name VARCHAR2, p_object_name VARCHAR2)
RETURN NUMBER;
END;
/
CREATE OR REPLACE PACKAGE BODY test_package IS
FUNCTION testSpeed (p_package_name VARCHAR2, p_object_name VARCHAR2)
RETURN NUMBER
IS
ln_total NUMBER;
BEGIN
SELECT SUM(position) INTO ln_total
FROM all_arguments
WHERE package_name = 'STANDARD'
AND object_name = 'REGEXP_COUNT';
RETURN ln_total;
END testSpeed;
END;
/
Add an inline view and a ROWNUM to prevent the Oracle from re-writing the query into a single query block and executing the functions multiple times.
Sample function and demonstration of the problem
create or replace function wait_1_second return number is
begin
execute immediate 'begin dbms_lock.sleep(1); end;';
-- ...
-- Do something here to make caching impossible.
-- ...
return 1;
end;
/
--1 second
select wait_1_second() from dual;
--2 seconds
select wait_1_second(), wait_1_second() from dual;
--3 seconds
select wait_1_second(), wait_1_second() , wait_1_second() from dual;
Simple query changes that do NOT work
Both of these methods still take 2 seconds, not 1.
select x, x
from
(
select wait_1_second() x from dual
);
with execute_function as (select wait_1_second() x from dual)
select x, x from execute_function;
Forcing Oracle to execute in a specific order
It's difficult to tell Oracle "execute this code by itself, don't do any predicate pushing, merging, or other transformations on it". There are hints for each of those optimizations, but they are difficult to use. There are a few ways to disable those transformations, adding an extra ROWNUM is usually the easiest.
--Only takes 1 second
select x, x
from
(
select wait_1_second() x, rownum
from dual
);
It's hard to see exactly where the functions get evaluated. But these explain plans show how the ROWNUM causes the inline view to run separately.
explain plan for select x, x from (select wait_1_second() x from dual);
select * from table(dbms_xplan.display(format=>'basic'));
Plan hash value: 1388734953
---------------------------------
| Id | Operation | Name |
---------------------------------
| 0 | SELECT STATEMENT | |
| 1 | FAST DUAL | |
---------------------------------
explain plan for select x, x from (select wait_1_second() x, rownum from dual);
select * from table(dbms_xplan.display(format=>'basic'));
Plan hash value: 1143117158
---------------------------------
| Id | Operation | Name |
---------------------------------
| 0 | SELECT STATEMENT | |
| 1 | VIEW | |
| 2 | COUNT | |
| 3 | FAST DUAL | |
---------------------------------
You can try the deterministic keyword to mark functions as pure. Whether or not this actually improves performance is another question though.
Update:
I don't know how realistic your example above is, but in theory you can always try to re-structure your SQL so it knows about repeated functions calls (actually repeated values). Kind of like
select x,x from (
SELECT test_package.testSpeed('STANDARD', 'REGEXP_COUNT') x
FROM dual
)
Use an in-line view.
with get_functions as(
SELECT my_package.my_function(my_id) as func_val,
my_package.function_also_calling_my_function(my_id) func_val_2
FROM my_table
WHERE my_table.id = my_id
)
select func_val,
func_val / 24 as func_val_adj,
func_val_2
from get_functions;
If you want to eliminate the call for item 3, instead pass the result of func_val to the third function.

Why does a missing primary key/unique key cause deadlock issues on upsert?

I came across a schema and an upsert stored procedure that was causing deadlock issues. I have a general idea about why this is causing deadlock and how to fix it. I can reproduce it but I don't have a clear understanding of the sequence of steps that is causing it. It would be great if someone can explain clearly why this is causing deadlock.
Here is the schema and the stored procedures. This code is being executed on PostgreSQL 9.2.2.
CREATE TABLE counters (
count_type INTEGER NOT NULL,
count_id INTEGER NOT NULL,
count INTEGER NOT NULL
);
CREATE TABLE primary_relation (
id INTEGER PRIMARY KEY,
a_counter INTEGER NOT NULL DEFAULT 0
);
INSERT INTO primary_relation
SELECT i FROM generate_series(1,5) AS i;
CREATE OR REPLACE FUNCTION increment_count(ctype integer, cid integer, i integer) RETURNS VOID
AS $$
BEGIN
LOOP
UPDATE counters
SET count = count + i
WHERE count_type = ctype AND count_id = cid;
IF FOUND THEN
RETURN;
END IF;
BEGIN
INSERT INTO counters (count_type, count_id, count)
VALUES (ctype, cid, i);
RETURN;
EXCEPTION WHEN OTHERS THEN
END;
END LOOP;
END;
$$
LANGUAGE PLPGSQL;
CREATE OR REPLACE FUNCTION update_primary_a_count(ctype integer) RETURNS VOID
AS $$
WITH deleted_counts_cte AS (
DELETE
FROM counters
WHERE count_type = ctype
RETURNING *
), rollup_cte AS (
SELECT count_id, SUM(count) AS count
FROM deleted_counts_cte
GROUP BY count_id
HAVING SUM(count) <> 0
)
UPDATE primary_relation
SET a_counter = a_counter + rollup_cte.count
FROM rollup_cte
WHERE primary_relation.id = rollup_cte.count_id
$$ LANGUAGE SQL;
And here is a python script to reproduce the deadlock.
import os
import random
import time
import psycopg2
COUNTERS = 5
THREADS = 10
ITERATIONS = 500
def increment():
outf = open('synctest.out.%d' % os.getpid(), 'w')
conn = psycopg2.connect(database="test")
cur = conn.cursor()
for i in range(0,ITERATIONS):
time.sleep(random.random())
start = time.time()
cur.execute("SELECT increment_count(0, %s, 1)", [random.randint(1,COUNTERS)])
conn.commit()
outf.write("%f\n" % (time.time() - start))
conn.close()
outf.close()
def update(n):
outf = open('synctest.update', 'w')
conn = psycopg2.connect(database="test")
cur = conn.cursor()
for i in range(0,n):
time.sleep(random.random())
start = time.time()
cur.execute("SELECT update_primary_a_count(0)")
conn.commit()
outf.write("%f\n" % (time.time() - start))
conn.close()
pids = []
for i in range(THREADS):
pid = os.fork()
if pid != 0:
print 'Process %d spawned' % pid
pids.append(pid)
else:
print 'Starting child %d' % os.getpid()
increment()
print 'Exiting child %d' % os.getpid()
os._exit(0)
update(ITERATIONS)
for pid in pids:
print "waiting on %d" % pid
os.waitpid(pid, 0)
# cleanup
update(1)
I recognize that one issue with this is that the upsert will can produce duplicate rows (with multiple writers) which will likely result in some double counting. But why will this result in deadlock?
The error I get from PostgreSQL is something like the following:
process 91924 detected deadlock while waiting for ShareLock on transaction 4683083 after 100.559 ms",,,,,"SQL statement ""UPDATE counters
And the client spews something like this:
psycopg2.extensions.TransactionRollbackError: deadlock detected
DETAIL: Process 91924 waits for ShareLock on transaction 4683083; blocked by process 91933.
Process 91933 waits for ShareLock on transaction 4683079; blocked by process 91924.
HINT: See server log for query details.CONTEXT: SQL statement "UPDATE counters
SET count = count + i
WHERE count_type = ctype AND count_id = cid"
PL/pgSQL function increment_count(integer,integer,integer) line 4 at SQL statement
To fix the issue, you need to add a primary key like so:
ALTER TABLE counters ADD PRIMARY KEY (count_type, count_id);
Any insight would be greatly appreciated. Thanks!
because of the primary key, the number of rows in this table is always <= # threads, and the primary key ensures that no row is repeated.
When you remove the primary key, some of the threads get lagged behind and the number of rows increases, and at the same time rows get repeated. When the rows get repeated, then the update time is longer and 2 or more threads will try to update the same row(s).
Open a new terminal and type:
watch --interval 1 "psql -tc \"select count(*) from counters\" test"
Try this with and without the primary key. When you get the first deadlock, look at the results of the query above. In my case this is what I am left with in the table counters:
test=# select * from counters order by 2;
count_type | count_id | count
------------+----------+-------
0 | 1 | 735
0 | 1 | 733
0 | 1 | 735
0 | 1 | 735
0 | 2 | 916
0 | 2 | 914
0 | 2 | 914
0 | 3 | 882
0 | 4 | 999
0 | 5 | 691
0 | 5 | 692
(11 rows)
Your code is the perfect recipe for a race condition (multiple threads, random sleeps).
The problem is most probably due to locking issues, since you don't mention the locking mode i'm going to assume that is a page based lock so, you get the following scenario:
Thread 1 starts, it begins to insert records, lets say that it locks page n° 1, and should lock page 2.
Thread 2 starts, at the same time as 1, but it locks first page 2, and should lock page 1 next.
Both threads are now waiting on each other to complete, so you have a deadlock.
Now, why a PK fixes it?
Because locking is done via index at first, you the race condition is mitigated because the PK is unique on the inserts, so all threads wait for the index, and in updates access is done via index so the record is locked based on its PK.
At some point one user is waiting on a lock another user has, while the first user owns a lock that the second user wants. This is what causes a deadlock.
At a guess, it's because without a primary key (or in fact any key) when you UPDATE counters in your increment sp it is having to read the whole table. The same with the primary_relation table. This is going to leave locks strewn about, and open the way for a deadlock. I'm not a Postgres user so I don't know the details of exactly when it will place locks, but I'm pretty sure that this is what is happening.
Putting a PK on counters is making it possible for the DB to target the rows it reads accurately and put on the minimum number of locks. You should really have a PK on primary_relation too!

How to avoid multiple function evals with the (func()).* syntax in a query?

Context
When a function returns a TABLE or a SETOF composite-type, like this one:
CREATE FUNCTION func(n int) returns table(i int, j bigint) as $$
BEGIN
RETURN QUERY select 1,n::bigint
union all select 2,n*n::bigint
union all select 3,n*n*n::bigint;
END
$$ language plpgsql;
the results can be accessed by various methods:
select * from func(3) will produce these output columns :
i | j
---+---
1 | 3
2 | 9
3 | 27
select func(3) will produce only one output column of ROW type.
func
-------
(1,3)
(2,9)
(3,27)
select (func(3)).* will produce like #1:
i | j
---+---
1 | 3
2 | 9
3 | 27
When the function argument comes from a table or a subquery, the syntax #3 is the only possible one, as in:
select N, (func(N)).*
from (select 2 as N union select 3 as N) s;
or as in this related answer. If we had LATERAL JOIN we could use that, but until PostgreSQL 9.3 is out, it's not supported, and the previous versions will still be used for years anyway.
Problem
The problem with syntax #3 is that the function is called as many times as there are columns in the result. There's no apparent reason for that, but it happens.
We can see it in version 9.2 by adding a RAISE NOTICE 'called for %', n in the function. With the query above, it outputs:
NOTICE: called for 2
NOTICE: called for 2
NOTICE: called for 3
NOTICE: called for 3
Now, if the function is changed to return 4 columns, like this:
CREATE FUNCTION func(n int) returns table(i int, j bigint,k int, l int) as $$
BEGIN
raise notice 'called for %', n;
RETURN QUERY select 1,n::bigint,1,1
union all select 2,n*n::bigint,1,1
union all select 3,n*n*n::bigint,1,1;
END
$$ language plpgsql stable;
then the same query outputs:
NOTICE: called for 2
NOTICE: called for 2
NOTICE: called for 2
NOTICE: called for 2
NOTICE: called for 3
NOTICE: called for 3
NOTICE: called for 3
NOTICE: called for 3
2 function calls were needed, 8 were actually made. The ratio is the number of output columns.
With syntax #2 that produces the same result except for the output columns layout, these multiple calls don't happen:
select N, func(N)
from (select 2 as N union select 3 as N) s;
gives:
NOTICE: called for 2
NOTICE: called for 3
followed by the 6 resulting rows:
n | func
---+------------
2 | (1,2,1,1)
2 | (2,4,1,1)
2 | (3,8,1,1)
3 | (1,3,1,1)
3 | (2,9,1,1)
3 | (3,27,1,1)
Questions
Is there a syntax or a construct with 9.2 that would achieve the expected result by doing only the minimum required function calls?
Bonus question: why do the multiple evaluations happen at all?
You can wrap it up in a subquery but that's not guaranteed safe without the OFFSET 0 hack. In 9.3, use LATERAL. The problem is caused by the parser effectively macro-expanding * into a column list.
Workaround
Where:
SELECT (my_func(x)).* FROM some_table;
will evaluate my_func n times for n result columns from the function, this formulation:
SELECT (mf).* FROM (
SELECT my_func(x) AS mf FROM some_table
) sub;
generally will not, and tends not to add an additional scan at runtime. To guarantee that multiple evaluation won't be performed you can use the OFFSET 0 hack or abuse PostgreSQL's failure to optimise across CTE boundaries:
SELECT (mf).* FROM (
SELECT my_func(x) AS mf FROM some_table OFFSET 0
) sub;
or:
WITH tmp(mf) AS (
SELECT my_func(x) FROM some_table
)
SELECT (mf).* FROM tmp;
In PostgreSQL 9.3 you can use LATERAL to get a saner behaviour:
SELECT mf.*
FROM some_table
LEFT JOIN LATERAL my_func(some_table.x) AS mf ON true;
LEFT JOIN LATERAL ... ON true retains all rows like the original query, even if the function call returns no row.
Demo
Create a function that isn't inlineable as a demonstration:
CREATE OR REPLACE FUNCTION my_func(integer)
RETURNS TABLE(a integer, b integer, c integer) AS $$
BEGIN
RAISE NOTICE 'my_func(%)',$1;
RETURN QUERY SELECT $1, $1, $1;
END;
$$ LANGUAGE plpgsql;
and a table of dummy data:
CREATE TABLE some_table AS SELECT x FROM generate_series(1,10) x;
then try the above versions. You'll see that the first raises three notices per invocation; the latter only raise one.
Why?
Good question. It's horrible.
It looks like:
(func(x)).*
is expanded as:
(my_func(x)).i, (func(x)).j, (func(x)).k, (func(x)).l
in parsing, according to a look at debug_print_parse, debug_print_rewritten and debug_print_plan. The (trimmed) parse tree looks like this:
:targetList (
{TARGETENTRY
:expr
{FIELDSELECT
:arg
{FUNCEXPR
:funcid 57168
...
}
:fieldnum 1
:resulttype 23
:resulttypmod -1
:resultcollid 0
}
:resno 1
:resname i
...
}
{TARGETENTRY
:expr
{FIELDSELECT
:arg
{FUNCEXPR
:funcid 57168
...
}
:fieldnum 2
:resulttype 20
:resulttypmod -1
:resultcollid 0
}
:resno 2
:resname j
...
}
{TARGETENTRY
:expr
{FIELDSELECT
:arg
{FUNCEXPR
:funcid 57168
...
}
:fieldnum 3
:...
}
:resno 3
:resname k
...
}
{TARGETENTRY
:expr
{FIELDSELECT
:arg
{FUNCEXPR
:funcid 57168
...
}
:fieldnum 4
...
}
:resno 4
:resname l
...
}
)
So basically, we're using a dumb parser hack to expand wildcards by cloning nodes.

Selecting positive aggregate value and ignoring negative in Postgres SQL

I must apply a certain transformation fn(argument). Here argument is equal to value, but not when it is negative. When you get a first negative value, then you "wait" until it sums up with consecutive values and this sum becomes positive. Then you do fn(argument). See the table I want to get:
value argument
---------------------
2 2
3 3
-10 0
4 0
3 0
10 7
1 1
I could have summed all values and apply fn to the sum, but fn can be different for different rows and it is essential to know the row number to choose a concrete fn.
As want a Postgres SQL solution, looks like window functions fit, but I am not experienced enough to write expression that does that yet. In fact, I am new to "thinking in sql", unfortunately. I guess that can be easily done in an imperative way, but I do not want to write a stored procedure yet.
I suppose I'm late, but this may help someone:
select
value,
greatest(0, value) as argument
from your_table;
This doesn't really fit any of the predefined aggregation functions. You probably need to write your own. Note that in postgresql, aggregate functions can be used as window functions, and in fact that is the only way to write window functions in anything other than C, as of 9.0.
You can write a function that tracks the state of "summing" the values, except that it always returns the input value if the current "sum" is positive, and just keeps adding when the "sum" is negative. Then you simply need to take the greater of either this sum or zero. To whit:
-- accumulator function: first arg is state, second arg is input
create or replace function ouraggfunc(int, int)
returns int immutable language plpgsql as $$
begin
raise info 'ouraggfunc: %, %', $1, $2; -- to help you see what's going on
-- get started by returning the first value ($1 is null - no state - first row)
if $1 is null then
return $2;
end if;
-- if our state is negative, we're summing until it becomes positive
-- otherwise, we're just returning the input
if $1 < 0 then
return $1 + $2;
else
return $2;
end if;
end;
$$;
You need to create an aggregate function to invoke this accumulator:
create aggregate ouragg(basetype = int, sfunc = ouraggfunc, stype = int);
This defines that the aggregate takes integers as input and stores its state as an integer.
I copied your example into a table:
steve#steve#[local] =# create table t(id serial primary key, value int not null, argument int not null);
NOTICE: CREATE TABLE will create implicit sequence "t_id_seq" for serial column "t.id"
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "t_pkey" for table "t"
CREATE TABLE
steve#steve#[local] =# copy t(value, argument) from stdin;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> 2 2
>> 3 3
>> -10 0
>> 4 0
>> 3 0
>> 10 7
>> 1 1
>> \.
And you can now have those values produced by using the aggregate function with a window clause:
steve#steve#[local] =# select value, argument, ouragg(value) over(order by id) from t;
INFO: ouraggfunc: <NULL>, 2
INFO: ouraggfunc: 2, 3
INFO: ouraggfunc: 3, -10
INFO: ouraggfunc: -10, 4
INFO: ouraggfunc: -6, 3
INFO: ouraggfunc: -3, 10
INFO: ouraggfunc: 7, 1
value | argument | ouragg
-------+----------+--------
2 | 2 | 2
3 | 3 | 3
-10 | 0 | -10
4 | 0 | -6
3 | 0 | -3
10 | 7 | 7
1 | 1 | 1
(7 rows)
So as you can see, the final step is that you need to take the output of the function if it is positive, or zero. This can be done by wrapping the query, or writing a function to do that:
create function positive(int) returns int immutable strict language sql as
$$ select case when $1 > 0 then $1 else 0 end $$;
and now:
select value, argument, positive(ouragg(value) over(order by id)) as raw_agg from t
This produces the arguments for the function that you specified in the question.