From the Postgres documentation (https://www.postgresql.org/docs/9.6/sql-createaggregate.html) I find it hard to deduce what the parameter SORTOP does.
Is this option only applicable to an ordered-set aggregate?
Concretely I'm trying to create an aggregate function that finds the most frequent number in a column of numbers. I thought specifying the SORTOP option would sort the data before executing my self defined aggregate function, but this doesn't seem to be the case.
Here is my current implementation that only works when the input data is sorted.
It loops over the rows and keeps track of the largest sequence of previous numbers (largfreq variables in state) and the amount of repetitions seen so far of the number that it's currently on (currfreq variables in state).
CREATE TYPE largfreq_state AS (
largfreq_val INT,
largfreq INT,
currfreq_val INT,
currfreq INT
);
CREATE FUNCTION slargfreq(state largfreq_state, x INT) RETURNS largfreq_state AS $$
BEGIN
if state.currfreq_val <> x then
if state.currfreq >= state.largfreq then
state.largfreq = state.currfreq;
state.largfreq_val = state.currfreq_val;
end if;
state.currfreq = 1;
state.currfreq_val = x;
else
state.currfreq = state.currfreq + 1;
end if;
return state;
END;
$$ language plpgsql;
CREATE FUNCTION flargfreq(state largfreq_state) RETURNS INT AS $$
BEGIN
if state.currfreq >= state.largfreq then
return state.currfreq_val;
else
return state.largfreq_val;
end if;
END;
$$ language plpgsql;
CREATE AGGREGATE largfreq(INT) (
SFUNC = slargfreq,
STYPE = largfreq_state,
FINALFUNC = flargfreq,
INITCOND = '(0, 0, 0, 0)',
SORTOP = <
);
This is well explained in the documentation:
Aggregates that behave like MIN or MAX can sometimes be optimized by looking into an index instead of scanning every input row. If this aggregate can be so optimized, indicate it by specifying a sort operator. The basic requirement is that the aggregate must yield the first element in the sort ordering induced by the operator; in other words:
SELECT agg(col) FROM tab;
must be equivalent to:
SELECT col FROM tab ORDER BY col USING sortop LIMIT 1;
So you need that for aggregates that can be calculated using an index scan.
so I am trying to write to an array in PL/SQL, and I always get the subscript outside of limit error. I've seen similar posts and implemented everything based on those answers, I can't seem to find what I'm doing wrong. The line giving the error is "arr_quartosLivres(counter) := q.id;" I've tried to extend the array and it still doesn't work, however, either way, the look only runs 21 times (because there are only 21 values in the table quarto) so it shouldn't even need to be extended. Any help would be highly appreciated! Thank you
SET SERVEROUTPUT ON;
DECLARE
p_idReserva reserva.id%type := 408;
v_dataEntradaReserva reserva.data_entrada%type;
counter integer := 0;
type arr_aux IS varray(21) of quarto.id%type;
arr_quartosLivres arr_aux := arr_aux();
BEGIN
SELECT data_entrada INTO v_dataEntradaReserva FROM reserva WHERE id = p_idreserva;
FOR q IN (SELECT * FROM quarto)
LOOP
BEGIN
IF isQuartoIndisponivel(q.id, v_dataEntradaReserva)
THEN DBMS_OUTPUT.PUT_LINE('nao disponivel' || counter);
arr_quartosLivres(counter) := q.id;
ELSE DBMS_OUTPUT.PUT_LINE('disponivel' || counter);
END IF;
counter := counter + 1;
END;
END LOOP;
END;
The index values for varray begin with 1. Your logic is trying to use index value 0. Thus index out of range. BTW extend does not apply to varray, when declared a varray has a fixed size. You have 3 solutions: initialize counter to 1 instead of 0, or move incrementing it prior to its use as an index. Since as it stands you increment every time through the loop, even when the IF condition returns false and you do not use the counter as an index, leaving a NULL value in the array.But you use counter for 2 different purposes: Counting rows processed and index into the array. Since the row value may not be put into the array then your 3rd option is to introduce another variable for the index. Further there is no need for the BEGIN ... End block in the loop.
declare
p_idreserva reserva.id%type := 408;
v_dataentradareserva reserva.data_entrada%type;
counter integer := 0;
type arr_aux is varray(21) of quarto.id%type;
arr_quartoslivres arr_aux := arr_aux();
varray_index integer := 1 ; -- index varaibal for varray.
begin
select data_entrada into v_dataentradareserva from reserva where id = p_idreserva;
for q in (select * from quarto)
loop
if isquartoindisponivel(q.id, v_dataentradareserva)
then
dbms_output.put_line('nao disponivel' || counter || ' at index ' || varray_index);
arr_quartoslivres(varray_index) := q.id;
varray_index := varray_index + 1;
else
dbms_output.put_line('disponivel' || counter);
end if;
counter := counter + 1;
end loop;
end;
I have this code:
declare
sName varchar(25);
iRank number := 0;
sDesc varchar(510);
cursor q is
SELECT *
FROM trec_topics ORDER BY num;
BEGIN
for ql in q
loop
sDesc := replace(replace(replace(ql.title, '?', '{?}'), ')', '{)}'), '(', '{(}');
--dbms_output.put_line(ql.num||'-'||sDesc);
declare
cursor c is
SELECT /*+ FIRST_ROWS(100) */ docno,
CASE
WHEN SCORE(10) >= SCORE(20) THEN SCORE(10)
ELSE SCORE(20)
END AS SCORE
FROM txt_search_docs WHERE CONTAINS(txt, 'DEFINESCORE(ql.title, OCCURRENCE)', 10) > 0 OR
CONTAINS(txt, 'DEFINESCORE(sDesc, OCCURRENCE)', 20) > 0
order by SCORE desc;
begin
iRank := 1;
for c1 in c
loop
dbms_output.put_line(ql.num||' Q0 '||c1.docno||' '||lpad(iRank,3, '0')||' '||lpad(c1.score, 2, '0')||' myUser');
iRank := iRank + 1;
exit when c%rowcount = 100;
end loop;
end;
end loop;
end;
As you can see I'm doing select on two different tables, however, I need to change the standard score, as it did not perform well. I'm trying to use the DEFINESCORE clause that has this 'DEFINESCORE (query_term, scoring_expression)' format.
How can I call the table columns within this clause? That is, I need to call my columns instead of "query_term", as there are several documents to do the search. Because the way I’m calling him, he’s looking for exactly the term ql.title
Anyone a suggestion to help me with this problem?
I finally managed to solve it.
It was about:
create a variable: topics varchar (525);
store the column value: topics := replace(replace(replace(ql.title, '?', '{?}'), ')', '{)}'), '(', '{(}');
and after calling it in the CONTAINS clause: FROM txt_search_docs WHERE CONTAINS(txt, 'DEFINESCORE(('''||topics||'''), OCCURRENCE)', 1) > 0
I've seen a bunch of different solutions on StackOverflow that span many years and many Postgres versions, but with some of the newer features like gen_random_bytes I want to ask again to see if there is a simpler solution in newer versions.
Given IDs which contain a-zA-Z0-9, and vary in size depending on where they're used, like...
bTFTxFDPPq
tcgHAdW3BD
IIo11r9J0D
FUW5I8iCiS
uXolWvg49Co5EfCo
LOscuAZu37yV84Sa
YyrbwLTRDb01TmyE
HoQk3a6atGWRMCSA
HwHSZgGRStDMwnNXHk3FmLDEbWAHE1Q9
qgpDcrNSMg87ngwcXTaZ9iImoUmXhSAv
RVZjqdKvtoafLi1O5HlvlpJoKzGeKJYS
3Rls4DjWxJaLfIJyXIEpcjWuh51aHHtK
(Like the IDs that Stripe uses.)
How can you generate them randomly and safely (as far as reducing collisions and reducing predictability goes) with an easy way to specify different lengths for different use cases, in Postgres 9.6+?
I'm thinking that ideally the solution has a signature similar to:
generate_uid(size integer) returns text
Where size is customizable depending on your own tradeoffs for lowering the chance of collisions vs. reducing the string size for usability.
From what I can tell, it must use gen_random_bytes() instead of random() for true randomness, to reduce the chance that they can be guessed.
Thanks!
I know there's gen_random_uuid() for UUIDs, but I don't want to use them in this case. I'm looking for something that gives me IDs similar to what Stripe (or others) use, that look like: "id": "ch_19iRv22eZvKYlo2CAxkjuHxZ" that are as short as possible while still containing only alphanumeric characters.
This requirement is also why encode(gen_random_bytes(), 'hex') isn't quite right for this case, since it reduces the character set and thus forces me to increase the length of the strings to avoid collisions.
I'm currently doing this in the application layer, but I'm looking to move it into the database layer to reduce interdependencies. Here's what the Node.js code for doing it in the application layer might look like:
var crypto = require('crypto');
var set = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789';
function generate(length) {
var bytes = crypto.randomBytes(length);
var chars = [];
for (var i = 0; i < bytes.length; i++) {
chars.push(set[bytes[i] % set.length]);
}
return chars.join('');
}
Figured this out, here's a function that does it:
CREATE OR REPLACE FUNCTION generate_uid(size INT) RETURNS TEXT AS $$
DECLARE
characters TEXT := 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789';
bytes BYTEA := gen_random_bytes(size);
l INT := length(characters);
i INT := 0;
output TEXT := '';
BEGIN
WHILE i < size LOOP
output := output || substr(characters, get_byte(bytes, i) % l + 1, 1);
i := i + 1;
END LOOP;
RETURN output;
END;
$$ LANGUAGE plpgsql VOLATILE;
And then to run it simply do:
generate_uid(10)
-- '3Rls4DjWxJ'
Warning
When doing this you need to be sure that the length of the IDs you are creating is sufficient to avoid collisions over time as the number of objects you've created grows, which can be counter-intuitive because of the Birthday Paradox. So you will likely want a length greater (or much greater) than 10 for any reasonably commonly created object, I just used 10 as a simple example.
Usage
With the function defined, you can use it in a table definition, like so:
CREATE TABLE users (
id TEXT PRIMARY KEY DEFAULT generate_uid(10),
name TEXT NOT NULL,
...
);
And then when inserting data, like so:
INSERT INTO users (name) VALUES ('ian');
INSERT INTO users (name) VALUES ('victor');
SELECT * FROM users;
It will automatically generate the id values:
id | name | ...
-----------+--------+-----
owmCAx552Q | ian |
ZIofD6l3X9 | victor |
Usage with a Prefix
Or maybe you want to add a prefix for convenience when looking at a single ID in the logs or in your debugger (similar to how Stripe does it), like so:
CREATE TABLE users (
id TEXT PRIMARY KEY DEFAULT ('user_' || generate_uid(10)),
name TEXT NOT NULL,
...
);
INSERT INTO users (name) VALUES ('ian');
INSERT INTO users (name) VALUES ('victor');
SELECT * FROM users;
id | name | ...
---------------+--------+-----
user_wABNZRD5Zk | ian |
user_ISzGcTVj8f | victor |
I'm looking for something that gives me "shortcodes" (similar to what Youtube uses for video IDs) that are as short as possible while still containing only alphanumeric characters.
This is a fundamentally different question from what you first asked. What you want here then is to put a serial type on the table, and to use hashids.org code for PostgreSQL.
This returns 1:1 with the unique number (serial)
Never repeats or has a chance of collision.
Also base62 [a-zA-Z0-9]
Code looks like this,
SELECT id, hash_encode(foo.id)
FROM foo; -- Result: jNl for 1001
SELECT hash_decode('jNl') -- returns 1001
This module also supports salts.
Review,
26 characters in [a-z]
26 characters in [A-Z]
10 characters in [0-9]
62 characters in [a-zA-Z0-9] (base62)
The function substring(string [from int] [for int]) looks useful.
So it looks something like this. First we demonstrate that we can take the random-range and pull from it.
SELECT substring(
'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789',
1, -- 1 is 'a', 62 is '9'
1,
);
Now we need a range between 1 and 63
SELECT trunc(random()*62+1)::int+1
FROM generate_series(1,1e2) AS gs(x)
This gets us there.. Now we just have to join the two..
SELECT substring(
'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789',
trunc(random()*62)::int+1
1
)
FROM generate_series(1,1e2) AS gs(x);
Then we wrap it in an ARRAY constructor (because this is fast)
SELECT ARRAY(
SELECT substring(
'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789',
trunc(random()*62)::int+1,
1
)
FROM generate_series(1,1e2) AS gs(x)
);
And, we call array_to_string() to get a text.
SELECT array_to_string(
ARRAY(
SELECT substring(
'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789',
trunc(random()*62)::int+1,
1
)
FROM generate_series(1,1e2) AS gs(x)
)
, ''
);
From here we can even turn it into a function..
CREATE FUNCTION random_string(randomLength int)
RETURNS text AS $$
SELECT array_to_string(
ARRAY(
SELECT substring(
'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789',
trunc(random()*62)::int+1,
1
)
FROM generate_series(1,randomLength) AS gs(x)
)
, ''
)
$$ LANGUAGE SQL
RETURNS NULL ON NULL INPUT
VOLATILE LEAKPROOF;
and then
SELECT * FROM random_string(10);
Thanks to Evan Carroll answer, I took a look on hashids.org.
For Postgres you have to compile the extension or run some TSQL functions.
But for my needs, I created something simpler based on hashids ideas (short, unguessable, unique, custom alphabet, avoid curse words).
Shuffle alphabet:
CREATE OR REPLACE FUNCTION consistent_shuffle(alphabet TEXT, salt TEXT) RETURNS TEXT AS $$
DECLARE
SALT_LENGTH INT := length(salt);
integer INT = 0;
temp TEXT = '';
j INT = 0;
v INT := 0;
p INT := 0;
i INT := length(alphabet) - 1;
output TEXT := alphabet;
BEGIN
IF salt IS NULL OR length(LTRIM(RTRIM(salt))) = 0 THEN
RETURN alphabet;
END IF;
WHILE i > 0 LOOP
v := v % SALT_LENGTH;
integer := ASCII(substr(salt, v + 1, 1));
p := p + integer;
j := (integer + v + p) % i;
temp := substr(output, j + 1, 1);
output := substr(output, 1, j) || substr(output, i + 1, 1) || substr(output, j + 2);
output := substr(output, 1, i) || temp || substr(output, i + 2);
i := i - 1;
v := v + 1;
END LOOP;
RETURN output;
END;
$$ LANGUAGE plpgsql VOLATILE;
The main function:
CREATE OR REPLACE FUNCTION generate_uid(id INT, min_length INT, salt TEXT) RETURNS TEXT AS $$
DECLARE
clean_alphabet TEXT := 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890';
curse_chars TEXT := 'csfhuit';
curse TEXT := curse_chars || UPPER(curse_chars);
alphabet TEXT := regexp_replace(clean_alphabet, '[' || curse || ']', '', 'gi');
shuffle_alphabet TEXT := consistent_shuffle(alphabet, salt);
char_length INT := length(alphabet);
output TEXT := '';
BEGIN
WHILE id != 0 LOOP
output := output || substr(shuffle_alphabet, (id % char_length) + 1, 1);
id := trunc(id / char_length);
END LOOP;
curse := consistent_shuffle(curse, output || salt);
output := RPAD(output, min_length, curse);
RETURN output;
END;
$$ LANGUAGE plpgsql VOLATILE;
How-to use examples:
-- 3: min-length
select generate_uid(123, 3, 'salt'); -- output: "0mH"
-- or as default value in a table
CREATE SEQUENCE IF NOT EXISTS my_id_serial START 1;
CREATE TABLE collections (
id TEXT PRIMARY KEY DEFAULT generate_uid(CAST (nextval('my_id_serial') AS INTEGER), 3, 'salt')
);
insert into collections DEFAULT VALUES ;
This query generate required string. Just change second parasmeter of generate_series to choose length of random string.
SELECT
string_agg(c, '')
FROM (
SELECT
chr(r + CASE WHEN r > 25 + 9 THEN 97 - 26 - 9 WHEN r > 9 THEN 64 - 9 ELSE 48 END) AS c
FROM (
SELECT
i,
(random() * 60)::int AS r
FROM
generate_series(0, 62) AS i
) AS a
ORDER BY i
) AS A;
So I had my own use-case for something like this. I am not proposing a solution to the top question, but if you are looking for something similar like I am, then try this out.
My use-case was that I needed to create a random external UUID (as a primary key) with as few characters as possible. Thankfully, the scenario did not have a requirement that a large amount of these would ever be needed (probably in the thousands only). Therefore a simple solution was a combination of using generate_uid() and checking to make sure that the next sequence was not already used.
Here is how I put it together:
CREATE OR REPLACE FUNCTION generate_id (
in length INT
, in for_table text
, in for_column text
, OUT next_id TEXT
) AS
$$
DECLARE
id_is_used BOOLEAN;
loop_count INT := 0;
characters TEXT := 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789';
loop_length INT;
BEGIN
LOOP
next_id := '';
loop_length := 0;
WHILE loop_length < length LOOP
next_id := next_id || substr(characters, get_byte(gen_random_bytes(length), loop_length) % length(characters) + 1, 1);
loop_length := loop_length + 1;
END LOOP;
EXECUTE format('SELECT TRUE FROM %s WHERE %s = %s LIMIT 1', for_table, for_column, quote_literal(next_id)) into id_is_used;
EXIT WHEN id_is_used IS NULL;
loop_count := loop_count + 1;
IF loop_count > 100 THEN
RAISE EXCEPTION 'Too many loops. Might be reaching the practical limit for the given length.';
END IF;
END LOOP;
END
$$
LANGUAGE plpgsql
STABLE
;
here is an example table usage:
create table some_table (
id
TEXT
DEFAULT generate_id(6, 'some_table', 'id')
PRIMARY KEY
)
;
and a test to see how it breaks:
DO
$$
DECLARE
loop_count INT := 0;
BEGIN
-- WHILE LOOP
WHILE loop_count < 1000000
LOOP
INSERT INTO some_table VALUES (DEFAULT);
loop_count := loop_count + 1;
END LOOP;
END
$$ LANGUAGE plpgsql
;
I am a beginner in SQL and I don't know how to solve this question. How to combine the following SQL query and the stored procedure into a single SQL query. This is a PostgreSQL query and function. Any help is appreciated.
The following query calls a stored procedure/function for each value of the query:
SELECT t.trj_id, T2.trj_id, Similar_trj_woffset(t.trj_id, T2.trj_id)
FROM transitions T1,
transitions T2
WHERE T1.w_s = T2.w_s
AND T1.w_e = T2.w_e
AND T1.trans_id <> T2.trans_id
The stored procedure/function is:
create or replace function similar_trj_woffset(
IN t1 as integer,
IN t2 as integer,
OUT score as integer,
OUT offset as integer
) AS $$
declare
off_set integer :=0;
score integer := 0;
cou integer;
time1 integer;
time2 integer;
dist integer;
begin
for off_set in 0..10 LOOP
time1 := 0;
time2 := 0 + off_set;
cou := 0;
while time2 < 100 Loop
select vec_length(P.x-P2.x,P.y-P2.y,P.z-P2.z) into dist
from AtomPositions P, AtomPositions P2
where P.trj_id = t1
and P2.trj_id = t2
and P.t = time1
and P2.t = time2;
if dist < cuto`enter code here`ff then
cou : = cou + 1;
time1 := time1 + 1;
time2 := time2 + 1;
end loop
if cou > score then
score := cou ;
offset := off_set;
end if;
off_set := off_set + 1;
end loop;
end $$ language plpgsql;
Can some one tell me how to merge the query and the stored procedure into one single SQL query.
To call this function an split the returned record into individual columns:
SELECT t.trj_id, t2.trj_id, (similar_trj_woffset(t.trj_id, t2.trj_id)).*
FROM transitions t1
JOIN transitions t2 USING (w_s, w_e)
WHERE t1.trans_id <> t2.trans_id
Note the parenthesis around the function call.
I also rewrote the query to use proper ANSI JOIN syntax with a simplified equijoin condition (USING) and removed the spurious upper-casing.
You shouldn't merge them is your basic answer. One purpose of the code being in a function is so it can be called from multiple queries and be maintained in one place. Merging the two doesn't make sense when the query is already calling the function on each iteration.
What are you expecting to gain from doing this because simply moving the code isn't going to do anything positive?
Actually, if any other queries use it, they will break if you remove it.