Renumber reference variables in text columns - sql

Background
For a data entry project, a user can enter variables using a short-hand notation:
"Pour i1 into a flask."
"Warm the flask to 25 degrees C."
"Add 1 drop of i2 to the flask."
"Immediately seek cover."
In this case i1 and i2 are reference variables, where the number refers to an ingredient. The text strings are in the INSTRUCTION table, the ingredients the INGREDIENT table.
Each ingredient has a sequence number for sorting purposes.
Problem
Users may rearrange the ingredient order, which adversely changes the instructions. For example, the ingredient order might look as follows, initially:
seq | label
1 | water
2 | sodium
The user adds another ingredient:
seq | label
1 | water
2 | sodium
3 | francium
The user reorders the list:
seq | label
1 | water
2 | francium
3 | sodium
At this point, the following line is now incorrect:
"Add 1 drop of i2 to the flask."
The i2 must be renumbered (because ingredient #2 was moved to position #3) to point to the original reference variable:
"Add 1 drop of i3 to the flask."
Updated Details
This is a simplified version of the problem. The full problem can have lines such as:
"Add 1 drop of i2 to the o3 of i1."
Where o3 is an object (flask), and i1 and i2 are water and sodium, respectively.
Table Structure
The ingredient table is structured as follows:
id | seq | label
The instruction table is structured as follows:
step
Algorithm
The algorithm I have in mind:
Repeat for all steps that match the expression '\mi([0-9]+)':
Break the step into word tokens.
For each token:
If the numeric portion of the token matches the old sequence number, replace it with the new sequence number.
Recombine the tokens and update the instruction.
Update the ingredient number.
Update
The algorithm may be incorrect as written. There could be two reference variables that must change. Consider before:
seq | label
1 | water
2 | sodium
3 | caesium
4 | francium
And after (swapping sodium and caesium):
seq | label
1 | water
2 | caesium
3 | sodium
4 | francium
Every i2 in every step must become i3; similarly i3 must become i2. So
"Add 1 drop of i2 to the flask, but absolutely do not add i3."
Becomes:
"Add 1 drop of i3 to the flask, but absolutely do not add i2."
Code
The code to perform the first two parts of the algorithm resembles:
CREATE OR REPLACE FUNCTION
renumber_steps(
p_ingredient_id integer,
p_old_sequence integer,
p_new_sequence integer )
RETURNS void AS
$BODY$
DECLARE
v_tokens text[];
BEGIN
FOR v_tokens IN
SELECT
t.tokens
FROM (
SELECT
regexp_split_to_array( step, '\W' ) tokens,
regexp_matches( step, '\mi([0-9]+)' ) matches
FROM
instruction
) t
LOOP
RAISE NOTICE '%', v_tokens;
END LOOP;
END;
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;
Question
What is a more efficient way to solve this problem (i.e., how would you eliminate the looping constructs), possibly leveraging PostgreSQL-specific features, without a major revision to the data model?
Thank you!
System Details
PostgreSQL 9.1.2.

You have to take care that you don't change ingredients and seq numbers back and forth. I introduce a temporary prefix for ingredients and negative numbers for seq for that purpose and exchange them for permanent values when all is done.
Could work like this:
CREATE OR REPLACE FUNCTION renumber_steps(_old int[], _new int[])
RETURNS void AS
$BODY$
DECLARE
_prefix CONSTANT text := ' i'; -- prefix, incl. leading space
_new_prefix CONSTANT text := ' ###'; -- temp prefix, incl. leading space
i int;
o text;
n text;
BEGIN
IF array_upper(_old,1) <> array_upper(_new,1) THEN
RAISE EXCEPTION 'Array length mismatch!';
END IF;
FOR i IN 1 .. array_upper(_old,1) LOOP
IF _old[i] <> _new[i] THEN
o := _prefix || _old[i] || ' '; -- leading and trailing blank!
-- new instruction are temporarily prefixed with new_marker
n := _new_prefix || _new[i] || ' ';
UPDATE instruction
SET step = replace(step, o, n) -- replace all instances
WHERE step ~~ ('%' || o || '%');
UPDATE ingredient
SET seq = _new[i] * -1 -- temporarily negative
WHERE seq = _old[i];
END IF;
END LOOP;
-- finally replace temp. prefix
UPDATE instruction
SET step = replace(step, _new_prefix, _prefix)
WHERE step ~~ ('%' || _new_prefix || '%');
-- .. and temp. negative seq numbers
UPDATE ingredient
SET seq = seq * -1
WHERE seq < 0;
END;
$BODY$
LANGUAGE plpgsql VOLATILE STRICT;
Call:
SELECT renumber_steps('{2,3,4}'::int[], '{4,3,2}'::int[]);
The algorithm requires ...
... that ingredients in the steps are delimited by spaces.
... that there are no permanent negative seq numbers.
_old and _new are ARRAYs of the old and new instruction.seq of ingredients that change position. The length of both arrays has to match, or an exception will be raised. It can contain seq that don't change. Nothing will happen to those.
Requires PostgreSQL 9.1 or later.

I think your model is problematic... you should have the "real name (id)" (i1, o3 etc.) FIXED after creation and have a second field in the ingredient table providing the "sorting". The user enters the "sorting name" and you immediately replace it with the "real name" (id) on saving the entered data into the step table.
When you read it from the step table you just replace/map the "real name" (id) with the current "sorting name" for display purposes if need be...
This way you don't have to change the data already in the step table for everytime someone changes the sorting which is a complex and expensive operation IMHO - it is prone to concurrency problems too...
The above option reduces the whole problem to a mapping operiton (table ingredient) on INSERT/UPDATE/SELECT (table step) for the one entry currently worked on - it doesn't mess with any other entries already there.

Related

Split Text into Table Rows with Read-Only Permissions

I am a read-only user for a database with he following problem:
Scenario:
Call center employees for a company submit tickets to me through our database on behalf of our clients. The call center includes alphanumeric lot numbers of an exact length in their message for me to troubleshoot. Depending on how many times a ticket is updated, there could be several messages for one ticket, each of them having zero or more of these alphanumeric lot numbers embedded in the message. I can access all of these messages with Oracle SQL and SQL Tools.
How can I extract just the lot numbers to make a single-column table of all the given lot numbers?
Example Data:
-- Accessing Ticket 1234 --
SELECT *
FROM communications_detail
WHERE ticket_num = 1234;
-- Results --
TICKET_NUM | MESSAGE_NUM | MESSAGE
------------------------------------------------------------------------------
1234 | 1 | A customer recently purchased some products with
| | a lot number of vwxyz12345 and wants to know if
| | they have been recalled.
------------------------------------------------------------------------------
1234 | 2 | Same customer found lots vwxyz23456 and zyxwv12345
| | in their storage as well and would like those checked.
------------------------------------------------------------------------------
1234 | 3 | These lots have not been recalled. Please inform
| | the client.
So-Far:
I am able to isolate the lot numbers of a constant string with the following code, but it gets put into standard output and not a table format.
DECLARE
msg VARCHAR2(200) := 'Same customer found lots xyz23456 and zyx12345 in their storage as well and would like those checked.';
cnt NUMBER := regexp_count(msg, '[[:alnum:]]{10}');
BEGIN
IF cnt > 0 THEN
FOR i IN 1..cnt LOOP
Dbms_Output.put_line(regexp_substr(msg, '[[:alnum:]]{10}', 1, i));
END LOOP;
END IF;
END;
/
Goals:
Output results into a table that can itself be used as a table in a larger query statement.
Somehow be able to apply this to all of the messages associated with the original ticket.
Update: Changed the example lot numbers from 8 to 10 characters long to avoid confusion with real words in the messages. The real-world scenario has much longer codes and very specific formatting, so a more complex regular expression will be used.
Update 2: Tried using a table variable instead of standard output. It didn't error, but it didn't populate my query tab... This may just be user error...!
DECLARE
TYPE lot_type IS TABLE OF VARCHAR2(10);
lots lot_type := lot_type();
msg VARCHAR2(200) := 'Same customer found lots xyz23456 and zyx12345 in their storage as well and would like those checked.';
cnt NUMBER := regexp_count(msg, '[[:alnum:]]{10}');
BEGIN
IF cnt > 0 THEN
FOR i IN 1..cnt LOOP
lots.extend();
lots(i) := regexp_substr(msg, '[[:alnum:]]{10}', 1, i);
END LOOP;
END IF;
END;
/
This is a regex format which matches the LOT mask you provided: '[a-z]{3}[0-9]{5}' . Using something like this will help you avoid the false positives you mention in your question.
Now here is a read-only, pure SQL solution for you.
with cte as (
select 'Same customer found lots xyz23456 and zyx12345 in their storage as well and would like those checked.' msg
from dual)
select regexp_substr(msg, '[a-z]{3}[0-9]{5}', 1, level) as lotno
from cte
connect by level <= regexp_count(msg, '[a-z]{3}[0-9]{5}')
;
I'm using the WITH clause just to generate the data. The important thing is the the use of the CONNECT BY operator which is part of Oracle's hierarchical data syntax but here generates a table from one row. The pseudo-column LEVEL allows us to traverse the string and pick out the different occurrences of the regex pattern.
Here's the output:
SQL> r
1 with cte as ( select 'Same customer found lots xyz23456 and zyx12345 in their storage as well and would like those checked.' msg from dual)
2 select regexp_substr(msg, '[a-z]{3}[0-9]{5}', 1, level) as lotno
3 from cte
4 connect by level <= regexp_count(msg, '[a-z]{3}[0-9]{5}')
5*
LOTNO
----------
xyz23456
zyx12345
SQL>

PostgreSQL - Start A Transaction block IN Function

I'm trying to use create a transaction block inside a function, so my goal is to use this function one at time, so if some one use this Function and another want to use it, he can't until the first one is finish i create this Function :
CREATE OR REPLACE FUNCTION my_job(time_to_wait integer) RETURNS INTEGER AS $$
DECLARE
max INT;
BEGIN
BEGIN;
SELECT MAX(max_value) INTO max FROM sch_lock.table_concurente;
INSERT INTO sch_lock.table_concurente(max_value, date_insertion) VALUES(max + 1, now());
-- Sleep a wail
PERFORM pg_sleep(time_to_wait);
RETURN max;
COMMIT;
END;
$$
LANGUAGE plpgsql;
But it seams not work, i have a mistake Syntax error BEGIN;
Without BEGIN; and COMMIT i get a correct result, i use this query to check :
-- First user should to wait 10 second
SELECT my_job(10) as max_value;
-- First user should to wait 3 second
SELECT my_job(3) as max_value;
So the result is :
+-----+----------------------------+------------+
| id | date | max_value |
+-----+----------------------------+------------+
| 1 | 2017-02-13 13:03:58.12+00 | 1 |
+-----|----------------------------+------------+
| 2 | 2017-02-13 13:10:00.291+00 | 2 |
+-----+----------------------------+------------+
| 3 | 2017-02-13 13:10:00.291+00 | 2 |
+-----+----------------------------+------------+
But the result should be :
+-----+----------------------------+------------+
| id | date | max_value |
+-----+----------------------------+------------+
| 1 | 2017-02-13 13:03:58.12+00 | 1 |
+-----|----------------------------+------------+
| 2 | 2017-02-13 13:10:00.291+00 | 2 |
+-----+----------------------------+------------+
| 3 | 2017-02-13 13:10:00.291+00 | 3 |
+-----+----------------------------+------------+
so the third one id = 3 should have the max_value = 3 and not 2, this happen because the first user Select the max = 1 and wait 10 sec and the second user Select the max = 1 and wait 3 sec before Insertion, but the right solution is : I can't use this Function Until the First one finish, for that i want to make something secure and protected.
My questions is :
how can i make a Transaction block inside a function?
Do you have any suggestion how can we make this, with a secure way?
Thank you.
Ok so you cannot COMMIT in a function. You can have a save point and roll back to the save point however.
Your smallest possible transaction is a single statement parsed and executed by the server from the client, so every transaction is a function. Within a transaction, however, you can have save points. In this case you would look at the exception handling portions of PostgreSQL to handle this.
However that is not what you want here. You want (I think?) data to be visible during a long-running server-side operation. For that you are kind of out of luck. You cannot really increment your transaction ids while running a function.
You have a few options, in order of what I would consider to be good practices (best to worst):
Break down your logic into smaller slices that each move the db from one consistent state to another, and run those in separate transactions.
Use a message queue (like pg_message_queue)in the db, plus an external worker, and something which runs a step and yields a message for the next step. Disadvantage is this adds more maintenance.
Use a function or framework like dblink or pl/python, or pl/perlu to connect back to the db and run transactions there. ick....
You can use dblink for this. Something like :
CREATE OR REPLACE FUNCTION my_job(time_to_wait integer) RETURNS INTEGER AS $$
DECLARE
max INT;
BEGIN
SELECT INTO RES dblink_connect('con','dbname=local');
SELECT INTO RES dblink_exec('con', 'BEGIN');
...
SELECT INTO RES dblink_exec('con', 'COMMIT');
SELECT INTO RES dblink_disconnect('con');
END;
$$
LANGUAGE plpgsql;
I don't know if this is a good way or not but what if we use LOCK TABLE for example like this :
CREATE OR REPLACE FUNCTION my_job(time_to_wait integer) RETURNS INTEGER AS $$
DECLARE
max INT;
BEGIN
-- Lock table so no one will use it until the first one is finish
LOCK TABLE sch_lock.table_concurente IN ACCESS EXCLUSIVE MODE;
SELECT MAX(max_value) INTO max FROM sch_lock.table_concurente;
INSERT INTO sch_lock.table_concurente(max_value, date_insertion) VALUES(max + 1, now());
PERFORM pg_sleep(time_to_wait);
RETURN max;
END;
$$
LANGUAGE plpgsql;
It gives me the right result.

How do you return a specfic column value of a certain row in an existing table within a database?

The Problem:
I'm working in PostgreSQL 9.0 and I'm having a difficult time figuring out how to tackle the situation where you want to return a specific column value of a certain row for use in a CASE WHEN THEN statement.
I want to basically go in and set the value of table A: someRow's someColumn value, equal to the value of table B: row X's column A value, given the value of row X's column B. (More detail in "Backround Info" if needed to understand the question)
This is what I want to do (but don't know how):
Update tableA
Set someColumn
CASE WHEN given_info_column = 'tableB: row X's column B value'
THEN (here I want to return row X's column A value, finding row X using the given column B value)
ELSE someColumn END
Background Info: (Optional, for clarification)
Imagine that there is a user activity table, and a device table in an already existing database, with already existing activity performed strings that exist throughout to codebase you are working in: (for example)
User_Activity:
id (int) | user_name (string) | activity_preformed (string) | category (string)
---------|-----------------------|----------------------------------------|------------------
1 | Joe Martinez | checked out iphone: iphone2 | dvc_activity
2 | Jon Shmoe | uploads video from device: (id: 12345) | dvc_activity
3 | Larry David | goes to the bathroom |other_activity
Device:
seq (int)| device_name (string) | device_srl_num (int) | device_status (string)|
---------+-----------------------+----------------------+-----------------------+
1 | iphone1 | 12344 | available
2 | iphone2 | 12345 | checked out
3 | android1 | 23456 | available
Your assignment from your boss is to create a report that shows one table with all device activity, like so:
Device Activity Report
(int) (int) (string) (string) (string) (int) (string)
act_seq |usr_id | usr_name | act_performed | dvc_name | dvc_srl_num | dvc_status
---------+-------+--------------+---------------------------------------+-----------+-------------+------------
1 |1 | Joe Martinez | Checked out iphone: iphone2 | iphone2 | 12345 | checked out
2 |2 | John Shmoe | uploads video from device: (id: 12345)| android1 | 23456 | available
For the purposes of this question, this has to be done by adding a new column to the user activity table called dvc_seq which will be a foreign key to the device table. You will create a temporary table by querying from the user activity table and joining the two where User_Activity (dvc_seq) = Device (seq)
This is fine and will work great for new entries into the User_Activity table, which will record a dvc_seq linking to the associated device if the activity involves a device.
The problem is that you need to go in and fill in values for the new dvc_seq column in the User_Activity table for all previous entries relating to devices. Since the previous programmers decided to specify which device in the activity_performed column using the serial number certain times and the device names other times, this presents an interesting problem, where you will need to derive the associated Device seq number from a device, given its name or serial number.
So once again, what I want to do: (using this example)
UPDATE User_Activity
SET dvc_seq
CASE WHEN activity_performed LIKE 'checked out iphone:%'
THEN (seq column of Device table)
WHERE (SELECT 1 FROM Device WHERE device_name = (substring in place of the %))
ELSE dvc_seq (I think this would be null since there would be nothing here yet)
END
Can any of you help me accomplish this?? Thanks in advance for all responses and advice!
The query below uses an update-join to update the sequence number when the serial number or the name is contained within the activity_performed
UPDATE UserActivity
SET a.dvc_seq = b.seq
FROM UserActivity AS a
JOIN devices b
ON UserActivity.activity_performed LIKE '%from device: (id: ' || b.serial_num || '%'
OR UserActivity.activity_performed LIKE '%: ' || b.name || '%'
Just an additional update on how to speed up this code based off of the correct answer given by #FuzzyTree (this would only work for the serial number, which has a standard length, and not for the device name which could be many different sizes)
Because of the LIKE used in the join, the query runs very slow for large databases. an even better solution would utilize the postgres substring() and position() functions and join the tables on the serial number like so:
UPDATE UserActivity
SET a.dvc_seq = b.seq
FROM UserActivity AS a
JOIN devices b
ON b.serial_num =
substring(activity_performed from position('%from device: (id: ' in activity_performed)+(length of the string before the plus so that position returns the start position for the serial number)) for (lengthOfSerialNumberHere))
WHERE UserActivity.activity_performed LIKE '%from device: (id: ' || b.serial_num || '%';`
In postgresql you can't do a complex CASE expresion like
CASE WHEN activity_performed LIKE 'checked out iphone:%'
only
CASE WHEN 1, 2
The best you can do is create a function
UPDATE User_Activity
SET dvc_seq = getDeviceID(User_Activity.activity_preformed);
Here you can do IF, CASE much easier
CREATE OR REPLACE FUNCTION getDeviceID(activity text)
RETURNS integer AS
$BODY$
DECLARE
device_name text;
device_id integer;
BEGIN
-- parse the string activity
/* this part is pseudo code
set device_id;
IF (device_id is null)
set device_name;
search for device_id using device_name;
set device_id;
*/
RETURN device_id;
END;
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;
ALTER FUNCTION get_box(text)
OWNER TO postgres;

What is maximum rows count in oracles nested table

CREATE TYPE nums_list AS TABLE OF NUMBER;
What is maximum possible rows count in oracle's nested table ?
UPDATE
CREATE TYPE nums_list AS TABLE OF NUMBER;
CREATE OR REPLACE FUNCTION generate_series(from_n NUMBER, to_n NUMBER)
RETURN nums_list AS
ret_table nums_list := nums_list();
BEGIN
FOR i IN from_n..to_n LOOP
ret_table.EXTEND;
ret_table(i) := i;
END LOOP;
RETURN ret_table;
END;
SELECT count(*) FROM TABLE ( generate_series(1,4555555) );
This gives error: ORA-22813 operand value exceeds system limits, Object or Collection value was too large
The range of subscripts for a nested table is 1..2**31 so you can have 2**31 elements in the collection. That limit hasn't changed since at least 8.1.6 though, of course, it might change in the future.
Just as an additional observation, it isn't the nested table itself that is too large or using too much memory. With an exception handler you can see that the error is not being thrown by your function. You can populate the same thing in an anonymous block:
DECLARE
ret_table nums_list := nums_list();
BEGIN
FOR i IN 1..4555555 LOOP
ret_table.EXTEND;
ret_table(i) := i;
END LOOP;
dbms_output.put_line(ret_table.count);
END;
/
anonymous block completed
4555555
And you can call your function from a block too:
DECLARE
ret_table nums_list;
BEGIN
ret_table := generate_series(1,4555555);
dbms_output.put_line(ret_table.count);
END;
/
anonymous block completed
4555555
It's only when you use it as table collection expression that you get an error:
SQL Error: ORA-22813: operand value exceeds system limits
22813. 00000 - "operand value exceeds system limits"
*Cause: Object or Collection value was too large. The size of the value
might have exceeded 30k in a SORT context, or the size might be
too big for available memory.
*Action: Choose another value and retry the operation.
The cause text refers to the SORT context, and a sort is being done by your query:
------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 2 | 29 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 2 | | |
| 2 | COLLECTION ITERATOR PICKLER FETCH| GENERATE_SERIES | 8168 | 16336 | 29 (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------
As #a_horse_with_no_name suggested, you can avoid the problem by making your function pipelined:
CREATE OR REPLACE FUNCTION generate_series(from_n NUMBER, to_n NUMBER)
RETURN nums_list PIPELINED AS
BEGIN
FOR i IN from_n..to_n LOOP
PIPE ROW (i);
END LOOP;
RETURN;
END;
/
SELECT count(*) FROM TABLE ( generate_series(1,4555555) );
COUNT(*)
----------
4555555
That still does a SORT AGGREGATE but it doesn't seem to mind any more. Not really sure why it does that in either case; perhaps someone else will be able to explain what it's doing. (I'm doing this in an 11gR2 instance by the way; I don't have a 12c instance to verify the behaviour is the same, but your symptoms suggest it will be). Or maybe it isn't the SORT context that's the issue, perhaps it is available memory. In my environment your version seems to consistently work up to 4177918 elements - which doesn't seem to be a significant number, so is likely to be environment related?
But it depends how you intend to use the collection; from a PL/SQL context your original version might be more suitable.

Custom sorting (order by) in PostgreSQL, independent of locale

Let's say I have a simple table with two columns: id (int) and name (varchar). In this table I store some names which are in Polish, e.g.:
1 | sępoleński
2 | świecki
3 | toruński
4 | Włocławek
Now, let's say I want to sort the results by name:
SELECT * FROM table ORDER BY name;
If I have C locale, I get:
4 | Włocławek
1 | sępoleński
3 | toruński
2 | świecki
which is wrong, because "ś" should be after "s" and before "t". If I use Polish locale (pl_PL.UTF-8), I get:
1 | sępoleński
2 | świecki
3 | toruński
4 | Włocławek
which is also not what I want, because I would like names starting with capital letters to be first just like in C locale, like this:
4 | Włocławek
1 | sępoleński
2 | świecki
3 | toruński
How can I do this?
If you want a custom sort, you must define some function that modifies your values in some way so that the natural ordering of the modified values fits your requirement.
For example, you can append some character or string it the value starts with uppercase:
CREATE OR REPLACE FUNCTION mysort(text) returns text IMMUTABLE as $$
SELECT CASE WHEN substring($1 from 1 for 1) =
upper( substring($1 from 1 for 1)) then 'AAAA' || $1 else $1 END
;
$$ LANGUAGE SQL;
And then
SELECT * FROM table ORDER BY mysort(name);
This is not foolprof (you might want to change 'AAA' for something more apt) and hurts performance, of course.
If you want it efficient, you'll need to create another column that "naturally" sorts correctly (e.g. even in the C locale), and use that as a sorting criterion. For that, you should use the approach of the strxfrm C library function. As a straight-forward strxfrm table for your approach, replace each letter with two ASCII letters: 's' would become 's0' and 'ś' would become 's1'. Then 'świecki' becomes 's1w0i0e0c0k0i0', and the regular ASCII sorting will sort it correctly.
If you don't want to create a separate column, you can try to use a function in the where clause:
SELECT * FROM table ORDER BY strxfrm(name);
Here, strxfrm needs to be replaced with a proper function. Either you write one yourself, or you use the standard translate function (although this doesn't support replacing a character with two of them, so you'll need some more involved transformation).