Retrieving entries of linked list in relational database - sql

For my project, I implemented linked list with rdbms. The linked list uses rowid column as a pointer, and contains prior, next and owner pointer(from different table).
The simple example would be like this.
CREATE TABLE EMPLOYEE
(
EMP_ID NUMBER(4) NOT NULL,
OFFICE_CODE CHAR(2),
OFF_EMP_prior ROWID,
OFF_EMP_next ROWID,
OFF_EMP_owner ROWID
);
{EMP1,(NULL,EMP2,OFF1)} - {EMP2,(EMP1,EMP3,OFF1)} - {EMP3,(EMP2,NULL,OFF1)}
Now I have to implement a retrieval function like "Find 'nth(integer)' entry of the list which has 'OFF1' as a owner".
This can be simply done by using loop to traverse the linked list. But this requires too many SQL operations for one retrieval. (I know that using sequence number can be another option, but this is the decision made so far.)
Instead, I found SELECT - CONNECTED BY in oracle SQL, and tried
select * from EMPLOYEE
where OFF_EMP_owner = [OFF_ROWID]
connect by nocycle OFF_EMP_prior = rowid;
This query works for retrieving entries of the list, but the order of the result is not as I expected (something like EMP3-EMP1-EMP2).
Is it possible to retrieve entries of the linked list and sort them by the order of the list with SELECT-CONNECT BY'? Or is there exists more suitable SQL?

select * from EMPLOYEE
where DEPT_EMPLOYEE_owner = [OWNER_ROWID}
start with OFF_EMP_prior is NULL
connect by OFF_EMP_prior = prior rowid;
Solved the problem with the query above. 'prior' should be used instead of nocycle.

Related

Explicitly set ROWNUM in column

I'm trying to split what was a large table update into multiple inserts into working tables. One of the queries needs uses the row number in it. On an INSERT in oracle, can I explicitly add the ROWNUM as an explicit column? This is a working table ultimately used in a reporting operation with a nasty partion over clause and having a true row number is helpful.
create table MY_TABLE(KEY number,SOMEVAL varchar2(30),EXPLICIT_ROW_NUMBER NUMBER);
INSERT /*+PARALLEL(AUTO) */ INTO MY_TABLE(KEY,SOMEVAL,EXPLICIT_ROW_NUMBER) (
SELECT /*+PARALLEL(AUTO) */ KEY,SOMEVAL,ROWNUM
FROM PREVIOUS_VERSION_OF_MY_TABLE
);
where PREVIOUS_VERSION_OF_MY_TABLE has both a KEY and SOMEVAL fields.
I'd like it to number the rows in the order that the inner select statement does it. So, the first row in the select, had it been explicitly run, would have a ROWNUM of 1, etc. I don't want it reversed, etc.
The table above has over 80MM records. Originally I used an UPDATE, and when I ran it, I got some ORA error saying that I ran out of UNDO space. I do not have the exact error message at this point anymore.
I'm trying to accomplish the same thing with multiple working tables that I would have done with one or more updates. Apparently it is either hard, impossible, etc to add UNDO space, for this query (our company DB team says), without making me a DBA, or spending about $100 on a hard drive and attaching it to the instance. So I need to write a harder query to get around this limitation. The goal is to have a session id and timestamps within that session, but for each timestamp within a session (except the last timestamp), show the next session. The original query is included below:
update sc_hub_session_activity schat
set session_time_stamp_rank = (
select /*+parallel(AUTO) */ order_number
from (
select /*+parallel(AUTO) */ schat_all.explicit_row_number as explicit_row_number,row_number() over (partition by schat_all.session_key order by schat_all.session_key,schat_all.time_stamp) as order_number
from sc_hub_session_activity schat_all
where schat_all.session_key=schat.session_key
) schat_all_group
where schat.explicit_row_number = schat_all_group.explicit_row_number
);
commit;
update sc_hub_session_activity schat
set session_next_time_stamp = (
select /*+parallel(AUTO) */ time_stamp
from sc_hub_session_activity schat2
where (schat2.session_time_stamp_rank = schat.session_time_stamp_rank+1) and (schat2.session_key = schat.session_key)
);
commit;

Create virtual table with rowid only of another table

Suppose I have a table in sqlite as follows:
`name` `age`
"bob" 20 (rowid=1)
"tom" 30 (rowid=2)
"alice" 19 (rowid=3)
And I want to store the result of the following table using minimal storage space:
SELECT * FROM mytable WHERE name < 'm' ORDER BY age
How can I store a virtual table from this resultset that will just give me the ordered resultset. In other words, storing the rowid in an ordered way (in the above it would be 3,1) without saving all the data into a separate table.
For example, if I stored this information with just the rowid in a sorted order:
CREATE TABLE vtable AS
SELECT rowid from mytable WHERE name < 'm' ORDER BY age;
Then I believe every time I would need to query the vtable I would have to join it back to the original table using the rowid. Is there a way to do this so that the vtable "knows" the content that it has based on the external table (I believe this is referred to as external-content when creating an fts index -- https://sqlite.org/fts5.html#external_content_tables).
I believe this is referred to as external-content when creating an
fts.
No a virtual table is CREATED using CREATE VIRTUAL TABLE ...... USING module_name (module_parameters)
Virtual tables are tables that can call a module, thus the USING module_name(module_parameters) is mandatory.
For FTS (Full Text Serach) you would have to read the documentation but it could be something like
CREATE VIRTUAL TABLE IF NOT EXISTS bible_fts USING FTS3(book, chapter INTEGER, verse INTEGER, content TEXT)
You very likely don't need/want a VIRTUAL table.
CREATE TABLE vtable AS SELECT rowid from mytable WHERE name < 'm' ORDER BY age;
Would create a normal table IF it didn't already exist that would persist. And if you wanted to use it then it would probably only be of use by joining it with mytable. Effectively it would allow a snapshot, but at a cost, of at least 4k for every snapshot.
I'd suggest a single table for all snapshots that has two columns a snapshot identifier and the rowid of the snapshot. This would probably be far less space consuming.
Basic Example
Consider :-
CREATE TABLE IF NOT EXISTS mytable (
id INTEGER PRIMARY KEY, /* NOTE not using an alias of the rowid may present issues as the id's can change */
name TEXT,
age INTEGER
);
CREATE TABLE IF NOT EXISTS snapshot (id TEXT DEFAULT CURRENT_TIMESTAMP, mytable_map);
INSERT INTO mytable (name,age) VALUES('Mary',21),('George',22);
INSERT INTO snapshot (mytable_map) SELECT id FROM mytable;
SELECT snapshot.id,name,age FROM snapshot JOIN mytable ON mytable.id = snapshot.mytable_map;
And the above is run 3 times with a reasonable interval (seconds so as to distinguish the snapshot id (the timestamp)).
Then you would get 3 snapshots (each with a number of rows but the same value in the id column for each snapshot), the first with 2 rows, the 2nd with 4 and the last with 6 (as each run 2 rows are being added to mytable) :-

SQL : get PK IDs that are in a CSV list (or field) that aren't in another CSV list

(Edited to add info on the context)
I have 2 fields in Table A containing CSV lists of IDs of records in 2 other tables. The "USERS" field contains a CSV list of records in USERS_TABLE; the "CONTACTS" field contains a CSV list of records in CONTACTS_TABLE:
USERS_FIELD: "1,2,3,4,5,6"
CONTACTS_FIELD: "2,4,6,8"
I want to find all records that are in USERS_FIELD list but not in CONTACTS_FIELD list. In this instance I want records 1,3,5. The lists can be anywhere from 1 ID to hundreds.
The solution has to run in the WHERE clause of a query. My environment is a VBScript-based scripting language inside a COTS product: MicroFocus/Serena SBM running on MS Windows Server and SQL Server 2012. The scripting language allows me to specify the WHERE and ORDERBY clauses and it does the query and returns the results. The storage of multiple record IDs as CSV is built into the product. I can't do anything about it, nor can I create SQL temp tables or define SQL functions. The implementation of the host scriping language removed arrays and the "Split" function. While I can parse the CSV to a Dictionary object, iterating a pair of those, each with several hundred elements is not fast. This is all happening while the end-user is waiting for a web page to complete. Again, that's how the product was designed.
Can I use a UNION type operator and do something like:
Select ID from USERS_TABLE Where ID in USERS_FIELD
MINUS
Select ID from CONTACTS_TABLE Where ID in CONTACTS_FIELD
Not sure I follow the requirement for the solution needing to run in the WHERE clause. If you're using SQL Server 2017, you can take advantage of the STRING_SPLIT (also available in SQL Server 2016) and STRING_AGG functions.
DROP TABLE IF EXISTS #A;
CREATE TABLE #A (id INT PRIMARY KEY IDENTITY, users VARCHAR(MAX), contacts VARCHAR(MAX));
INSERT INTO #A (users, contacts)
VALUES
('1,2,3,4,5,6', '2,4,6,8'),
('3,5,6', '4,6,9'),
('2,4,7,9', '2,4,9');
SELECT
A.id,
A.users,
A.contacts,
STRING_AGG(B.value, ',') intersection
FROM #A A
CROSS APPLY STRING_SPLIT(users, ',') B
WHERE NOT EXISTS (SELECT * FROM STRING_SPLIT(A.contacts, ',') X1 WHERE B.value = X1.value) -- where user is not in contacts
GROUP BY
A.id,
A.users,
A.contacts;

Get Id from a conditional INSERT

For a table like this one:
CREATE TABLE Users(
id SERIAL PRIMARY KEY,
name TEXT UNIQUE
);
What would be the correct one-query insert for the following operation:
Given a user name, insert a new record and return the new id. But if the name already exists, just return the id.
I am aware of the new syntax within PostgreSQL 9.5 for ON CONFLICT(column) DO UPDATE/NOTHING, but I can't figure out how, if at all, it can help, given that I need the id to be returned.
It seems that RETURNING id and ON CONFLICT do not belong together.
The UPSERT implementation is hugely complex to be safe against concurrent write access. Take a look at this Postgres Wiki that served as log during initial development. The Postgres hackers decided not to include "excluded" rows in the RETURNING clause for the first release in Postgres 9.5. They might build something in for the next release.
This is the crucial statement in the manual to explain your situation:
The syntax of the RETURNING list is identical to that of the output
list of SELECT. Only rows that were successfully inserted or updated
will be returned. For example, if a row was locked but not updated
because an ON CONFLICT DO UPDATE ... WHERE clause condition was not
satisfied, the row will not be returned.
Bold emphasis mine.
For a single row to insert:
Without concurrent write load on the same table
WITH ins AS (
INSERT INTO users(name)
VALUES ('new_usr_name') -- input value
ON CONFLICT(name) DO NOTHING
RETURNING users.id
)
SELECT id FROM ins
UNION ALL
SELECT id FROM users -- 2nd SELECT never executed if INSERT successful
WHERE name = 'new_usr_name' -- input value a 2nd time
LIMIT 1;
With possible concurrent write load on the table
Consider this instead (for single row INSERT):
Is SELECT or INSERT in a function prone to race conditions?
To insert a set of rows:
How to use RETURNING with ON CONFLICT in PostgreSQL?
How to include excluded rows in RETURNING from INSERT ... ON CONFLICT
All three with very detailed explanation.
For a single row insert and no update:
with i as (
insert into users (name)
select 'the name'
where not exists (
select 1
from users
where name = 'the name'
)
returning id
)
select id
from users
where name = 'the name'
union all
select id from i
The manual about the primary and the with subqueries parts:
The primary query and the WITH queries are all (notionally) executed at the same time
Although that sounds to me "same snapshot" I'm not sure since I don't know what notionally means in that context.
But there is also:
The sub-statements in WITH are executed concurrently with each other and with the main query. Therefore, when using data-modifying statements in WITH, the order in which the specified updates actually happen is unpredictable. All the statements are executed with the same snapshot
If I understand correctly that same snapshot bit prevents a race condition. But again I'm not sure if by all the statements it refers only to the statements in the with subqueries excluding the main query. To avoid any doubt move the select in the previous query to a with subquery:
with s as (
select id
from users
where name = 'the name'
), i as (
insert into users (name)
select 'the name'
where not exists (select 1 from s)
returning id
)
select id from s
union all
select id from i

SQL: I need to take two fields I get as a result of a SELECT COUNT statement and populate a temp table with them

So I have a table which has a bunch of information and a bunch of records. But there will be one field in particular I care about, in this case #BegAttField# where only a subset of records have it populated. Many of them have the same value as one another as well.
What I need to do is get a count (minus 1) of all duplicates, then populate the first record in the bunch with that count value in a new field. I have another field I call BegProd that will match #BegAttField# for each "first" record.
I'm just stuck as to how to make this happen. I may have been on the right path, but who knows. The SELECT statement gets me two fields and as many records as their are unique #BegAttField#'s. But once I have them, I haven't been able to work with them.
Here's my whole set of code, trying to use a temporary table and SELECT INTO to try and populate it. (Note: the fields with # around the names are variables for this 3rd party app)
CREATE TABLE #temp (AttCount int, BegProd varchar(255))
SELECT COUNT(d.[#BegAttField#])-1 AS AttCount, d.[#BegAttField#] AS BegProd
INTO [#temp] FROM [Document] d
WHERE d.[#BegAttField#] IS NOT NULL GROUP BY [#BegAttField#]
UPDATE [Document] d SET d.[#NumAttach#] =
SELECT t.[AttCount] FROM [#temp] t INNER JOIN [Document] d1
WHERE t.[BegProd] = d1.[#BegAttField#]
DROP TABLE #temp
Unfortunately I'm running this script through a 3rd party database application that uses SQL as its back-end. So the errors I get are simply: "There is already an object named '#temp' in the database. Incorrect syntax near the keyword 'WHERE'. "
Comment out the CREATE TABLE statement. The SELECT INTO creates that #temp table.