setting a sort order column quickly - sql

Suppose I have a SQLite table People with columns recordId TEXT, name TEXT, job TEXT, sortOrder NUM.
I want to set the sortOrder column for all rows in the table based on sorting the table by name and then job.
I'm currently doing this by:
(1) SELECT recordId from People order by name, job
(1b) In the sqlite3_step() loop for that SELECT command, save the recordId values into a vector<string> orderedRecordIds. When we are done with this loop, orderedRecordIds has the recordId values in the desired order.
(2) In a loop, do a sqlite3_exec() for each recordId of the form
UPDATE People SET sortOrder = <i> WHERE recordId = '<orderedRecordIds[i]>'
That all works, but it's too slow.
For a database with 200k records, it takes about 1 second to do step (1), and 12 seconds to do step (2).
I'm not worried about the 1 second to do step (1).
But I'm trying to figure out how to make step (2) faster.
I do have an index on the recordId column, which I would think would help with finding each row to set the sortOrder value in step (2).
If I use rowid instead of recordId, it brings step (2) down to 10 seconds.
I was thinking it was all the separate calls to sqlite3_exec() that were slowing things down. So I tried doing it all in a single exec statement by building a huge CASE statement:
UPDATE People SET sortOrder = CASE
WHEN recordId='abc' THEN 0
WHEN recordId='def' THEN 1
/* <and so on for 200k rows> */
END
but that was extremely slow.
I feel like there should be a very fast way to do step (2). By comparison, when I use CREATE INDEX to create an index a column of this large table, it does it in like 20ms. I have this list of ordered record IDs, and just want to say "set the values of a numeric column according to that order."
Perhaps it's faster to create a temporary table with recordId and sortOrder columns, and then do the update based on a join with that table?
Or perhaps there is a way to do this all in a single step/loop instead of 2?
(By the way, I realize that as the problem is presented I could avoid the need for a sortOrder column by just creating an index on the name and job fields. But in my actual application some of the fields I'm sorting by are computed values, in some cases based on values from related tables. This is why I want to have a sortOrder column in the first place. Perhaps there is a way I can index based on related values in other tables. But for now please consider my question as stated.)

Why not use window functions, which are available in SQLite starting version 3.25 ?
If I followed you correctly, the following query gives you the result that you want:
select p.*, row_number() over(order by name, job) rn from people p;
You could just use this query to create a view rather than storing the value... When it comes to updating the table, it is a bit more complicated, because sqlite does not support joins in update statements. You could first materialize a temporary table that contains the sort orders, then use it to update your table, like:
create temp table people_tmp as
select recordId, row_number() over(order by name, job) rn from people;
update people
set sortOrder = (
select rn from people_tmp pt where pt.recordId = people.recordId
);

Related

Updating Table Records in a Batch and Auditing it

Consider this Table:
Table: ORDER
Columns: id, order_num, order_date, order_status
This table has 1 million records. I want to update the order_status to value of '5', for a bunch (about 10,000) of order_num's that i will be reading from a input text file.
My SQL could be:
(A) update ORDER set order_status=5 where order_num in ('34343', '34454', '454545',...)
OR
(B) update ORDER set order_status=5 where order_num='34343'
I can loop over this update several times until I have covered my 10,000 order updates.
(Also note that i have few Child Tables of ORDER like ORDER_ITEMS, where similar status must be updated and information audited)
My problem is here is:
How can i Audit this update in a separate ORDER_AUDIT Table:
Order_Num: 34343 - Updated Successfully
Order_Num: 34454 - Order Not Found
Order_Num: 454545 - Updated Successfully
Order_Num: 45457 - Order Not Found
If i go for batch update as in (A), I cannot Audit at Order Level.
If i go for Single Order at at time update as in (B), I will have to loop 10,000 times - that may be quite slow - but I can Audit at Order level in this case.
Is there any other way?
First of all, build an external table over your "input text file". That way you can run a simple single UPDATE statement:
update ORDER
set order_status=5
where order_num in ( select col1 from ext_table order by col1)
Neat and efficient. (Sorting the sub-query is optional: it may improve the performance of the update but the key point is, we can treat external tables like regular tables and use the full panoply of the SELECT syntax on them.) Find out more.
Secondly use the RETURNING clause to capture the hits.
update ORDER
set order_status=5
where order_num in ( select col1 from ext_table order by col1)
returning order_num bulk collect into l_nums;
l_nums in this context is a PL/SQL collection of type number. The RETURNING clause will give you all the ORDER_NUM values for updated rows only. Find out more.
If you declare the type for l_nums as a SQL nested table object you can use it in further SQL statements for your auditing:
insert into order_audit
select 'Order_Num: '||to_char(t.column_value)||' - Updated Succesfully'
from table ( l_nums ) t
/
insert into order_audit
select 'Order_Num: '||to_char(col1)||' - Order Not Found'
from ext_table
minus
select * from table ( l_nums )
/
Notes on performance:
You don't say how many of the rows you have in the input text file will match. Perhaps you don't know (actually on re-reading it's not clear whether 10,000 is the number of rows in the file or the number of matching rows). Pl/SQL collections use private session memory, so very large collections can blow the PGA. However, you should be able to cope with ten thousand NUMBER instances without blinching.
My solution does require you to read the external table twice. This shouldn't be a problem. And it will certainly be way faster than dynamically assembling one hundred IN clauses of a thousand numbers and looping over each.
Note that update is often the slowest bulk operation known to man. There are ways of speeding them up, but those methods can get quite involved. However, if this is something you'll want to do often and performance becomes a sticking point you should read this OraFAQ article.
Use MERGE. Firstly load data into a temporary table called ORDER_UPD_TMP with only one column id. You can do it using SQLDeveloper import feature. Then use MERGE in order to udpate your base table:
MERGE INTO ORDER b
USING (
SELECT order_id
FROM ORDER_UPD_TMP
) e
ON (b.id = e.id)
WHEN MATCHED THEN
UPDATE SET b.status = 5
You can also update with a different status when records don't match. Check the documentation for more details:
http://docs.oracle.com/cd/B28359_01/server.111/b28286/statements_9016.htm
I think the best way will be:
to import your file to the database first
then do few SQL UPDATE/INSERT queries in one transaction to update status for all orders and create audit records.

Getting a specific number of rows from Database using RowNumber; Inconsistent results

Here is my SQL query:
select * from TABLE T where ROWNUM<=100
If i execute this and then re-execute this, I don't get the same result. Why?
Also, on a sybase system if i execute
set rowcount 100
select * from TABLE
even on re-execution i get the same result?
Can someone explain why? and provide possible solution for RowNum
Thanks
If you don't use ORDER BY in your query you get the results in natural order.
Natural order is whatever is fastest for the database at the moment.
A possible solution is to ORDER BY your primary key, if it's an INT
SELECT TOP 100 START AT 0 * FROM TABLE
ORDER BY TABLE.ID;
If your primary key is not a sequentially incrementing integer and you don't have another column to order by (such as a timestamp) you may need to create an extra column SORT_ORDER INT and increment in automatically on insert using either an Autoincrement column or a sequence and an insert trigger, depending on the database.
Make sure to create an index on that column to speed up the query.
You need to specify an ORDER BY. Queries without explicit ORDER BY clause make no guarantee about the order in which the rows are returned. And from this result set you take the first 100 rows. As the order in which the rows can be different every time, so can be your first 100 rows.
You need to use ORDER BY first, followed by ROWNUM. You will get inconsistent results if you don't follow this order.
select * from
(
select * from TABLE T ORDER BY rowid
) where ROWNUM<=100

processing large table - how do i select the records page by page?

I need to do a process on all the records in a table. The table could be very big so I rather process the records page by page. I need to remember the records that have already been processed so there are not included in my second SELECT result.
Like this:
For first run,
[SELECT 100 records FROM MyTable]
For second run,
[SELECT another 100 records FROM MyTable]
and so on..
I hope you get the picture. My question is how do I write such select statement?
I'm using oracle btw, but would be nice if I can run on any other db too.
I also don't want to use store procedure.
Thank you very much!
Any solution you come up with to break the table into smaller chunks, will end up taking more time than just processing everything in one go. Unless the table is partitioned and you can process exactly one partition at a time.
If a full table scan takes 1 minute, it will take you 10 minutes to break up the table into 10 pieces. If the table rows are physically ordered by the values of an indexed column that you can use, this will change a bit due to clustering factor. But it will anyway take longer than just processing it in one go.
This all depends on how long it takes to process one row from the table of course. You could chose to reduce the load on the server by processing chunks of data, but from a performance perspective, you cannot beat a full table scan.
You are most likely going to want to take advantage of Oracle's stopkey optimization, so you don't end up with a full tablescan when you don't want one. There are a couple ways to do this. The first way is a little longer to write, but let's Oracle automatically figure out the number of rows involved:
select *
from
(
select rownum rn, v1.*
from (
select *
from table t
where filter_columns = 'where clause'
order by columns_to_order_by
) v1
where rownum <= 200
)
where rn >= 101;
You could also achieve the same thing with the FIRST_ROWS hint:
select /*+ FIRST_ROWS(200) */ *
from (
select rownum rn, t.*
from table t
where filter_columns = 'where clause'
order by columns_to_order_by
) v1
where rn between 101 and 200;
I much prefer the rownum method, so you don't have to keep changing the value in the hint (which would need to represent the end value and not the number of rows actually returned to the page to be accurate). You can set up the start and end values as bind variables that way, so you avoid hard parsing.
For more details, you can check out this post

Selecting data effectively sql

I have a very large table with over 1000 records and 200 columns. When I try to retreive records matching some criteria in the WHERE clause using SELECT statement it takes a lot of time. But most of the time I just want to select a single record that matches the criteria in the WHERE clause rather than all the records.
I guess there should be a way to select just a single record and exit which would minimize the retrieval time. I tried ROWNUM=1 in the WHERE clause but it didn't really work because I guess the engine still checks all the records even after finding the first record matching the WHERE criteria. Is there a way to optimize in case if I want to select just a few records?
Thanks in advance.
Edit:
I am using oracle 10g.
The Query looks like,
Select *
from Really_Big_table
where column1 is NOT NULL
and column2 is NOT NULL
and rownum=1;
This seems to work slower than the version without rownum=1;
rownum is what you want, but you need to perform your main query as a subquery.
For example, if your original query is:
SELECT co1, col2
FROM table
WHERE condition
then you should try
SELECT *
FROM (
SELECT col1, col2
FROM table
WHERE condition
) WHERE rownum <= 1
See http://www.oracle.com/technology/oramag/oracle/06-sep/o56asktom.html for details on how rownum works in Oracle.
1,000 records isn't a lot of data in a table. 200 columns is a reasonably wide table. For this reason, I'd suggest you aren't dealing with a really big table - I've performed queries against millions of rows with no problems.
Here is a little experiment... how long does it take to run this compared to the "SELECT *" query?
SELECT
Really_Big_table.Id
FROM
Really_Big_table
WHERE
column1 IS NOT NULL
AND
column2 IS NOT NULL
AND
rownum=1;
An example is here: You can view more here
SELECT ename, sal
FROM ( SELECT ename, sal, RANK() OVER (ORDER BY sal DESC) sal_rank
FROM emp )
WHERE sal_rank <= 1;
You also have to do some column indexing for column in the WHERE clause
In SQL most of the optimization would come in the form on index on the table (where you would index the columns that appear in the WHERE and ORDER BY columns as a rough guide).
You did not specify what SQL database you are using, so I can't point to a good resource.
Here is an introduction to indexes on Oracle.
Here another tutorial.
As for queries - you should always specify the columns you are returning and not use a blanket *.
it shouldn't take a lot of time to query a 1000 rows table. There are exceptions however, check if you are in one of the following cases:
1. Lots of rows were deleted
The table had a massive amount of rows in the past. Since the High Water Mark (HWM) is still high (delete won't lower it) and FULL TABLE SCAN read all the data up to the high water mark, it may take a lot of time to return results even if the table is now nearly empty.
Analyse your table (dbms_stats.gather_table_stats('<owner>','<table>')) and compare the space actually used by the rows (space on disk) with the effective space (data), for example:
SELECT t.avg_row_len * t.num_rows data_bytes,
(t.blocks - t.empty_blocks) * ts.block_size bytes_used
FROM user_tables t
JOIN user_tablespaces ts ON t.tablespace_name = ts.tablespace_name
WHERE t.table_name = '<your_table>';
You will have to take into account the overhead of the rows and blocks as well as the space reserved for update (PCT_FREE). If you see that you use a lot more space than required (typical overhead is below 30%, YMMV) you may want to reset the HWM, either:
ALTER TABLE <your_table> MOVE; and then rebuild INDEX (ALTER INDEX <index> REBUILD), don't forget to collect stats afterwards.
use DBMS_REDEFINITION
2. The table has very large columns
Check if you have columns of datatype LOB, CLOB, LONG (irk), etc. Data over 4000 bytes in any of these columns is stored out of line (in a separate segment), which means that if you don't select these columns you will only query the other smaller columns.
If you are in this case, don't use SELECT *. Either you don't need the data in the large columns or use SELECT rowid and then do a second query : SELECT * WHERE rowid = <rowid>.

Insert into temp values (select.... order by id)

I'm using an Informix (Version 7.32) DB. On one operation I create a temp table with the ID of a regular table and a serial column (so I would have all the IDs from the regular table numbered continuously). But I want to insert the info from the regular table ordered by ID something like:
CREATE TEMP TABLE tempTable (id serial, folio int );
INSERT INTO tempTable(id,folio)
SELECT 0,folio FROM regularTable ORDER BY folio;
But this creates a syntax error (because of the ORDER BY)
Is there any way I can order the info then insert it to the tempTable?
UPDATE: The reason I want to do this is because the regular table has about 10,000 items and in a jsp file, it has to show every record, but it would take to long, so the real reason I want to do this is to paginate the output. This version of Informix doesn't have Limit nor Skip. I can't renumber the serial because is in a relationship, and this is the only solution we could get a fixed number of results on one page (for example 500 results per page). In the Regular table has skipped id's (called folio) because they have been deleted. if i were to put
SELECT * FROM regularTable WHERE folio BETWEEN X AND Y
I would get maybe 300 in one page, then 500 in the next page
You can do this by breaking up the SQL into two temp tables:
CREATE TEMP TABLE tempTable1 (
id serial,
folio int);
SELECT folio FROM regularTable ORDER BY folio
INTO TEMP tempTable2;
INSERT INTO tempTable1(id,folio) SELECT 0,folio FROM tempTable2;
In Informix when using a SELECT as a sub-clause in an INSERT statement, you are limited
to a subset of the SELECT syntax.
The following SELECT clauses are not supported in this case:
INTO TEMP
ORDER BY
UNION.
Additionally, the FROM clause of the SELECT can not reference the same table as referenced by the INSERT (not that this matters in your case).
It's been years since I worked on Informix, but perhaps something like this will work:
INSERT INTO tempTable(id,folio)
SELECT 0, folio
FROM (
SELECT folio FROM regularTable ORDER BY folio
);
You might try it iterating a cursor over the SELECT ... ORDER BY and doing the INSERTs within the loop.
It makes no sense to order the rows as you insert into a table. Relational databases do not allow you to specify the order of rows in a table.
Even if you could, SQL does not guarantee a query will return rows in any order, such as the order you inserted them. You must specify an ORDER BY clause to guarantee an order for a query result.
So it would do you no good to change the order in which you insert the rows.
As stated by Bill, there's not a lot of point ordering the input, you really need to order the output. In the simplistic example you've provided, it just makes no sense, so I can only assume that the real problem you're trying to solve is more complex - deduplication perhaps?
The functionality you're after is CREATE SEQUENCE, but I'm pretty sure it's not available in such an old version of Informix.
If you really need to do what you're asking, you could look into UNLOADing the data in the required order, and then LOADing it again. That would ensure the SERIAL values get allocated sequentially.
Would something like this work?
SELECT
folio
FROM
(
SELECT
ROWNUM n,
folio
FROM
regularTable
ORDER BY
folio
)
WHERE
n BETWEEN 501 AND 1000
It may not be terribly efficient if the table grows larger or you're fetching later "pages", but 10K rows is pretty small.
I don't recall if Informix has a ROWNUM concept, I use Oracle.