I am working on a plsql procedure where i am using an insert-select statement.
I need to insert into the table in ordered manner. but the order by i used in the select sql is not working.
is there any specific way in oracle to insert rows in orderly fashion?
The use of an ORDER BY within an INSERT SELECT is not pointless as long as it can change the content of the inserted data, i.e. with a sequence NEXTVAL included in the SELECT clause. And this even if the inserted rows won't be sorted when fetched - that's the role of your ORDER BY clause in your SELECT clause when accessing the rows.
For such a goal, you can use a work-around placing your ORDER BY clause in a sub-query, and it works:
INSERT INTO myTargetTable
(
SELECT mySequence.nextval, sq.* FROM
( SELECT f1, f2, f3, ...fx
FROM mySourceTable
WHERE myCondition
ORDER BY mySortClause
) sq
)
The typical use case for an ordered insert is in order to co-locate particular value in the same blocks (effectively reducing the clustering factor on indexes on columns by which you have ordered the data).
This generally requires a direct path insert ...
insert /*+ append */ into ...
select ...
from ...
order by ...
There's nothing invalid about this as long as you accept that it's only worthwhile for bulk data, that the data will load above the high water mark only, and that there are locking issues involved.
Another approach which achieves mostly the same effect, but which is more arguably more suitable for OLTP systems, is to create the table in a cluster.
The standard Oracle table is a heap-organized table. A heap-organized table is a table with rows stored in no particular order.
Sorting has nothing to do while inserting rows. and is completely pointless. You need an ORDER BY only while projecting/selecting the rows.
That is how the Oracle RDBMS is designed.
I'm pretty sure that Oracle does not guarantee to insert rows to a table in any specific order (even if the rows were inserted in that order).
Performance and storage considerations far outweigh ordering considerations (as every user might have a different preference for order)
Why not just use an "ORDER BY" clause in your SELECT statement?
Or better yet, create a VIEW that already has the ORDER BY clause in it?
CREATE VIEW your_table_ordered
SELECT *
FROM your_table
ORDER BY your_column
Related
Posting two questions:
1.
Let's say there is a query:
SELECT C1, C2, C3 from TABLE;
When this query is fired for the first time,it retrieves all the values in a certain order.
Next time, when the same query is fired, will the previous order be retained?
There are 2 tables, TABLE1 and TABLE2, both of them have identical data.
Will (SELECT * from TABLE1) and (SELECT * from TABLE1) retrieve the same order of rows?
SQL tables represent unordered sets. Period. There is no ordering in a result set unless you explicitly include an ORDER BY.
It is that simple. If you want data in a particular order, then you need to use ORDER BY. That is how relational databases work.
The same query can return results in different orders each time the query is executed. There are no guarantees about the order -- unless the query has an ORDER BY for the outermost SELECT.
No, unless you are fetching data from result cache!
No, unless they are very small tables and your query runs with low parallelism.
Sorry for extra answer, but I see Tim claims that the query will return same result as long as the underlying table(s) is not modified, and the query has same execution plan.
Snowflake executes the queries in parallel, therefore the order of data is not predictable unless ORDER BY is used.
Let's create a table (big enough to be processed in parallel), and run a simple test case:
-- running on medium warehouse
create or replace table my_test_table ( id number, name varchar ) as
select seq4(), 'gokhan' || seq4() from table(generator(rowcount=>1000000000));
alter session set USE_CACHED_RESULT = false;
select * from my_test_table limit 10;
You will see that it will return different rows every time you run.
To answer both questions short: No.
If your query has no ORDER BY-clause, the SELECT statement always returns an unordered set. This means: Even if you query the same table twice and the data didnt change, SELECT without ORDER BY can retrieve different row-orders.
https://docs.snowflake.com/en/sql-reference/sql/select.html
I want to insert multiple rows efficiently into VERTICA. In PostgreSQL (and probably other SQL implementations) it is possible to INSERT multiple rows in one statement, which is a lot faster, than doing single inserts (especially when in Autocommit mode).
A minimal self-contained example to load two rows in a newly created table could look like this (a):
CREATE TABLE my_schema.my_table (
row_count int,
some_float float,
some_string varchar(8));
INSERT INTO my_schema.my_table (row_count, some_float, some_string)
VALUES (1,1.0,'foo'),(2,2.0,'bar');
But the beauty of this is, that the order in which the values are bunched can be changed to be something like (b):
INSERT INTO my_schema.my_table (some_float, some_string, row_count)
VALUES (1.0,'foo',1),(2.0,'bar',2);
Furthermore, this syntax allows for leaving out columns which are then filled by default values (such as auto incrementing integers etc.).
However, VERTICA does not seem to have the possibility to do a multi-row insert with the same fine-tuning. In fact, the only way to emulate a similar behaviour seems to be to UNION several selects together for something like (c):
INSERT INTO my_schema.my_table SELECT 1,1.0,'foo' UNION SELECT 2,2.0,'bar';
as in this answer: Vertica SQL insert multiple rows in one statement .
However, this seems to be working only, when the order of the inserted columns matches the order of their initial definition. My question is, it is possible to craft a single insert like (c) but with the possibility of changing column order as in (b)? Or am I tackling the problem completely wrong? If so, what alternative is there to a multi-row insert? Should I try COPY LOCAL?
Just list the columns in the insert:
INSERT INTO my_schema.my_table (row_count, some_float, some_string)
SELECT 1,1.0,'foo'
UNION ALL
SELECT 2,2.0,'bar';
Note the use of UNION ALL instead of UNION. UNION incurs overhead for removing duplicates, which is not needed.
I have a very large table with over 1000 records and 200 columns. When I try to retreive records matching some criteria in the WHERE clause using SELECT statement it takes a lot of time. But most of the time I just want to select a single record that matches the criteria in the WHERE clause rather than all the records.
I guess there should be a way to select just a single record and exit which would minimize the retrieval time. I tried ROWNUM=1 in the WHERE clause but it didn't really work because I guess the engine still checks all the records even after finding the first record matching the WHERE criteria. Is there a way to optimize in case if I want to select just a few records?
Thanks in advance.
Edit:
I am using oracle 10g.
The Query looks like,
Select *
from Really_Big_table
where column1 is NOT NULL
and column2 is NOT NULL
and rownum=1;
This seems to work slower than the version without rownum=1;
rownum is what you want, but you need to perform your main query as a subquery.
For example, if your original query is:
SELECT co1, col2
FROM table
WHERE condition
then you should try
SELECT *
FROM (
SELECT col1, col2
FROM table
WHERE condition
) WHERE rownum <= 1
See http://www.oracle.com/technology/oramag/oracle/06-sep/o56asktom.html for details on how rownum works in Oracle.
1,000 records isn't a lot of data in a table. 200 columns is a reasonably wide table. For this reason, I'd suggest you aren't dealing with a really big table - I've performed queries against millions of rows with no problems.
Here is a little experiment... how long does it take to run this compared to the "SELECT *" query?
SELECT
Really_Big_table.Id
FROM
Really_Big_table
WHERE
column1 IS NOT NULL
AND
column2 IS NOT NULL
AND
rownum=1;
An example is here: You can view more here
SELECT ename, sal
FROM ( SELECT ename, sal, RANK() OVER (ORDER BY sal DESC) sal_rank
FROM emp )
WHERE sal_rank <= 1;
You also have to do some column indexing for column in the WHERE clause
In SQL most of the optimization would come in the form on index on the table (where you would index the columns that appear in the WHERE and ORDER BY columns as a rough guide).
You did not specify what SQL database you are using, so I can't point to a good resource.
Here is an introduction to indexes on Oracle.
Here another tutorial.
As for queries - you should always specify the columns you are returning and not use a blanket *.
it shouldn't take a lot of time to query a 1000 rows table. There are exceptions however, check if you are in one of the following cases:
1. Lots of rows were deleted
The table had a massive amount of rows in the past. Since the High Water Mark (HWM) is still high (delete won't lower it) and FULL TABLE SCAN read all the data up to the high water mark, it may take a lot of time to return results even if the table is now nearly empty.
Analyse your table (dbms_stats.gather_table_stats('<owner>','<table>')) and compare the space actually used by the rows (space on disk) with the effective space (data), for example:
SELECT t.avg_row_len * t.num_rows data_bytes,
(t.blocks - t.empty_blocks) * ts.block_size bytes_used
FROM user_tables t
JOIN user_tablespaces ts ON t.tablespace_name = ts.tablespace_name
WHERE t.table_name = '<your_table>';
You will have to take into account the overhead of the rows and blocks as well as the space reserved for update (PCT_FREE). If you see that you use a lot more space than required (typical overhead is below 30%, YMMV) you may want to reset the HWM, either:
ALTER TABLE <your_table> MOVE; and then rebuild INDEX (ALTER INDEX <index> REBUILD), don't forget to collect stats afterwards.
use DBMS_REDEFINITION
2. The table has very large columns
Check if you have columns of datatype LOB, CLOB, LONG (irk), etc. Data over 4000 bytes in any of these columns is stored out of line (in a separate segment), which means that if you don't select these columns you will only query the other smaller columns.
If you are in this case, don't use SELECT *. Either you don't need the data in the large columns or use SELECT rowid and then do a second query : SELECT * WHERE rowid = <rowid>.
Insert 200 data througn for-loop into sqlserver 2000 database, the order change, why ?
When I use mysql, it doesn't have the matter.
i mean:
when you insert 2, then insert 3, then insert 1, in mysql you will see 2,3,1 like the order you insert. But in sqlsever2000 that may not.
The order in which rows are stored doesn't really matter. Actually SQL tables don't have any ordering. If order matters to you, you need to query using an ORDER BY clause.
There are no guarantees on the order of the results of a select statement, unless you add an ORDER BY clause. You might get the results back in the order of insertion, but it's not guaranteed.
If you have an index on the table, it might appear that your data is being ordered in a way that you're not expecting. Normally, you'll just have an identity column as your surrogate primary key, which means your inserted data will show up in "order" if you just do a select *, but if you have indexes on other columns the data might be ordered differently when you do select *.
You can control physical ordering by creating a clustered index on that columns. Only one clustered index is allowed per table. Cluster index make sure when you perform SELECT you will always get data in order which was defined when creating cluster index.
I'm using an Informix (Version 7.32) DB. On one operation I create a temp table with the ID of a regular table and a serial column (so I would have all the IDs from the regular table numbered continuously). But I want to insert the info from the regular table ordered by ID something like:
CREATE TEMP TABLE tempTable (id serial, folio int );
INSERT INTO tempTable(id,folio)
SELECT 0,folio FROM regularTable ORDER BY folio;
But this creates a syntax error (because of the ORDER BY)
Is there any way I can order the info then insert it to the tempTable?
UPDATE: The reason I want to do this is because the regular table has about 10,000 items and in a jsp file, it has to show every record, but it would take to long, so the real reason I want to do this is to paginate the output. This version of Informix doesn't have Limit nor Skip. I can't renumber the serial because is in a relationship, and this is the only solution we could get a fixed number of results on one page (for example 500 results per page). In the Regular table has skipped id's (called folio) because they have been deleted. if i were to put
SELECT * FROM regularTable WHERE folio BETWEEN X AND Y
I would get maybe 300 in one page, then 500 in the next page
You can do this by breaking up the SQL into two temp tables:
CREATE TEMP TABLE tempTable1 (
id serial,
folio int);
SELECT folio FROM regularTable ORDER BY folio
INTO TEMP tempTable2;
INSERT INTO tempTable1(id,folio) SELECT 0,folio FROM tempTable2;
In Informix when using a SELECT as a sub-clause in an INSERT statement, you are limited
to a subset of the SELECT syntax.
The following SELECT clauses are not supported in this case:
INTO TEMP
ORDER BY
UNION.
Additionally, the FROM clause of the SELECT can not reference the same table as referenced by the INSERT (not that this matters in your case).
It's been years since I worked on Informix, but perhaps something like this will work:
INSERT INTO tempTable(id,folio)
SELECT 0, folio
FROM (
SELECT folio FROM regularTable ORDER BY folio
);
You might try it iterating a cursor over the SELECT ... ORDER BY and doing the INSERTs within the loop.
It makes no sense to order the rows as you insert into a table. Relational databases do not allow you to specify the order of rows in a table.
Even if you could, SQL does not guarantee a query will return rows in any order, such as the order you inserted them. You must specify an ORDER BY clause to guarantee an order for a query result.
So it would do you no good to change the order in which you insert the rows.
As stated by Bill, there's not a lot of point ordering the input, you really need to order the output. In the simplistic example you've provided, it just makes no sense, so I can only assume that the real problem you're trying to solve is more complex - deduplication perhaps?
The functionality you're after is CREATE SEQUENCE, but I'm pretty sure it's not available in such an old version of Informix.
If you really need to do what you're asking, you could look into UNLOADing the data in the required order, and then LOADing it again. That would ensure the SERIAL values get allocated sequentially.
Would something like this work?
SELECT
folio
FROM
(
SELECT
ROWNUM n,
folio
FROM
regularTable
ORDER BY
folio
)
WHERE
n BETWEEN 501 AND 1000
It may not be terribly efficient if the table grows larger or you're fetching later "pages", but 10K rows is pretty small.
I don't recall if Informix has a ROWNUM concept, I use Oracle.