Define maximum table size or rows per table - SQL Server 2012 - sql

Is there a way to define the maximum size per table or the maximum number of rows in a table? (for a SELECT INTO new_table operation)
I was working with SELECT INTO and JOINS in tables with approximately 70 million rows and it happened that I made a mistake in the ON condition. As a consequence, the result of this join operation created a table larger than the database size limit. The DB crashed and went into a recovery mode (which left for 2 days)
I would like to know how to avoid this kind of problem in the future. Is there any "good manners manual" when working with huge tables? Any kind of pre-defined configuration to prevent this problem?
I don't have the code but as I said, it was basically a left join and the result inserted in a new table through SELECT INTO.
PS: I don't have much experience with SQL SERVER or any other Relational database.
Thank you.

SET ROWCOUNT 10000 would have made it so that no more than 10,000 rows would be inserted. However, while that can prevent damage, it would also mask the mistake you made with your SQL query.
I'd say that before running any SELECT INTO, I would do a SELECT COUNT(*) to see how many rows my JOIN and WHERE clauses are resulting in. If the count is huge, or if it spends hours even coming up with a count, then that's your warning sign.

Related

INSERT INTO SELECT and SELECT INTO take much longer than the SELECT

I've got a SELECT statement which takes around 500-600ms to execute. If I use the same SELECT in a INSERT INTO ... SELECT ... or SELECT ... INTO it takes up to 30 seconds.
The table is more like a data copy of a view, for performance reasons which gets truncated and filled with the data from time to time. So my SQL looks like:
TRUNCATE myTable
INSERT INTO myTable (col, col, col) SELECT col, col, col FROM otherTable INNER JOIN ...
I tried multiple things like inserting the data into a temp table so no indexes etc. are on the table (well I also tried dropping the indexes from the original table) but nothing seems to help. If I'm inserting the data into the temp table first (which also takes 30 seconds) and then copy it to the real table, the copy itself is pretty fast (< 1 second).
The query results in ~3800 rows and like 30-40 columns.
The second time executing the Truncate-INSERT INTO/SELECT INTO sql takes less than a second (until I clear all caches). The execution plans look the same, except for the Table Insert which has a cost of 90%.
Also tried to get rid of any implicit conversions but that didnt help either.
Someone knows how this can be possible or how I could find the problem? The problem exists on multiple systems running Sql Server 2014/2016.
Edit: Just saw the execution plan of my SELECT shows an "Excessiv Grant" message as it estimated ~11000 rows but the result is only ~3800 rows. Could that be a reason for the slow insert?
I've just had the same problem. All the data types, sizes & allow-NULLS were the same in my SELECT and target table. I tried changing the table to a HEAP, then a cluster, but it made no difference. The SELECT took around 15 seconds but with the INSERT it took around 4 minutes.
In my case, I ended up using SELECT INTO a temp table, then SELECTing from that into my real table, and it reverted back to 15 seconds or so.
The OP said they tried this and it didn't work, but it may do for some people.
I had identical problem.
Select takes around 900ms to execute insert / select into took more then 2 minutes.
I have re written select to improve performance - just few ms for select but it have great improvement for insert.
Try to simplify query plan as much is possible.
for example if you have multiple joins try to prepare multi - steps solution.
For what it's worth now, I had a similar problem just today. It turned out that the table I was inserting into had INT types, and the table I was selecting from had SMALLINT types. Thus, a type conversion was going on (several times) for each row.
Once I changed the target table to have the same types as the source table, then the insertion and selection took the same order of magnitude.

Simple select from table takes 24 seconds in SQL Server 2014

I have a table named [cwbOrder] that currently has 1.277.469 rows. I am using SQL Server 2014 and I am doing these tests on a UAT environment, on production this query takes a little bit longer.
If I try selecting all of the rows like using:
SELECT * FROM cwbOrder
It takes 24 seconds to retrieve all of the data from the table. I have read about how it is important to index columns used in the predicates (WHERE), but I still cannot understand how does a simple select take 24 seconds.
Using this table in other more complex queries generates a lot of extra workload for the query, although I have created the JOINs on indexed columns. Additionally I have selected only 2 columns from this table then JOINED it to another table and this operation still takes a significantly long amount of time. As an example please consider the below query:
Below I have attached the index structure of both tables, to illustrate the matter:
PK_cwbOrder is the index on the id_cwbOrder column in the cwbOrder table.
Edit 1: I have added the execution plan for the query in which I join the cwbOrder table with the cwbAction table.
Is there any way, considering the information above, that I can make this query faster?
There are many reasons why such a select could be slow:
The row size or number of rows could be very large, requiring a lot of time to transport or delay.
Other operations on the table could have locks on the table.
The database server or network could be very busy.
The "table" could really be a view that is running a complicated query.
You can test different aspects. For instance:
SELECT TOP 10 <one column here>
FROM cwbOrder o
This returns a very small result set and reads just a small part of the table. This reads the entire table but returns a small result set:
SELECT COUNT(*)
FROM cwbOrder o

Different result size between SELECT * and SELECT COUNT(*) on Oracle

I have an strange behavior on an oracle database. We make a huge insert of around 3.1 million records. Everything fine so far.
Shortly after the insert finished (around 1 too 10 minutes) I execute two statements.
SELECT COUNT(*) FROM TABLE
SELECT * FROM TABLE
The result from the first statement is fine it gives me the exact number of rows that was inserted.
The result from the second statement is now the problem. Depending on the time, the number of rows that are returned is for example around 500K lower than the result from the first statement. The difference of the two results is decreasing with time.
So I have to wait 15 to 30 minutes before both statements return the same number of rows.
I already talked with the oracle dba about this issue but he has no idea how this could happen.
Any ideas, questions or suggestions?
Update
When I select only an index column I get the correct row count.
When I instead select an non index column I get again the wrong row count.
That doesn't sounds like a bug to me, if I understood you correctly, it just takes time for Oracle to fetch the entire table . After all, 3 Mil is not a small amount.
As opposed to count, which brings 1 record with the total number of rows.
If after some waiting, the number of records being output equals to the number that the count query returns, then everything is fine.
Have you already verified with these things:
1- Count single column instead of * ALL to verify both result
2- You can verify both queries result by adding where clause and gradually select more rows by removing conditions so that you can get the issue where it is returning different value from both.
I think you should check Execution plan to identify missing indexes to improve performance.
Add missing Indexes and check the result.
Why missing Indexes are impotent:
To count row, Oracle engine no need to go throw paging operation. But while fetching all the details from a table, it requires to go through paging.
And paging process depends on indexes created on a table to fetch the data effectively and fast.
So to decrease time for your second statement, you should find missing indexes and create those indexes.
How to Find Missing Indexes:
You can start with DBA_HIST_ACTIVE_SESS_HISTORY, and look at all statements that contain that type of hint.
From there, you can pull the index name coming from that hint, and then do a lookup on dba_indexes to see if the index exists, is valid, etc.

how to speed up a clustered index scan while selecting all fields on range of rows or all the rows

I have a table
Books(BookId, Name, ...... , PublishedYear)
I do have about 30 fields in my Books table, where BookId is the primary key (Identity column). I have about 2 million records for this table.
I know select * is evil performance killer..
I have a situation to select range of rows or all the rows having all the columns in it.
Select * from Books;
this query takes more than 2 seconds to scan through the data page and get all the records. On checking the execution it still uses the Clustered index scan.
Obviously 2 seconds my not be that bad, however when this table has to be joined with other tables which is executed in batch is taking time over 15 minutes (There are no duplicate records though on the final result at completion as the count is matching). The join criteria is pretty simple and yields no duplication.
Excluding this table alone has the batch execution completed in sub seconds.
Is there a way to optimize this having said that I will have to select all the columns :(
Thanks in advance.
I've just run a batch against my developer instance, one SELECT specifying all Columns and one using *. There is no evidence (nor should there) that there is any difference aside from the raw parsing of my input. If I remember correctly, that old saying really means: Do not SELECT columns you are not using, they use up resources without benefit.
When you try to improve performance in your code, always check your assumptions, they might only apply to some older version (of sql server etc) or other method.

Speeding up aggregations for a large table in Oracle

I am trying to see how to improve performance for aggregation queries in an Oracle database. The system is used to run financial series simulations.
Here is the simplified set-up:
The first table table1 has the following columns
date | id | value
It is read-only, has about 100 million rows and is indexed on id, date
The second table table2 is generated by the application according to user input, is relatively small (300K rows) and has this layout:
id | start_date | end_date | factor
After the second table is generated, I need to compute totals as follows:
select date, sum(value * nvl(factor,1)) as total
from table1
left join table2 on table1.id = table2.id
and table1.date between table2.start_date and table2.end_date group by date
My issue is that this is slow, taking up to 20-30 minutes if the second table is particularly large. Is there a generic way to speed this up, perhaps trading off storage space and execution time, ideally, to achieve something running in under a minute?
I am not a database expert and have been reading Oracle performance tuning docs but was not able to find anything appropriate for this. The most promising idea I found were OLAP cubes but I understand this would help only if my second table was fixed and I simply needed to apply different filters on the data.
First, to provide any real insight, you'd need to determine the execution plan that Oracle is producing for the slow query.
You say the second table is ~300K rows - yes that's small compared to 100M but since you have a range condition in the join between the two tables, it's hard to say how many rows from table1 are likely to be accessed in any given execution of the query. If a large proportion of the table is accessed, but the query optimizer doesn't recognize that, the index may actually be hurting instead of helping.
You might benefit from re-organizing table1 as an index-organized table, since you already have an index that covers most of the columns. But all I can say from the information so far is that it might help, but it might not.
Apart from indexes, Also try below. My two cents!
Try running this Query with PARALLEL option employing multiple processors. /*+ PARALLEL(table1,4) */ .
NVL has been done for million of rows, and this will be an impact
to some extent, any way data can be organised?
When you know the date in Advance, probably you divide this Query
into two chunks, by fetching the ids in TABLE2 using the start
date and end date. And issue a JOIN it to TABLE1 using a
view or temp table. By this we use the index (with id as
leading edge) optimally
Thanks!