WHERE greater than a given datetime to reduce queried rows - sql

I've got a stored procedure that searches for rows in a table based on a given text. TableWithText has a SomeText column of type nvarchar and also a CreateDate DateTime column populated with the date and time that the row was created.
The body of the stored procedure is this:
SELECT TableWithTextID, SomeOtherColumn
FROM TableWithText
WHERE SomeText = #inputText
The value of SomeText for each is guaranteed to be unique although no such constraint is imposed. Therefore this statement is expected to return only one row.
However the table has some 500,000 rows. Given that I know when the row I'm looking for was entered (down to the minute), if I add
AND CreateDate >= #CreateDate
to the stored procedure, will the MS SQL query optimizer reduce the amount of query rows to those created after #CreateDate before it searches for the input text?

The best thing to do is to review the execution plan and see what the optimizer is telling you. You might think there is a problem just by looking at the query and the number of rows but the actual cost is quite low.
If you already have an index on CreateDate, then add this to the where clause and it should take advantage of that.
Otherwise, you would be better off indexing the SomeText field if this is something that is run a lot and you are noticing full table scans when executing this. I'm guessing it's used in other queries two given that it's a unique thing?

Yes, potentially, but only if you add an index on the CreateDate column (or an index on SomeText, CreateDate)

Related

Oracle efficient way of updating non-indexed and non-partioned table?

Is there an efficient way to update rows of a table that has no indexes and no partitions (and ~50millions rows)?
I have a date field LOAD_DTTM and values of this field for rows that require update (around 2000 distinct dates).
WIll update be faster if i specify a date in a WHERE clause along with the UNIQUE_ID of a row?
If you want to update all, or a large number, of the rows then the quickest way is:
create table my_table_copy as
select ... -- all the columns, updating values as required
from my_table;
drop table my_table;
rename my_table_copy to my_table;
If your table had any indexes, constraints or triggers you would now need to re-add them - but it seems you don't have that issue!
You could create a PL/SQL procedure that loops and update and commit the table every n row count -- Say every 20.000 rows. I do not advise to update the full table as it will create a lock for a looong time and expose you to data loss in case of external factors.
The answer is NO.
Even if you specify both conditions in your WHERE clause as you stated, it won't help you to avoid a full scan of your table.
Even if one of your criteria will uniquely identify the row, it still won't help.
There is a real example tested on Oracle 12C ver.2 similar to your case. No indexes, no partitions, nothing. Just plain table with 4 columns
I have a table with 18mn records.
I also have CUSTOMER_ID which is a UNIQUE identifier for a row.
I also have ORDER_DATE column there.
Even if I do the query that you mentioned
update hit set status = 1 where customer_id = 408518625844 and order_date = '09-DEC-19';
it won't help me to avoid a full table scan. See below Execution Plan. Therefore under conditions, you've specified, you will be always getting the slowest execution time possible. Full Table Scan on 50mn rows is actually the worst-case scenario.
And pay attention to that Cost, it is 26539 on 18mn rows.
So if you have 50mn rows we can easily expect much more Cost for your query

How to improve performance of select queries from table with lots of columns

I have a table with almost 100 columns. The type for most of the columns is Nvarchar(50).
When I select rows by query, it takes a long time, depending on the number of rows. Now, the table includes 250,000 rows. It takes a a full minute to get all the rows when I execute:
SELECT * FROM Table1
I tried to add indexes, but it didn't shorten the time.
How can I build this table in a better way, so that the query's execute time will be shorter?
I'm using SQL Server 2014.
Try to reduce the size of the table by adjusting the data types to the minimum requirements. Don't use a bigint where an int will suffice, or Unicode types if you are not holding Unicode text and keep data lengths to the minimum requirement. If there are nvarchar(50) columns with a great deal of Nulls, set Sparse to True in the column properties to reduce the amount of data that has to be loaded.
You can use an index with includes. If you make sure that the include contains only the columns you select, SQL server can run the query exclusively on the index, which improves performance.
For example a table...
CREATE TABLE Example (
Column1 INT,
Column2 INT,
--....
Column100 INT
)
...with a query on a subset of the columns:
SELECT Column1, Column2, Column3
WHERE Column1 > 1000
You can create the index
CREATE INDEX IX_Example
ON Example(Column1)
INCLUDE (Column2, Column3)
Since SQL Server can get all the information it needs from the index, it won't touch the table. However, you will need more space for index than without includes, because the data from the table is now stored redundandly in the index.
Your Execution plan will give you answer, check it
If you're using 'SELECT * FROM Table1' then Indexes won't make any difference. SQL will need to read the entire table from Disk to Buffer. If you can't get away from Selecting all columns then look at what record-sets are most often returned, i.e. Transactions from 2016. If this is the case, then you can introduce Partitioning (Enterprise), or if you are on SQL Standard create a 'Filtered Covering Index' (including all columns) that only contains records from 2016. If you can afford the extra update overhead then SQL will use the smaller index to fulfill the bulk of user demand.

dictionary database, one table vs table for each char

I have a very simple database contains one table T
wOrig nvarchar(50), not null
wTran nvarchar(50), not null
The table has +50 million rows. I execute a simple query
select wTran where wOrig = 'myword'
The query takes about 40 sec to complete. I divided the table based on the first char of wOrig and the execution time is much smaller than before (based on each table new length).
Am I missing something here? Should not the database use more efficient way to do the search, like binary search?
My question What changes to the database options - based on this situation - could make the search more efficient in order to keep all the data in one table?
You should be using an index. For your query, you want an index on wTran(wOrig). Your query will be much faster:
create index idx_wTran_wOrig on wTran(wOrig);
Depending on considerations such as space and insert/update characteristics, a clustered index on (wOrig) or (wOrig, wTran) might be the best solution.

Select data with a dynamic where clause on non-indexed column

I have a table with 30 columns and millions of entries.
I want to execute a stored procedure on this table to search data.
The search criteria are passed in a parameter to this SP.
If I serach data with a dynamic WHERE clause on non-indexed column, it spends a lot of time.
Below is an example :
Select counterparty_name from counterparty where counterparty_name = 'test'
In this example this counterparty is in th row number 5000000.
As explained,I can't create an index to this table .
I would like to know if the processing time is normal.
I would like to know if there is any recommandation that can improve the execution time?
Best regards.
If you do not have an index on the column then it will have to do a scan of the clustered index in order to look for the data (or maybe a smaller index which might have that column included in it). As such it is going to take a long time.

Oracle sql statement on very large table

I relative new to sql and I have a statement which takes forever to run.
SELECT
sum(a.amountcur)
FROM
custtrans a
WHERE
a.transdate <= '2013-12-31';
I's a large table but the statemnt takes about 6 minutes!
Any ideas why?
Your select, as you post it, will read 99% of the whole table (2013-12-31 is just a week ago, and i assume most entries are before that date and only very few after). If your table has many large columns (like varchar2(4000)), all that data will be read as well when oracle scans the table. So you might read several KB each row just to get the 30 bytes you need for amountcur and transdate.
If you have this scenario. create a combined index on transdate and amountcur:
CREATE INDEX myindex ON custtrans(transdate, amountcur)
With the combined index, oracle can read the index to fulfill your query and doesn't have to touch the main table at all, which might result in considerably less data that needs to be read from disk.
Make sure the table has an index on transdate.
create index custtrans_idx on custtrans (transdate);
Also if this field is defined as a date in the table then do
SELECT sum(a.amountcur)
FROM custtrans a
WHERE a.transdate <= to_date('2013-12-31', 'yyyy-mm-dd');
If the table is really large, the query has to scan every row with transdate below given.
Even if you have an index on transdate and it helps to stop the scan early (which it may not), when the number of matching rows is very high, it would take considerable time to scan them all and sum the values.
To speed things up, you could calculate partial sums, e.g. for each passed month, assuming that your data is historical and past does not change. Then you'd only need to scan custtrans only for 1-2 months, then quickly scan the table with monthly sums, and add the results.
Try to create an index only on column amountcur:
CREATE INDEX myindex ON custtrans(amountcur)
In this case Oracle will read most probably only the Index (Index Full Scan), nothing else.
Correction, as mentioned in comment. It must be a composite Index:
CREATE INDEX myindex ON custtrans(transdate, amountcur)
But maybe it is a bit useless to create an index just for a single select statement.
One option is to create an index on the column used in the where clause (this is useful if you want to retrieve only 10-15% rows by using indexed column).
Another option is to partition your table if it has millions of rows. In this case also if you try to retrieve 70-80% data, it wont help.
The best option is first to analyze your requirements and then make a choice.
Whenever you deal with date functions it's better to use to_date() function. Do not rely on implicit data type conversion.