Is it faster to use limit statement with known max count? - sql

a query from a large table like:
select something from sometable limit somecount;
I know the limit statement is usefull to avoid get too much rows from a query.
But how about using it when not much rows got but in a large table?
for example, there is a table create like this
CREATE TABLE if not exists users (
id integer primary key autoincrement,
name varchar(80) unique not null,
password varchar(20) not null,
role integer default 1, -- 0 -> supper admin; 1 -> user
banned integer default 0
);
case 1: i want to get users where id=100. Here id is primary key,
surely it can get 1 row at most.which is faster between 2 statements below?
select * from users where id=100;
select * from users where id=100 limit 1;
case 2: i want to get users where name='jhon'. Here name is unique,
also it can get 1 row at most.which is faster between 2 statements below?
select * from users where name='jhon';
select * from users where name='jhon' limit 1;
case 3: i want to get users where role=0. Here role is neither primary key
nor unique, but i know there are only 10 rows at most. which is faster between 2 statements below?
select * from users where role=0;
select * from users where role=0 limit 10;

If you care about performance, then add indexes to handle all three queries. This requires an additional index on: users(role). The id column already has an index as the primary key; name has an index because it is declared unique.
For the first two cases, the limit shouldn't make a difference. Without limit, the engine finds the matching row in the index and returns it. If the engine doesn't use the "unique" information, then it might need to peek at the next value in the index, just to see if it is the same.
The third case, with no index, is a bit different. Without an index, the engine will want to scan all the rows to find all matches. With an index, it can find all the matching rows there. Add a limit to that, and it will just stop at the first one.
The appropriate indexes will be a bigger boost to performance than using limit, on average.

Related

SQL - Get specific row without a full table scan

I'm using Postgresql (cockroachdb) and I want to select a specific row. For example, there are thousands of records and I want to select row number 999.
In this case we would use LIMIT and OFFSET, SELECT * FROM table LIMIT 1 OFFSET 998;
However, using LIMIT and OFFSET can cause performance issue according to this post. So I'm wondering if there a way to get specific row without a full table scan.
I feel like it is possible because the database seems to sort data by primary key, that when I do SELECT * FROM table; it always show a sorted result. Since it is sorted by primary key, database can use index to access a specific row, right?
If you select rows based on the primary key (e.g. SELECT * FROM table WHERE <primary key> = <value>), no scans will be needed underneath the hood. The same is also true if you define a secondary index on the table and apply a WHERE clause that filters based on the column(s) in the secondary index.

Is it advised to index the field if I envision retrieving all records corresponding to positive values in that field?

I have a table with definition somewhat like the following:
create table offset_table (
id serial primary key,
offset numeric NOT NULL,
... other fields...
);
The table has about 70 million rows in it.
I envision doing the following query many times
select * from offset_table where offset > 0;
For speed issues, I am wondering whether it would be advised to create an index like:
create index on offset_table(offset);
I am trying to avoid creation of unnecessary indices on this table as it is pretty big already.
As you mentioned in the comments, it would be ~70% of rows that match the offset > 0 predicate.
In that case the index would not be beneficial, since postgresql (and basically every other DBMS) would prefer a full table scan instead. It happens because it would be faster than jumping between reading the index consequently and the table randomly.

SQL index column always value from 1 to N

I think my question is very simple but every search in the web shows me results about SQL indexing.
I use the following SQL query to create a simple table:
CREATE TABLE SpeechOutputList
(
ID int NOT NULL IDENTITY(1,1),
SpeechConfigCode nvarchar(36) NOT NULL,
OutputSentence nvarchar(500),
IsPrimaryOutput bit DEFAULT 0,
PRIMARY KEY(ID),
FOREIGN KEY(SpeechConfigCode)
REFERENCES SpeechConfig
ON UPDATE CASCADE ON DELETE CASCADE
);
I would like to add an index column that increases automatically (not identity(1,1)) which always has values from 1 to N (according to the number of rows).
identity(1,1) will not do since there are many cases there are no continues numbers from 1 to N because it's intended for primary key.
Thanks
Trying to keep such an index field sequential, and without gaps, will not be efficient. If for instance a record is removed, you would need to have a trigger that renumbers the records that follow. This will take not only extra time, it will also reduce concurrency.
Furthermore, that index will not be a stable key for a record. If a client would get the index value of a record, and then later would try to locate it again by that index, it might well get a different record as a result.
If you still believe such an index is useful, I would suggest to create a view that will add this index on-the-fly:
CREATE VIEW SpeechOutputListEx AS
SELECT ID, SpeechConfigCode, OutputSentence, IsPrimaryOutput,
ROW_NUMBER() OVER (ORDER BY ID ASC) AS idx
FROM SpeechOutputList
This will make it possible to do selections, like:
SELECT * FROM SpeechOutputListEx WHERE idx = 5
To make an update, with a condition on the index, you would take the join with the view:
UPDATE s
SET OutputSentence = 'sentence'
FROM SpeechOutputList s
INNER JOIN SpeechOutputListEx se
ON s.ID = se.ID
WHERE idx = 5
The issue of primary:
You explained in comments that the order should indicate whether a sentence is primary.
For that purpose you don't need the view. You could add a column idx, that would allow gaps. Then just let the user determine the value of the idx column. Even if negative, that would not be an issue. You would select in order of idx value and so get the primary sentence first.
If a sentence would have to be made primary, you could issue this update:
update SpeechOutputList
set idx = (select min(idx) - 1 from SpeechOutputList)
where id = 123

SQL get last rows in table WITHOUT primary ID

I have a table with 800,000 entries without a primary key. I am not allowed to add a primary key and I cant sort by TOP 1 ....ORDER BY DESC because it takes hours to complete this task. So I tried this work around:
DECLARE #ROWCOUNT int, #OFFSET int
SELECT #ROWCOUNT = (SELECT COUNT(field) FROM TABLE)
SET #OFFSET = #ROWCOUNT-1
select TOP 1 FROM TABLE WHERE=?????NO PRIMARY KEY??? BETWEEN #Offset AND #ROWCOUNT
Of course this doesn't work.
Anyway to do use this code/or better code to retrieve the last row in table?
If your table has no primary key or your primary key is not orderly... you can try the code below... if you want see more last record, you can change the number in code
Select top (select COUNT(*) from table) * From table
EXCEPT
Select top ((select COUNT(*) from table)-(1)) * From table
I assume that when you are saying 'last rows', you mean 'last created rows'.
Even if you had primary key, it would still be not the best option to use it do determine rows creation order.
There is no guarantee that that the row with the bigger primary key value was created after the row with a smaller primary key value.
Even if primary key is on identity column, you can still always override identity values on insert by using
set identity_insert on.
It is a better idea to have timestamp column, for example CreatedDateTime with a default constraint.
You would have index on this field.Then your query would be simple, efficient and correct:
select top 1 *
from MyTable
order by CreatedDateTime desc
If you don't have timestamp column, you can't determine 'last rows'.
If you need to select 1 column from a table of 800,000 rows where that column is the min or max possible value, and that column is not indexed, then the unassailable fact is that SQL will have to read every row in the table in order to identify that min or max value.
(An aside, on the face of it reading all the rows of an 800,000 row table shouldn't take all that long. How wide is the column? How often is the query run? Are there concurrency, locking, blocking, or deadlocking issues? These may be pain points that could be addressed. End of aside.)
There are any number of work-arounds (indexes, views, indexed views, peridocially indexed copies of the talbe, run once store result use for T period of time before refreshing, etc.), but virtually all of them require making permanent modifications to the database. It sounds like you are not permitted to do this, and I don't think there's much you can do here without some such permanent change--and call it improvement, when you discuss it with your project manager--to the database.
You need to add an Index, can you?
Even if you don't have a primary key an Index will speed up considerably the query.
You say you don't have a primary key, but for your question I assume you have some type of timestamp or something similar on the table, if you create an Index using this column you will be able to execute a query like :
SELECT *
FROM table_name
WHERE timestamp_column_name=(
SELECT max(timestamp_column_name)
FROM table_name
)
If you're not allowed to edit this table, have you considered creating a view, or replicating the data in the table and moving it into one that has a primary key?
Sounds hacky, but then, your 800k row table doesn't have a primary key, so hacky seems to be the order of the day. :)
I believe you could write it simply as
SELECT * FROM table ORDER BY rowid DESC LIMIT 1;
Hope it helps.

SELECT query slow when flag column is a constraint

I have a fairly simple table called widgets. Each row holds an id, a description, and an is_visible flag:
CREATE TABLE `widgets` (
`id` int auto_increment primary key,
`description` varchar(255),
`is_visible` tinyint(1) default 1
);
I'd like to issue a query that selects the descriptions of a subset of visible widgets. The following simple query does the trick (where n and m are integers):
SELECT `description`
FROM `widgets`
WHERE (`is_visible`)
ORDER BY `id` DESC
LIMIT n, m;
Unfortunately this query, as written, has to scan at least n+m rows. Is there a way to make this query scan fewer rows, either by reworking the query or modifying the schema?
Use indexes for faster query result:
ALTER TABLE `widgets` ADD INDEX ( `is_visible` )
Is there a way to make this query scan fewer rows?
No, not really. Given that it's a binary flag, you wouldn't get much benefit from creating an index on that field.
I will elaborate, given the downvote.
You have to take into consideration the cardinality (# of unique values) of an index. From the MySQL Manual:
The higher the cardinality, the greater the chance that MySQL uses the index when doing joins.
On that field would be 2. It doesn't get much lower than that.
See also: Why does MySQL not use an index on a int field that's being used as a boolean?
Indexing boolean fields