optimising a query - sql

i have a table in database that is having 7k plus records. I have a query that searches for a particular id in that table.(id is auto-incremented)
query is like this->
select id,date_time from table order by date_time DESC;
this query will do all search on that 7k + data.......isn't there anyways with which i can optimize this query so that search is made only on 500 or 1000 records....as these records will increase day by day and my query will become heavier and heavier.Any suggestions?????

I don't know if im missing something here but what's wrong with:
select id,date_time from table where id=?id order by date_time DESC;
and ?id is the number of the id you are searching for...
And of course id should be a primary index.

If id is unique (possibly your primary key), then you don't need to search by date_time and you're guaranteed to only get back at most one row.
SELECT id, date_time FROM table WHERE id=<value>;
If id is not unique, then you still use the same query but need to look at indexes, other contraints, and/or caching outside the database, if the query becomes too slow.

Related

SQL query slow performance 'LIKE CNTR%04052021%

we have a database that is growing every day. roughly 40M records as of today.
This table/database is located in Azure.
The table has a primary key 'ClassifierID', and the query is running on this primary key.
The primary key is in the format of ID + timestamp (mmddyyy HHMMSS), for example 'CNTR00220200 04052021 073000'
Here is the query to get all the IDs by date
**Select distinct ScanID
From ClassifierResults
Where ClassifierID LIKE 'CNTR%04052020%**
Very simple and straightforward, but it sometimes takes over a min to complete. Do you have any suggestion how we can optimize the query? Thanks much.
The best thing here would be to fix your design so that a) you are not storing the ID and timestamp in the same text field, and b) you are storing the timestamp in a proper date/timestamp column. Using your single point of data, I would suggest the following table design:
ID | timestamp
CNTR00220200 | timestamp '2021-04-05 07:30:00'
Then, create an index on (ID, timestamp), and use this query:
SELECT *
FROM yourTable
WHERE ID LIKE 'CNTR%' AND
timestamp >= '2021-04-05' AND timestamp < '2021-04-06';
The above query searches for records having an ID starting with CNTR and falling exactly on the date 2021-04-05. Your SQL database should be able to use the composite index I suggested above on this query.

sql last recorded row take too long too query

I am using the following query to get the last row of the table. This code seems to be what should be done for this type of query. So I am wondering why it takes so long for the table to output the result.
I am looking for the potential reasons that could lead to this long time to query. For example maybe the row is not in the top (I use some conditions) and it takes time to reach the row if the query read from top to bottom of the table? Or the query need to read all the table before to conclude which row is correct (I don't think it is the case)?
Any contribution to know how the "sorting algo" is working is appreciated.
The code:
SELECT created_at , currency, balance
from YYY
where id in (ZZZ) and currency = 'XXX'
order by created_at desc
limit 1
This is your query:
select created_at, currency, balance
from YYY
where id in (ZZZ) and currency = 'XXX'
order by created_at desc
limit 1;
If your query has no indexes, then the engine needs to scan the entire table, choose the rows that match the where clause, sort those rows and choose the one with the maximum created_at.
If you have an index on YYY(currency, id, created_at desc), then the engine can simply look up the appropriate value right away using the index.
So, if you have the right optimizations in the database, then the query is likely to be much, much faster.

Select random rows according to a given criteria PostgreSQL

I have table user with ten million rows. It has fields: id int4 primary key, rating int4, country varchar(32), last_active timestamp. It has gaps in identifiers.
The task is to select five random users for a given country which were active in a period of last two days and has rating in a given range.
Is there a tricky way to select them faster than the query below?
SELECT id
FROM user
WHERE last_active > '2020-04-07'
AND rating between 200 AND 280
AND country = 'US'
ORDER BY random()
LIMIT 5
It thought about this query:
SELECT id
FROM user
WHERE last_active > '2020-04-07'
AND rating between 200 AND 280
AND country = 'US'
AND id > (SELECT random()*max(id) FROM user)
ORDER BY id ASC
LIMIT 5
but the problem is that there lots of inactive users with small identifier values, the majority of new users are in the end of the id range. So, this query would select some users too often.
Based on the EXPLAIN plan, your table is large. About 2 rows per page. Either it is very bloated, or the rows themselves are very wide.
The key to getting good performance is probably to get it to use an index-only scan, by creating an index which contains all 4 columns referenced in your query. The column tested for equality should come first. After that, you have to choose between your two range-or-inequality queried columns ("last_active" or "rating"), based on whichever you think will be more selective. Then you add the other range-or-inequality and the id column to the end, so that an index-only scan can be used. So maybe create index on app_user (country, last_active, rating, id). That will probably be good enough.
You could also try a GiST index on those same columns. This has the theoretical advantage that the two range-or-inequality restrictions can be used together in defining what index pages to look at. But in practise GiST indexes have very high overhead, and this overhead would likely exceed the theoretical benefit.
If the above aren't good enough, you could try partitioning. But how exactly you do that should be based on a holistic view of your application, not just one query.

Very slow SQL query

I'm having a problem with a slow query. Consider the table tblVotes - and it has two columns - VoterGuid, CandidateGuid. It holds votes cast by voters to any number of candidates.
There are over 3 million rows in this table - with about 13,000 distinct voters casting votes to about 2.7 million distinct candidates. The total number of rows in the table is currently 6.5 million.
What my query is trying to achieve is getting - in the quickest and most cache-efficient way possible (we are using SQL Express) - the top 1000 candidates based on the number of votes they have received.
The code is:
SELECT CandidateGuid, COUNT(*) CountOfVotes
FROM dbo.tblVotes
GROUP BY CandidateGuid
HAVING COUNT(*) > 1
ORDER BY CountOfVotes DESC
... but this takes a scarily long time to run on SQL express when there is a very full table.
Can anybody suggest a good way to speed this up and get it running in quick time? CandidateGuid is indexed individually - and there is a composite primary key on CandidateGuid+VoterGuid.
If you have only two columns in a table, a "normal" index on those two fields won't help you much, because it is in fact a copy of your entire table, only ordered. First check in execution plan, if your index is being used at all.
Then consider changing your index to clustered index.
Try using a top n, instead of a having clause - like so:
SELECT TOP 1000 CandidateGuid, COUNT(*) CountOfVotes
FROM dbo.tblVotes
GROUP BY CandidateGuid
ORDER BY CountOfVotes DESC
I don't know if SQL Server is able to use the composite index to speed this query, but if it is able to do so you would need to express the query as SELECT CandidateGUID, COUNT(VoterGUID) FROM . . . in order to get the optimization. This is "safe" because you know VoterGUID is never NULL, since it's part of a PRIMARY KEY.
If your composite primary key is specified as (CandidateGUID, VoterGUID) you will not get any added benefit of a separate index on just CandidateGUID -- the existing index can be used to optimize any query that the singleton index would assist in.

which one is a faster/better sql practice?

Suppose I have a 2 column table (id, flag) and id is sequential.
I expect this table to contain a lot of records.
I want to periodically select the first row not flagged and update it. Some of the records on the way may have already been flagged, so I want to skip them.
Does it make more sense if I store the last id I flagged and use it in my select statement, like
select * from mytable where id > my_last_id order by id asc limit 1
or simply get the first unflagged row, like:
select * from mytable where flagged = 'F' order by id asc limit 1
Thank you!
If you create an index on flagged, retrieving an unflagged row should be pretty much an instant operation. If you always update them sequentially, then the first method is fine though.
Option two is the only one that makes sense unless you know that you're always going to process records in sequence!
Assuming MySQL, this one:
SELECT *
FROM mytable
WHERE flagged = 'F'
ORDER BY
flagged ASC, id ASC
LIMIT 1
will be slightly less efficient in InnoDB and of same efficiency in MyISAM, if you have an index on (flagged, id).
InnoDB tables are clustered on the PRIMARY KEY, so fetching the first record in id does not require looking up the table.
In MyISAM, tables are heap-organized, so the index used to police the PRIMARY KEY is stored separately from the table.
Note the flagged in the ORDER BY clause may seem to be redundant, but it is required for MySQL to pick the correct index.
Also, the composite index should be on (flagged, id) even in InnoDB (which implicitly includes the PRIMARY KEY into each index).
You could use
Select Min(Id) as 'Id'
From dbo.myTable
Where Flagged='F'
Assuming the Flagged = 'F' means that it is not flagged.