How to Optimize SQL query with DISTINCT and fetch first? - sql

I am trying to fetch first 5 distinct rows using the query SELECT DISTINCT col_A from table_name fetch first 5 rows only;
But the table has millions of rows so using distinct is expensive as it scans the whole table for only 5 rows taking a lot of time, around 200 seconds for me.
Is there a workaround or subquery for this?

The problem here is you don't know if a row is unique unless you check all the other rows. I believe your only solution would be to index the rows such that non-distinct rows are indexed together. That might buy you some efficiency when searching, however it will cost you when inserting data.

Related

Sql tuning - Just need 5 records from table

select *
from customers
where column1 = 'test'
limit 5;
I just need 5 records. will execution engine stop running after finding 5 records which matches condition.
I am working on table with millions of records simple select statement with limit is taking ~20 minutes.
Can I improve the performance of this query?
Make sure that you have an index on column1. If not, then the engine has to scan ALL records starting with the first one until it finds 5 matching records. If you know that more than this single column will match your desired rows and also exclude other rows, you could create a compound index consisting of more than one row. You could also consider partitioning your table.

SQL takes more time to fetch records

I am using sql server 2012. I have one table, contains 1.7 million records. I am just selecting all record using (select * from table_name). But it takes 1 Hour 2 minutes to fetch.
What should i do to fetch records quickly?
All you can do is limit your Result by Using
Top(100) Result or Where Clause Your Desired Data instead of
SELECT * FROM table
This will help you get only concerning and limited data in a less amount of time.
Another thing you can do is to Get Concerning columns only which gives you desired results.
This will significantly enhance the fetching time.

SQL Query Execution time , SQL Server, Nested Query

I have a query as following:
SELECT Brand,Id,Model FROM product
Which takes time in order of seconds as Product table has more than 1 million records.
But the query executes within no time. (less than even one second))
select count(*) as numberOfRows from (SELECT Brand,Id,Model FROM product) result
Why is that?
When you execute a Query, the Time taken will vary depending on the Number of Columns and Rows and their datatypes.
In a table where you have 10 Columns, The Performance will be different if you select all columns (*) for all records and Just 1 or Two Columns for All records.
Because The Amount of data loaded is less in the Second case, it will execute faster.
Just like that, When you say Count(*) the result is Just a single Cell whereas in your first Select, you are selecting Millions of rows for those 3 columns, So the amount of Data is high.
That's why you are getting the Count(*) result faster. You don't need to give * inside the count, Instead Just use Count(1) for even more better performance.

I am using select top 5 from(select * from tbl2) tbl, will it get all records from tbl2 or will it get only specific records i.e 5

I am using select top 5 from(select * from tbl2) tbl, will it get all records from tbl2 and then it will it get only specific records i.e 5? or do it only get 5 records from internal memory. suppose we have 1000 records in tbl2.
For future reference, SQL actually has a huge amount of amazing documentation/resources online. For instance: This google search.
As to your question, it will pull the top five results matching your criteria, so it depends. It'll go through, and find the first five results matching your criteria. If those are the last results it goes through, it'll still have to do the comparisons and filtering on all rows, the only difference will be that it will have to send less rows to your computer.
For example, let's say we have a table my_table with two columns: Customer_ID (which is unique) and First_Purchase_Date (which is a date value between 2015-01-01 and 2017-07-26). If we simply do SELECT TOP 5 * FROM my_table then it will go through and pull the first five rows it finds, without looking at the rest of the rows. On the other hand, if we do SELECT TOP 5 * FROM my_table WHERE First_Purchase_Date = '2017-05-17' then it will have to go through all the rows until it can find five rows with a First_Purchase_Date of 2017-05-17. If First_Purchase_Date is indexed, this should not be very expensive, as it'll more or less know where to look. If it's not, then it depends on your how SQL has decided to structure your table, and if it has created any useful statistics. Worst case, it could not have any statistics and the desired rows could be the last five in the database, in which case it will have to complete the comparison on all the rows in the database.
By the way, this is a somewhat poor idea, as the columns returned will not necessarily stay consistent over time. It may be a good idea to throw in an ORDER BY clause, to ensure you get the same records every time.
The SELECT TOP clause is used to specify the number of records to return.
It limits the rows returned in a query result set to a specified number of rows or percentage of rows. When TOP is used in conjunction with the ORDER BY clause, the result set is limited to the first N number of ordered rows; otherwise, it returns the first N number of rows in an undefined order.
SELECT TOP number|percent column_name(s)
FROM table_name
WHERE condition;

Maximum number of rows a select statement can fetch

Is there any practical limit of number of rows a select statement can fetch from a database- any database?
Assume, I am running a query SELECT * FROM TableName and that table has more than 12,00,000 rows.
How many rows it can fetch? Is there a limit for this?
12000000 is not at all a big number I have worked with way bigger result sets. As long as your memory can fit the result you should have no problems.