Multiple Non-Clustered index and performance? - sql

I have a table in SQL Server that has 700 000 records. But, when I am making a simple select query with 3 to 4 conditions in where clause, it is taking up to 45 seconds. I already have 2 non-clustered and 1 clustered index on that. So I was thinking to add 2 more non-clustered index in that table. By doing so, My table will have indexes for all columns which I am using in where clause of my query. I have also done it and found that result is coming quite faster as compared to previous one.
Can having 5 to 6 Non-clustered index can harm database performance or it would not affect much?
My Query structure is
SELECT ( SOME COLUMNS) FROM MyTable
WHERE COL1 = #Id AND COL2 >= #SomeDate AND (NOT (COL3 = 1)) AND
(COL4 <= #SomeOtherDate)
Table has 35 columns.

This is your query:
SELECT ( SOME COLUMNS)
FROM MyTable
WHERE COL1 = #Id AND COL2 >= #SomeDate AND (NOT (COL3 = 1)) AND
(COL4 <= #SomeOtherDate)
Unfortunately, your query can only make direct use of two columns in this clause. I would suggest the following composite index: (col1, col2, col3, col4). This index covers the where clause, but can only be used directly for the first two conditions.
A clustered index would probably be a marginal improvement over a non-clustered b-tree index.
Note if col3 only takes on the values 0 and 1, then you should write the where case:
WHERE COL1 = #Id AND COL2 >= #SomeDate AND COL3 = 0 AND
(COL4 <= #SomeOtherDate)
And use either (col1, col3, col2, col4) or (col1, col3, col4, col2).

Related

PostgreSQL Update Statement Performance

I have a table with these columns:
id (int)
col1 (int)
col2 (varchar)
date1 (date)
col3 (int)
cumulative_col3 (int)
and about 750k rows.
I want to update the cumulative_col3 with the sum of col3 of same col1, col2 and previous to date of date1.
I have indexes on (date1), (date1, col1, col2) and (col1, col2).
I have tried the following query but it takes a long time to complete.
update table_name
set cumulative_col3 = (select sum(s.col3)
from table_name s
where s.date1 <= table_name.date1
and s.col1 = table_name.col1
and s.col2 = table_name.col2);
What can I do to improve the performance of this query?
You can try to calculate the running sum in a derived table instead:
update table_name
set cumulative_col3 = t.cum_sum
from (
select id,
sum(s.col3) over (partition by col1, col2 order by date1) as cum_sum
from table_name
) s
where s.id = table_name.id;
This assumes that id is the primary key of the table.
You might try adding the following index to your table:
CREATE INDEX idx ON table_name (date1, col1, col2, col3);
This index, if used, should allow the correlated sum subquery to be evaluated faster.

Why does the optimizer choose a keylookup instead of 2 separate queries?

I have a table that has a primary key/clustered index on an ID column and a nonclustered index on a system date column. If I query all the columns from the table using the system date column (covering index wouldn't make sense here) the execution plan shows a key lookup because for each record it finds it has to go the the ID to get all of the column data.
The weird thing is, if I write 2 queries with a temp table it performs much faster. I can query the system date to get a table of ID's and then use that table to search the ID column. This makes sense because you're no longer doing the slow key lookup for each record.
Why doesn't the optimizer do this for us?
--slow version with key lookup
--id primary key/clustered index
--systemdate nonclustered index
select ID, col1, col2, col3, col4, col5, SystemDate
from MyTable
where SystemDate > '2019-01-01'
--faster version
--id primary key/clustered index
--systemdate nonclustered index
select ID, SystemDate
into #myTempTable
from MyTable
where SystemDate > '2019-01-01'
select t1.ID, t1.col1, t1.col2, t1.col3, t1.col4, t1.col5, t1.SystemDate
from MyTable t1
inner join #myTempTable t2
on t1.ID = t2.ID
Well, in second case you're actually doing a key lookup yourself, aren't you? ; )
Optimizer could perform slower due to outdated (or missing) statistics, fragmented index.
To tell you why it's actually slower, it's best if you'd paste your execution plans here. This would be way easier to explain what happens.
Query optimizer chooses key lookup because the query is not supported by covering index. It has to grab missing columns from table itself:
/*
--slow version with key lookup
--id primary key/clustered index
--systemdate nonclustered index
*/
select ID, col1, col2, col3, col4, col5, SystemDate
from MyTable
where SystemDate > '2019-01-01';
Adding a covering index should boost the performance:
CREATE INDEX my_idx ON MyTable(SystemDate) INCLUDE(col1, col2, col3, col4, col5);
db<>fiddle demo
For query without JOIN:
select ID, col1, col2, col3, col4, col5, SystemDate
from MyTable -- single table
where SystemDate > '2019-01-01';
There is JOIN in execution plan:
After introducing covering index there is no need for additional key lookup:

Performance for Avg & Max in SQL

I want to decrease the query execution time for the following query.
This query is taking around 1 min 20 secs for about 2k records.
Numbers of records in table: 1348474
Number of records processed through where query: 25000
Number of records returned: 2152
SELECT Col1, Col2,
ISNULL(AVG(Col3),0) AS AvgCol,
ISNULL(MAX(Col3),0) AS MaxCol,
COUNT(*) AS Col5
FROM TableName WITH(NOLOCK)
GROUP BY Col1, Col2
ORDER BY Col1, MaxCol DESC
I tried removing the AVG & MAX columns and it lowered to 1 sec.
Is there any optimized solution for the same?
I have no other indexing other than Primary key.
Update
Indexes added:
nonclustered located on PRIMARY - Col1
nonclustered located on PRIMARY - Col2
clustered, unique, primary key located on PRIMARY - Id
======
Thanks in advance..Happy coding !!!
For this query:
SELECT Col1, Col2,
COALESCE(AVG(Col3), 0) AS AvgCol,
COALESCE(MAX(Col3), 0) AS MaxCol,
COUNT(*) AS Col5
FROM TableName
GROUP BY Col1, Col2
ORDER BY Col1, MaxCol DESC;
I would start with an index on (Col1, Col2, Col3).
I'm not sure if this will help. It is possible that the issue is the time for ordering the results.

SQL Covering Columns Order

Does the order of covering columns matter in an index?
CREATE INDEX idx1 ON MyTable (Col1, Col2) INCLUDE (Col3, Col4)
That is the order of Col3 & Col4 in the above example.
No, included columns are not ordered, so the order that they appear does not matter

Get row where column2 is X and column1 is max of column1

I have a SQLite table like this:
Col1 Col2 Col3
1 ABC Bill
2 CDE Fred
3 FGH Jack
4 CDE June
I would like to find the row containing a Col2 value of CDE which has the max Col1 value i.e. in this case June. Or, put another way, the most recently added row with a col2 value of CDE, as Col1 is an auto increment column. What is an SQL query string to achieve this? I need this to be efficient as the query will run many iterations in a loop.
Thanks.
SELECT * FROM table WHERE col2='CDE' ORDER BY col1 DESC LIMIT 1
in case if col1 wasn't an increment it would go somewhat like
SELECT *,MAX(col1) AS max_col1 FROM table WHERE col2='CDE' GROUP BY col2 LIMIT 1
Try this:
SELECT t1.*
FROM table1 t1
INNER JOIN
(
SELECT MAX(col1) MAXID, col2
FROM table1
GROUP BY col2
) t2 ON t1.col1 = t2.maxID AND t1.col2 = t2.col2
WHERE t1.col2 = 'CDE';
SQL Fiddle Demo1
1: This demo is mysql, but it should work fine with the same syntax in sqlite.
Use a subquery such as:
SELECT Col1, Col2, Col3
FROM table
WHERE Col1 = (SELECT MAX(Col1) FROM table WHERE Col2='CDE')
Add indexes as appropriate, e.g. clustered index on Col1 and another nonclustered index on Col2 to speed up the subquery.
In SQLite 3.7.11 and later, the simplest query would be:
SELECT *, max(Col1) FROM MyTable WHERE Col2 = 'CDE'
As shown by EXPLAIN QUERY PLAN, both this and passingby's query are most efficient, if there is an index on Col2.
If you'd want to see the correspondig values for all Col2 values, use a query like this instead:
SELECT *, max(Col1) FROM MyTable GROUP BY Col2