Performance for Avg & Max in SQL

Performance for Avg & Max in SQL - sql

I want to decrease the query execution time for the following query.
This query is taking around 1 min 20 secs for about 2k records.
Numbers of records in table: 1348474
Number of records processed through where query: 25000
Number of records returned: 2152
SELECT Col1, Col2,
ISNULL(AVG(Col3),0) AS AvgCol,
ISNULL(MAX(Col3),0) AS MaxCol,
COUNT(*) AS Col5
FROM TableName WITH(NOLOCK)
GROUP BY Col1, Col2
ORDER BY Col1, MaxCol DESC
I tried removing the AVG & MAX columns and it lowered to 1 sec.
Is there any optimized solution for the same?
I have no other indexing other than Primary key.
Update
Indexes added:
nonclustered located on PRIMARY - Col1
nonclustered located on PRIMARY - Col2
clustered, unique, primary key located on PRIMARY - Id
======
Thanks in advance..Happy coding !!!

For this query:
SELECT Col1, Col2,
COALESCE(AVG(Col3), 0) AS AvgCol,
COALESCE(MAX(Col3), 0) AS MaxCol,
COUNT(*) AS Col5
FROM TableName
GROUP BY Col1, Col2
ORDER BY Col1, MaxCol DESC;
I would start with an index on (Col1, Col2, Col3).
I'm not sure if this will help. It is possible that the issue is the time for ordering the results.

Related

PostgreSQL Update Statement Performance

I have a table with these columns:
id (int)
col1 (int)
col2 (varchar)
date1 (date)
col3 (int)
cumulative_col3 (int)
and about 750k rows.
I want to update the cumulative_col3 with the sum of col3 of same col1, col2 and previous to date of date1.
I have indexes on (date1), (date1, col1, col2) and (col1, col2).
I have tried the following query but it takes a long time to complete.
update table_name
set cumulative_col3 = (select sum(s.col3)
from table_name s
where s.date1 <= table_name.date1
and s.col1 = table_name.col1
and s.col2 = table_name.col2);
What can I do to improve the performance of this query?

You can try to calculate the running sum in a derived table instead:
update table_name
set cumulative_col3 = t.cum_sum
from (
select id,
sum(s.col3) over (partition by col1, col2 order by date1) as cum_sum
from table_name
) s
where s.id = table_name.id;
This assumes that id is the primary key of the table.

You might try adding the following index to your table:
CREATE INDEX idx ON table_name (date1, col1, col2, col3);
This index, if used, should allow the correlated sum subquery to be evaluated faster.

How to change the order of WHERE and ORDER BY in Postgres?

I have a simple query that filters a 100M row table for fk_id and returns rows sorted by col1, col2.
The query is:
SELECT * FROM table
WHERE fk_id=$fk_id
ORDER BY col1 DESC, col2 DESC
LIMIT 10
Both col1 and col2 are indexed DESC TIMESTAMP columns. fk_id is also indexed.
ANALYZE shows that Postgres first orders the table by col1 and col2 and then filters out rows by fk_id.
A lookup query SELECT * FROM table WHERE fk_id=$fk_id is 1000x faster than the query above, and so I want to apply the WHERE filter first.

Why does the optimizer choose a keylookup instead of 2 separate queries?

I have a table that has a primary key/clustered index on an ID column and a nonclustered index on a system date column. If I query all the columns from the table using the system date column (covering index wouldn't make sense here) the execution plan shows a key lookup because for each record it finds it has to go the the ID to get all of the column data.
The weird thing is, if I write 2 queries with a temp table it performs much faster. I can query the system date to get a table of ID's and then use that table to search the ID column. This makes sense because you're no longer doing the slow key lookup for each record.
Why doesn't the optimizer do this for us?
--slow version with key lookup
--id primary key/clustered index
--systemdate nonclustered index
select ID, col1, col2, col3, col4, col5, SystemDate
from MyTable
where SystemDate > '2019-01-01'
--faster version
--id primary key/clustered index
--systemdate nonclustered index
select ID, SystemDate
into #myTempTable
from MyTable
where SystemDate > '2019-01-01'
select t1.ID, t1.col1, t1.col2, t1.col3, t1.col4, t1.col5, t1.SystemDate
from MyTable t1
inner join #myTempTable t2
on t1.ID = t2.ID

Well, in second case you're actually doing a key lookup yourself, aren't you? ; )
Optimizer could perform slower due to outdated (or missing) statistics, fragmented index.
To tell you why it's actually slower, it's best if you'd paste your execution plans here. This would be way easier to explain what happens.

Query optimizer chooses key lookup because the query is not supported by covering index. It has to grab missing columns from table itself:
/*
--slow version with key lookup
--id primary key/clustered index
--systemdate nonclustered index
*/
select ID, col1, col2, col3, col4, col5, SystemDate
from MyTable
where SystemDate > '2019-01-01';
Adding a covering index should boost the performance:
CREATE INDEX my_idx ON MyTable(SystemDate) INCLUDE(col1, col2, col3, col4, col5);
db<>fiddle demo
For query without JOIN:
select ID, col1, col2, col3, col4, col5, SystemDate
from MyTable -- single table
where SystemDate > '2019-01-01';
There is JOIN in execution plan:
After introducing covering index there is no need for additional key lookup:

Multiple Non-Clustered index and performance?

I have a table in SQL Server that has 700 000 records. But, when I am making a simple select query with 3 to 4 conditions in where clause, it is taking up to 45 seconds. I already have 2 non-clustered and 1 clustered index on that. So I was thinking to add 2 more non-clustered index in that table. By doing so, My table will have indexes for all columns which I am using in where clause of my query. I have also done it and found that result is coming quite faster as compared to previous one.
Can having 5 to 6 Non-clustered index can harm database performance or it would not affect much?
My Query structure is
SELECT ( SOME COLUMNS) FROM MyTable
WHERE COL1 = #Id AND COL2 >= #SomeDate AND (NOT (COL3 = 1)) AND
(COL4 <= #SomeOtherDate)
Table has 35 columns.

This is your query:
SELECT ( SOME COLUMNS)
FROM MyTable
WHERE COL1 = #Id AND COL2 >= #SomeDate AND (NOT (COL3 = 1)) AND
(COL4 <= #SomeOtherDate)
Unfortunately, your query can only make direct use of two columns in this clause. I would suggest the following composite index: (col1, col2, col3, col4). This index covers the where clause, but can only be used directly for the first two conditions.
A clustered index would probably be a marginal improvement over a non-clustered b-tree index.
Note if col3 only takes on the values 0 and 1, then you should write the where case:
WHERE COL1 = #Id AND COL2 >= #SomeDate AND COL3 = 0 AND
(COL4 <= #SomeOtherDate)
And use either (col1, col3, col2, col4) or (col1, col3, col4, col2).

SQL: Deleting duplicate records in SQL Server

I have an sql server database, that I pre-loaded with a ton of rows of data.
Unfortunately, there is no primary key in the database, and there is now duplicate information in the table. I'm not concerned about there not being a primary key, but i am concerned about there being duplicates in the database...
Any thoughts? (Forgive me for being an sql server newb)

Well, this is one reason why you should have a primary key on the table. What version of SQL Server? For SQL Server 2005 and above:
;WITH r AS
(
SELECT col1, col2, col3, -- whatever columns make a "unique" row
rn = ROW_NUMBER() OVER (PARTITION BY col1, col2, col3 ORDER BY col1)
FROM dbo.SomeTable
)
DELETE r WHERE rn > 1;
Then, so you don't have to do this again tomorrow, and the next day, and the day after that, declare a primary key on the table.

Let's say your table is unique by COL1 and COL2.
Here is a way to do it:
SELECT *
FROM (SELECT COL1, COL2, ROW_NUMBER() OVER (PARTITION BY COL1, COL2 ORDER BY COL1, COL2 ASC) AS ROWID
FROM TABLE_NAME )T
WHERE T.ROWID > 1
The ROWID > 1 will enable you to select only the duplicated rows.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Performance for Avg & Max in SQL - sql

Related

PostgreSQL Update Statement Performance

How to change the order of WHERE and ORDER BY in Postgres?

Why does the optimizer choose a keylookup instead of 2 separate queries?

Multiple Non-Clustered index and performance?

SQL: Deleting duplicate records in SQL Server

Categories

Resources