I have a SQL server database-view which has lots of inner-join operations. This view works perfectly when performing select-operations. It is not very fast, but within reason.
SELECT * FROM ViewName WHERE ItemId=1234
However when sorting the results of this view the performance drops to an unacceptable low.
SELECT * FROM ViewName WHERE ItemId=1234 ORDER BY CompanyName
This seems a bit strange because when I run the same query on a temporary table
SELECT * FROM ViewName INTO #temp WHERE ItemId=1234
SELECT * FROM #temp ORDER BY CompanyName
This is very fast.
Is there a way to make the sorting of my view-data faster, without using the temporary-table solution? So to force the query to first do the selection, and then the sorting.
There are a few variants you can try, that sometimes offer better performance. The key really is to look at the execution plan when you run the query with and without the ORDER BY, and see what is different.
One option is to use a subquery as a derived table:
SELECT *
FROM (
SELECT *
FROM ViewName
WHERE ItemId=1234
) AS dt
ORDER BY CompanyName
Another option is using common table expression, which I always prefer over derived tables if possible, because they are more readable:
WITH cte AS (
SELECT *
FROM ViewName
WHERE ItemId=1234
)
SELECT *
FROM cte
ORDER BY CompanyName
A third option is to use index hints, to force it to use the correct index. I always try to avoid this option though, because it can cause issues in the future if the data or structure change. You can read some more opinions about index hints here:
https://www.brentozar.com/archive/2013/10/index-hints-helpful-or-harmful/
Related
The following post had compelling reasons for generally avoiding the use of select * in SQL.
Why is SELECT * considered harmful?
In the discussion was examples of when it was or wasn't acceptable to use select * However I did not see discussion on common table expression (CTE). Are there any drawbacks for using select * in CTEs?
Example:
WITH CTE1 AS
(
SELECT Doc, TotalDue
FROM ARInvoices
WHERE CustomerName = 'ABC'
UNION
SELECT Doc, - TotalDue
FROM ARInvoiceMemos
WHERE CustomerName = 'ABC'
)
select * from CTE1
UNION
Select 'Total' as Doc, sum(TotalDue)
FROM CTE1
Since you already properly listed the column names in the cte, I don't see any harm in using select * from the cte.
In fact, it might be just the right place to use select *, since there is no point of listing the columns twice.
Unless you don't need to use all the columns returned by the cte. (i.e a column in the cte is used on the query, but not in the select clause) In that case, I would suggest listing only the columns you need even of the from is pointing to a cte.
Note that if the cte itself uses select * then all of the drawbacks listed in the post you linked to applies to it.
My main objection to select * is that it's usually used by lazy developers that doesn't consider the consequences of the *.
Note: Everything I've written here applies to derived tables as well.
In theory the rule of thumb that select * is ill advised always applies.
In practice though, if you are a developer who considers things like design and general good programming practice as important as functionality, your CTE will most likely be coded to only return the columns which are actually needed, so select * from CTE1 might not be so bad.
The following post had compelling reasons for generally avoiding the use of select * in SQL.
Why is SELECT * considered harmful?
In the discussion was examples of when it was or wasn't acceptable to use select * However I did not see discussion on common table expression (CTE). Are there any drawbacks for using select * in CTEs?
Example:
WITH CTE1 AS
(
SELECT Doc, TotalDue
FROM ARInvoices
WHERE CustomerName = 'ABC'
UNION
SELECT Doc, - TotalDue
FROM ARInvoiceMemos
WHERE CustomerName = 'ABC'
)
select * from CTE1
UNION
Select 'Total' as Doc, sum(TotalDue)
FROM CTE1
Since you already properly listed the column names in the cte, I don't see any harm in using select * from the cte.
In fact, it might be just the right place to use select *, since there is no point of listing the columns twice.
Unless you don't need to use all the columns returned by the cte. (i.e a column in the cte is used on the query, but not in the select clause) In that case, I would suggest listing only the columns you need even of the from is pointing to a cte.
Note that if the cte itself uses select * then all of the drawbacks listed in the post you linked to applies to it.
My main objection to select * is that it's usually used by lazy developers that doesn't consider the consequences of the *.
Note: Everything I've written here applies to derived tables as well.
In theory the rule of thumb that select * is ill advised always applies.
In practice though, if you are a developer who considers things like design and general good programming practice as important as functionality, your CTE will most likely be coded to only return the columns which are actually needed, so select * from CTE1 might not be so bad.
I'm doing paging with SQL Server and I'd like to avoid duplication by counting the total number of results as part of my partial resultset, rather than getting that resultset and then doing a separate query to get the count afterwards. However, the trouble is, it seems to be increasing execution time. For example, if I check with SET STATISTICS TIME ON, this:
WITH PagedResults AS (
SELECT
ROW_NUMBER() OVER (ORDER BY AggregateId ASC) AS RowNumber,
COUNT(PK_MatrixItemId) OVER() AS TotalRowCount,
*
FROM [MyTable] myTbl WITH(NOLOCK)
)
SELECT * FROM PagedResults
WHERE RowNumber BETWEEN 3 AND 4810
... or this (whose execution plan is identical):
SELECT * FROM (
SELECT TOP (4813)
ROW_NUMBER() OVER (ORDER BY AggregateId ASC) AS RowNumber,
COUNT(PK_MatrixItemId) OVER() AS TotalRowCount,
*
FROM [MyTable] myTbl WITH(NOLOCK)
) PagedResults
WHERE PagedResults.RowNumber BETWEEN 3 AND 4810
... seems to be averaging a CPU time (all queries added up) of 1.5 to 2 times as much as this:
SELECT * FROM (
SELECT TOP (4813)
ROW_NUMBER() OVER (ORDER BY AggregateId ASC) AS RowNumber,
*
FROM [MyTable] myTbl WITH(NOLOCK)
) PagedResults
WHERE PagedResults.RowNumber BETWEEN 3 AND 4810
SELECT COUNT(*) FROM [MyTable] myTbl WITH(NOLOCK)
Obviously I'd rather use the former than the latter because the latter redundantly repeats the FROM clause (and would repeat any WHERE clauses if I had any), but its execution time is so much better I really have to use it. Is there a way I can get the former's execution time down at all?
CTE's are inlined into the query plan. They perform exactly the same as derived tables do.
Derived tables do not correspond to physical operations. They do not "materialize" the result set into a temp table. (I believe MySQL does this, but MySQL is about the most primitive mainstream RDBMS there is.)
Using OVER() does indeed manifest itself in the query plan as buffering to a temp table. It is not at all clear why that would be faster here than just re-reading the underlying table. Buffering is rather slow because writes are more CPU intensive than reads in SQL Server. We can just read twice from the original table. That's probably why the latter option is faster.
If you want to avoid repeating parts of a query, use a view or table-valued function. Granted, these are not great options for one-off queries. You can also generate SQL in the application layer and reuse strings. ORMs also make this a lot easier.
I have an simple question (?) about SQL. I have come across this problem a few times before and I have always solved it, but I'm looking for a more elegant solution and perhaps a faster solution.
The problem is that I would like to select all rows in a table except the one with the max value in a timestampvalue (in this case this is a summary row but it's not marked as this is any way, and it's not releveant to my result).
I could do something like this:
select * from [table] t
where loggedat < (select max(loggedat) from [table] and somecolumn='somevalue')
and somecolumn='somevalue'
But when working with large tables this seems kind of slow. Any suggestions?
If you don't want to change your DB structure, then your query (or one with a slight variation using <> instead of <) is the way to go.
You could add a column IsSummary bit to the table, and always mark the most recent row as true (and all others false). Then your query would change to:
Select * from [table] where IsSummary = 0 and somecolumn = 'somevalue'
This would sacrifice slower speed on inserts (since an insert would also trigger an update of the IsSummary value) in exchange for faster speed on the select query.
If only you don't mind one tiny (4 byte) extra column, then you might possibly go like this:
SELECT *
FROM (
SELECT *, ROW_NUMBER() OVER (ORDER BY loggedat DESC) AS rownum
FROM [table] t
WHERE somecolumn = 'somevalue'
/* and all the other filters you want */
) s
WHERE rownum > 1
In case you do mind the extra column, you'll just have to list the necessary columns explicitly in the outer SELECT.
It may not be the elegant SQL query you're looking for, but it would be trivial to do it in Java, PHP, etc, after fetching the results. To make it as simple as possible, use ORDER BY timestamp DESC and discard the first row.
I have a SQL query, that returns a set of rows:
SELECT id, name FROM users where group = 2
I need to also include a column that has an incrementing integer value, so the first row needs to have a 1 in the counter column, the second a 2, the third a 3 etc
The query shown here is just a simplified example, in reality the query could be arbitrarily complex, with several joins and nested queries.
I know this could be achieved using a temporary table with an autonumber field, but is there a way of doing it within the query itself ?
For starters, something along the lines of:
SELECT my_first_column, my_second_column,
ROW_NUMBER() OVER (ORDER BY my_order_column) AS Row_Counter
FROM my_table
However, it's important to note that the ROW_NUMBER() OVER (ORDER BY ...) construct only determines the values of Row_Counter, it doesn't guarantee the ordering of the results.
Unless the SELECT itself has an explicit ORDER BY clause, the results could be returned in any order, dependent on how SQL Server decides to optimise the query. (See this article for more info.)
The only way to guarantee that the results will always be returned in Row_Counter order is to apply exactly the same ordering to both the SELECT and the ROW_NUMBER():
SELECT my_first_column, my_second_column,
ROW_NUMBER() OVER (ORDER BY my_order_column) AS Row_Counter
FROM my_table
ORDER BY my_order_column -- exact copy of the ordering used for Row_Counter
The above pattern will always return results in the correct order and works well for simple queries, but what about an "arbitrarily complex" query with perhaps dozens of expressions in the ORDER BY clause? In those situations I prefer something like this instead:
SELECT t.*
FROM
(
SELECT my_first_column, my_second_column,
ROW_NUMBER() OVER (ORDER BY ...) AS Row_Counter -- complex ordering
FROM my_table
) AS t
ORDER BY t.Row_Counter
Using a nested query means that there's no need to duplicate the complicated ORDER BY clause, which means less clutter and easier maintenance. The outer ORDER BY t.Row_Counter also makes the intent of the query much clearer to your fellow developers.
In SQL Server 2005 and up, you can use the ROW_NUMBER() function, which has options for the sort order and the groups over which the counts are done (and reset).
The simplest way is to use a variable row counter. However it would be two actual SQL commands. One to set the variable, and then the query as follows:
SET #n=0;
SELECT #n:=#n+1, a.* FROM tablename a
Your query can be as complex as you like with joins etc. I usually make this a stored procedure. You can have all kinds of fun with the variable, even use it to calculate against field values. The key is the :=
Heres a different approach.
If you have several tables of data that are not joinable, or you for some reason dont want to count all the rows at the same time but you still want them to be part off the same rowcount, you can create a table that does the job for you.
Example:
create table #test (
rowcounter int identity,
invoicenumber varchar(30)
)
insert into #test(invoicenumber) select [column] from [Table1]
insert into #test(invoicenumber) select [column] from [Table2]
insert into #test(invoicenumber) select [column] from [Table3]
select * from #test
drop table #test