Computed columns store aggr count - sql

I want the computed column to store count totals from another table, how would I do it? (would the following work)
create table sample
(
column1 AS (SELECT COUNT(*) FROM table2) PERSISTED
)

For SQL Server you could potentially do this with an Indexed View.
Those present a host of other restrictions, though, so be sure the value is enough to justify the increased effort in maintenance.
One of the handier aspects of indexed views is that you don't need to query them directly to get the benefits - if the optimizer detects you querying an aggregate that is indexed it'll make use of it "behind the scenes".

Per MSDN:
A computed column is computed from an expression that can use other columns in the same table. The expression can be a noncomputed column name, constant, function, and any combination of these connected by one or more operators. The expression cannot be a subquery.

Related

requirements for an Indexed View

I am currently working towards my certification in SQL Server 70-461. I'm working through some practice tests at the moment and have come across a question on requirements for an indexed view. I understand that indexed views must have SCHEMABINDING and COUNT_BIG(*) if a GROUP BY clause is used and that the index must be clustered and that this will then materialise the data.
CREATE VIEW VW_Test
AS
SELECT ColumnA, ColumnB FROM Table
WHERE ColumnC > 200
In the sample question, the index is to be created on ColumnA. ColumnB and ColumnC are both computed columns.
The question is, what are the requirements for ColumnB and ColumnC?
Determinstic
Precise
Marked PERSISTED
Unfortunately, in my training material I have not come across these terms in this context so if you can give me some guidance on what they mean then I will be able to figure it out from there.
Deterministic: Refers to functions referenced by computed columns. Deterministic functions always return the same value when given the same input. For example, SUM is deterministic, but GETDATE is not.
Precise: A deterministic expression that does not contain float expressions.
Marked PERSISTED: When building a computed column, there is an option to mark it as 'PERSISTED' so that the computed column is physically stored to the database, rather than re-calculated on-the-fly when referenced.
As to the question itself about requirements for columns B and C, it would seem that that the following applies:
"Only precise deterministic expressions can participate in key columns and in WHERE or GROUP BY clauses of indexed views."
Create Indexed Views

Creating indexed view with using AVG in SQL [duplicate]

I have been trying out a few index views and am impressed but I nearly always need a max or a min as well and can not understand why it doesn't work with these, can anyone explain why?
I KNOW they are not allowed, I just can't understand why!!! Count etc. is allowed why not MIN/MAX, I'm looking for explanation...
These aggregates are not allowed because they cannot be recomputed solely based on the changed values.
Some aggregates, like COUNT_BIG() or SUM(), can be recomputed just by looking at the data that changed. These are allowed within an indexed view because, if an underlying value changes, the impact of that change can be directly calculated.
Other aggregates, like MIN() and MAX(), cannot be recomputed just by looking at the data that is being changed. If you delete the value that is currently the max or min, then the new max or min has to be searched for and found in the entire table.
The same principle applies to other aggregates, like AVG() or the standard variation aggregates. SQL cannot recompute them just from the values changed, but needs to re-scan the entire table to get the new value.
Aggregate functions like MIN/MAX aren't supported in indexed views. You have to do the MIN/MAX in the query surrounding the view.
There's a full definition on what is and isn't allowed within an indexed view here (SQL 2005).
Quote:
The AVG, MAX, MIN, STDEV, STDEVP, VAR,
or VARP aggregate functions. If
AVG(expression) is specified in
queries referencing the indexed view,
the optimizer can frequently calculate
the needed result if the view select
list contains SUM(expression) and
COUNT_BIG(expression). For example, an
indexed view SELECT list cannot
contain the expression AVG(column1).
If the view SELECT list contains the
expressions SUM(column1) and
COUNT_BIG(column1), SQL Server can
calculate the average for a query that
references the view and specifies
AVG(column1).
if you just want to see things ordered without adding a sort by when you use a view I just add a column with and order by in it.
id = row_number() over (order by col1, col2)
Besides the reasons specified by Remus, there is less practical need to support MIN and MAX.
Unlike COUNT() or SUM(), MAX and MIN are fast to calculate - you are all set after just one lookup - you don't need to read a lot of data.

PERSISTED Computed Column vs. Regular Column

I have a high performance DB (on SQL Server 2012). One of my views had a virtual column represented by an inline scalar valued UDF computing custom hash from the previous column value. I moved this calculation from the view into one of the underlying tables to improve the performance. I added a new computed column to the table and persisted the data. Then, created index on this column and referenced it back in the corresponding view (basically migrating the logic from the view into the table).
Now I am wondering why wouldn't I just add a regular VARCHAR(32) column to the table instead of the computed PERSISTED column? I could create a DEFAULT on this column with the above mentioned UDF for all new inserts and recalculate the historical records.
Is there any advantage of the indexed computed column with PERSISTED data vs. regular NC indexed column?
Thx.
The computed column will keep your field up to date if the field data it is based on is changed. Adding just a default will update the field on insert if no value is provided for the field.
If you know that your data is not going to change (which i think you are implying but did not specify in your question), then they are functionally the same for you. The computed column would probably be preferred though to prevent accidental update of the field with an incorrect value (bypassing the default). Also it is clear to any other developers what the field is to be used for.
You could switch to a "normal" column with a default value or insert trigger. The one potential issue is that unlike a computed column anyone with insert/update access could (potentially accidentally) change the value of the column.
Performance is going to be the same either way. In essence that is what the db is doing behind the scenes with a persisted computed column. As a developer a column that is persisted computed is clearer in the intent than a default value. Default value implies it is one of many possible values not the only possible value.
Be sure to declare the UDF With SchemaBinding. This will allow SQL Server to determine if the function is deterministic and flag it as so. That can improve query plan optimization in some cases.
There is no performance difference. However, in terms of database design it is more elegant when you have your pre-calculated column in the persisted view.

Computed columns or store

I need a couple of computed columns that contain count totals (indexed columns). Do you think it is better to use a computed column in a view or add extra columns to the table that will store the totals? Adding extra columns would probably mean using triggers to keep the count totals correct.
DB is MS SQL 2008 R2.
You can use a indexed view to get the performance of stored columns at no maintenance effort.
It depends.
If the tables change a lot but you rarely need the counts, a view is better. The question "view vs. computed columns" is one of DB design. If you can't change the original table or the DBMS doesn't support computed columns, use a view. If you can change the table definition, computed columns can be better but they also clutter the definition and make select * slower if you don't always need this data.
If the table rarely changes but you need those numbers a lot, use extra columns with triggers to avoid performance problems.

When are computed columns appropriate?

I'm considering designing a table with a computed column in Microsoft SQL Server 2008. It would be a simple calculation like (ISNULL(colA,(0)) + ISNULL(colB,(0))) - like a total. Our application uses Entity Framework 4.
I'm not completely familiar with computed columns so I'm curious what others have to say about when they are appropriate to be used as opposed to other mechanisms which achieve the same result, such as views, or a computed Entity column.
Are there any reasons why I wouldn't want to use a computed column in a table?
If I do use a computed column, should it be persisted or not? I've read about different performance results using persisted, not persisted, with indexed and non indexed computed columns here. Given that my computation seems simple, I'm inclined to say that it shouldn't be persisted.
In my experience, they're most useful/appropriate when they can be used in other places like an index or a check constraint, which sometimes requires that the column be persisted (physically stored in the table). For further details, see Computed Columns and Creating Indexes on Computed Columns.
If your computed column is not persisted, it will be calculated every time you access it in e.g. a SELECT. If the data it's based on changes frequently, that might be okay.
If the data doesn't change frequently, e.g. if you have a computed column to turn your numeric OrderID INT into a human-readable ORD-0001234 or something like that, then definitely make your computed column persisted - in that case, the value will be computed and physically stored on disk, and any subsequent access to it is like reading any other column on your table - no re-computation over and over again.
We've also come to use (and highly appreciate!) computed columns to extract certain pieces of information from XML columns and surfacing them on the table as separate (persisted) columns. That makes querying against those items just much more efficient than constantly having to poke into the XML with XQuery to retrieve the information. For this use case, I think persisted computed columns are a great way to speed up your queries!
Let's say you have a computed column called ProspectRanking that is the result of the evaluation of the values in several columns: ReadingLevel, AnnualIncome, Gender, OwnsBoat, HasPurchasedPremiumGasolineRecently.
Let's also say that many decentralized departments in your large mega-corporation use this data, and they all have their own programmers on staff, but you want the ProspectRanking algorithms to be managed centrally by IT at corporate headquarters, who maintain close communication with the VP of Marketing. Let's also say that the algorithm is frequently tweaked to reflect some changing conditions, like the interest rate or the rate of inflation.
You'd want the computation to be part of the back-end database engine and not in the client consumers of the data, if managing the front-end clients would be like herding cats.
If you can avoid herding cats, do so.
Make Sure You Are Querying Only Columns You Need
I have found using computed columns to be very useful, even if not persisted, especially in an MVVM model where you are only getting the columns you need for that specific view. So long as you are not putting logic that is less performant in the computed-column-code you should be fine. The bottom line is for those computed (not persisted columns) are going to have to be looked for anyways if you are using that data.
When it Comes to Performance
For performance you narrow your query to the rows and the computed columns. If you were putting an index on the computed column (if that is allowed Checked and it is not allowed) I would be cautious because the execution engine might decide to use that index and hurt performance by computing those columns. Most of the time you are just getting a name or description from a join table so I think this is fine.
Don't Brute Force It
The only time it wouldn't make sense to use a lot of computed columns is if you are using a single view-model class that captures all the data in all columns including those computed. In this case, your performance is going to degrade based on the number of computed columns and number of rows in your database that you are selecting from.
Computed Columns for ORM Works Great.
An object relational mapper such as EntityFramework allow you to query a subset of the columns in your query. This works especially well using LINQ to EntityFramework. By using the computed columns you don't have to clutter your ORM class with mapped views for each of the model types.
var data = from e in db.Employees
select new NarrowEmployeeView { Id, Name };
Only the Id and Name are queried.
var data = from e in db.Employees
select new WiderEmployeeView { Id, Name, DepartmentName };
Assuming the DepartmentName is a computed column you then get your computed executed for the latter query.
Peformance Profiler
If you use a peformance profiler and filter against sql queries you can see that in fact the computed columns are ignored when not in the select statement.
Computed columns can be appropriate if you plan to query by that information.
For instance, if you have a dataset that you are going to present in the UI. Having a computed column will allow you to page the view while still allowing sorting and filtering on the computed column. if that computed column is in code only, then it will be much more difficult to reasonably sort or filter the dataset for display based on that value.
Computed column is a business rule and it's more appropriate to implement it on the client and not in the storage. Database is for storing/retrieving data, not for business rule processing. The fact that it can do something doesn't mean you should do it that way. You too you are free to jump from tour Eiffel but it will be a bad decision :)