Consider I have three tables: order, sub_order and product.
The cost of an sub_order is built upon a complex formula which then depends on the individual costs of the products. The cost of the order is simply the sum of the sub_order costs, although this formula might change in the future.
We store the derived field order.order_cost for convinience.
The questions I have are:
Should business rules be applied to the database layer? If so, is there a way to force the constraint for order_cost using SQL? That is, order_cost is always the sum of sub_order_cost
If you want the order cost to always be the sum of the costs from sub_order, then calculate the value on the fly. There is no need to store the value, unless you have a performance reason. Your question doesn't mention performance as a reason to duplicate the data.
If you want the order cost to be initialized to the sum, then you can use a trigger. That would be strange, though, because I would expect the order to be created before the sub_orders.
If you want to maintain the order amount as sub_orders are created, deleted, and updated, then you need triggers. Or, just calculate the values on the fly.
Related
I am creating a Data Warehouse and have hit a interesting problem...
I have DimQualification and DimUnit tables. A unit is a part of a qualification.
However some units are optional. In stating all available units in the DimUnit table i am puzzled by how best to show the customers choice.
FactAttendance - The attendance on the qualification
Would it be best to put multiple rows in the fact table (qualification and units taken) or is there another option?
The other option, besides putting multiple rows in the fact table, is to have a single row for each fact in the fact table, and a separate column for each unit. The column would be a count of the number of that unit associated with that fact. Something like this:
FactID Unit1Count Unit2Count Unit3Count ...
I have looked at a few things now and have decided that there is a way to achieve this without the reduction in speed which multiple rows in the fact table would create.
Instead of having the multiple rows for each unit I am going to create another fact table which holds all the units chosen then from the FactAttendance table we can immediately and efficiently identify the units chosen.
In my database, I have a table that has to get info from two adjacent rows from another table.
Allow me to demonstrate. There's a bill that calculates the difference between two adjacent meter values and calculates the cost accordingly (i.e., I have a water meter and if I want to calculate the amount I should pay in December, I take the value I measured in November and subtract it from the December one).
My question is, how to implement the references the best way? I was thinking about:
Making each meter value an entity on its own. The bill will then have two foreign keys, one for each meter value. That way I can include other useful data, like measurement date and so on. However, implementing and validating adjacency becomes icky.
Making a pair of meter values an entity (or a meter value and a diff). The bill will reference that pair. However, that leads to data duplication.
Is there a better way? Thank you very much.
First, there is no such thing as "adjacent" rows in a relational database. Tables represent unordered sets. If you have a concept of ordering it needs to be implementing using data in the rows. Let me assume that you have some sort of "id" or "creation date" that specifies the ordering.
Because you don't specify the database, I'll assume you have a functional database that supports the ANSI standard window functions. In that case, you can get what you want using the LAG() function. The syntax to get the previous meter reading is something like:
select lag(value) over (partition by meterid order by readdatetime)
There is no need to have data duplication or some arcane data data structure. LAG() should also be able to take advantage of appropriate indexes.
How can I identify the indexes that are worth to set on a sql table?
Take the following as an example:
select *
from products
where name = 'car'
and type = 'vehicle'
and availability > 3
and insertion_date > '2015-10-10'
order by price asc
limit 1
Imagine a database with a few million entries.
Would there be benefits if I set an index on the combination of all attributes that occur in the WHERE and ORDER BY clause?
For the example:
create index i_my_idx on products
(name, type, availability, insertion_date, price)
There are a few rules of thumb that can be useful when deciding which columns to index:
Make sure there's a unique index on the primary key - this is done automatically when you specify a PK in most RDBMSs including postgresql.
Add indexes for each foreign key. These are created automatically in some RDBMSs when you specify a FK but not in postgresql.
If a PK is a compound key, consider adding indexes on each FK making up the PK (except for the first, which is covered by the PK index). As in 2, some RDBMSs (e.g. MySQL with ISAM) add these indexes automatically when the FKs are specified.
Usually, but not always, table joins in queries will be PF to FK and by having indexes on both keys, the query optimizer of the RDBMS has flexibility in determining the optimum plan for maximum performance. This won't always be the best though, and experienced programmers will often format the SQL for a database query to influence the execution plan for best performance, or decide to omit indexes they know are not needed. It's worth noting that an SQL query that is optimal on one RDBMS is not necessarily optimal on another, or on future versions of the DB server, or as the database grows. The latter is important as in some RDBMSs such as postgres and Oracle, the query execution plans are dependent on the data in the tables (this is known as cost-based optimisation).
Once you've got these out of the way a lot comes down to experience and a knowledge of your data, and importantly, how the data is going to be accessed.
Generally you will be looking to index those columns which are best at filtering the data. In your query above, the obvious one is name. This might be enough to make that query run fast enough (unless all your products are cars).
Other than that it's worth making a list of the common ways the data is likely to be accessed e.g.
Get a list of products that are in a category - an index on category will probably help
However, get a list of products that are currently available - an index on availability will probably not help because a large proportion of products are likely to satisfy this condition.
Unless you are dealing with large amounts of data this can often be all you need to do, and it's not generally a good idea to add indexes "just in case" as there are overheads in maintaining them. But if your system does has performance issues, then it's worth considering how combinations of columns are being used in queries, reading up about the postgres query optimizer etc.
And to answer your last question - possibly, but it's far from the first thing to consider.
Well the way you are setting indexes is absolutely correct. Indexes has nothing to do with order by clause.
Some important points while designing SQL query
Always put the condition first in WHERE clause which will filter maximum rows for eg above query name ='car' will filter maximum records in products.
Do not use ">=" use ">" only because greater or equal to will always end up in checking greater first if failed equals as well which will reduce performance of query.
Create a single index in same order your where clause is arranged in.
Try minimizing IN clause use ANY instead.
Thanks
Anant
I wonder when I want to display result of calculating some fields in the same table should I do it as computed field or by using "before insert or update" trigger ?.
Note: I found similar question but it was for SQL Server and I need to know when I display the result in a grid with many records visible, if the computed field will affect performance in this case.
Example of the calculation I use now in a computed field:
field_1 * (
iif(field_2 is null,0,1)
+iif(field_3 is null,0,1)
+iif(field_4 is null,0,1)
+iif(field_5 is null,0,1))
A trigger only works if you're storing the information in the table, because they only get fired when an actual INSERT, UPDATE, or DELETE happens. They have no effect on SELECT statements. Therefore, the actual question becomes "Should I calculate column values in my SELECT statement, or add a column to store them?".
There's no need to store a value that can be easily calculated in the SELECT, and there's seldom a performance impact when doing a simple calculation like the one you've included here.
Whether you should store it depends on many factors, such as how frequently the data changes, and how large the result set will be for your typical query. The more rows you return, the greater the impact of the calculations, and at some point the process of calculating becomes more costly than the increased storage requirements adding a column incurs. However, if you can limit the number of rows returned by your query, the cost of calculations can be so negligible that the overhead of maintaining an extra column of data for every row when it's not needed can be higher, as every row that is inserted or updated will have the trigger execute even when that data isn't being accessed.
However, if your typical query returns a very large number of rows or the calculation is extremely complex, the calculation may become so expensive that it's better to store the data in an actual column where it can be quickly and easily retrieved. If data is frequently inserted or updated, though, the execution of the trigger slows those operations, and if they happen much more frequently than the large SELECT queries then it may not be worth the tradeoff.
There's at least one disadvantage (which I failed to mention, but you asked in a comment below) to actually storing the calculation results in a column.If your calculation (formula) logic changes, you have to:
Disable the trigger
Update all of the rows with a new value based on the new calculation
Edit the trigger to use the new calculation
Re enable the trigger
With the calculation being done in your query itself, you simply change the query's SQL and you're done.
So my answer here is
It's generally better to calculate the column values on the fly unless you have a clear reason not to do so, and
"A clear reason to do so" means that you have an actual performance impact you can prove is related to the calculation, or you have a need to SELECT very large numbers of rows with a fairly intense calculation.
Performance should be fine, except with larger tables when your computed field becomes part of a WHERE clause. The other thing is, even if computed by other fields, if your requirements allow to overwrite the calculated value for some reason. Then you need a real physical field as well.
i have a database program for store as you know there is too type of invoice in it one for the thing i bought and the other for me when i sold them the two table is almost identical like
invoice table
Id
customerName
date
invoiceType
and invoiceDetails which have
id
invoiceId
item
price
amount
my question is simple its what best to keep the design like that or split every table for two sperate tabels
couple of my friend suggest splitting the tables as one for saleInvoice and the other for buyInvoice to speed the time for querying
so whats the pro and con of every abrouch i feel that if i split them its like i dont follow DRY rule
i am using Nhibernate BTW so its kindda weird to have to identical class with different names
Both approached would work. If you use the single table approach, then the invoiceType column would be your discriminator field. In your nHibernate mapping, this discriminator field would be used by nHibernate to decide which type (i.e. a purchase or a sale) to instantiate for a given row in the table (see section 5.1.6 of the nHibernate mapping guide. For ad hoc SQL queries or reporting queries, you could create two views, one to return only rows with invoiceType = purchase and one to return only rows where invoiceType=sales.
Alternatively, you could create two separate tables, one for purchase and one for sales. As you point out, these two tables would have nearly identical schemas and nhibernate mapping files.
If you are anticipating very high transaction volumes, you would want to put purchases and sales on two different physical discs. With two different tables, this can be accomplished by putting them into different file groups. With a single table, you still could accomplish this by creating a SQL Server Partitioned Table. Before you go to this trouble, you might want to evaluate if this really is necessary and that disc access to the table is really going to be the performance bottleneck. You don't want to spend a lot of time doing premature optimization if it is not necessary.
My preference would be to have a single table with a discriminator column, to better follow DRY principles. Unless I had solid numbers that indicated that indicated it was necessary, I would hold off implementing a partitioned table until if and when it became necessary.
I'd ask myself, how do I intend to use this information? Will I need sales and buy invoices in the same queries? Am I likely to need specialized information eventually (highly likely in my experience) for each type? And if I do will I need to have child tables for only 1 type? How would that affect referntial integrity? Would a change to one automatically mean I needed a change to the other? How large is the table likely to be (It would have to be in the multi-millions before I would consider that it might need to be split out only due to size). How likely is it that I would get the information mixed up by accident if they are in the same table and include both when I didn't want to? The answers would determine whether I needed to split it out for me. I would tend to see these as two separate functions and it would take alot to convince me to put them in one table.