Are multiple nullable foreign keys a bad design? - sql

We have a table to store user financial transactions.
A transaction can be incremental or decremental, we have 3 types of transactions, increase with a payment, increase with receiving of a gift, and decrease with purchase of product, So our transaction table contains 3 foreign keys:
[PaymentId] [GiftId] [RequestId]
Is this a bad design? What better alternative is there?
I think it is complicated to join [Transaction] table with 3 other tables to get details of each transaction to display list of user transactions

It seems like you'd be better off with a column called something along the lines of TransactionType and then just storing the TransactionTypeId in a single column as well. Using these two columns you can represent any number of types of Transactions. You actually find this common in the financial data world with dimension tables that represent things like Cost Type, etc.

Related

Store 3-dimensional table in database where 1 dimension increases over time

I have a data set with three dimensions that I would like to store for use with a website:
A list of companies (about 1000)
Information about the company (about 15 things)
Time (monthly)
Essentially, I want to track this information over time and keep it up to date.
When I start, the data will be 1000x15x1, after a year it will be 1000x15x12, and after 10 years if will be 1000x15x120.
The main queries I would make are:
Get all information for one company over all times
Get all information for one particular time
What would be a good database configuration for doing this? I'm open to either SQL or noSQL solutions.
In case it matters, the website is on Google App Engine.
From the relational database schema design perspective:
If the goal is analytics / ad-hoc querying / OLAP in general only, then you can use star-schema which is well suited for these type of analytics. But beware, OLAP databases are de-normalized and not suitable for operational transaction storage / OLTP in general, if you are planning to do both on this database.
The beauty of the Star schema:
The fact tables are usually all numeric, making the tables very small even though there are too many records. Small table means it is very fast to read (I/O).
All joins from the fact table to dimension tables are based on foreign keys (single column, numeric, indexable foreign keys)
All dimension tables have surrogate key, which is a single column primary key. Single column primary key is easier to JOIN than a multi-column primary key and also easier to index.
There is no NULL in foreign keys in fact tables. This makes JOIN operations straightforward, i.e. always JOIN fact table to all of its dimension tables. If you need NULL case, you need to add that as a special case in your dimension table. For example: if a company is not listed on stock market, and one of the thing you track is stock price, then you enter 0 or NULL into the fact for the stock price table depending on (how you want to do SUM(), AVG() etc later) and then add a special case into your StockSymbols dimension table called 'Private company' and add the foreign key of this special case into the fact table as your foreign key.
Almost all filtering is done through the dimension tables that are much much smaller than the fact tables. This requires having a Date dimension to be able to do date-based queries.
If you can stay in pure Star schema, then all yours JOINs are single hop (i.e. no join between two tables through another table).
All these makes JOIN operations very fast, simple and straightforward. That's why the Star schema is at the heart of data-warehousing designs.
https://en.wikipedia.org/wiki/Star_schema
https://en.wikipedia.org/wiki/Data_warehouse
One level up from this is OLAP (SSAS SQL Server Analyses Services for example) which does pre-processing of the data to make it fast to query but it involves more learning than pure start-schema and it's an overkill in your case
For your example
In Star schema,
Companies will be a dimension table
You will need Month dimension table. It's simplified version of Date dimension, just for month info. An example of Date dimension is here.
https://www.codeproject.com/Articles/647950/Create-and-Populate-Date-Dimension-for-Data-Wareho
The information about the company (15 things you say) will be fact tables. The facts must be numeric (b/c ideally all non-numeric values is saved in dimension tables). This means taking the non-numeric part of a fact to a dimension table. For example: if you are keeping revenue and would like to keep the currency type too, then the you will need a Currency dimension and save only the amount in the fact table and a foreign key to the Currency dimension table.
If you have any non-numeric facts, you need to store the distinct list in a dimension table and add foreign key to that dimension table inside your Fact table (this is called factless fact table). The only exception to that is if the cardinality of the dimension and the fact table is very similar, then you can just store the non-numeric fact value inside the fact table directly as there is no benefit in having a dimension table (in fact a disadvantage).
Also the facts can be grouped by their granularity. For example you could have company_monthly_summary fact table and keep more than one fact in that table (which are all joining to Company dimension and Month dimension). This is all up-to-you how you would like to group facts table. But if their granularity are not the same, they should not be grouped as that will cause sparse fact tables and harder to query.
You will use foreign keys in Fact tables to join to your Dimension tables
Add index for your Dimension tables' most used columns
Add a numeric surrogate key to your dimension. It is usually an auto-increment number but that's up-to you. One exception people prefers for the surrogate key of Date dimension is using the format YYYYMMDD (as integer). This makes is easier on WHERE clause: i.e instead of filtering for the Date column (a DATETIME value), which will do search to find the surrogate keys, you just provide the surrogate keys directly b/c you know the format. Depending on your business domain, you may have other similar useful surrogate key patterns that you may want to consider and use. But just know, in case of a business domain change, you will have have to update all fact records. Simple auto-increment surrogate key does not have that problem. In your case, the surrogate key for the month can be actual month number (1 for Jan)
That being said, 1 million rows in 5 years is easy to query even without a Star-schema design (with proper indexing, database maintenance). But if this is part of a larger analytics system, then go with Star schema
The simplest way.
Create a table, companyname + info you needto store + column for year-month.
Ex:
CREATE TABLE tablename (
id int(11) NOT NULL AUTO_INCREMENT,
companyname varchar(255) ,
info1 int(11) NOT NULL,
info2 datetime ,
info3 varchar(255) ,
info4 bool ,
yearmonth datetime,
PRIMARY KEY (id) );
#queries
select * from tablename where companyname="nameofthecompany";
select * from tablename where yearmonth="year-month"; #can use between here

Data warehouse FactTable

I am trying to use this to learn how about data warehousing and having trouble understanding the concept of the fact table.
http://www.codeproject.com/Articles/652108/Create-First-Data-WareHouse
What would be some queries that I could run to find information from the faceable, and what questions do they answer.
A fact table is used in the dimensional model in data warehouse design. A fact table is found at the center of a star schema or snowflake schema surrounded by dimension tables.
A fact table consists of facts of a particular business process e.g., sales revenue by month by product. Facts are also known as measurements or metrics. A fact table record captures a measurement or a metric.
Example of fact table -
In the schema below, we have a fact table FACT_SALES that has a grain which gives us a number of units sold by date, by store and by product.
All other tables such as DIM_DATE, DIM_STORE and DIM_PRODUCT are dimensions tables. This schema is known as the star schema.
Let's translate this a bit.
Firstly, in a fact table we usually enter numeric values ( rarely Strings , char's ,or other data types).
The purpose of a fact table is to connect with the KEYS of dimensional tables ,other fact tables (fact tables more rarely, and also not a good practice) AND measurements ( and by measurements I mean numbers that change frequently like Prices , Quantities , etc.).
Let's take an example:
Think about a row from a fact table as an product from a supermarket when you pass it by the check out and it get's scanned. What will be displayed in the check out row in your database fact table? Possibly:
Product_ID | ProductName | CustomerID | CustomerName | InventoryID | StoreID | StaffID | Price | Quantity ... etc.
So all those Keys and measurements are bringed together in one fact table, having some big performance and understandability advantage.
Fact Table Contain all Primary key of Dimension table and Measure like "Sales Amount"
A Fact table is a table that stores your measurements of a business process. Here you would record numeric values that apply to an event like a sale in a store. It is surrounded by dimension tables which give the measurement context (which store? which product? which date?).
Using the dimensions you can ask lots of questions of your facts, like how many of a particular product have been sold each month in a region.
Some further info
All dimension keys in a fact should be a FK to the dimension
if a key is unknown it should point to a zero key in the dimension detaialing this.
All joins from a fact to a dim are 1 to 1. bridge tables are is a technique to cater for many to many's, but this is more advanced
All measurements in a fact are numeric, but can contain NULLS if unknown (never put 0 to represent unknown)
When joining facts to dimensions there is no need to do an outer join, due to FKS applied above.
if you have 999 rows in a fact, no matter what dimensions you join to, you should always have 999 rows returned.

Improvement on database schema

I'm creating a small pet shop database for a project
The database needs to have a list of products by supplier that can be grouped by pet type or product category.
Each in store sale and customer order can have multiple products per order and an employee attached to them the customer order must be have a customer and employee must have a position,
http://imgur.com/2Mi7EIU
Here are some random thoughts
I often separate addresses from the thing that has an address. You could make 1-many relationships between Employee, Customer and Supplier to an address table. That would allow you to have different types of addresses per entity, and to change addresses without touching the original table.
If it is possible for prices to change for an item, you would need to account for that somehow. Ideas there are create a pricing table, or to capture the price on the sales item table.
I don't like the way you handle the sales item table. the different foreign keys based on the type of the transaction is not quite correct. An alternative would be to replace SalesItem SaleID and OrderId with the SalesRecordId... another better option would be to just merge the fields from InStoreSale, SalesRecord, and CustomerOrders into a single table and slap an indicator on the table to indicate which type of transaction it was.
You would probably try to be consistent with plurality on your tables. For example, CustomerOrders vs. CustomerOrder.
Putting PositionPay on the EmployeePosition table seems off to... Employees in the same position typically can have different pay.
Is the PetType structured with enough complexity? Can't you have items that apply to more than one pet type? For example, a fishtank can be used for fish or lizards? If so, you will need a many-to-many join table there.
Hope this helps!

How Should I Design TAX Table?

I have a db with many tables. Some tables have unit price column, no tax, gst and such columns. What should i do now? Should i create GST table, HST table and PST table separately. In other words, what is the standard schema of designing the tax table?
You only need one table for tax percentages:
CDN_TAXES
tax_id (primary key)
tax_name (GST, HST, PST)
tax_percentage
Because HST covers both PST and GST, I'd model this as three columns in the PRODUCT/etc table and use a CHECK constraint to enforce that either the HST column be associated, or at least one of PST/GST is associated to the product.
The amount of tax per sale, should be stored in a SALES table. Tax tables in Canada are updated twice a year, and the percentage can change. Which means if you want to reprint a receipt that is accurate, you have to capture the tax amounts at the time of sale.
I think it depends. For storing the type of tax and a rate, sure, you can normalize it in a table - and even capture it changing over time with a start and end date (although the actual tax charged on a sale would always need to be stored).
However, typically you will need more tax configuration. In the US, for instance, we have some things which are never taxed (milk), tax free days when nothing (with some exceptions like cars) is taxed temporarily, and special days when certain things are not taxed (hurricane preparation items like batteries and generators).
So your product table would certainly not hold a tax amount, but may hold some tax flags. Other tables may link the product to a jurisdiction which has taxes. If you have a multi-store company, some jurisdictions may tax things differently, yet you may have a common product master file and pricing might be the same across stores, but taxes would vary.
I'd say depends on the amount of columns. If you're only talking about 3 columns then i'd add them into the table and leave them empty if not used.
If however you want to expand this further and further, create a single table to hold the value, description etc.
Then maybe have a lookup table called unitGST and unitHST etc. Each of these tables contains the unit id and the id to your single table holding the value.

MS Visio Category relationships to SQL Server

I'm using MS Visio to model a database and part of the model contains Transaction categories - the parent table has a transactionId, timestamp, amount, and transactionType. There are three child tables - cheque, bank transfer, and credit card, all related to the parent by transactionId.
Is there a specific way this kind of relationship is implemented in SQL Server, or is it just a conceptual model leaving the implementation up to me? If the latter, why have a transactionType column in the parent table if the tables are all related with transactionId - is it just to narrow my queries? That is, if a row in the parent table specifies "cheque" as the transactionType I know that I only have to query/join the cheque child table?
It just occurred to me - is this just an ISA hierarchy, in which case I'd create three distinct tables each containing the columns identified in the ISA parent entity?
This is essentially multiple-table inheritance, although you can model it in the domain as a simple reference relationship if you want.
There are many good reasons to have the selector field/property. The obvious one is so an application or service gets a hint as to how to load the details, so it doesn't have to load every conceivable row from every conceivable table (try this when you have 20 different types of transactions).
Another reason is that much of the time the end user doesn't necessarily need to know the details of a transaction, but does need to know the type. If you're looking at an A/R report from some financial or billing system, most of the time all you need to know for a basic report is the previous balance, amount, subsequent balance, and the transaction type. Without that information, it's very hard to read. The ledger doesn't necessarily show the details for every transaction, and some systems may not even track the details at all.
The most common alternative to this type of model is a single table with a whole bunch of nullable columns for each different transaction type. Although I personally despise this model, it's a requirement for many Object-Relational Mappers that only support single-table inheritance. That's the only other way you'd want (or not want) to model this in a database.
The transactionType in the parent table is useful if you'd like to query over all transactions, for example to sum the amounts per transaction type:
select transactionType, sum(amount)
from transactions
group by transactionType
Without the column, you could still do that by querying on the child tables:
select
case when c.transactionId is not null then 'CHEQUE'
when cc.transactionId is not null then 'CREDIT CARD'
...
end
, sum(amount)
from transactions t
left join cheque c on t.transactionId = c.transactionId
left join creditcard cc on t.transactionId = cc.transactionId
...
group by
case when c.transactionId is not null then 'CHEQUE'
when cc.transactionId is not null then 'CREDIT CARD'
...
end
As you can see, that's much harder, and requires extending the query for each type of transaction you add.