SQL Database design question - sql-server-2005

I have a task to design a SQL database that will record values for the company's commodity products (sku numbers). The company has 7600 unique product items that need to be tracked, and each product will have approximately 200 values over the period of a year (one value per product per day, over the period of a year).
My first guess is that the sku numbers go top to bottom (each sku has a row) and each date is a column.
The data will be used to view in chart / graph format and additional calculations will be displayed against those (such as percentage profit margin etc)
My question is:
- is this layout advisable?
- do I have to be cautious of anything, if this type of data goes back about 15 yrs (each table will represent a year)
Any suggestions?

It better to have 3 columns only - instead of many as you are suggesting:
sku date value
-------------------------
1 2011-01-01 10
1 2011-01-02 12
2 2011-01-01 5
This way you can easily add another column if you want to record something else about a given product per date.

I would suggest a table for your products, and a table for the historical values. Maybe create an index for the historical values based on date if you plan to select for specific time periods.
create table products (
id number primary key,
sku number,
name text,
desc text);
create table values (
id number primary key,
product_id number,
timestamp date,
value number,
foreign key fk_prod_price product_id on product.id);
create index idx_price on values.timestamp;
NOTE: not actual sql, you will have to write your own

If you do like #fiver wrote, you don't have to have a table for each year either. Everything in one table. And add indexes on sku/date for faster searching

Related

Different detail levels in one table

EDITED
I'm having a problem with a table, where I need to store measures on different detail levels. My default table is:
Id
TotalQuantity
Amount
1
75
1000
Where TotalQuantity is a sum of quantities from every month.
Now I need to add into my default table an information, what quantities I have each month. These monthly quantities should be in one column, so I used UNION.
The problem is that when I will sum up values from these columns in some reports, TotalQuantity AND OTHER VALUES THAT ARE THE SAME FOR BOTH ROWS will be displayed wrong. How can I possibly store all that information?
You need a fact table at the (id, month) grain, like FactMonthlyTotals(id, month, amount). If you have other data that is not for a particular month it would go on a separate fact table, or perhaps a dimension table.

How to create a relationship where all columns have many details

I am working in a small personal project about capital expenses. There is one part I can't figure it out.
The tables i have are the following:
capex_form
capex_cashflow
When I create a capex_form I am able to request money and divide this money however I want in 13 months including this month (to show how I I will pay it in the next year). this will reflect in capex_cashflow who has 13 columns with either an amount or 0.
The problem comes here:
I need to be able to add many descriptions for each payment. For example:
in July 2019 I will spend 200 ( this is done), I need to enter a breakdown of this 200 dollars and a description. 50 dollars on one thing and 150 on another thing.
I added 3 columns per month which works, But then it will only let me add one description per month.
I was thinking I might be able to create another table for description, but how this is going to related to a specific column(month). As far as my brain gives, you relate one table with another table not column.
I also was thinking to create 13 tables for 13 months, but I think there should be something I am missing to avoid to create 13 unnecessary tables.
I appreciate any kind of help or guidance
This is pretty straightforward and a common thing.
Put an index column in the "header" table. The header table is a summary of the information, so in your case may you create a table that just takes the capex_income.
CAPEX_FORM
Capex_id
Capex_Amount
Then create a payment table, the payment table can have a month column (only 1) and a capex_Id column, along with a description or whatever else you need
CAPEX_PAYMENT
Capex_payment_id
Capex_id
Payment_Amount
Month (1-13)
Description
Now because you have the Capex_id in this table, it will be related to the CAPEX table and you will be able to query all the payments that are associated like so
select payment_amount, month, description from capex_payment p join capex_form f on p.capex_id = f.capex_id

SSAS get active record count between two dates

Can you please let me know the best approach for designing Data ware house and dimension modelling (SSAS cube) based on below requirement.
Requirement here is, I have to get the student count which are active as of that month, if the user selects year (2015) from drop down which is displayed in the image. Catch here there is no option to select enrollstartdate and enrollenddate as two different dates (no role play dimension) , only one filter i.e Year.
Requirement to get the active student count as of that month
There are a couple of possible approaches that come to mind. The first is a periodic snapshot fact table and another is a timespan accumulating snapshot fact table.
In my opinion, the first is easier to implement, so I've provided some detail below that I hope you will find useful.
CREATE TABLE FactEnrollmentSnapshot
(
DateKey INT NOT NULL -- Reference to Date dimension table
, StudentKey INT NOT NULL -- Reference to Student dimension table
);
CREATE TABLE DimStudent
(
StudentKey INT NOT NULL
StudentId ?
...Other Student Attributes...
);
CREATE TABLE DimDate
(
DateKey INT NOT NULL
, FullDate DATETIME NOT NULL
, Year SMALLINT
);
Assuming your date dimension is at the day grain, you could either store daily snapshots, or just store snapshots on the 15th of each month.
Depending on whether you need to get a count of unique students during 2015 or the most recent count of students in 2015 you could use the DISTINCT COUNT aggregation or the LastChild aggregation in SSAS. If you use LastChild, make sure your Date dimension is marked as a Time type.
Note that a snapshot style fact table results in semi-additive facts.
You could get the raw data to populate the fact table from your example source data by using a CROSS JOIN between you source data and the Date dimension
SELECT
StudentTable.StudentID
, DimDate.FullDate
FROM
StudentTable
INNER JOIN DimDate ON (DimDate.FullDate BETWEEN StudentTable.EnrollDate AND ISNULL(StudentTable.DisenrollDate,'9999-12-31'));
I didn't include the lookups for surrogate keys for simplicity
You can then get the answer for your business users be filtering on the Year attribute in the Date dimension.
I hope this is useful in getting you started with a possible approach.
Regards,
Jesse Dyson

Granularity of data rows

We're developing an application with one function of managing payments to people. A payment will be written to a row in a table, with the following fields:
PersonId (INT)
TransactionDate (DATETIME)
Amount (MONEY)
PaymentTypeId (INT)
...
...
...
It looks like we deal with around 8000 people who we send payments to, and a new transaction per person is added daily (Around 8,000 inserts per day). This means that after 7 years (The time we need to store the data for), we will have over 20,000,000 rows.
We get around 10% more people per year, so this number rises a bit.
The most common query would be to get a SUM(Amount), per person, where Transaction Date between a start date and an end date.
SELECT PersonId, SUM(Amount)
FROM Table
WHERE PaymentTypeId = x
AND TransactionDate BETWEEN StartDate AND EndDate
GROUP BY PersonId
My question is, is this going to be a performance problem for SQL Server 2012? Or is 20,000,000 rows not too bad?
I'd have assumed a clustered index on PersonID? (To group them), but this would cause very slow insert/updates?
An index on the TransactionDate?
If your query selects based on TransactionDate and PaymentTypeId and also needs PersonId and Amount at the same, I would recommend putting a nonclustered index on TransactionDate and PaymentTypeId and including those other two columns in the index:
CREATE NONCLUSTERED INDEX IX_Table_TransactionDate
ON dbo.Table (TransactionDate, PaymentTypeId)
INCLUDE (PersonId, Amount)
That way, your query can be satisfied from just this index - no need to go back to the actual complete data pages.
Also: if you have years that can be "finalized" (no more changes), you could possibly pre-compute and store certain of those summations, e.g. for each day, for each month etc. With this approach, certain queries might just pull pre-computed sums from a table, rather than having to again compute the sum over thousands of rows.

How can I separate a one-column table into a table with multiple columns?

I came across with a text file that I need to import to SQL Server 2005. The data looks like this:
A1A00001
A2Name
A3Address
A4Credit
A5ModeOfPayment
D1Invoice 1 Amount
D1Invoice 2 Amount
D1Invoice N Amount (number of invoice varies per entry)
D2Total Amount Amount
S1Total Outstanding Amount
S1
A1A00002
A2Name
A3Address
A4Credit
A5ModeOfPayment
D1Invoice 1 Amount
D1Invoice 2 Amount
D1Invoice N Amount (number of invoice varies per entry)
D2Total Amount Amount
S1Total Outstanding Amount
S1
A1A00003
A2Name
A3Address
A4Credit
A5ModeOfPayment
D1Invoice 1 Amount
D1Invoice 2 Amount
D1Invoice N Amount (number of invoice varies per entry)
D2Total Amount Amount
S1Total Outstanding Amount
S1
As you can see, there are no delimiters in the data, that's why I only managed to have a table with one column containing the information above.
I need your help on how to capture one entry from A1 to S1, put it into separate columns, then proceed to the next occurrence of A1 and S1 and so on.
Like A1A00001 A2Name A3Address A4Credit A5ModeOfPayment D1Invoice 1 Amount D1Invoice.. etc
Thanks in advance!
This is a classic example of a file I would send back to the provider and tell them to send it in an appropriate format.
Unfortunately that isn't always an option. In the past how I have handled such is to add a recordidentifer column to the one column table and then populate it to keep all the records together that should be together. I also would add an Identity column at the time the records are inserted so youhave something to order on.
Then you populate the record identifier probably using a cursor or loop to add the same record identifier for each group of related records.
Now create the normalized staging tables you actually need. Populate them with SQl code (which is possible now that you have a recordidentifier). Then popluate your real tables form these nomalized staging tables.
If you have to do this in sql then use cursor. But best would be to parse and insert data to database with some application.
With cursor it would look like this (pseudocode):
If A1 insert previous row if exist
Start prepering next row
If this was like an initial load and you expect to keep the data and maybe add/modify it I would split it into two tables. Without seeing actual data something like this might work.
Obviously I just made up data types and column lengths, adjust as necessary.
Group A1 through A5, D2 and S1 into one table. Actually D2Total Amount could probably not be stored in a table but derived from invoice amounts. I included it in the table just for the example.
CREATE TABLE baseInfo (
A1 INT NOT NULL,
Name VARCHAR(25) NOT NULL
Address VARCHAR(55),
Credit VARCHAR(12),
ModeOfPayment VARCHAR(12)
TotalAmount MONEY,
OutstandingAmount MONEY
CONSTRAINT [PK_basinfoA1] PRIMARY KEY CLUSTERED (A1)
)
For D1, the invoice amounts, just a two column table. A1 to relate back to the account ID in baseInfo and the amount of the invoice.
CREATE TABLE invoice (
A1 INT NOT NULL,
invoiceAmount MONEY
)
ALTER TABLE invoice WITH CHECK ADD CONSTRAINT FK_invoice_base FOREIGN KEY (A1) REFERENCES baseInfo (A1)
To get the data into the two tables you could use TSQL but personally I’d go back to the original text file and use Powershell to parse the text and build SQL inserts.