I have a requirement where I want to store 5 years amounts divided by months and quarters in database. Its not necessary that all amounts will be filled in for example user can input data for 3 months for 1st year and also can provide amount for all the months in another year.
I came up with following design
Fields = this table is used for saving month names and associated quarter information. data would like as below
FieldId FieldName Quarter
1 Jan q1
2 Feb q1
3 march q1
4 q1total q1
Data
DataId FieldId Amount year
1 1 100 2015
2 2 200 2015
3 3 300 2015
4 4 600 2016
With this approach for every budget information I have to save almost 80 records (5 years data for each month and quarter) in database in worse case.
I would like to know more efficient way to design tables for this requirement.
There's no need to store month name or what quarter it's in -- that can be calculated on the fly by date functions of your database or programming language. I'd get rid of the Fields table completely, drop the year and FieldId fields from the Data table, and then add a basic date field to the Data table. All you need is this:
ID Date Amount
-- ---------- ------
1 2015-01-01 100
2 2015-02-01 200
Then you just add a date span for your where clause. If you want Jan:
SELECT * FROM data WHERE date >= '2015-01-01' AND date < '2015-02-01';
If you want Q1:
SELECT * FROM data WHERE date >= '2015-01-01' AND date < '2015-04-01';
Or (in MySQL, for example):
SELECT * FROM data WHERE YEAR(date) = 2015 AND QUATER(date) = 1; -- Q1 2015
SELECT * FROM data WHERE YEAR(date) = 2015 AND MONTH(date) = 1; -- Jan 2015
Note, I'm guessing you're probably tracking more than one budget. Perhaps one per user or one per department or something. In this case, you'll want an additional field to indicate who or what the record belongs to:
ID UserId Date Amount
-- ------ ---------- ------
1 1 2015-01-01 100
2 1 2015-02-01 200
Or:
ID DepartmentId Date Amount
-- ------------ ---------- ------
1 1 2015-01-01 100
2 1 2015-02-01 200
With this approach for every budget information I have to save almost 80 records (5 years data for each month and quarter) in database in worse case
To be honest - 80, 800 or 8000 records, it doesn't matter much. With that amount of data you don't need to worry about "efficiency", but rather about maintenance and future growth.
You'll want to design it so that it is easy to maintain and easy to change (because it will change). Storing quaters, years and months now might make sense if you want to shave off a nano-second in query time, or want to have an easier query to retrieve the data. But in 1 year, when someone will ask you also get weekly statistics, this design will fail you.
I agree with Alex answer about the design of that particular table. If you store a date you have freedom to use it as you please. But my answer is more of a general note for any table you will create:
Don't get stuck in how to optimize it now, instead try to think ahead and store data with as much detail as possible (unless building a huge database).
Related
I am using Oracle SQL Developer.
I got the following result from several joined input-tables:
Working station number
Produced tools
Date
1
150
01.01.2020
2
100
01.01.2020
1
50
01.02.2020
3
70
15.01.2020
1
120
08.02.2020
4
130
08.02.2020
The date in the last column is at TO_Date format YYYY/MM/DD
My goal is to visualize the amount of produced tools per working station for each month and year.
Expected output-table:
Year/Month
Working Station Number
Sum of all WS
2022/01
1
150
2022/02
2
80
2022/03
3
100
2022/04
1
120
I want this output format for all WS per Month and year. Moreover I would like to add the sum of the WS per month and year
The data does also include the amount of tools per WS for 2021. The table should therefore aso show the amount of produced tools for more years.
To achieve this format I need to 1.: sum up the tools per ws and 2. sum up the tools per month per working station number and 3. convert the lines to columns.
I would begin to Sum up the Produced tools per month and afterwards calculate it based on the ws.
Afterwards I would use the pivot-Function in order to turn the lines (working stations) into columns.
My approach would be the following:
(SELECT Working station number, Amount of produced tools, Date from SourceTable
from
(SELECT Working station number, Amount of produced tools, Date from SourceTable from SourceTable) as SourceTable
Pivot
(Max(Amount of produced tools)
For Working Station in ([1] [2] [3] [4])
) as PIVOT_Table
Unfortunately I don't know how to get the 3 steps together.
I am happy about any comments!
My dataset provides a monthly snapshot of customer accounts. Below is a very simplified version:
Date_ID | Acc_ID
------- | -------
20160430| 1
20160430| 2
20160430| 3
20160531| 1
20160531| 2
20160531| 3
20160531| 4
20160531| 5
20160531| 6
20160531| 7
20160630| 4
20160630| 5
20160630| 6
20160630| 7
20160630| 8
Customers can open or close their accounts, and I want to calculate the number of 'new' customers every month. The number of 'exited' customers will also be helpful if this is possible.
So in the above example, I should get the following result:
Month | New Customers
------- | -------
20160430| 3
20160531| 4
20160630| 1
Basically I want to compare distinct account numbers in the selected and previous month, any that exist in the selected month and not previous are new members, any that were there last month and not in the selected are exited.
I've searched but I can't seem to find any similar problems, and I hardly know where to start myself - I've tried using CALCULATE and FILTER along with DATEADD to filter the data to get two months, and then count the unique values. My PowerPivot skills aren't up to scratch to solve this on my own however!
Getting the new users is relatively straightforward - I'd add a calculated column which counts rows for that user in earlier months and if they don't exist then they are a new user:
=IF(CALCULATE(COUNTROWS(data),
FILTER(data, [Acc_ID] = EARLIER([Acc_ID])
&& [Date_ID] < EARLIER([Date_ID]))) = BLANK(),
"new",
"existing")
Once this is in place you can simply write a measure for new_users:
=CALCULATE(COUNTROWS(data), data[customer_type] = "new")
Getting the cancelled users is a little harder because it means you have to be able to look backwards to the prior month - none of the time intelligence stuff in PowerPivot will work out of the box here as you don't have a true date column.
It's nearly always good practice to have a separate date table in your PowerPivot models and it is a good way to solve this problem - essentially the table should be 1 record per date with a unique key that can be used to create a relationship. Perhaps post back with a few more details.
This is an alternative method to Jacobs which also works. It avoids creating a calculated column, but I actually find the calculated column useful to use as a flag against other measures.
=CALCULATE(
DISTINCTCOUNT('Accounts'[Acc_ID]),
DATESBETWEEN(
'Dates'[Date], 0, LASTDATE('Dates'[Date])
)
) - CALCULATE(
DISTINCTCOUNT('Accounts'[Acc_ID]),
DATESBETWEEN(
'Dates'[Date], 0, FIRSTDATE('Dates'[Date]) - 1
)
)
It basically uses the dates table to make a distinct count of all Acc_ID from the beginning of time until the first day of the period of time selected, and subtracts that from the distinct count of all Acc_ID from the beginning of time until the last day of the period of time selected. This is essentially the number of new distinct Acc_ID, although you can't work out which Acc_ID's these are using this method.
I could then calculate 'exited accounts' by taking the previous months total as 'existing accounts':
=CALCULATE(
DISTINCTCOUNT('Accounts'[Acc_ID]),
DATEADD('Dates'[Date], -1, MONTH)
)
Then adding the 'new accounts', and subtracting the 'total accounts':
=DISTINCTCOUNT('Accounts'[Acc_ID])
The data I am working with is oil and gas production data. The production table uniquely identifies each well and contains a time series of production values. I want to be able to calculate a column that contains the month number occurrence of production for every well in the production table. This needs to be a calculation, so I can graph the production for various wells based on the production month, not the calendar month. (I want to compare well performance across wells over the life of wells.) Also note that there could be gaps in the production data so you can't depend on having twelve months of sequential production for each well.
I tried using the answer in this postRankValues but the calculation would never finish. I have over 4 million rows of production data.
In the table shown below, the values shown in ProdMonth is what I need to calculate based on their time occurrence shown in ProdDate. This needs to be performed as a row calculation for each unique WellId
Thanks.
WellID ProdDate ProdMonth
1 12/1/2011 1
1 1/1/2012 2
1 2/1/2012 3
1 3/1/2012 4
… … …
1 11/1/2012 12
2 3/1/2014 1
2 4/1/2014 2
2 5/1/2014 3
2 6/1/2014 4
2 7/1/2014 5
… … …
2 2/1/2014 12
I would create a new date table that has a row for each day (the granularity of your data). I would then add to that table the ProdMonth column. This will ensure you have dates for all days (even if there are gaps in the well reporting data). Then you can use a relationship between the well production data and the Date table on the ProdDate field. Then if you pull in the ProdMonth from the date table, you'll have a list of all of the ProdMonths (hint: you may need to select 'show values with no data' on the field right click menu in the fields well). Then if you add to the same visualization WellID you should be able to see which wells were active in which ProdMonth. If WellID is a number, you might need do use the 'do not summarize' feature on the WellID to get the result you desire.
I posted this question on the PowerPivotPro and Tom Allan provided the DAX formula I needed. First step was to calculate a field that concatenated Year and Month (YearMonth). Then utilized the RANKXX function as such:
= RANKX ( FILTER ( Data, [WellID] = EARLIER ( [WellID] ) ), [YearMonth], , 1, DENSE )
That did the trick and performed fairly quickly on 12mm rows.
I have created 2 tables. one table has 4 fields. a unique name, a date and 3 figures. The second table contains the same fields but records the output of a merge function. therefore has a date at which time the update or insert function happened. what I want to do is retrieve a sum of either the difference between 2 days or alternatively the totals of the 2 days to work out how much the value has changed over the day. The merge function only updates if a value has changed or it needs to insert a new value.
so far I have this
select sum(Change_Table_1.Disk_Space) as total,
Change_Table_1.Date_Updated
from VM_Info
left join Change_Table_1
on VM_Info.VM_Unique = Change_Table_1.VM_Unique
where VM_Info.Agency = 'test'
group by Change_Table_1.Date_Updated
but this would just return the sum of that days updated total rather than the difference between the two days. One answer to this question would be to to add all new records to the table but this would contain a number of duplicates. So in my head what I want it to do is loop over the current figures for the day then loop over the next day but also to include all values that haven't updated. sorry if I haven't explained this well. so what I want to achieve is to get some sort of change of the total over time. If its poor design im in a position to accept that also.
Any help is much appreciated.
maybe this would explain it better. show me total for day 1, if the value hasn't changed then show me the same value for day 2 if it has changed show me new value. and so on...
ok to further elaborate.
the Change_Table looks like
vm date created action value_Field1 value_field_2 Disk_Space
abc 14/10/2013 insert 5 5 30
def 14/10/2013 insert 5 5 75
abc 15/10/2013 update 5 5 75
so the out put I want is for the 14th the total for the last column is 105. On the 15th abc has changed from 30 to 75 but def hasn't changed but still neds to be included giving 150
so the output would look like
date disk_Space
14/10/2013 105
15/10/2013 150
Does this help? If not, can you provide a few rows of sample data, and an example of the desired result?
select
(VM_Info.Disk_Space - Change_Table_1.Disk_Space) as DiskSpaceChange,
Change_Table_1.Date_Updated
from
VM_Info
left join Change_Table_1 on VM_Info.VM_Unique = Change_Table_1.VM_Unique and VM_Info.Date = Change_Table_1.Date_Updated
where
VM_Info.Agency = 'test'
Wondering if anyone can help with the code for this.
I want to query the data and get 2 entries, one for YTD previous year and one for this year YTD.
Only way I know how to do this is as 2 separate queries with where clauses.. I would prefer to not have to run the query twice.
One column called DatePeriod and populated with 2011 YTD and 2012YTD, would be even better if I could get it to do 2011YTD, 2012YTD, 2011Total, 2012Total... though guessing this is 4 queries.
Thanks
EDIT:
In response to help clear a few things up:
This is being coded in MS SQL.
The data looks like so: (very basic example)
Date | Call_Volume
1/1/2012 | 4
What I would like is to have the Call_Volume summed up, I have queries that group it by week, and others that do it by month. I could pull all the dailies in and do this in Excel but the table has millions of rows so always best to reduce the size of my output.
I currently group by Week/Month and Year and union all so its 1 output. But that means I have 3 queries accessing the same table, large pain, very slow not efficient and that is fine but now I also need a YTD so its either 1 more query or if I could find a way to add it to the yearly query that would ideal:
So
DatePeriod | Sum_Calls
2011 Total | 40
2011 YTD | 12
2012 Total | 45
2012 YTD | 15
Hope this makes any sense.
SQL is built to do operations on rows, not columns (you select columns, of course, but aggregate operations are all on rows).
The most standard approach to this is something like:
SELECT SUM(your_table.sales), YEAR(your_table.sale_date)
FROM your_table
GROUP BY YEAR(your_table.sale_date)
Now you'll get one row for each year on record, with no limit to how many years you can process. If you're already grouping by another field, that's fine; you'll then get one row for each year in each of those groups.
Your program can then iterate over the rows and organize/render them however you like.
If you absolutely, positively must have columns instead, you'll be stuck with something like this:
SELECT SUM(IF(YEAR(date) = 2011, sales, 0)) AS total_2011,
SUM(IF(YEAR(date) = 2012, total_2012, 0)) AS total_2012
FROM your_table
If you're building the query programmatically you can add as many of those column criteria as you need, but I wouldn't count on this running very efficiently.
(These examples are written with some MySQL-specific functions. Corresponding functions exist for other engines but the syntax would be a little different.)