Problem with calculated member in Schema Workbench - pentaho

I have the Formula:
(([Measures]. [QuantityPathology])/([Measures]. [QuantityPersons],
[DimPathology.Pathologies].[All])) * 100
The numerator is the measure QuantityPathology and the denominator is the QuantityPeople grouped by the member All of the hierarchy Pathologies, that is the total number of people served in that period regardless of the selected pathology.
But it does not work, since saiku analysis does not display any results, which may be doing wrong.
Modify message
Attached is an example of the formula
(([Measures], [QuantityPathology])/([Measures], [QuantityPeople],
[DimPathology.Pathologies][All])) * 100
QuantityPathology with Bronchitis year 2019 = 61
Number of people served in the year 2019 = 4569
61/4569 = 0,013350843*100
Result =1,3350843
Also avg(NumberPeople) does not work in all rows returns 1
I don't have any error in catalina.out
Thank you very much.

Related

Summing up variable by preset N observations

I am running into problems doing a fairly basic summation.
My dataset is composed of company ID's (cusip8) and their daily (date) abnormal returns (AR). I need to sum the abnormal returns of each company, from days t+3 until t+60 forward.
cusip8 year date ret vwretd AR
"00030710" 2014 19998 . .0007672172 . .
"00030710" 2014 19999 .008108087815344334 .009108214 -.0010001262 .
"00030710" 2014 20002 .03163539618253708 -.00158689 .033222288 .
"00030710" 2014 20003 0 -.014999760000000001 .01499976 .
"00030710" 2014 20004 -.005717287305742502 .0158898 -.02160709 .
"00030710" 2014 20005 .006272913888096809 -.02121511 .027488023 .
"00030710" 2014 20006 -.012987012974917889 -.01333873 .000351717 .
I have tried the following:
sort cusip8 date
by cusip8: gen CAR = AR if _n==1
(24,741,003 missing values generated)
by cusip8: replace CAR = AR +CAR[_n-1] if _n>3 & if _n<60
And have yet been left with just .'s in the newly generated variable. Does anyone know how to solve this?
I am using Stata 16.0.
You have more problems than one. First, let's address your problem report.
In each panel, CAR[2] is created missing by your code, which creates CAR only in the first observation. That messes up all subsequent calculations, as for example CAR[3] is AR[3] + CAR[2], and so missing, CAR[4] is AR[4] + CAR[3] and so missing, and so on.
Contrary to your claim, in each panel CAR[1] should be non-missing whenever AR is.
Second, evidently you have gaps for days 20000 and 20001 which were at a weekend. dow() returns 6 for Saturday and 0 for Sunday from daily dates (for which 0 is 1 January 1960).
. di dow(20000)
6
. di dow(20001)
0
. di %td 20000
04oct2014
So, either set up a business calendar to exclude weekends and holidays, or decide that you want just to use whatever is available within particular windows based on daily dates.
Third, your wording is not precisely enough to make your problem unambiguous to anyone who doesn't routinely deal with your kind of data. It seems that you seek a cumulative (running) sum but the window could just be one window (as your question literally implies) or a moving window (which I guess at). The function sum() gives cumulative or running sums: see help sum(). Just possibly,
bysort cusip8 (date): gen wanted = sum(AR)
is a start on your solution. Otherwise, ssc describe rangestat shows you a command good for moving window calculations.
There are hundreds of posts in this territory on Statalist.

MSAccess Slow Updates on Self-Joined table

I am trying to improve the performance of updating only about 60K rows with data coming from different rows in the same table. At about 2 minutes, it's not terrible, but it's not great either, and my application really doesn't work if you have to wait so long between recalculations.
The app generates a set of financial statements for a business, where it calculates basic formulas on 1300 line items, like Rent, or Direct Labor, or Inventory costs, all of which roll up to totals that mimic the Balance Sheet, P&Ls, Cash Flow etc. Many of the line items need to calculate on a month by month basis, where for instance it has figure out April's On Hand Inventory before knowing what April's Inventory Value is. So the total program ends up looping through 48 months over 30 calculation passes, requiring about 8000 SQL statements. (fortunately it figures it all out by itself!) Each SQL is taking only a few milliseconds, but it adds up.
I'm pretty sure I can't reduce the number of loops, so I keep trying to figure out how to make each SQL quicker. The basic structure is as follows:
LI: Line item table that holds the basic info of each item, primary key LID
LID Name
123 Sales_1
124 Sales_2
200 Total Sales
Formula: Master/Detail tables that create any formula from the line items
Total sales=Sales_1 + Sales_2
or
{200}={123}+{124}
(I use curly braces to be able to find and replace the LIDs within the formula, as shown in the SQL below)
FC: Formula Calculation table: all line items by month, about 1300 items x 48 months=62K records. Primary key FID
FID SQL_ID LID LID_brace LIN OutputMonth Formula Amount
3232 25 123 {123} Sales_1 1 1200
3255 26 124 {124} Sales_2 1 1500
5454 177 200 {200} Total Sales 1 {123}+{124}
DMO:Operand Join table, which links a formula to its detail lines within the same table, so once Sales_1 is calculated, it can find the Total Sales record and update it, which then will evaluate then send its amount up the chain to the other LIDs that depend on it, such as Total Income. It locates the record to update based on the SQL_id, which is set based on the calc pass and month. Its complex to setup, but pretty straightforward once you actually run things
Master_FID Detail_FID
5454 3232 (links total sales to sales_1)
5454 3255 (links total sales to sales_2)
SQL1:
Update FC inner join DMO on FC.FID=DMO.Master_FID inner join FC2 on DMO.Detail_FID=FC2.FID set FC.formula=replace(FC.Formula,FC.LID_brace,FC2.Amount) where FC.sql_id=177
The above will change {123} + {124} to 1200+1500 which will then evaluate to 2700 when I run the following
SQL2:
UPDATE FC SET FC.amount = Eval([fc].[formula]) WHERE (((FC.calc_sql_id)=177 )
So those two sql statements are run over and over again, with the only thing changing is the SQL_id.
There are indexes on the SQL_ID, LID, FID etc
When measuring, the milliseconds per record can range from .04ms if there are many records included (~10K for some passes), up to 10 or 15 ms for just one record updated. Perhaps it is the setup of the query causing a whole lot of overhead time, because it doesn't seem to be a function of the actual number of records updated? Also its not very consistent, where some runs have 20+ ms compared to less than 3ms when it runs it again.
I know this is a complex question i'm asking that probably doesn't have a simple answer, but I'm just looking for directions for what might help. For instance, a parameter query if there isn't a whole lot of change between runs? Does Access have a better time of running a query if knows about it in advance, i.e a named query with parameters vs dynamic SQL? Am I just doomed because it still needs to run those 8000 queries?
Also, is there inherently a problem with trying to update the same table through a secondary join table, and/or is there a better way to do it?
Is it also because string replacing isn't efficient this way? If I tried RegEx would that be quicker? I would have to make a function that could do that within a query, but it seems like that's going to be slower.
Thanks in advance, this has been a most vexing problem!!!

Table Percentage of Total in Matrix Table (By both column and row)

I have a table built off of a dataset. It contains both a column and row group. The intention is that each cell will specify it's percentage of the total in it's column, without looking at the row. For example:
Jan Feb Mar
New Accounts 50% 35% 86%
Old Accounts 50% 65% 14%
Currently, I have it set up with three groups: One parent, all-encompassing
row group, one row group that will limit to new or old accounts, and one month-specific column group. The code I'm using in each individual cell is as follows:
=SUM(Fields!amountOfAccounts.Value, "newOrOldAccountGrouping") / SUM(Fields!accts.amountOfAccounts,"allEncompassingGroup")
This works somewhat, but the issue is that it does not separate by column group - it just averages across all the months. As a result, I'll end up with something like this:
Jan Feb Mar
New Accounts 57% 57% 57%
Old Accounts 43% 43% 43%
I've already tried filtering by both row and column group, but Report Builder throws a scope error if I do so. For instance, the following code will not work:
=SUM(SUM(Fields!accts.Value, "columnGrouping"), "rowGrouping")
I'm not certain how to fix this, does anyone have any ideas?
Your expression should look like the one below
=Sum(Fields!accts.Value)/ Sum(Fields!accts.Value,"columnGrouping")

powerpivot average of an aggregate

I may be missing an obvious solution but I am looking for a way to take the average of a summed value. For example I have profit at an item# level where each item is on a bill as well and I want average of the bills profit.
Item# | Bill# | Profit
1 1 100
2 1 200
1 2 100
2 2 200
If I just take the avg of profit I get 150 but I want the avg of the bill total which would be 300. Is it possible to do this? I was thinking something like Calculate(Average(Profit),Bill# = Bill#) but that is always true?
Thanks in advance!
It's not totally clear how you intend to use your measure but there are some powerful iterative functions in PowerPivot that do this kind of thing. This formula iterates over each bill# and averages the sum of the profit:
= AVERAGEX(VALUES(tbl[bill#]), SUM(tbl[profit]))
The first argument simply creates a 'column' of the unique bill#s and the second is the summing the profit per bill#.
assuming your table is called tbl

Cumulative average number of records created for specific day of week or date range

Yeah, so I'm filling out a requirements document for a new client project and they're asking for growth trends and performance expectations calculated from existing data within our database.
The best source of data for something like this would be our logs table as we pretty much log every single transaction that occurs within our application.
Now, here's the issue, I don't have a whole lot of experience with MySql when it comes to collating cumulative sum and running averages. I've thrown together the following query which kind of makes sense to me, but it just keeps locking up the command console. The thing takes forever to execute and there are only 80k records within the test sample.
So, given the following basic table structure:
id | action | date_created
1 | 'merp' | 2007-06-20 17:17:00
2 | 'foo' | 2007-06-21 09:54:48
3 | 'bar' | 2007-06-21 12:47:30
... thousands of records ...
3545 | 'stab' | 2007-07-05 11:28:36
How would I go about calculating the average number of records created for each given day of the week?
day_of_week | average_records_created
1 | 234
2 | 23
3 | 5
4 | 67
5 | 234
6 | 12
7 | 36
I have the following query which makes me want to murderdeathkill myself by casting my body down an elevator shaft... and onto some bullets:
SELECT
DISTINCT(DAYOFWEEK(DATE(t1.datetime_entry))) AS t1.day_of_week,
AVG((SELECT COUNT(*) FROM VMS_LOGS t2 WHERE DAYOFWEEK(DATE(t2.date_time_entry)) = t1.day_of_week)) AS average_records_created
FROM VMS_LOGS t1
GROUP BY t1.day_of_week;
Halps? Please, don't make me cut myself again. :'(
How far back do you need to go when sampling this information? This solution works as long as it's less than a year.
Because day of week and week number are constant for a record, create a companion table that has the ID, WeekNumber, and DayOfWeek. Whenever you want to run this statistic, just generate the "missing" records from your master table.
Then, your report can be something along the lines of:
select
DayOfWeek
, count(*)/count(distinct(WeekNumber)) as Average
from
MyCompanionTable
group by
DayOfWeek
Of course if the table is too large, then you can instead pre-summarize the data on a daily basis and just use that, and add in "today's" data from your master table when running the report.
I rewrote your query as:
SELECT x.day_of_week,
AVG(x.count) 'average_records_created'
FROM (SELECT DAYOFWEEK(t.datetime_entry) 'day_of_week',
COUNT(*) 'count'
FROM VMS_LOGS t
GROUP BY DAYOFWEEK(t.datetime_entry)) x
GROUP BY x.day_of_week
The reason why your query takes so long is because of your inner select, you are essentialy running 6,400,000,000 queries. With a query like this your best solution may be to develop a timed reporting system, where the user receives an email when the query is done and the report is constructed or the user logs in and checks the report after.
Even with the optimization written by OMG Ponies (bellow) you are still looking at around the same number of queries.
SELECT x.day_of_week,
AVG(x.count) 'average_records_created'
FROM (SELECT DAYOFWEEK(t.datetime_entry) 'day_of_week',
COUNT(*) 'count'
FROM VMS_LOGS t
GROUP BY DAYOFWEEK(t.datetime_entry)) x
GROUP BY x.day_of_week