Get the most common item in a calculated column - powerpivot

I've figured out how to accomplish the equivalent of the following in a measure, but I need to use it as the legend in a Power View chart so it needs to be done in a calculated column. The change of context from calculated field to calculated column has completely screwed me up.
In my data model, I have a table of job applications. Each record has a single Specialty and an address for the company being applied to. Each specialty can show up multiple times in the table.
ApplicationTable:
ApplicationID | Name | Specialty | City | State
32911 |Joe Bob | Engineering | Miami | Florida
89547 |Ralph Kramden | Shouting | New York | New York
etc.
I also have a table of states. It just has columns for state name and postal abbreviation. I need to create a column with the most commonly occurring Specialty per state.
If I could do this as a calculated field, I would have been finished hours ago. I just used a pretty straightforward application of topn:
Top Specialty := FIRSTNONBLANK (TOPN (3, VALUES (ApplicationTable[Specialty]),[Count of ApplicationID], ApplicationTable[Specialty])
I used FIRSTNONBLANK and TOPN(3...) because some states only have a few applications, so each specialty only shows up once. In my application it's fine to just pick the first specialty in the list in those cases.
Anyway that formula is cool but it doesn't help here. So how do I do the equivalent in a calculated column, so I can use it as a key or a filter? Specifically, I think I need to do this in the StateTable, giving me the name of the specialty that occurs most per state in the ApplicationTable. Ideas?

First, create a basic measure to count specialties:
SpecialtyCount :=
COUNTA ( ApplicationTable[Specialty] )
Next, create a measure to figure out the highest single specialty (within a context):
MostSpecial :=
MAXX ( VALUES ( ApplicationTable[Specialty] ), [SpecialtyCount] )
Finally, add a calculated column to your States table:
=
FIRSTNONBLANK (
ApplicationTable[Specialty],
IF (
[SpecialtyCount]
= CALCULATE ( [MostSpecial], VALUES ( ApplicationTable[Speciality] ) ),
1,
BLANK ()
)
)
By placing this as a calculated column, our filter context is each state. So first PowerPivot will filter the ApplicationTable to just applications within the state, and then it will use FIRSTNONBLANK() to iterate through each ApplicationTable[Specialty], calculate its SpecialtyCount and see if that equals the MostSpecial count within that state. If so, it's not blank, and that's the specialty it returns.

Related

Keeping an updated tally of changing records

I have a list of students and their subjects:
id | student | subject
---|---------|--------
1 | adam | math
2 | bob | english
3 | charlie | math
4 | dan | english
5 | erik | math
And I create a tally from the above list aggregating how many students are there in each subject:
id | subject | students
---|---------|--------
1 | math | 3
2 | english | 2
The student list will keep on expanding and this aggregation will be done at regular intervals.
The reason I'm keeping the Tally in a separate table in the first place is because the original table is supposed to be massive (this is just a simplification of my original problem) and so querying the original table for a current tally on-the-fly is unfeasible to do quickly enough.
Anyways, so the aggregating is pretty straight forward as long as the students don't change their subject.
But now I want to add a feature to allow students to change their subject.
My previous approach was this: while updating the Tally, I keep a counter variable up to which row of students I've already accounted for. Next time I only consider records added after that row.
Also the reason why I keep a counter is because the Students table is massive, and I don't want to scan the whole table every time as it won't scale well.
It works fine if all students are unique and no one changes their subject.
But it breaks apart now because I can no longer account for rows that come before the counter and were updated.
My second approach was using a updated_at field (instead of counter) and keep track of newly modified rows that way.
But still I don't know how to actually update the Tally accurately.
Say, Erik changes his subject from "math" to "english" in the above scenario. When I run the script to update the Tally, while it does finds the newly updated row but it simply says {"erik": "english"}. How would I know what it changed from? I need to know this to correctly decrement "math" in the Tally table while incrementing "english".
Is there a way this can be solved?
To summarize my question again, I want to find a way to be able to update the Tally table accurately (a process that runs at regular interval) with the updated/modified rows in the Student table.
I'm using NodeJS and PostgreSQL if it matters.
Why don't you do it when student add subject, remove subject, or change subject.
When student add new subject: Just increase UPDATE tbl_tally SET student = student + 1 WHERE subject = :subject;
When student remove subject: Just decrease UPDATE tbl_tally SET student = student - 1 WHERE subject = :subject;
When student change subject: Just increase new subject by one and decrease old subject by one
UPDATE tbl_tally SET student = student - 1 WHERE subject = :old_subject;
UPDATE tbl_tally SET student = student + 1 WHERE subject = :new_subject;
I am not familiar with PostgreSQL, but in MySQL, you can even do it with trigger. I think PostgreSQL also has trigger.

With MDX is there a generic way to calculate the ratio of cells with regards to the selected members of a specific hierarchy?

I want to define a cube measure in a SSAS Analysis Services Cube (multidimensional model) that calculates ratios for the selection a user makes for a predefined hierarchy. The following example illustrates the desired behavior:
|-City----|---|
| Hamburg | 2 |
| Berlin | 1 |
| Munich | 3 |
This is my base table. What I want to achieve is a cube measure that calculates ratios based on a users' selection. E.g. when the user queries Hamburg (2) and Berlin (1) the measure should return the values 67% (for Hamburg) and 33% (for Berlin). However if Munich (3) is added to the same query, the return values would be 33% (Hamburg), 17% (Berlin) and 50% (Munich). The sum of the values should always equal to 100% no matter how many hierarchy members have been included into the MDX query.
So far I came up with different measures, but they all seem to suffer from the same problem that is it seems impossible to access the context of the whole MDX query from within a cell.
My first approach to this was the following measure:
[Measures].[Ratio] AS SUM([City].MEMBERS,[Measures].[Amount])/[Measures].[Amount]
This however sums up the amount of all cities regardless of the users selection and though always returns the ratio of a city with regards to the whole city hierarchy.
I also tried to restrict the members to the query context by adding the EXISTING keyword.
[Measures].[Ratio] AS SUM(EXISTING [City].MEMBERS,[Measures].[Amount])/[Measures].[Amount]
But this seems to restrict the context to the cell which means that I get 100% as a result for each cell (because EXISTING [City].MEMBERS is now restricted to a cell it only returns the city of the current cell).
I also googled to find out whether it is possible to add a column or row with totals but that also seems not possible within MDX.
The closest I got was with the following measure:
[Measures].[Ratio] AS SUM(Axis(1),[Measures].[Amount])/[Measures].[Amount]
Along with this MDX query
SELECT {[Measures].[Ratio]} ON 0, {[City].[Hamburg],[City].[Berlin]} ON 1 FROM [Cube]
it would yield the correct result. However, this requires the user to put the correct hierarchy for this specific measure onto a specific axis - very error prone, very unintuitive, I don't want to go this way.
Are there any other ideas or approaches that could help me to define this measure?
I would first define a set with the selected cities
[GeoSet] AS {[City].[Hamburg],[City].[Berlin]}
Then the Ratio
[Measures].[Ratio] AS [Measures].[Amount]/SUM([GeoSet],[Measures],[Amount])
To get the ratio of that city to the set of cities. Lastly
SELECT [Measures].[Ratio] ON COLUMNS,
[GeoSet] ON ROWS
FROM [Cube]
Whenever you select a list of cities, change the [GeoSet] to the list of cities, or other levels in the hierarchy, as long as you don't select 2 overlapping values ([City].[Hamburg] and [Region].[DE6], for example).

Multi-Row Per Record SQL Statement

I'm not sure this is possible but my manager wants me to do it...
Using the below picture as a reference, is it possible to retrieve a group of records, where each record has 2 rows of columns?
So columns: Number, Incident Number, Vendor Number, Customer Name, Customer Location, Status, Opened and Updated would be part of the first row and column: Work Notes would be a new row that spans the width of the report. Each record would have two rows. Is this possible with a GROUP BY statement?
Record 1
Row 1 = Number, Incident Number, Vendor Number, Customer Name, Customer Location, Status, Opened and Updated
Row 2 = Work Notes
Record 2
Row 1 = Number, Incident Number, Vendor Number, Customer Name, Customer Location, Status, Opened and Updated
Row 2 = Work Notes
Record n
...
I don't think that possible with the built in report engine. You'll need to export the data and format it using something else.
You could have something similar to what you want on short description (list report, group by short description), but you can't group by work notes so that's out.
One thing to note is that the work_notes field is not actually a field on the table, the work_notes field is of type journal_input, which means it's really just a gateway to the actual underlying data model. "Modifying" work_notes actually just inserts into sys_journal_field.
sys_journal_field is the table which stores the work notes you're looking for. Given a sys_id of an incident record, this URL will give you all journal field entries for that particular record:
/sys_journal_field_list.do?sysparm_query=name=task^element_id=<YOUR_SYS_ID>
You will notice this includes ALL journal fields (comments + work_notes + anything else), so if you just wanted work notes, you could simply add a query against element thusly:
/sys_journal_field_list.do?sysparm_query=name=task^element=work_notes^element_id=<YOUR_SYS_ID>
What this means for you!
While you can't separate a physical row into multiple logical rows in the UI, in the case of journal fields you can join your target table against the sys_journal_field table using a Database View. This deviates from your goal in that you wouldn't get a single row for all work notes, but rather an additional row for each matched work note.
Given an incident INC123 with 3 work notes, your report against the Database View would look kind of like this:
Row 1: INT123 | markmilly | This is a test incident |
Row 2: INT123 | | | Work note #1
Row 3: INT123 | | | Work note #2
Row 4: INT123 | | | Work note #3

SQL statement that retrieves formulas for two different dates in db

All,
I have three total tables. The first table 'rollup1' contains the number of views and number of clicks for a campaign, as well as a one-up number for the day field (largest number in column represents the current date) A second table 'rollup2' contains the earnings for the campaign. It also contains the same one-up number for the dayfield. The third table 'campaigns' contains the ID/names for the campaigns. campaigns.id = rollup1.id = rollup2.id and rollup1.day=rollup2.day
I want to generate an SQL query that lists the campaign id, name, specific calculated value from yesterday, and specific calculated value from today. The specific calculated value I'm looking for is (earnings/clicks)*1000.
The results will look like:
id | name | yesterday | today
a | Campaign1 | $0.05 | $0.010
I think I can use case statements, but I can't seem to get it correct. Here's what I have so far. It calculates the formula for yesterday, but not the one for today. I need these to be side by side.
select campaigns.id, campaigns.name, rollup1.views,rollup1.clicks,rollup2.costs,round((rollup2.costs/rollup1.views)*1000,2) as yesterday
from campaigns,rollup1,rollup2
where campaigns.id = rollup1.campaign_id and campaigns.id = rollup2.campaign_id
and rollup1.dayperiod = rollup2.dayperiod
and rollup1.dayperiod = (SELECT (max(rollup1.dayperiod) -1) FROM rollup1)
Thanks for any help you can provide.

Pentaho Report Designer (PRD): Use parameter in select clause

In the report I'm working with, I need to display information of four columns of a database table. The first three columns of the table are SEX, AGE and NAME. The other N columns (N being like 100!) are questions, with every line of the table meaning that person's answer to that question:
SEX | AGE | NAME | Q1 | Q2 | Q3 | ... | Q100
In my report, I need to show four of these columns, where the first three are always the same and the fourth column varies according to the option selected by the user:
SEX | AGE | NAME | <QUESTION_COLUMN>
So far I've created a dropdown parameter (filled with "Q1", "Q2", "Q3", etc) where the user can select the question he wants to compare. I've tried to use the value of the selected option (for instance, "Q1") in the SELECT clause of my report query, without success:
SELECT sex, age, name, ${QUESTION} FROM user_answers
Pentaho Report Designer doesn't show any errors with that, it simply doesn't show any values for the question column (the other columns - sex, age and name - always return their values)
So, I would like your know:
Can I do this? I mean, use parameters in the SELECT clause?
Is there any other way have this "wildcard" column according to a parameter?
Thanks in advance!
Bruno Gama
you can use the pentaho report design to design.
First,you must bulid the param "QUESTION"on the paramers
eg: SELECT question FROM user_ansewers order by XXXX
you can use the sql
SELECT sex, age, name,question FROM user_answers
where question= ${QUESTION}
last ,you can see the "drop down" to realized the choose
I am using SQL server as database. This problem solves like this :
execute('SELECT sex, age, name, '+${QUESTION}+' as Q1 FROM user_answers')
Please note that ${QUESTION} must be a column name of user_answers. In this example I used a text box parameter name QUESTION where column name is given as input. You may need other coding if input parameter is not text box.