Usage of Power BI Interactive visualisations with Parent-Child dataset relations - azure-sql-database

I have been meddling with Power BI for almost a week now.
It seems like a powerful tool, when you get to know your way around it at least..
I would like to be able to see the sum of Therapists, Admins and Citizens, based on all subgroups for the currently selected group.
Here is my example:
When i select a Group (resembling a customer group) in the Drill Down Donut Chart, i want so see admins, therapists, and citizen count for all subgroups in the selected group, shown in the Clustered Column Chart. However I only get the users which are in the selected group, and not the users in sub groups.
I have created measures for Admins, Therapists & Citizens to get the count based on TemplateLevel (which is resembling the role of the user:
All measures are written in the same fashion, using different TemplateLevel(s).
Here is the three measures used in the Column Chart:
In my DataSet, i have the table UserGroup:
IdPath and NumLevels is an attempt to use parent-child reference, which i did not get to work properly, so dont mind that.
I expected that Power BI's interactive system would be able to handle Parent/Children references, as is the case with UserGroup[Id] and UserGroup[UserGroupParentId]. My initial thoughts was to just add GroupName as Category for each level of SubGroup available (Owner -> Customer -> Therapist -> Citizen).
Owner group id is 27 and always will be so that's why the drill down donut chart is filtering groups with no such parent it, to show the customer groups.
The DataSet for the report is from a test database migrated to an Azure SQL Server.
Any suggestions are given the warmest welcome!
Kindly Regards
Kalrin

Power BI (or more precisely, the Tabular Model underlying Power BI) does not support parent/child relationships. You have to transform/flatten the hierarchy to construct a table that holds the columns of all the levels of the hierarchy:
| Id | Owner | Customer | Therapist | Citizen | Group |
| ----- | --------- | --------- | --------- | -------- | -------- |
| 1 | ownerX | | | | 1 |
| 2 | ownerX | cust1 | | | 1 |
| 3 | ownerX | cust1 | tpA | | 1 |
| 4 | ownerX | cust1 | tpA | cit100 | 1 |
| 5 | ownerX | cust1 | tpA | cit101 | 1 |
| 6 | ownerX | cust1 | tpB | | 1 |
The above is a flattened hierarchy which is also ragged (you can have parent items with no child items).
This pattern describes how we can use DAX to construct a flattened hierarchy, but typically it is a best practice to flatten your data on the database side, before loading the table into Power BI (this can be done using recursive CTEs in SQL).

Related

Only show matching values from junction table

I have tables service and material and service_material.
The table service cointains warranty_dates that stores multiple values for all materials used on the service. The warranty length is in table material, and the warranty_dates in table service is calculated with service_date plus warranty lengths from table materials.
I'm trying to create a query in Access that will show me only the warranty dates for materials used on a service, so only the values that match the serviceid in service_material table.
Servis = Service, Datum = date, Garancija = warranty, Ime = name
SELECT DISTINCT Servis.Datum+Material.Garancija AS garancijski_rok, Servis.Datum, Material.Ime, Servis_Material.ServisID
FROM Servis
INNER JOIN (Material
INNER JOIN Servis_Material ON Material.MaterialID = Servis_Material.MaterialID) ON Servis.ServisID = Servis_Material.ServisID
WHERE (((Servis_Material.ServisID)=[Servis].[ServisID]) AND ((Servis_Material.materialid)=[material].[materialid]))
ORDER BY Servis_Material.ServisID;
This is what I have so far, but this query shows me all warranty dates for all materials used in services. I just want the query to show the matching materials that were used in service.
I'm a total beginner and I'm using Excel because it's a school project. Sorry cause it's in Slovenian. Hopefully it is still understandable.
This is the result atm. If it's possible I want it to only show materials for that particular service I am adding the warranty dates in.
Assuming you are working with access and not Excel and assuming the problem is mostly setting up the ManytoMany Relationship. Here is a Minimal Reproducible example starting with the Normalized Tables:
-------------------------------------------
| ServiceID | ServiceDate |
-------------------------------------------
| 1 | 12/12/2022 |
-------------------------------------------
| 2 | 12/5/2022 |
-------------------------------------------
----------------------------------------------------------------
| MaterialID | MaterialName | WarantyLength |
----------------------------------------------------------------
| 1 | PartA | 10 |
----------------------------------------------------------------
| 2 | PartB | 200 |
----------------------------------------------------------------
| 3 | PartC | 300 |
----------------------------------------------------------------
----------------------------------------------------------------
| MaterialServiceID | MaterialID | ServiceID |
----------------------------------------------------------------
| 1 | 1 | 1 |
----------------------------------------------------------------
| 2 | 2 | 1 |
----------------------------------------------------------------
| 3 | 3 | 1 |
----------------------------------------------------------------
| 4 | 1 | 2 |
----------------------------------------------------------------
| 5 | 2 | 2 |
----------------------------------------------------------------
Regarding Table Normalization, note how ServiceDate is the only thing hanging on Services and What would be more clearly named as MaterialWarantyLength is hanging on Materials. What would best be called LastDayofMaterialWaranty would be under MaterialsServices but we have a formula so we calculate that as needed.
One way of thinking about the next step is: Database users interact with the database through forms and reports. Among other things this protects the database from Users entering bad data and Protects the User from having to understand those normalized tables.
aside: the default forms are not good at protecting the database from bad data but they are a jump start.
So, the next step will be a query that puts the data for this relationship back together into one big table that the Database designer can understand. I find myself adjusting for the user when I design the User interface of forms and reports. Here I put everything into the designers query which is a good default as what you don't need for a particular form you just don't show.
'Warranty is a calculated Field.
'Note the use of DateAdd to handle all those calendar problems
'normally I extract the code for calculated fields into public functions for an example of that see here: https://stackoverflow.com/questions/74621501/ms-access-group-by-and-sum-if-all-values-are-available-query/74673498#74673498
Warranty: DateAdd("d",[Materials].[WarantyLength],[Services].[ServiceDate])
'here is the resulting sql for the designer query
'if you are following along try pasting the sql into the sql pane and then going to differnt tabs of the query designer
SELECT Services.ServiceID, Services.ServiceDate, MaterialsServices.MaterialID, Materials.WarantyLength, Materials.MaterialName
FROM Services INNER JOIN (Materials INNER JOIN MaterialsServices ON Materials.MaterialID = MaterialsServices.MaterialID) ON Services.ServiceID = MaterialsServices.ServiceID;
Running the query on the sample data gives:
-------------------------------------------------------------------------------------------------------------------
| ServiceID | ServiceDate | MaterialID | WarantyLength | MaterialName |
-------------------------------------------------------------------------------------------------------------------
| 1 | 12/12/2022 | 1 | 10 | PartA |
-------------------------------------------------------------------------------------------------------------------
| 1 | 12/12/2022 | 2 | 200 | PartB |
-------------------------------------------------------------------------------------------------------------------
| 1 | 12/12/2022 | 3 | 300 | PartC |
-------------------------------------------------------------------------------------------------------------------
| 2 | 12/5/2022 | 1 | 10 | PartA |
-------------------------------------------------------------------------------------------------------------------
| 2 | 12/5/2022 | 2 | 200 | PartB |
-------------------------------------------------------------------------------------------------------------------
At this point it is down to how you want the user to interact with the data. For instance, for a jump start I selected the query and used the create-form wizard to get a tabular style form and then added an unbound combobox to the header to allow the end user to filter the resulting form.
See here for an example: https://www.youtube.com/watch?v=uq3cgaHF6fc
The Final Form allows the end user to switch between viewing different services :
Private Sub cmbService_AfterUpdate()
Me.Filter = "ServiceID = " & Me.cmdService
Me.FilterOn = True
End Sub

Create and display table column hierarchy in Tableau

My table currently has a number of similar numerical columns I'd like to nest under a common label.
My current table is something like:
| Week | Seller count, total | Seller count, churned | Seller count, resurrected |
| ---- | ------------------- | --------------------- | ------------------------- |
| 1 | 100 | 10 | 4 |
| 2 | 105 | 12 | 5 |
And I'd like it to be:
| | Seller count |
| Week | Total | Churned | Resurrected |
| ---- | ----- | ------- | ----------- |
| 1 | 100 | 10 | 4 |
| 2 | 105 | 12 | 5 |
I've seen examples of this, including a related instructional video, but this video hides the actual creation of the nested object (called "Segment").
I also tried creating a hierarchy by dragging items in the "Data" tab on top of one another. This function appears to only be possible for dimensions (categorical data), not measures (numerical data) like mine.
Even so, I can drag my column names from the measures side onto the dimensions side to get them to be considered dimensions. Then I can drag to nest and create the hierarchy. But then when I drag the top item of the hierarchy ("Seller count" in the example below) into the "Columns" field, I get the warning "the field being added contains 92,000 members, and maximum recommended is 1,000". It thinks this is categorical data, and is maybe planning to create a subheading for each value (100, 105, etc.), instead of the desired hierarchy sub-items as subheadings.
Any idea how to accomplish this simple hierarchical restructuring of my column labels?
Actually, this is some data restructuring and Tableau isn't best suited for it. Still, it is simple one and you can do it like this-
I recreated one table like yours in excel, and imported it in Tableau
Rename the three cols, (removed seller count from their names)
selected these three columns at once, and select pivot to transform these like
Rename these columns again
create a text table in tableau, as you have shown in question

SQL Query to look up one table against another

I have two Excel tables- the first one is the data table and the second one is a look up table. Here is how they are structured-
Data Table
+----------+-------------+----------+----------+
| Category | Subcategory | Division | Business |
+----------+-------------+----------+----------+
| A | Red | Home | Q |
| B | Blue | Office | R |
| C | Green | City | S |
| D | Yellow | State | T |
| D | Red | State | T |
| D | Green | Office | Q |
+----------+-------------+----------+----------+
Lookup Table Lookup Table
+----------+-------------+----------+----------+--------------+
| Category | Subcategory | Division | Business | LookUp Value |
+----------+-------------+----------+----------+--------------+
| 0 | 0 | 0 | Q | ABC |
| B | 0 | Office | 0 | DEF |
| C | Green | 0 | 0 | MNO |
| D | 0 | State | T | RST |
+----------+-------------+----------+----------+--------------+
So I want to add the lookup value column to the data table based on the criteria given in the lookup table. Eg, for the first row in the lookup table, I dont want to lookup on Category, Subcategory, or Division. but if the Business is Q, then I want to populate the lookup value as ABC. Similarly, for the second row I dont want to consider the Subcategory. and Business. but if the Category. is "B" and Division is "Office", I want it to populate DEF. So the result should look like this-
[Final Resulting Data Table]
+----------+-------------+----------+----------+--------------+
| Category | Subcategory | Division | Business | LookUp Value |
+----------+-------------+----------+----------+--------------+
| A | Red | Home | Q | ABC |
| B | Blue | Office | R | DEF |
| C | Green | City | S | MNO |
| D | Yellow | State | T | RST |
| D | Red | State | T | RST |
| D | Green | Office | Q | ABC |
+----------+-------------+----------+----------+--------------+
I am very new to SQL and the actual data set is very complex wih multiple lookup values based on different criteria. IF you think any other scripting language would work better, I am open to that too. My data is in Excel currently
If the data is so complex, you should first consider if you want to put it in a (relational) database (like MS Access, MySQL, etc.) instead of in a spreadsheet (like MS Excel).
Both kind of programs are used for structured data handling, but databases focus primarily on efficient data storage and data integrity (including guarding type safety, required fields, unique fields, required references between various datasets/tables, etc.) and spreadsheets focus primarily on data analysis and calculations.
Relational databases support Structured Query Language (SQL) to let clients query their data. Spreadsheets normally do not use or support SQL (as far as I know).
It is possible to let MS Excel import or reference data in an external data source (like a relational database) to perform analysis and calculations on it.
The other way around is (sometimes) possible too: to link to spreadsheet worksheets as external tables inside a relational database system to - within certain limits - allow that data to be queried using SQL. But using a database to store the data and a spreadsheet (as a database client) to perform analysis on the data in the database would be a more logical design in my opinion.
However, creating such an integrated solution using multiple MS Office applications and/or external databases can be a complex challenge, especially when you are just starting to learn about them.
To be honest, I am not experienced with designing MS Office based solutions, so I cannot guide you around any pitfalls. I do hope, that this answer helps you a little with finding the right way to go here...

Hide Hierachy duplication in Powerpivot (Row Labels)

I am reporting on performance of legal cases, from a SQL database of activities. I have a main table of cases, which has a parent/child hierarchy. I am looking for a way to appropriately report on case performance, reporting only once for a parent/child group (`Family').
An example of relevant tables is:
Cases
ID | Client | MatterName | ClaimAmount | ParentID | NumberOfChildren |
1 | Mr. Smith | ABC Ltd | $40,000 | 0 | 2 |
2 | Mr. Smith | Jakob R | $40,000 | 1 | 0 |
3 | Mr. Smith | Jenny R | $40,000 | 1 | 0 |
4 | Mrs Bow | JQ Public | $7,000 | 0 | 0 |
Payments
ID | MatterID | DateReceived | Amount |
1 | 1 | 14/7/15 | $50 |
2 | 3 | 21/7/15 | $100 |
I'd like to be able to report back on a consolidated view that only shows the parent matter, with total received (and a lot of other similar related fact tables) - e.g.
Client | MatterName | ClaimAmount | TotalReceived |
Mr Smith | ABC Ltd | $40,000 | $150 |
Mrs Bow | JQ Public | $7,000 | $0 |
A key problem I'm having is hiding row labels for irrelevant rows (child matters). I believe I need to
Determine whether the current row is a parent group
Consolidate all measures for that parent group
Filter on that being True? Place all measures inside IF checks?
Any help appreciated
How many levels does your hierarchy have? If it's just 2 levels (parents have children, children cannot be parents), then denormalize your model. You can add a single column for ParentMatterName and use that as the rowfilter in pivots. If there is a reasonable maximum number of levels in your data (we typically look at <=6 as reasonable) then denormalization is probably preferable, and certainly simpler/more performant, than trying to dynamically roll up the child measure values.
Edits to address comment below:
Denormalizing your data structure in this case just means going to the following table structure:
Cases
ID | Client | ParentMatterName | MatterName | ClaimAmount
1 | Mr. Smith | ABC Ltd | ABC Ltd | $40,000
2 | Mr. Smith | Jakob R | ABC Ltd | $0
3 | Mr. Smith | Jenny R | ABC Ltd | $0
4 | Mrs Bow | JQ Public | JQ Public | $7,000
Regarding nomenclature - Excel is stupid, and so is DAX. Here is the way to think about these things to help minimize confusion - these are important concepts as you move forward in more complex DAX measures and queries.
Here are some absolutely truthful and accurate statements to show how stupid the nomenclature can get:
FILTER() is a table
Pivot table rows are filter context
FILTER() applies additional filter context when used as an argument to CALCULATE()
FILTER() creates row context internally which to evaluate expressions
FILTER()'s arguments are affected by filter context from pivot table rows
FILTER()'s second argument evaluates an expression evaluated in the pivot table's rowfilter context in the row context of each row in the table in the first argument
And so on. Don't think of a pivot table as anything but filters. You have filters, slicers, rowfilters, columnfilters. Everything in a pivot table is filter context.
Links:
Denormalization in Power Pivot
Denormalizing Dimensions

Retrieve comma delimited data from a field

I've created a form in PHP that collects basic information. I have a list box that allows multiple items selected (i.e. Housing, rent, food, water). If multiple items are selected they are stored in a field called Needs separated by a comma.
I have created a report ordered by the persons needs. The people who only have one need are sorted correctly, but the people who have multiple are sorted exactly as the string passed to the database (i.e. housing, rent, food, water) --> which is not what I want.
Is there a way to separate the multiple values in this field using SQL to count each need instance/occurrence as 1 so that there are no comma delimitations shown in the results?
Your database is not in the first normal form. A non-normalized database will be very problematic to use and to query, as you are actually experiencing.
In general, you should be using at least the following structure. It can still be normalized further, but I hope this gets you going in the right direction:
CREATE TABLE users (
user_id int,
name varchar(100)
);
CREATE TABLE users_needs (
need varchar(100),
user_id int
);
Then you should store the data as follows:
-- TABLE: users
+---------+-------+
| user_id | name |
+---------+-------+
| 1 | joe |
| 2 | peter |
| 3 | steve |
| 4 | clint |
+---------+-------+
-- TABLE: users_needs
+---------+----------+
| need | user_id |
+---------+----------+
| housing | 1 |
| water | 1 |
| food | 1 |
| housing | 2 |
| rent | 2 |
| water | 2 |
| housing | 3 |
+---------+----------+
Note how the users_needs table is defining the relationship between one user and one or many needs (or none at all, as for user number 4.)
To normalise your database further, you should also use another table called needs, and as follows:
-- TABLE: needs
+---------+---------+
| need_id | name |
+---------+---------+
| 1 | housing |
| 2 | water |
| 3 | food |
| 4 | rent |
+---------+---------+
Then the users_needs table should just refer to a candidate key of the needs table instead of repeating the text.
-- TABLE: users_needs (instead of the previous one)
+---------+----------+
| need_id | user_id |
+---------+----------+
| 1 | 1 |
| 2 | 1 |
| 3 | 1 |
| 1 | 2 |
| 4 | 2 |
| 2 | 2 |
| 1 | 3 |
+---------+----------+
You may also be interested in checking out the following Wikipedia article for further reading about repeating values inside columns:
Wikipedia: First normal form - Repeating groups within columns
UPDATE:
To fully answer your question, if you follow the above guidelines, sorting, counting and aggregating the data should then become straight-forward.
To sort the result-set by needs, you would be able to do the following:
SELECT users.name, needs.name
FROM users
INNER JOIN needs ON (needs.user_id = users.user_id)
ORDER BY needs.name;
You would also be able to count how many needs each user has selected, for example:
SELECT users.name, COUNT(needs.need) as number_of_needs
FROM users
LEFT JOIN needs ON (needs.user_id = users.user_id)
GROUP BY users.user_id, users.name
ORDER BY number_of_needs;
I'm a little confused by the goal. Is this a UI problem or are you just having trouble determining who has multiple needs?
The number of needs is the difference:
Len([Needs]) - Len(Replace([Needs],',','')) + 1
Can you provide more information about the Sort you're trying to accomplish?
UPDATE:
I think these Oracle-based posts may have what you're looking for: post and post. The only difference is that you would probably be better off using the method I list above to find the number of comma-delimited pieces rather than doing the translate(...) that the author suggests. Hope this helps - it's Oracle-based, but I don't see .