I am reporting on performance of legal cases, from a SQL database of activities. I have a main table of cases, which has a parent/child hierarchy. I am looking for a way to appropriately report on case performance, reporting only once for a parent/child group (`Family').
An example of relevant tables is:
Cases
ID | Client | MatterName | ClaimAmount | ParentID | NumberOfChildren |
1 | Mr. Smith | ABC Ltd | $40,000 | 0 | 2 |
2 | Mr. Smith | Jakob R | $40,000 | 1 | 0 |
3 | Mr. Smith | Jenny R | $40,000 | 1 | 0 |
4 | Mrs Bow | JQ Public | $7,000 | 0 | 0 |
Payments
ID | MatterID | DateReceived | Amount |
1 | 1 | 14/7/15 | $50 |
2 | 3 | 21/7/15 | $100 |
I'd like to be able to report back on a consolidated view that only shows the parent matter, with total received (and a lot of other similar related fact tables) - e.g.
Client | MatterName | ClaimAmount | TotalReceived |
Mr Smith | ABC Ltd | $40,000 | $150 |
Mrs Bow | JQ Public | $7,000 | $0 |
A key problem I'm having is hiding row labels for irrelevant rows (child matters). I believe I need to
Determine whether the current row is a parent group
Consolidate all measures for that parent group
Filter on that being True? Place all measures inside IF checks?
Any help appreciated
How many levels does your hierarchy have? If it's just 2 levels (parents have children, children cannot be parents), then denormalize your model. You can add a single column for ParentMatterName and use that as the rowfilter in pivots. If there is a reasonable maximum number of levels in your data (we typically look at <=6 as reasonable) then denormalization is probably preferable, and certainly simpler/more performant, than trying to dynamically roll up the child measure values.
Edits to address comment below:
Denormalizing your data structure in this case just means going to the following table structure:
Cases
ID | Client | ParentMatterName | MatterName | ClaimAmount
1 | Mr. Smith | ABC Ltd | ABC Ltd | $40,000
2 | Mr. Smith | Jakob R | ABC Ltd | $0
3 | Mr. Smith | Jenny R | ABC Ltd | $0
4 | Mrs Bow | JQ Public | JQ Public | $7,000
Regarding nomenclature - Excel is stupid, and so is DAX. Here is the way to think about these things to help minimize confusion - these are important concepts as you move forward in more complex DAX measures and queries.
Here are some absolutely truthful and accurate statements to show how stupid the nomenclature can get:
FILTER() is a table
Pivot table rows are filter context
FILTER() applies additional filter context when used as an argument to CALCULATE()
FILTER() creates row context internally which to evaluate expressions
FILTER()'s arguments are affected by filter context from pivot table rows
FILTER()'s second argument evaluates an expression evaluated in the pivot table's rowfilter context in the row context of each row in the table in the first argument
And so on. Don't think of a pivot table as anything but filters. You have filters, slicers, rowfilters, columnfilters. Everything in a pivot table is filter context.
Links:
Denormalization in Power Pivot
Denormalizing Dimensions
Related
I have tables service and material and service_material.
The table service cointains warranty_dates that stores multiple values for all materials used on the service. The warranty length is in table material, and the warranty_dates in table service is calculated with service_date plus warranty lengths from table materials.
I'm trying to create a query in Access that will show me only the warranty dates for materials used on a service, so only the values that match the serviceid in service_material table.
Servis = Service, Datum = date, Garancija = warranty, Ime = name
SELECT DISTINCT Servis.Datum+Material.Garancija AS garancijski_rok, Servis.Datum, Material.Ime, Servis_Material.ServisID
FROM Servis
INNER JOIN (Material
INNER JOIN Servis_Material ON Material.MaterialID = Servis_Material.MaterialID) ON Servis.ServisID = Servis_Material.ServisID
WHERE (((Servis_Material.ServisID)=[Servis].[ServisID]) AND ((Servis_Material.materialid)=[material].[materialid]))
ORDER BY Servis_Material.ServisID;
This is what I have so far, but this query shows me all warranty dates for all materials used in services. I just want the query to show the matching materials that were used in service.
I'm a total beginner and I'm using Excel because it's a school project. Sorry cause it's in Slovenian. Hopefully it is still understandable.
This is the result atm. If it's possible I want it to only show materials for that particular service I am adding the warranty dates in.
Assuming you are working with access and not Excel and assuming the problem is mostly setting up the ManytoMany Relationship. Here is a Minimal Reproducible example starting with the Normalized Tables:
-------------------------------------------
| ServiceID | ServiceDate |
-------------------------------------------
| 1 | 12/12/2022 |
-------------------------------------------
| 2 | 12/5/2022 |
-------------------------------------------
----------------------------------------------------------------
| MaterialID | MaterialName | WarantyLength |
----------------------------------------------------------------
| 1 | PartA | 10 |
----------------------------------------------------------------
| 2 | PartB | 200 |
----------------------------------------------------------------
| 3 | PartC | 300 |
----------------------------------------------------------------
----------------------------------------------------------------
| MaterialServiceID | MaterialID | ServiceID |
----------------------------------------------------------------
| 1 | 1 | 1 |
----------------------------------------------------------------
| 2 | 2 | 1 |
----------------------------------------------------------------
| 3 | 3 | 1 |
----------------------------------------------------------------
| 4 | 1 | 2 |
----------------------------------------------------------------
| 5 | 2 | 2 |
----------------------------------------------------------------
Regarding Table Normalization, note how ServiceDate is the only thing hanging on Services and What would be more clearly named as MaterialWarantyLength is hanging on Materials. What would best be called LastDayofMaterialWaranty would be under MaterialsServices but we have a formula so we calculate that as needed.
One way of thinking about the next step is: Database users interact with the database through forms and reports. Among other things this protects the database from Users entering bad data and Protects the User from having to understand those normalized tables.
aside: the default forms are not good at protecting the database from bad data but they are a jump start.
So, the next step will be a query that puts the data for this relationship back together into one big table that the Database designer can understand. I find myself adjusting for the user when I design the User interface of forms and reports. Here I put everything into the designers query which is a good default as what you don't need for a particular form you just don't show.
'Warranty is a calculated Field.
'Note the use of DateAdd to handle all those calendar problems
'normally I extract the code for calculated fields into public functions for an example of that see here: https://stackoverflow.com/questions/74621501/ms-access-group-by-and-sum-if-all-values-are-available-query/74673498#74673498
Warranty: DateAdd("d",[Materials].[WarantyLength],[Services].[ServiceDate])
'here is the resulting sql for the designer query
'if you are following along try pasting the sql into the sql pane and then going to differnt tabs of the query designer
SELECT Services.ServiceID, Services.ServiceDate, MaterialsServices.MaterialID, Materials.WarantyLength, Materials.MaterialName
FROM Services INNER JOIN (Materials INNER JOIN MaterialsServices ON Materials.MaterialID = MaterialsServices.MaterialID) ON Services.ServiceID = MaterialsServices.ServiceID;
Running the query on the sample data gives:
-------------------------------------------------------------------------------------------------------------------
| ServiceID | ServiceDate | MaterialID | WarantyLength | MaterialName |
-------------------------------------------------------------------------------------------------------------------
| 1 | 12/12/2022 | 1 | 10 | PartA |
-------------------------------------------------------------------------------------------------------------------
| 1 | 12/12/2022 | 2 | 200 | PartB |
-------------------------------------------------------------------------------------------------------------------
| 1 | 12/12/2022 | 3 | 300 | PartC |
-------------------------------------------------------------------------------------------------------------------
| 2 | 12/5/2022 | 1 | 10 | PartA |
-------------------------------------------------------------------------------------------------------------------
| 2 | 12/5/2022 | 2 | 200 | PartB |
-------------------------------------------------------------------------------------------------------------------
At this point it is down to how you want the user to interact with the data. For instance, for a jump start I selected the query and used the create-form wizard to get a tabular style form and then added an unbound combobox to the header to allow the end user to filter the resulting form.
See here for an example: https://www.youtube.com/watch?v=uq3cgaHF6fc
The Final Form allows the end user to switch between viewing different services :
Private Sub cmbService_AfterUpdate()
Me.Filter = "ServiceID = " & Me.cmdService
Me.FilterOn = True
End Sub
I have two Excel tables- the first one is the data table and the second one is a look up table. Here is how they are structured-
Data Table
+----------+-------------+----------+----------+
| Category | Subcategory | Division | Business |
+----------+-------------+----------+----------+
| A | Red | Home | Q |
| B | Blue | Office | R |
| C | Green | City | S |
| D | Yellow | State | T |
| D | Red | State | T |
| D | Green | Office | Q |
+----------+-------------+----------+----------+
Lookup Table Lookup Table
+----------+-------------+----------+----------+--------------+
| Category | Subcategory | Division | Business | LookUp Value |
+----------+-------------+----------+----------+--------------+
| 0 | 0 | 0 | Q | ABC |
| B | 0 | Office | 0 | DEF |
| C | Green | 0 | 0 | MNO |
| D | 0 | State | T | RST |
+----------+-------------+----------+----------+--------------+
So I want to add the lookup value column to the data table based on the criteria given in the lookup table. Eg, for the first row in the lookup table, I dont want to lookup on Category, Subcategory, or Division. but if the Business is Q, then I want to populate the lookup value as ABC. Similarly, for the second row I dont want to consider the Subcategory. and Business. but if the Category. is "B" and Division is "Office", I want it to populate DEF. So the result should look like this-
[Final Resulting Data Table]
+----------+-------------+----------+----------+--------------+
| Category | Subcategory | Division | Business | LookUp Value |
+----------+-------------+----------+----------+--------------+
| A | Red | Home | Q | ABC |
| B | Blue | Office | R | DEF |
| C | Green | City | S | MNO |
| D | Yellow | State | T | RST |
| D | Red | State | T | RST |
| D | Green | Office | Q | ABC |
+----------+-------------+----------+----------+--------------+
I am very new to SQL and the actual data set is very complex wih multiple lookup values based on different criteria. IF you think any other scripting language would work better, I am open to that too. My data is in Excel currently
If the data is so complex, you should first consider if you want to put it in a (relational) database (like MS Access, MySQL, etc.) instead of in a spreadsheet (like MS Excel).
Both kind of programs are used for structured data handling, but databases focus primarily on efficient data storage and data integrity (including guarding type safety, required fields, unique fields, required references between various datasets/tables, etc.) and spreadsheets focus primarily on data analysis and calculations.
Relational databases support Structured Query Language (SQL) to let clients query their data. Spreadsheets normally do not use or support SQL (as far as I know).
It is possible to let MS Excel import or reference data in an external data source (like a relational database) to perform analysis and calculations on it.
The other way around is (sometimes) possible too: to link to spreadsheet worksheets as external tables inside a relational database system to - within certain limits - allow that data to be queried using SQL. But using a database to store the data and a spreadsheet (as a database client) to perform analysis on the data in the database would be a more logical design in my opinion.
However, creating such an integrated solution using multiple MS Office applications and/or external databases can be a complex challenge, especially when you are just starting to learn about them.
To be honest, I am not experienced with designing MS Office based solutions, so I cannot guide you around any pitfalls. I do hope, that this answer helps you a little with finding the right way to go here...
I have been meddling with Power BI for almost a week now.
It seems like a powerful tool, when you get to know your way around it at least..
I would like to be able to see the sum of Therapists, Admins and Citizens, based on all subgroups for the currently selected group.
Here is my example:
When i select a Group (resembling a customer group) in the Drill Down Donut Chart, i want so see admins, therapists, and citizen count for all subgroups in the selected group, shown in the Clustered Column Chart. However I only get the users which are in the selected group, and not the users in sub groups.
I have created measures for Admins, Therapists & Citizens to get the count based on TemplateLevel (which is resembling the role of the user:
All measures are written in the same fashion, using different TemplateLevel(s).
Here is the three measures used in the Column Chart:
In my DataSet, i have the table UserGroup:
IdPath and NumLevels is an attempt to use parent-child reference, which i did not get to work properly, so dont mind that.
I expected that Power BI's interactive system would be able to handle Parent/Children references, as is the case with UserGroup[Id] and UserGroup[UserGroupParentId]. My initial thoughts was to just add GroupName as Category for each level of SubGroup available (Owner -> Customer -> Therapist -> Citizen).
Owner group id is 27 and always will be so that's why the drill down donut chart is filtering groups with no such parent it, to show the customer groups.
The DataSet for the report is from a test database migrated to an Azure SQL Server.
Any suggestions are given the warmest welcome!
Kindly Regards
Kalrin
Power BI (or more precisely, the Tabular Model underlying Power BI) does not support parent/child relationships. You have to transform/flatten the hierarchy to construct a table that holds the columns of all the levels of the hierarchy:
| Id | Owner | Customer | Therapist | Citizen | Group |
| ----- | --------- | --------- | --------- | -------- | -------- |
| 1 | ownerX | | | | 1 |
| 2 | ownerX | cust1 | | | 1 |
| 3 | ownerX | cust1 | tpA | | 1 |
| 4 | ownerX | cust1 | tpA | cit100 | 1 |
| 5 | ownerX | cust1 | tpA | cit101 | 1 |
| 6 | ownerX | cust1 | tpB | | 1 |
The above is a flattened hierarchy which is also ragged (you can have parent items with no child items).
This pattern describes how we can use DAX to construct a flattened hierarchy, but typically it is a best practice to flatten your data on the database side, before loading the table into Power BI (this can be done using recursive CTEs in SQL).
I’ve been working on a Windows Form App using vb.net that retrieves information from a SQL database. One of the forms, frmContract, queries several tables, such as Addresses, and displays them in various controls, such as Labels and DataGridViews. Every year, the customer’s file is either renewed or expired, and I’m just now realizing that a change committed to any record today will affect the information displayed for the customer in the past. For example, if we update a customer’s mailing address today, this new address will show up in all previous customer profiles. What is the smartest way to avoid this problem without creating separate rows in each table with the same information? Or to put it another way, how can versions of a customer’s profile be preserved?
Another example would be a table that stores customer’s vehicles.
VehicleID | Year | Make | Model | VIN | Body
---------------------------------------------------------------
1 | 2005 | Ford | F150 | 11111111111111111 | Pickup
2 | 2001 | Niss | Sentra | 22222222222222222 | Sedan
3 | 2004 | Intl | 4700 | 33333333333333333 | Car Carrier
If today vehicle 1 is changed from a standard pickup to a flatbed, then if I load the customer contract from 2016 it will also show as flatbed even though back then it was a pickup truck.
I have a table for storing individual clients.
ClientID | First | Last | DOB
---------|----------|-----------|------------
1 | John | Doe | 01/01/1980
2 | Mickey | Mouse | 11/18/1928
3 | Eric | Forman | 03/05/1960
I have another table to store yearly contracts.
ContractID | ContractNo | EffectiveDate | ExpirationDate | ClientID (foreign key)
-----------|------------|---------------|-------------------|-----------
1 | 13579 | 06/15/2013 | 06/15/2014 | 1
2 | 13579 | 06/15/2014 | 06/15/2015 | 1
3 | 24680 | 10/05/2016 | 10/05/2017 | 3
Notice that the contract number can remain the same across different periods. In addition, because the same vehicle can be related to multiple contracts, I use a bridge table to relate individual vehicles to different contracts.
Id | VehicleID | ContractID <-- both foreign keys
---|-----------|------------
1 | 1 | 1
2 | 3 | 1
3 | 1 | 2
4 | 3 | 2
5 | 2 | 3
6 | 2 | 2
When frmContract is loaded, it queries the database and displays information about that particular contract year. However, if Vehicle 1 is changed from pickup to flatbed right now, then all the previous contract years will also show it as a flatbed.
I hope this illustrates my predicament. Any guidance will be appreaciated.
Some DB systems have built-in temporal features so you can keep audit history of rows. Check to see if your DB has built-in support for this.
My task is to combine two tables in a specific way. I have a table Demands that contains demands of some goods (tovar). Each record has its own ID, Tovar, Date of demand and Amount. And I have another table Unloads that contains unloads of tovar. Each record has its own ID, Tovar, Order of unload and Amount. Demands and Unloads are not corresponding to each other and amounts in demands and unloads are not exactly equal. One demand may be with 10 units and there can be two unloads with 4 and 6 units. And two demands may be with 3 and 5 units and there can be one unload with 11 units.
The task is to get a table which will show how demands are covering by unloads. I have a solution (SQL Fiddle) but I think that there is a better one. Can anybody tell me how such tasks are solved?
What I have:
------------------------------------------
| DemandNumber | Tovar | Amount | Order |
|--------------------------------|--------
| Demand#1 | Meat | 2 | 1 |
| Demand#2 | Meat | 3 | 2 |
| Demand#3 | Milk | 6 | 1 |
| Demand#4 | Eggs | 1 | 1 |
| Demand#5 | Eggs | 5 | 2 |
| Demand#6 | Eggs | 3 | 3 |
------------------------------------------
------------------------------------------
| SaleNumber | Tovar | Amount | Order |
|--------------------------------|--------
| Sale#1 | Meat | 6 | 1 |
| Sale#2 | Milk | 2 | 1 |
| Sale#3 | Milk | 1 | 2 |
| Sale#4 | Eggs | 2 | 1 |
| Sale#5 | Eggs | 1 | 2 |
| Sale#6 | Eggs | 4 | 3 |
------------------------------------------
What I want to receive
-------------------------------------------------
| DemandNumber | SaleNumber | Tovar | Amount |
-------------------------------------------------
| Demand#1 | Sale#1 | Meat | 2 |
| Demand#2 | Sale#1 | Meat | 3 |
| Demand#3 | Sale#2 | Milk | 2 |
| Demand#3 | Sale#3 | Milk | 1 |
| Demand#4 | Sale#4 | Eggs | 1 |
| Demand#5 | Sale#4 | Eggs | 1 |
| Demand#5 | Sale#5 | Eggs | 1 |
| Demand#5 | Sale#6 | Eggs | 3 |
| Demand#6 | Sale#6 | Eggs | 1 |
-------------------------------------------------
Here is additional explanation from author's comment:
Demand#1 needs 2 Meat and it can take them from Sale#1.
Demand#2 needs 3 Meat and can take them from Sale#1.
Demand#3 needs 6 Milk but there is only 2 Milk in Sale#3 and 1 Milk in Sale#4, so we show only available amounts.
And so on.
The field Order in the example determine the order of calculations. We have to process Demands according to their Order. Demand#1 must be processed before Demand#2. And Sales also must be allocated according to their Order number. We cannot assign eggs from sale if there are sales with eggs with lower order and non-allocated eggs.
The only way I can get this is using loops. Is it posible to avoid loops and solve this task only with t-sql?
If the Amount values are int and not too large (not millions), then I'd use a table of numbers to generate as many rows as the value of each Amount.
Here is a good article describing how to generate it.
Then it is easy to join Demand with Sale and group and sum as needed.
Otherwise, a plain straight-forward cursor (in fact, two cursors) would be simple to implement, easy to understand and with O(n) complexity. If Amounts are small, set-based variant is likely to be faster than cursor. If Amounts are large, cursor may be faster. You need to measure performance with actual data.
Here is a query that uses a table of numbers. To understand how it works run each query in the CTE separately and examine its output.
SQLFiddle
WITH
CTE_Demands
AS
(
SELECT
D.DemandNumber
,D.Tovar
,ROW_NUMBER() OVER (PARTITION BY D.Tovar ORDER BY D.SortOrder, CA_D.Number) AS rn
FROM
Demands AS D
CROSS APPLY
(
SELECT TOP(D.Amount) Numbers.Number
FROM Numbers
ORDER BY Numbers.Number
) AS CA_D
)
,CTE_Sales
AS
(
SELECT
S.SaleNumber
,S.Tovar
,ROW_NUMBER() OVER (PARTITION BY S.Tovar ORDER BY S.SortOrder, CA_S.Number) AS rn
FROM
Sales AS S
CROSS APPLY
(
SELECT TOP(S.Amount) Numbers.Number
FROM Numbers
ORDER BY Numbers.Number
) AS CA_S
)
SELECT
CTE_Demands.DemandNumber
,CTE_Sales.SaleNumber
,CTE_Demands.Tovar
,COUNT(*) AS Amount
FROM
CTE_Demands
INNER JOIN CTE_Sales ON
CTE_Sales.Tovar = CTE_Demands.Tovar
AND CTE_Sales.rn = CTE_Demands.rn
GROUP BY
CTE_Demands.Tovar
,CTE_Demands.DemandNumber
,CTE_Sales.SaleNumber
ORDER BY
CTE_Demands.DemandNumber
,CTE_Sales.SaleNumber
;
Having said all this, usually it is better to perform this kind of processing on the client using procedural programming language. You still have to transmit all rows from Demands and Sales to the client. So, by joining the tables on the server you don't reduce the amount of bytes that must go over the network. In fact, you increase it, because original row may be split into several rows.
This kind of processing is sequential in nature, not set-based, so it is easy to do with arrays, but tricky in SQL.
I have no idea what your requirements are or what the business rules are or what the goals are but I can say this -- you are doing it wrong.
This is SQL. In SQL you do not do loops. In SQL you work with sets. Sets are defined by select statements.
If this problem is not resolved with a select statement (maybe with sub-selects) then you probably want to implement this in another way. (C# program? Some other ETL system?).
However, I can also say there is probably a way to do this with a single select statement. However you have not given enough information for me to know what that statement is. To say you have a working example and that should be enough fails on this site because this site is about answering questions about problems and you don't have a problem you have some code.
Re-phrase the question with inputs, expect outputs, what you have tried and what your question is. This is covered well in the FAQ.
Or if you have working code you want reviewed, it may be appropriate for the code review site.
I see additional 2 possible ways:
1. for 'advanced' data processing and calculations you can use cursors.
2. you can use SELECT with CASE construction