SQL Query to look up one table against another - sql

I have two Excel tables- the first one is the data table and the second one is a look up table. Here is how they are structured-
Data Table
+----------+-------------+----------+----------+
| Category | Subcategory | Division | Business |
+----------+-------------+----------+----------+
| A | Red | Home | Q |
| B | Blue | Office | R |
| C | Green | City | S |
| D | Yellow | State | T |
| D | Red | State | T |
| D | Green | Office | Q |
+----------+-------------+----------+----------+
Lookup Table Lookup Table
+----------+-------------+----------+----------+--------------+
| Category | Subcategory | Division | Business | LookUp Value |
+----------+-------------+----------+----------+--------------+
| 0 | 0 | 0 | Q | ABC |
| B | 0 | Office | 0 | DEF |
| C | Green | 0 | 0 | MNO |
| D | 0 | State | T | RST |
+----------+-------------+----------+----------+--------------+
So I want to add the lookup value column to the data table based on the criteria given in the lookup table. Eg, for the first row in the lookup table, I dont want to lookup on Category, Subcategory, or Division. but if the Business is Q, then I want to populate the lookup value as ABC. Similarly, for the second row I dont want to consider the Subcategory. and Business. but if the Category. is "B" and Division is "Office", I want it to populate DEF. So the result should look like this-
[Final Resulting Data Table]
+----------+-------------+----------+----------+--------------+
| Category | Subcategory | Division | Business | LookUp Value |
+----------+-------------+----------+----------+--------------+
| A | Red | Home | Q | ABC |
| B | Blue | Office | R | DEF |
| C | Green | City | S | MNO |
| D | Yellow | State | T | RST |
| D | Red | State | T | RST |
| D | Green | Office | Q | ABC |
+----------+-------------+----------+----------+--------------+
I am very new to SQL and the actual data set is very complex wih multiple lookup values based on different criteria. IF you think any other scripting language would work better, I am open to that too. My data is in Excel currently

If the data is so complex, you should first consider if you want to put it in a (relational) database (like MS Access, MySQL, etc.) instead of in a spreadsheet (like MS Excel).
Both kind of programs are used for structured data handling, but databases focus primarily on efficient data storage and data integrity (including guarding type safety, required fields, unique fields, required references between various datasets/tables, etc.) and spreadsheets focus primarily on data analysis and calculations.
Relational databases support Structured Query Language (SQL) to let clients query their data. Spreadsheets normally do not use or support SQL (as far as I know).
It is possible to let MS Excel import or reference data in an external data source (like a relational database) to perform analysis and calculations on it.
The other way around is (sometimes) possible too: to link to spreadsheet worksheets as external tables inside a relational database system to - within certain limits - allow that data to be queried using SQL. But using a database to store the data and a spreadsheet (as a database client) to perform analysis on the data in the database would be a more logical design in my opinion.
However, creating such an integrated solution using multiple MS Office applications and/or external databases can be a complex challenge, especially when you are just starting to learn about them.
To be honest, I am not experienced with designing MS Office based solutions, so I cannot guide you around any pitfalls. I do hope, that this answer helps you a little with finding the right way to go here...

Related

Only show matching values from junction table

I have tables service and material and service_material.
The table service cointains warranty_dates that stores multiple values for all materials used on the service. The warranty length is in table material, and the warranty_dates in table service is calculated with service_date plus warranty lengths from table materials.
I'm trying to create a query in Access that will show me only the warranty dates for materials used on a service, so only the values that match the serviceid in service_material table.
Servis = Service, Datum = date, Garancija = warranty, Ime = name
SELECT DISTINCT Servis.Datum+Material.Garancija AS garancijski_rok, Servis.Datum, Material.Ime, Servis_Material.ServisID
FROM Servis
INNER JOIN (Material
INNER JOIN Servis_Material ON Material.MaterialID = Servis_Material.MaterialID) ON Servis.ServisID = Servis_Material.ServisID
WHERE (((Servis_Material.ServisID)=[Servis].[ServisID]) AND ((Servis_Material.materialid)=[material].[materialid]))
ORDER BY Servis_Material.ServisID;
This is what I have so far, but this query shows me all warranty dates for all materials used in services. I just want the query to show the matching materials that were used in service.
I'm a total beginner and I'm using Excel because it's a school project. Sorry cause it's in Slovenian. Hopefully it is still understandable.
This is the result atm. If it's possible I want it to only show materials for that particular service I am adding the warranty dates in.
Assuming you are working with access and not Excel and assuming the problem is mostly setting up the ManytoMany Relationship. Here is a Minimal Reproducible example starting with the Normalized Tables:
-------------------------------------------
| ServiceID | ServiceDate |
-------------------------------------------
| 1 | 12/12/2022 |
-------------------------------------------
| 2 | 12/5/2022 |
-------------------------------------------
----------------------------------------------------------------
| MaterialID | MaterialName | WarantyLength |
----------------------------------------------------------------
| 1 | PartA | 10 |
----------------------------------------------------------------
| 2 | PartB | 200 |
----------------------------------------------------------------
| 3 | PartC | 300 |
----------------------------------------------------------------
----------------------------------------------------------------
| MaterialServiceID | MaterialID | ServiceID |
----------------------------------------------------------------
| 1 | 1 | 1 |
----------------------------------------------------------------
| 2 | 2 | 1 |
----------------------------------------------------------------
| 3 | 3 | 1 |
----------------------------------------------------------------
| 4 | 1 | 2 |
----------------------------------------------------------------
| 5 | 2 | 2 |
----------------------------------------------------------------
Regarding Table Normalization, note how ServiceDate is the only thing hanging on Services and What would be more clearly named as MaterialWarantyLength is hanging on Materials. What would best be called LastDayofMaterialWaranty would be under MaterialsServices but we have a formula so we calculate that as needed.
One way of thinking about the next step is: Database users interact with the database through forms and reports. Among other things this protects the database from Users entering bad data and Protects the User from having to understand those normalized tables.
aside: the default forms are not good at protecting the database from bad data but they are a jump start.
So, the next step will be a query that puts the data for this relationship back together into one big table that the Database designer can understand. I find myself adjusting for the user when I design the User interface of forms and reports. Here I put everything into the designers query which is a good default as what you don't need for a particular form you just don't show.
'Warranty is a calculated Field.
'Note the use of DateAdd to handle all those calendar problems
'normally I extract the code for calculated fields into public functions for an example of that see here: https://stackoverflow.com/questions/74621501/ms-access-group-by-and-sum-if-all-values-are-available-query/74673498#74673498
Warranty: DateAdd("d",[Materials].[WarantyLength],[Services].[ServiceDate])
'here is the resulting sql for the designer query
'if you are following along try pasting the sql into the sql pane and then going to differnt tabs of the query designer
SELECT Services.ServiceID, Services.ServiceDate, MaterialsServices.MaterialID, Materials.WarantyLength, Materials.MaterialName
FROM Services INNER JOIN (Materials INNER JOIN MaterialsServices ON Materials.MaterialID = MaterialsServices.MaterialID) ON Services.ServiceID = MaterialsServices.ServiceID;
Running the query on the sample data gives:
-------------------------------------------------------------------------------------------------------------------
| ServiceID | ServiceDate | MaterialID | WarantyLength | MaterialName |
-------------------------------------------------------------------------------------------------------------------
| 1 | 12/12/2022 | 1 | 10 | PartA |
-------------------------------------------------------------------------------------------------------------------
| 1 | 12/12/2022 | 2 | 200 | PartB |
-------------------------------------------------------------------------------------------------------------------
| 1 | 12/12/2022 | 3 | 300 | PartC |
-------------------------------------------------------------------------------------------------------------------
| 2 | 12/5/2022 | 1 | 10 | PartA |
-------------------------------------------------------------------------------------------------------------------
| 2 | 12/5/2022 | 2 | 200 | PartB |
-------------------------------------------------------------------------------------------------------------------
At this point it is down to how you want the user to interact with the data. For instance, for a jump start I selected the query and used the create-form wizard to get a tabular style form and then added an unbound combobox to the header to allow the end user to filter the resulting form.
See here for an example: https://www.youtube.com/watch?v=uq3cgaHF6fc
The Final Form allows the end user to switch between viewing different services :
Private Sub cmbService_AfterUpdate()
Me.Filter = "ServiceID = " & Me.cmdService
Me.FilterOn = True
End Sub

Access text count in query design

I am new to Access and am trying to develop a query that will allow me to count the number of occurrences of one word in each field from a table with 15 fields.
The table simply stores test results for employees. There is one table that stores the employee identification - id, name, etc.
The second table has 15 fields - A1 through A15 with the words correct or incorrect in each field. I need the total number of incorrect occurrences for each field, not for the entire table.
Is there an answer through Query Design, or is code required?
The solution, whether Query Design, or code, would be greatly appreciated!
Firstly, one of the reasons that you are struggling to obtain the desired result for what should be a relatively straightforward request is because your data does not follow database normalisation rules, and consequently, you are working against the natural operation of a RDBMS when querying your data.
From your description, I assume that the fields A1 through A15 are answers to questions on a test.
By representing these as separate fields within your database, aside from the inherent difficulty in querying the resulting data (as you have discovered), if ever you wanted to add or remove a question to/from the test, you would be forced to restructure your entire database!
Instead, I would suggest structuring your table in the following way:
Results
+------------+------------+-----------+
| EmployeeID | QuestionID | Result |
+------------+------------+-----------+
| 1 | 1 | correct |
| 1 | 2 | incorrect |
| ... | ... | ... |
| 1 | 15 | correct |
| 2 | 1 | correct |
| 2 | 2 | correct |
| ... | ... | ... |
+------------+------------+-----------+
This table would be a junction table (a.k.a. linking / cross-reference table) in your database, supporting a many-to-many relationship between the tables Employees & Questions, which might look like the following:
Employees
+--------+-----------+-----------+------------+------------+-----+
| Emp_ID | Emp_FName | Emp_LName | Emp_DOB | Emp_Gender | ... |
+--------+-----------+-----------+------------+------------+-----+
| 1 | Joe | Bloggs | 01/01/1969 | M | ... |
| ... | ... | ... | ... | ... | ... |
+--------+-----------+-----------+------------+------------+-----+
Questions
+-------+------------------------------------------------------------+--------+
| Qu_ID | Qu_Desc | Qu_Ans |
+-------+------------------------------------------------------------+--------+
| 1 | What is the meaning of life, the universe, and everything? | 42 |
| ... | ... | ... |
+-------+------------------------------------------------------------+--------+
With this structure, if ever you wish to add or remove a question from the test, you can simply add or remove a record from the table without needing to restructure your database or rewrite any of the queries, forms, or reports which depends upon the existing structure.
Furthermore, since the result of an answer is likely to be a binary correct or incorrect, then this would be better (and far more efficiently) represented using a Boolean True/False data type, e.g.:
Results
+------------+------------+--------+
| EmployeeID | QuestionID | Result |
+------------+------------+--------+
| 1 | 1 | True |
| 1 | 2 | False |
| ... | ... | ... |
| 1 | 15 | True |
| 2 | 1 | True |
| 2 | 2 | True |
| ... | ... | ... |
+------------+------------+--------+
Not only does this consume less memory in your database, but this may be indexed far more efficiently (yielding faster queries), and removes all ambiguity and potential for error surrounding typos & case sensitivity.
With this new structure, if you wanted to see the number of correct answers for each employee, the query can be something as simple as:
select results.employeeid, count(*)
from results
where results.result = true
group by results.employeeid
Alternatively, if you wanted to view the number of employees answering each question correctly (for example, to understand which questions most employees got wrong), you might use something like:
select results.questionid, count(*)
from results
where results.result = true
group by results.questionid
The above are obviously very basic example queries, and you would likely want to join the Results table to an Employees table and a Questions table to obtain richer information about the results.
Contrast the above with your current database structure -
Per your original question:
The second table has 15 fields - A1 through A15 with the words correct or incorrect in each field. I need the total number of incorrect occurrences for each field, not for the entire table.
Assuming that you want to view the number of incorrect answers by employee, you are forced to use an incredibly messy query such as the following:
select
employeeid,
iif(A1='incorrect',1,0)+
iif(A2='incorrect',1,0)+
iif(A3='incorrect',1,0)+
iif(A4='incorrect',1,0)+
iif(A5='incorrect',1,0)+
iif(A6='incorrect',1,0)+
iif(A7='incorrect',1,0)+
iif(A8='incorrect',1,0)+
iif(A9='incorrect',1,0)+
iif(A10='incorrect',1,0)+
iif(A11='incorrect',1,0)+
iif(A12='incorrect',1,0)+
iif(A13='incorrect',1,0)+
iif(A14='incorrect',1,0)+
iif(A15='incorrect',1,0) as IncorrectAnswers
from
YourTable
Here, notice that the answer numbers are also hard-coded into the query, meaning that if you decide to add a new question or remove an existing question, not only would you need to restructure your entire database, but queries such as the above would also need to be rewritten.

Usage of Power BI Interactive visualisations with Parent-Child dataset relations

I have been meddling with Power BI for almost a week now.
It seems like a powerful tool, when you get to know your way around it at least..
I would like to be able to see the sum of Therapists, Admins and Citizens, based on all subgroups for the currently selected group.
Here is my example:
When i select a Group (resembling a customer group) in the Drill Down Donut Chart, i want so see admins, therapists, and citizen count for all subgroups in the selected group, shown in the Clustered Column Chart. However I only get the users which are in the selected group, and not the users in sub groups.
I have created measures for Admins, Therapists & Citizens to get the count based on TemplateLevel (which is resembling the role of the user:
All measures are written in the same fashion, using different TemplateLevel(s).
Here is the three measures used in the Column Chart:
In my DataSet, i have the table UserGroup:
IdPath and NumLevels is an attempt to use parent-child reference, which i did not get to work properly, so dont mind that.
I expected that Power BI's interactive system would be able to handle Parent/Children references, as is the case with UserGroup[Id] and UserGroup[UserGroupParentId]. My initial thoughts was to just add GroupName as Category for each level of SubGroup available (Owner -> Customer -> Therapist -> Citizen).
Owner group id is 27 and always will be so that's why the drill down donut chart is filtering groups with no such parent it, to show the customer groups.
The DataSet for the report is from a test database migrated to an Azure SQL Server.
Any suggestions are given the warmest welcome!
Kindly Regards
Kalrin
Power BI (or more precisely, the Tabular Model underlying Power BI) does not support parent/child relationships. You have to transform/flatten the hierarchy to construct a table that holds the columns of all the levels of the hierarchy:
| Id | Owner | Customer | Therapist | Citizen | Group |
| ----- | --------- | --------- | --------- | -------- | -------- |
| 1 | ownerX | | | | 1 |
| 2 | ownerX | cust1 | | | 1 |
| 3 | ownerX | cust1 | tpA | | 1 |
| 4 | ownerX | cust1 | tpA | cit100 | 1 |
| 5 | ownerX | cust1 | tpA | cit101 | 1 |
| 6 | ownerX | cust1 | tpB | | 1 |
The above is a flattened hierarchy which is also ragged (you can have parent items with no child items).
This pattern describes how we can use DAX to construct a flattened hierarchy, but typically it is a best practice to flatten your data on the database side, before loading the table into Power BI (this can be done using recursive CTEs in SQL).

Hide Hierachy duplication in Powerpivot (Row Labels)

I am reporting on performance of legal cases, from a SQL database of activities. I have a main table of cases, which has a parent/child hierarchy. I am looking for a way to appropriately report on case performance, reporting only once for a parent/child group (`Family').
An example of relevant tables is:
Cases
ID | Client | MatterName | ClaimAmount | ParentID | NumberOfChildren |
1 | Mr. Smith | ABC Ltd | $40,000 | 0 | 2 |
2 | Mr. Smith | Jakob R | $40,000 | 1 | 0 |
3 | Mr. Smith | Jenny R | $40,000 | 1 | 0 |
4 | Mrs Bow | JQ Public | $7,000 | 0 | 0 |
Payments
ID | MatterID | DateReceived | Amount |
1 | 1 | 14/7/15 | $50 |
2 | 3 | 21/7/15 | $100 |
I'd like to be able to report back on a consolidated view that only shows the parent matter, with total received (and a lot of other similar related fact tables) - e.g.
Client | MatterName | ClaimAmount | TotalReceived |
Mr Smith | ABC Ltd | $40,000 | $150 |
Mrs Bow | JQ Public | $7,000 | $0 |
A key problem I'm having is hiding row labels for irrelevant rows (child matters). I believe I need to
Determine whether the current row is a parent group
Consolidate all measures for that parent group
Filter on that being True? Place all measures inside IF checks?
Any help appreciated
How many levels does your hierarchy have? If it's just 2 levels (parents have children, children cannot be parents), then denormalize your model. You can add a single column for ParentMatterName and use that as the rowfilter in pivots. If there is a reasonable maximum number of levels in your data (we typically look at <=6 as reasonable) then denormalization is probably preferable, and certainly simpler/more performant, than trying to dynamically roll up the child measure values.
Edits to address comment below:
Denormalizing your data structure in this case just means going to the following table structure:
Cases
ID | Client | ParentMatterName | MatterName | ClaimAmount
1 | Mr. Smith | ABC Ltd | ABC Ltd | $40,000
2 | Mr. Smith | Jakob R | ABC Ltd | $0
3 | Mr. Smith | Jenny R | ABC Ltd | $0
4 | Mrs Bow | JQ Public | JQ Public | $7,000
Regarding nomenclature - Excel is stupid, and so is DAX. Here is the way to think about these things to help minimize confusion - these are important concepts as you move forward in more complex DAX measures and queries.
Here are some absolutely truthful and accurate statements to show how stupid the nomenclature can get:
FILTER() is a table
Pivot table rows are filter context
FILTER() applies additional filter context when used as an argument to CALCULATE()
FILTER() creates row context internally which to evaluate expressions
FILTER()'s arguments are affected by filter context from pivot table rows
FILTER()'s second argument evaluates an expression evaluated in the pivot table's rowfilter context in the row context of each row in the table in the first argument
And so on. Don't think of a pivot table as anything but filters. You have filters, slicers, rowfilters, columnfilters. Everything in a pivot table is filter context.
Links:
Denormalization in Power Pivot
Denormalizing Dimensions

RDBMS schema for unknown columns

I have a project with a MySQL database, and I would like to be able to upload various datasets. Say I am building a restaurant reviews aggregator. So we would like to keep adding all sources of restaurant reviews we could get our hands on, and keeping all the information.
I have a table review_sources
=========================
| id | name |
=========================
| 1 | Zagat |
| 2 | GoodEats Magazine|
| ... |
| 50 | Allergy News |
=========================
Now say I have a table reviews
=====================================================================
| id | Restaurant Name | source_id | Star Rating | Description |
=====================================================================
| 0 | Joey's Burgers | 1 | 3.5 | Wow! |
| 1 | Jamal's Steaks | 1 | 3.5 | Yummy! |
| 2 | Jenny's Crepes | 1 | 4.5 | Sweet! |
| .... |
| 253| Jeeva's Curries | 3 | 4 | Spicy! |
=====================================================================
Now suppose someone wants to add reviews from "Allergy News", they have a field "nut-free". Or a source of reviews could describe the degree of kashrut compliance, or halal compliance or vegan-friendliness. I as a designer don't know the possible optional fields future data sources may have. I want to be able to answer queries:
What are all the fields in the Zagat reviews?
For review id=x, what is value of the optional field "vegan-friendly"?
So how do I design a schema that can handle these disparate data sources and answer these queries? My reasons for not going for NoSQL are that I do want certain types of normalization, and that this is part of an existing MySQL based project.
I'd use a many-to-many relationship with a table containing a review_id, a field (e.g. "vegan-friendly") and the value of the field. Then of course a reviews_fields table to map one to the other.
Cheers