SQL PIVOT, JOIN, and aggregate function to generate report - sql

I am working on creating a report which will incorporate data across 4 different tables. For this question, I have consolidated the data into 2 tables and am stuck trying to figure out exactly how to create this report using PIVOT.
The report will hold the top 5 strengths of an employee based on the Clifton StrengthsFinder assessment.
This is the table with the Names of the Clifton Strengths (34 rows total):
As mentioned, each employee has 5 strengths:
I would like to use PIVOT to generate a table which will ultimately look like this:
With a twist, I don't need the Team Name as a Row, it should be a column. The Count at the bottom and Themes at the top (Executing, Influencing, etc) can be ignored.
The columns of the table I'm trying to output are PersonFk, PersonName, TeamName, Achiever, Arranger, etc... (34 Strengths) and each row of the table with Values (personfk, name, team, 1 if person has the strength, 0 otherwise). This table should be SQL, not excel (sorry, just the best example I have on hand without spending an hour learning how to use Paint or something).
I'm not very familiar with aggregate functions, and am just now getting into the more complex SQL queries..

Interesting. Pivot requires an aggregate function to build the 1-5 values, so you'll have to rewrite your inner query probably as a union, and use MAX() as a throwaway aggregate function (throwaway because every record should be unique, so MAX, MIN, SUM, etc. should all return the same value:
SELECT * INTO #newblah from (
SELECT PersonFK, 1 as StrengthIndex, Strength1 as Strength from blah UNION ALL
SELECT PersonFK, 2 as StrengthIndex, Strength2 as Strength from blah UNION ALL
SELECT PersonFK, 3 as StrengthIndex, Strength3 as Strength from blah UNION ALL
SELECT PersonFK, 4 as StrengthIndex, Strength4 as Strength from blah UNION ALL
SELECT PersonFK, 5 as StrengthIndex, Strength5 as Strength from blah
)
Then
select PersonFK, [Achiever], [Activator], [Adaptability], [Analytical], [Belief] .....
from
(
select PersonFK, StrengthIndex, Strength
from #newblah
) pivotsource
pivot
(
max(StrengthIndex)
for Strength in ([Achiever], [Activator], [Adaptability], [Analytical], [Belief] ..... )
) myPivot;
The result of that query should be able to be joined back to your other tables to get the Person name, Strength Category, and Team name, so I'll leave that to you. You don't HAVE to do the first join as a temporary table -- you could do it as a subselect inline, so this could all be done in one SQL query, but that seems painful if you can avoid it.

Use one of the techniques from this post. For your purposes, you may want to use a delimiter in your column name to the tune of 'StrngthTheme-Strength', which your web report will then parse for the headers.

Related

How to aggregate data stored column-wise in a matrix table

I have a table, Ellipses (...), represent multiple columns of a similar type
TABLE: diagnosis_info
COLUMNS: visit_id,
patient_diagnosis_code_1 ...
patient_diagnosis_code_100 -- char(100) with a value of ‘0’ or ‘1’
How do I find the most common diagnosis_code? There are 101 columns including the visit_id. The table is like a matrix table of 0s and 1s. How do I write something that can dynamically account for all the columns and count all the rows where the value is 1?
What I would normally do is not feasable as there are too many columns:
SELECT COUNT(patient_diagnostic_code_1), COUNT(patient_diagnostic_code_2),... FROM diagnostic_info WHERE patient_diagnostic_code_1 = ‘1’ and patient_diagnostic_code_2 = ‘1’ and ….
Then even if I typed all that out how would I select which column had the highest count of values = 1. The table is more column oriented instead of row oriented.
Unfortunately your data design is bad from the start. Instead it could be as simple as:
patient_id, visit_id, diagnosis_code
where a patient with 1 dignostic code would have 1 row, a patient with 100 diagnostic codes 100 rows and vice versa. At any given time you could transpose this into the format you presented (what is called a pivot or cross tab). Also in some databases, for example postgreSQL, you could put all those diagnostic codes into an array field, then it would look like:
patient_id, visit_id, diagnosis_code (data type -bool or int- array)
Now you need the reverse of it which is called unpivot. On some databases like SQL server there is UNPIVOT as an example.
Without knowing what your backend this, you could do that with an ugly SQL like:
select code, pdc
from
(
select 1 as code, count(*) as pdc
from myTable where patient_diagnosis_code_1=1
union
select 2 as code, count(*) as pdc
from myTable where patient_diagnosis_code_2=1
union
...
select 100 as code, count(*) as pdc
from myTable where patient_diagnosis_code_100=1
) tmp
order by pdc desc, code;
PS: This would return all the codes with their frequency ordered from most to least. You could limit to get 1 to get the max (with ties in case there are more than one code to match the max).

Count number of rows returned in a SQL statement

Are there any DB engines that allow you to run an EXPLAIN (or other function) where it will give you an approximate count of values that may be returned before an aggregation is run (not rows scanned but that actually would be returned)? For example, in the following query:
SELECT gender, COUNT(1) FROM sales JOIN (
SELECT id, person FROM sales2 WHERE country='US'
GROUP BY person_id
) USING (id)
WHERE sales.age > 20
GROUP BY gender
Let's say this query returns 3 rows after being aggregated, but would return 170M rows if unaggregated.
Are there any tools where you can run the query to get this '170M' number or does this have to do with complexity theory (or something similar) where it's almost just as expensive to run the query (without the final aggregation/having/sort/limit/etc) to get the count? In other words, doing a rewrite to:
SELECT COUNT(1) FROM sales JOIN (
SELECT id, person FROM sales2 WHERE country='US'
GROUP BY person_id
) USING (id)
WHERE sales.age > 20
But having to execute the query nonetheless.
As an example of using the current (mysql) explain to show how 'off' it is to get what I'm looking for:
explain select * from movies where title>'a';
# rows=147900
select count(1) from _tracktitle where title>'a';
# 144647 --> OK, pretty close
explain select * from movies where title>'u';
# rows=147900
select * from movies where title>'u';
# 11816 --> Not close at all
Assuming you can use MS SQL Server, you could tap into the same data the Optimiser is using for cardinality estimation: DBCC SHOW_STATISTICS (table, index) WITH HISTOGRAM
Part of data sets you get back is per-column histogram, which is essentially number of rows for each value range found in the table.
You probably want to query the data programmatically, one way to achieve this would be to insert it into a temp table:
CREATE TABLE #histogram (
RANGE_HI_KEY datetime PRIMARY KEY,
RANGE_ROWS INT,
EQ_ROWS INT,
DISTINCT_RANGE_ROWS INT,
AVG_RANGE_ROWS FLOAT
)
INSERT INTO #histogram
EXEC ('DBCC SHOW_STATISTICS (Users, CreationDate) WITH HISTOGRAM')
SELECT 'Estimate', SUM(RANGE_ROWS+EQ_ROWS) FROM #histogram WHERE RANGE_HI_KEY BETWEEN '2010-08-30 08:28:45.070' AND '2010-09-20 22:15:33.603'
UNION ALL
select 'Actual', COUNT(1) from Users u WHERE u.CreationDate BETWEEN '2010-08-30 08:28:45.070' AND '2010-09-20 22:15:33.603'
For example, check out what this same query run against Stack Overflow Database.
| -------- | ----- |
| Estimate | 98092 |
| Actual | 11715 |
it seems like a lot but then keep in mind that the whole table has almost 15mil records.
A note on precision and other gotchas
The maximum number of histogram steps is capped at 200 - which is not a lot, so you are not getting guaranteed 10% margin of error, but neither does SQL Server.
As you insert data into table, histograms may get stale so your results would get skewed even more.
There are different ways to update this data, some are reasonably quick while others effectively require full table scan
not all columns will have statistics. You can either create it manually or (I believe) it gets created automatically if you run a search with the column as predicate
MS Sql Server offers "execution plans". In the picture below I have queries and I press (Ctrl-L) to see the plans.
In my queries I return all records in first and just the count in the other, using the same table.
Look at metric corresponding to red arrows- estimated # of rows that WILL be scanned when queries are run. In this case, that number is same regardless whether count(*) or *, your point in case!

SQL: Joining two table based on certain description

I have two tables:
And I want to add GTIN from table 2 to table 1 based on brand name. Though I cant use = or like because as you see in highlighted row they are not fully matched.
For example
Second row in table 1, suppose to have first GTIN from table 2 because both are Ziagen 300mg tablet. However all of what I tried failed to match all row correctly.
Postgres has a pg_trgm module described here. Start with a cross join joining both tables and calculate the similarity(t1.brand,t2.brand) function, which returns the real number.
Next filter the results based on some heuristic number. Then narrow down with choosing single best match using row_number() window function.
The results might be not accurate, you could improve it by taking generic similarity into account as well.
with cross_similarity(generic1,brand1,gtin,brand2,generic2,sim) as (
select *, similarity(t1.brand, t2.brand) as sim
from t1,
t2
where similarity(t1.brand, t2.brand) > 0
)
, max_similarity as (
select *,
row_number() over (partition by gtin order by sim desc) as best_match_rank
from cross_similarity
)
select * from max_similarity where best_match_rank =1;

Count distinct query MS Access

It seems that we can not use Count (Distinct column) function in MS Access. I have following data and expected result as shown below
Looking for MS Access query which can give required result.
Data
ID Name Category Person Office
1 FIL Global Ben london
1 FIL Global Ben london
1 FIL Overall Ben Americas
106 Asset Global Ben london
156 ICICI Overall Rimmer london
156 ICICI Overall Rimmer london
188 UBS Overall Rimmer london
9 Fund Global Rimmer london
Expected Result
Person Global_Cnt Overall_Cnt
Ben 2 1
Rimmer 1 2
Use a subquery to select the distinct values from your table.
In the parent query, GROUP BY Person, and use separate Count() expressions for each category. Count() only counts non-Null values, so use IIf() to return 1 for the category of interest and Null otherwise.
SELECT
sub.Person,
Count(IIf(Category = 'Global', 1, Null)) AS Global_Cnt,
Count(IIf(Category = 'Overall', 1, Null)) AS Overall_Cnt
FROM
(
SELECT DISTINCT ID, Category, Person
FROM YourTable
) AS sub
GROUP BY sub.Person;
I was unsure which fields identify your unique values, so chose ID, Category, and Person. The result set from the query matches what you asked for; change the SELECT DISTINCT field list if it doesn't fit with your actual data.
When creating a query in Microsoft Access, you might want to return only distinct or unique values. There are two options in the query's property sheet, "Unique Values" and "Unique Records":
DISTINCT and DISTINCTROW sometimes provide the same results, but there are significant differences:
DISTINCT
DISTINCT checks only the fields listed in the SQL string and then eliminates the duplicate rows. Results of DISTINCT queries are not updateable. They are a snapshot of the data.
DISTINCT queries are similar to Summary or Totals queries (queries using a GROUP BY clause).
DISTINCTROW
DISTINCTROW, on the other hand, checks all fields in the table that is being queried, and eliminates duplicates based on the entire record (not just the selected fields). Results of DISTINCTROW queries are updateable.
Read More...
MS Access-Engine does not support
SELECT count(DISTINCT....) FROM ...
You have to do it like this:
SELECT count(*)
FROM
(SELECT DISTINCT Name FROM table1)
Its a little workaround... you're counting a DISTINCT selection.
select count(column) as guessTable
from
(
select distinct column from Table
)

Query of queries with same field headings - MS Access

I've got a few queries (20+) which all return the following three columns:
Building | Room | Other
all of which are text fields. I'd like to take all of those queries and combine them; so I'd like to see what the queries return as a whole.
For example, if I had a query SELECT Building, Room, Other FROM tblOne WHERE Room=10 along with SELECT Building, Room, Other FROM tblOne WHERE Building=20, how might I combine those two into one? Obviously this is a very simple example and my real queries are much more complicated, so writing them as 1 query is not feasible.
I'd like the above example to output:
Building | Room | Other
```````````````````````
20 | 1 | Some Stuff
20 | 10 | Some More
5 | 10 | Some Other
15 | 10 | Some Extra
20 | 5 | Some Text
All the ways I've tried have come up with the error that "Building, Room and Other could refer to more than one table" (aka it doesn't want to combine them under one heading). What is the SQL syntax to fix this?
SELECT Building, Room, Other FROM tblOne WHERE Room=10
UNION ALL
SELECT Building, Room, Other FROM tblOne WHERE Building=20
Combine these two Query with the help of UNION ALL && UNION like this
Query 1
SELECT Building, Room, Other FROM tblOne WHERE Room=10
UNION ALL
SELECT Building, Room, Other FROM tblOne WHERE Building=20
Query 2
SELECT Building, Room, Other FROM tblOne WHERE Room=10
UNION
SELECT Building, Room, Other FROM tblOne WHERE Building=20
Notice
The UNION operator is used to combine the result-set of two or more SELECT statements.
Each SELECT statement within the UNION must have the same number of columns. The columns must also have similar data types. Also, the columns in each SELECT statement must be in the same order.
The UNION operator selects only distinct values by default. To allow duplicate values, use UNION ALL.