How to aggregate data stored column-wise in a matrix table - sql

I have a table, Ellipses (...), represent multiple columns of a similar type
TABLE: diagnosis_info
COLUMNS: visit_id,
patient_diagnosis_code_1 ...
patient_diagnosis_code_100 -- char(100) with a value of ‘0’ or ‘1’
How do I find the most common diagnosis_code? There are 101 columns including the visit_id. The table is like a matrix table of 0s and 1s. How do I write something that can dynamically account for all the columns and count all the rows where the value is 1?
What I would normally do is not feasable as there are too many columns:
SELECT COUNT(patient_diagnostic_code_1), COUNT(patient_diagnostic_code_2),... FROM diagnostic_info WHERE patient_diagnostic_code_1 = ‘1’ and patient_diagnostic_code_2 = ‘1’ and ….
Then even if I typed all that out how would I select which column had the highest count of values = 1. The table is more column oriented instead of row oriented.

Unfortunately your data design is bad from the start. Instead it could be as simple as:
patient_id, visit_id, diagnosis_code
where a patient with 1 dignostic code would have 1 row, a patient with 100 diagnostic codes 100 rows and vice versa. At any given time you could transpose this into the format you presented (what is called a pivot or cross tab). Also in some databases, for example postgreSQL, you could put all those diagnostic codes into an array field, then it would look like:
patient_id, visit_id, diagnosis_code (data type -bool or int- array)
Now you need the reverse of it which is called unpivot. On some databases like SQL server there is UNPIVOT as an example.
Without knowing what your backend this, you could do that with an ugly SQL like:
select code, pdc
from
(
select 1 as code, count(*) as pdc
from myTable where patient_diagnosis_code_1=1
union
select 2 as code, count(*) as pdc
from myTable where patient_diagnosis_code_2=1
union
...
select 100 as code, count(*) as pdc
from myTable where patient_diagnosis_code_100=1
) tmp
order by pdc desc, code;
PS: This would return all the codes with their frequency ordered from most to least. You could limit to get 1 to get the max (with ties in case there are more than one code to match the max).

Related

SQL: Joining two table based on certain description

I have two tables:
And I want to add GTIN from table 2 to table 1 based on brand name. Though I cant use = or like because as you see in highlighted row they are not fully matched.
For example
Second row in table 1, suppose to have first GTIN from table 2 because both are Ziagen 300mg tablet. However all of what I tried failed to match all row correctly.
Postgres has a pg_trgm module described here. Start with a cross join joining both tables and calculate the similarity(t1.brand,t2.brand) function, which returns the real number.
Next filter the results based on some heuristic number. Then narrow down with choosing single best match using row_number() window function.
The results might be not accurate, you could improve it by taking generic similarity into account as well.
with cross_similarity(generic1,brand1,gtin,brand2,generic2,sim) as (
select *, similarity(t1.brand, t2.brand) as sim
from t1,
t2
where similarity(t1.brand, t2.brand) > 0
)
, max_similarity as (
select *,
row_number() over (partition by gtin order by sim desc) as best_match_rank
from cross_similarity
)
select * from max_similarity where best_match_rank =1;

Design select SQL query

I have three values expected in a table case, Serious, Non-Serious, Unknown for each case_id
select case_id, case_seriousness
from case;
I have to build a SQL query which should show one row per case_id.
If there are rows for a case_id with multiple values, then only one row should appear based on priority - Serious, Non-Serious then Unknown.
e.g. Serious is in one row rest of four rows have Non-Serious or Unknown then Serious will be he value to show in one record.
If there are records with Non-serious and Unknown then Non-Serious should appear.
So Priorities will be like from S, NS and UK
You can use the analytical function as follows:
select case_id, case_seriousness
from
(select case_id, case_seriousness,
row_number() over (partition by case_id
order by case case_seriousness
when 'Serious' then 1
when 'Non-Serious' then 2
else 3
end ) as rn
from case)
where rn = 1;
Alternatively, You can also use DECODE instead of CASE..WHEN

Table whose columns are random samples from an original column

I have numerical data ​​related to clinical information from people with a particular disease recorded in an specific column ('Lab') from Table A.
I need to get a Table B with 30 rows and 50 columns.
The columns of Table B should be random samples from the values ​​contained in column 'Lab' (nearly 3300 registers).
I am able to get a table with one column using:
SELECT Lab FROM Table_A sample (1) WHERE Lab IS NOT NULL;
Is it possible make a query using the SELECT command that results in Table B with all its 50 columns without the need of getting its columns one by one?
You can use the RAND operator with ORDER BY, like this:
SELECT * FROM Table_A ORDER BY RAND( ) LIMIT 0 , 30

SQL PIVOT, JOIN, and aggregate function to generate report

I am working on creating a report which will incorporate data across 4 different tables. For this question, I have consolidated the data into 2 tables and am stuck trying to figure out exactly how to create this report using PIVOT.
The report will hold the top 5 strengths of an employee based on the Clifton StrengthsFinder assessment.
This is the table with the Names of the Clifton Strengths (34 rows total):
As mentioned, each employee has 5 strengths:
I would like to use PIVOT to generate a table which will ultimately look like this:
With a twist, I don't need the Team Name as a Row, it should be a column. The Count at the bottom and Themes at the top (Executing, Influencing, etc) can be ignored.
The columns of the table I'm trying to output are PersonFk, PersonName, TeamName, Achiever, Arranger, etc... (34 Strengths) and each row of the table with Values (personfk, name, team, 1 if person has the strength, 0 otherwise). This table should be SQL, not excel (sorry, just the best example I have on hand without spending an hour learning how to use Paint or something).
I'm not very familiar with aggregate functions, and am just now getting into the more complex SQL queries..
Interesting. Pivot requires an aggregate function to build the 1-5 values, so you'll have to rewrite your inner query probably as a union, and use MAX() as a throwaway aggregate function (throwaway because every record should be unique, so MAX, MIN, SUM, etc. should all return the same value:
SELECT * INTO #newblah from (
SELECT PersonFK, 1 as StrengthIndex, Strength1 as Strength from blah UNION ALL
SELECT PersonFK, 2 as StrengthIndex, Strength2 as Strength from blah UNION ALL
SELECT PersonFK, 3 as StrengthIndex, Strength3 as Strength from blah UNION ALL
SELECT PersonFK, 4 as StrengthIndex, Strength4 as Strength from blah UNION ALL
SELECT PersonFK, 5 as StrengthIndex, Strength5 as Strength from blah
)
Then
select PersonFK, [Achiever], [Activator], [Adaptability], [Analytical], [Belief] .....
from
(
select PersonFK, StrengthIndex, Strength
from #newblah
) pivotsource
pivot
(
max(StrengthIndex)
for Strength in ([Achiever], [Activator], [Adaptability], [Analytical], [Belief] ..... )
) myPivot;
The result of that query should be able to be joined back to your other tables to get the Person name, Strength Category, and Team name, so I'll leave that to you. You don't HAVE to do the first join as a temporary table -- you could do it as a subselect inline, so this could all be done in one SQL query, but that seems painful if you can avoid it.
Use one of the techniques from this post. For your purposes, you may want to use a delimiter in your column name to the tune of 'StrngthTheme-Strength', which your web report will then parse for the headers.

SQL Server Sum multiple rows into one - no temp table

I would like to see a most concise way to do what is outlined in this SO question: Sum values from multiple rows into one row
that is, combine multiple rows while summing a column.
But how to then delete the duplicates. In other words I have data like this:
Person Value
--------------
1 10
1 20
2 15
And I want to sum the values for any duplicates (on the Person col) into a single row and get rid of the other duplicates on the Person value. So my output would be:
Person Value
-------------
1 30
2 15
And I would like to do this without using a temp table. I think that I'll need to use OVER PARTITION BY but just not sure. Just trying to challenge myself in not doing it the temp table way. Working with SQL Server 2008 R2
Simply put, give me a concise stmt getting from my input to my output in the same table. So if my table name is People if I do a select * from People on it before the operation that I am asking in this question I get the first set above and then when I do a select * from People after the operation, I get the second set of data above.
Not sure why not using Temp table but here's one way to avoid it (tho imho this is an overkill):
UPDATE MyTable SET VALUE = (SELECT SUM(Value) FROM MyTable MT WHERE MT.Person = MyTable.Person);
WITH DUP_TABLE AS
(SELECT ROW_NUMBER()
OVER (PARTITION BY Person ORDER BY Person) As ROW_NO
FROM MyTable)
DELETE FROM DUP_TABLE WHERE ROW_NO > 1;
First query updates every duplicate person to the summary value. Second query removes duplicate persons.
Demo: http://sqlfiddle.com/#!3/db7aa/11
All you're asking for is a simple SUM() aggregate function and a GROUP BY
SELECT Person, SUM(Value)
FROM myTable
GROUP BY Person
The SUM() by itself would sum up the values in a column, but when you add a secondary column and GROUP BY it, SQL will show distinct values from the secondary column and perform the aggregate function by those distinct categories.