Oracle SQL: Pivot on multiple columns/fields - sql

I want to create a table where the entries of various columns are 'pivoted' to column headers. The table is for reporting purposes - my users want to query data via Excel (using Microsoft Query), and the problem is that doing the pivoting in Excel makes the files unpractically large and slow even for moderately-sized data sets (~100k data points).
Consider the following example:
CREATE TABLE tt
(
"COMMODITY" VARCHAR2(4000 BYTE),
"MARKET" VARCHAR2(4000 BYTE),
"BID_ASK" VARCHAR2(4000 BYTE),
"PRICE" NUMBER
);
INSERT INTO tt VALUES ('Gold','US','Ask',1.1);
INSERT INTO tt VALUES ('Gold','US','Bid',1);
INSERT INTO tt VALUES ('Gold','EU','Ask',1.2);
INSERT INTO tt VALUES ('Gold','EU','Bid',1.1);
INSERT INTO tt VALUES ('Oil','US','Ask',11);
INSERT INTO tt VALUES ('Oil','US','Bid',10);
INSERT INTO tt VALUES ('Oil','EU','Ask',12);
INSERT INTO tt VALUES ('Oil','EU','Bid',11);
The output that I want to achieve would be something like (the exact column headers don't matter much):
COMMODITY 'US_Bid' 'US_Ask' 'EU_Bid' 'EU_Ask'
Gold 1 1.1 1.1 1.2
Oil 10 11 11 12
Now it is straightforward to pivot a single column:
SELECT * FROM
(
SELECT * FROM tt
)
PIVOT
(
SUM(PRICE)
FOR MARKET IN ('US','EU')
)
Which gives:
COMMODITY BID_ASK 'US' 'EU'
Gold Bid 1 1.1
Oil Bid 10 11
Oil Ask 11 12
Gold Ask 1.1 1.2
According to my research there is no syntax for directly pivoting multiple columns. There are some related questions (here, here or here), but I could not find a direct answer to my problem there. So I came up with the following solution:
SELECT * FROM
(
SELECT COMMODITY, CONCAT(CONCAT(MARKET,'_'),BID_ASK) AS MARKET_BID_ASK, PRICE FROM tt
)
PIVOT
(
SUM(PRICE)
FOR MARKET_BID_ASK IN ('US_Bid','US_Ask','EU_Bid','EU_Ask')
)
This produces exactly the desired output. However, I do not consider it a practical solution as the number of variables that I have to enter grows way too fast (in my real data set, I want to pivot more fields at once, all of which have many different values). I know that there exist dynamic pivots, however I am not sure whether this will work with Excel, and I would also like to keep the syntax as simple as possible, because the users will define queries on their own (I just want to provide a template query that they can adapt). So I tried to query the field names in the IN-clause:
SELECT * FROM
(
SELECT COMMODITY, CONCAT(CONCAT(MARKET,'_'),BID_ASK) AS MARKET_BID_ASK, PRICE FROM tt
)
PIVOT
(
SUM(PRICE)
FOR MARKET_BID_ASK IN
(
SELECT DISTINCT CONCAT(CONCAT(MARKET,'_'),BID_ASK) AS MARKET_BID_ASK FROM tt
)
)
I think that such a solution could be practical, because one could still constrain the variables queried without having to list all the concatenated options using LIKE-conditions in the subquery. However, I get an "ORA-00936 - missing expression" error with this query, even though subqueries should be legal here according to the documentation that I found.

You can pivot on multiple columns by enclosing the columns, and the sets of values, in parentheses:
SELECT * FROM
(
SELECT * FROM tt
)
PIVOT
(
SUM(PRICE)
FOR (MARKET, BID_ASK)
IN (('US', 'Bid') us_bid, ('US', 'Ask') us_ask, ('EU', 'Bid') eu_bid, ('EU', 'Ask') eu_ask)
);
COMMODITY US_BID US_ASK EU_BID EU_ASK
---------- ---------- ---------- ---------- ----------
Gold 1 1.1 1.1 1.2
Oil 10 11 11 12
but the value pairs still have to be known when the query is parsed, and this doesn't scale well if you have a lot of combinations of values.
Your only alternative is dynamic SQL, as you suspected, unless you can get Excel to process the result of an XML pivot - which I don't think is possible. With dynamic SQL you could perhaps have a function that does the query and pivot and returns a ref cursor, if Excel finds that easier to handle than the pivot query.

Related

How to aggregate data stored column-wise in a matrix table

I have a table, Ellipses (...), represent multiple columns of a similar type
TABLE: diagnosis_info
COLUMNS: visit_id,
patient_diagnosis_code_1 ...
patient_diagnosis_code_100 -- char(100) with a value of ‘0’ or ‘1’
How do I find the most common diagnosis_code? There are 101 columns including the visit_id. The table is like a matrix table of 0s and 1s. How do I write something that can dynamically account for all the columns and count all the rows where the value is 1?
What I would normally do is not feasable as there are too many columns:
SELECT COUNT(patient_diagnostic_code_1), COUNT(patient_diagnostic_code_2),... FROM diagnostic_info WHERE patient_diagnostic_code_1 = ‘1’ and patient_diagnostic_code_2 = ‘1’ and ….
Then even if I typed all that out how would I select which column had the highest count of values = 1. The table is more column oriented instead of row oriented.
Unfortunately your data design is bad from the start. Instead it could be as simple as:
patient_id, visit_id, diagnosis_code
where a patient with 1 dignostic code would have 1 row, a patient with 100 diagnostic codes 100 rows and vice versa. At any given time you could transpose this into the format you presented (what is called a pivot or cross tab). Also in some databases, for example postgreSQL, you could put all those diagnostic codes into an array field, then it would look like:
patient_id, visit_id, diagnosis_code (data type -bool or int- array)
Now you need the reverse of it which is called unpivot. On some databases like SQL server there is UNPIVOT as an example.
Without knowing what your backend this, you could do that with an ugly SQL like:
select code, pdc
from
(
select 1 as code, count(*) as pdc
from myTable where patient_diagnosis_code_1=1
union
select 2 as code, count(*) as pdc
from myTable where patient_diagnosis_code_2=1
union
...
select 100 as code, count(*) as pdc
from myTable where patient_diagnosis_code_100=1
) tmp
order by pdc desc, code;
PS: This would return all the codes with their frequency ordered from most to least. You could limit to get 1 to get the max (with ties in case there are more than one code to match the max).

SQL: Joining two table based on certain description

I have two tables:
And I want to add GTIN from table 2 to table 1 based on brand name. Though I cant use = or like because as you see in highlighted row they are not fully matched.
For example
Second row in table 1, suppose to have first GTIN from table 2 because both are Ziagen 300mg tablet. However all of what I tried failed to match all row correctly.
Postgres has a pg_trgm module described here. Start with a cross join joining both tables and calculate the similarity(t1.brand,t2.brand) function, which returns the real number.
Next filter the results based on some heuristic number. Then narrow down with choosing single best match using row_number() window function.
The results might be not accurate, you could improve it by taking generic similarity into account as well.
with cross_similarity(generic1,brand1,gtin,brand2,generic2,sim) as (
select *, similarity(t1.brand, t2.brand) as sim
from t1,
t2
where similarity(t1.brand, t2.brand) > 0
)
, max_similarity as (
select *,
row_number() over (partition by gtin order by sim desc) as best_match_rank
from cross_similarity
)
select * from max_similarity where best_match_rank =1;

SQL Server where condition on column with separated values

I have a table with a column that can have values separated by ",".
Example column group:
id column group:
1 10,20,30
2 280
3 20
I want to create a SELECT with where condition on column group where I can search for example 20 ad It should return 1 and 3 rows or search by 20,280 and it should return 1 and 2 rows.
Can you help me please?
As pointed out in comments,storing mutiple values in a single row is not a good idea..
coming to your question,you can use one of the split string functions from here to split comma separated values into a table and then query them..
create table #temp
(
id int,
columnss varchar(100)
)
insert into #temp
values
(1,'10,20,30'),
(2, '280'),
(3, '20')
select *
from #temp
cross apply
(
select * from dbo.SplitStrings_Numbers(columnss,',')
)b
where item in (20)
id columnss Item
1 10,20,30 20
3 20 20
The short answer is: don't do it.
Instead normalize your tables to at least 3NF. If you don't know what database normalization is, you need to do some reading.
If you absolutely have to do it (e.g. this is a legacy system and you cannot change the table structure), there are several articles on string splitting with TSQL and at least a couple that have done extensive benchmarks on various methods available (e.g. see: http://sqlperformance.com/2012/07/t-sql-queries/split-strings)
Since you only want to search, you don't really need to split the strings, so you can write something like:
SELECT id, list
FROM t
WHERE ','+list+',' LIKE '%,'+#searchValue+',%'
Where t(id int, list varchar(max)) is the table to search and #searchValue is the value you are looking for. If you need to search for more than one value you have to add those in a table and use a join or subquery.
E.g. if s(searchValue varchar(max)) is the table of values to search then:
SELECT distinct t.id, t.list
FROM t INNER JOIN s
ON ','+t.list+',' LIKE '%,'+s.searchValue+',%'
If you need to pass those search values from ADO.Net consider table parameters.

SQL PIVOT, JOIN, and aggregate function to generate report

I am working on creating a report which will incorporate data across 4 different tables. For this question, I have consolidated the data into 2 tables and am stuck trying to figure out exactly how to create this report using PIVOT.
The report will hold the top 5 strengths of an employee based on the Clifton StrengthsFinder assessment.
This is the table with the Names of the Clifton Strengths (34 rows total):
As mentioned, each employee has 5 strengths:
I would like to use PIVOT to generate a table which will ultimately look like this:
With a twist, I don't need the Team Name as a Row, it should be a column. The Count at the bottom and Themes at the top (Executing, Influencing, etc) can be ignored.
The columns of the table I'm trying to output are PersonFk, PersonName, TeamName, Achiever, Arranger, etc... (34 Strengths) and each row of the table with Values (personfk, name, team, 1 if person has the strength, 0 otherwise). This table should be SQL, not excel (sorry, just the best example I have on hand without spending an hour learning how to use Paint or something).
I'm not very familiar with aggregate functions, and am just now getting into the more complex SQL queries..
Interesting. Pivot requires an aggregate function to build the 1-5 values, so you'll have to rewrite your inner query probably as a union, and use MAX() as a throwaway aggregate function (throwaway because every record should be unique, so MAX, MIN, SUM, etc. should all return the same value:
SELECT * INTO #newblah from (
SELECT PersonFK, 1 as StrengthIndex, Strength1 as Strength from blah UNION ALL
SELECT PersonFK, 2 as StrengthIndex, Strength2 as Strength from blah UNION ALL
SELECT PersonFK, 3 as StrengthIndex, Strength3 as Strength from blah UNION ALL
SELECT PersonFK, 4 as StrengthIndex, Strength4 as Strength from blah UNION ALL
SELECT PersonFK, 5 as StrengthIndex, Strength5 as Strength from blah
)
Then
select PersonFK, [Achiever], [Activator], [Adaptability], [Analytical], [Belief] .....
from
(
select PersonFK, StrengthIndex, Strength
from #newblah
) pivotsource
pivot
(
max(StrengthIndex)
for Strength in ([Achiever], [Activator], [Adaptability], [Analytical], [Belief] ..... )
) myPivot;
The result of that query should be able to be joined back to your other tables to get the Person name, Strength Category, and Team name, so I'll leave that to you. You don't HAVE to do the first join as a temporary table -- you could do it as a subselect inline, so this could all be done in one SQL query, but that seems painful if you can avoid it.
Use one of the techniques from this post. For your purposes, you may want to use a delimiter in your column name to the tune of 'StrngthTheme-Strength', which your web report will then parse for the headers.

Pivot table where column names are based on row values [duplicate]

Does any one know how to create crosstab queries in PostgreSQL?
For example I have the following table:
Section Status Count
A Active 1
A Inactive 2
B Active 4
B Inactive 5
I would like the query to return the following crosstab:
Section Active Inactive
A 1 2
B 4 5
Is this possible?
Install the additional module tablefunc once per database, which provides the function crosstab(). Since Postgres 9.1 you can use CREATE EXTENSION for that:
CREATE EXTENSION IF NOT EXISTS tablefunc;
Improved test case
CREATE TABLE tbl (
section text
, status text
, ct integer -- "count" is a reserved word in standard SQL
);
INSERT INTO tbl VALUES
('A', 'Active', 1), ('A', 'Inactive', 2)
, ('B', 'Active', 4), ('B', 'Inactive', 5)
, ('C', 'Inactive', 7); -- ('C', 'Active') is missing
Simple form - not fit for missing attributes
crosstab(text) with 1 input parameter:
SELECT *
FROM crosstab(
'SELECT section, status, ct
FROM tbl
ORDER BY 1,2' -- needs to be "ORDER BY 1,2" here
) AS ct ("Section" text, "Active" int, "Inactive" int);
Returns:
Section | Active | Inactive
---------+--------+----------
A | 1 | 2
B | 4 | 5
C | 7 | -- !!
No need for casting and renaming.
Note the incorrect result for C: the value 7 is filled in for the first column. Sometimes, this behavior is desirable, but not for this use case.
The simple form is also limited to exactly three columns in the provided input query: row_name, category, value. There is no room for extra columns like in the 2-parameter alternative below.
Safe form
crosstab(text, text) with 2 input parameters:
SELECT *
FROM crosstab(
'SELECT section, status, ct
FROM tbl
ORDER BY 1,2' -- could also just be "ORDER BY 1" here
, $$VALUES ('Active'::text), ('Inactive')$$
) AS ct ("Section" text, "Active" int, "Inactive" int);
Returns:
Section | Active | Inactive
---------+--------+----------
A | 1 | 2
B | 4 | 5
C | | 7 -- !!
Note the correct result for C.
The second parameter can be any query that returns one row per attribute matching the order of the column definition at the end. Often you will want to query distinct attributes from the underlying table like this:
'SELECT DISTINCT attribute FROM tbl ORDER BY 1'
That's in the manual.
Since you have to spell out all columns in a column definition list anyway (except for pre-defined crosstabN() variants), it is typically more efficient to provide a short list in a VALUES expression like demonstrated:
$$VALUES ('Active'::text), ('Inactive')$$)
Or (not in the manual):
$$SELECT unnest('{Active,Inactive}'::text[])$$ -- short syntax for long lists
I used dollar quoting to make quoting easier.
You can even output columns with different data types with crosstab(text, text) - as long as the text representation of the value column is valid input for the target type. This way you might have attributes of different kind and output text, date, numeric etc. for respective attributes. There is a code example at the end of the chapter crosstab(text, text) in the manual.
db<>fiddle here
Effect of excess input rows
Excess input rows are handled differently - duplicate rows for the same ("row_name", "category") combination - (section, status) in the above example.
The 1-parameter form fills in available value columns from left to right. Excess values are discarded.
Earlier input rows win.
The 2-parameter form assigns each input value to its dedicated column, overwriting any previous assignment.
Later input rows win.
Typically, you don't have duplicates to begin with. But if you do, carefully adjust the sort order to your requirements - and document what's happening.
Or get fast arbitrary results if you don't care. Just be aware of the effect.
Advanced examples
Pivot on Multiple Columns using Tablefunc - also demonstrating mentioned "extra columns"
Dynamic alternative to pivot with CASE and GROUP BY
\crosstabview in psql
Postgres 9.6 added this meta-command to its default interactive terminal psql. You can run the query you would use as first crosstab() parameter and feed it to \crosstabview (immediately or in the next step). Like:
db=> SELECT section, status, ct FROM tbl \crosstabview
Similar result as above, but it's a representation feature on the client side exclusively. Input rows are treated slightly differently, hence ORDER BY is not required. Details for \crosstabview in the manual. There are more code examples at the bottom of that page.
Related answer on dba.SE by Daniel Vérité (the author of the psql feature):
How do I generate a pivoted CROSS JOIN where the resulting table definition is unknown?
SELECT section,
SUM(CASE status WHEN 'Active' THEN count ELSE 0 END) AS active, --here you pivot each status value as a separate column explicitly
SUM(CASE status WHEN 'Inactive' THEN count ELSE 0 END) AS inactive --here you pivot each status value as a separate column explicitly
FROM t
GROUP BY section
You can use the crosstab() function of the additional module tablefunc - which you have to install once per database. Since PostgreSQL 9.1 you can use CREATE EXTENSION for that:
CREATE EXTENSION tablefunc;
In your case, I believe it would look something like this:
CREATE TABLE t (Section CHAR(1), Status VARCHAR(10), Count integer);
INSERT INTO t VALUES ('A', 'Active', 1);
INSERT INTO t VALUES ('A', 'Inactive', 2);
INSERT INTO t VALUES ('B', 'Active', 4);
INSERT INTO t VALUES ('B', 'Inactive', 5);
SELECT row_name AS Section,
category_1::integer AS Active,
category_2::integer AS Inactive
FROM crosstab('select section::text, status, count::text from t',2)
AS ct (row_name text, category_1 text, category_2 text);
DB Fiddle here:
Everything works: https://dbfiddle.uk/iKCW9Uhh
Without CREATE EXTENSION tablefunc; you get this error: https://dbfiddle.uk/j8W1CMvI
ERROR: function crosstab(unknown, integer) does not exist
LINE 4: FROM crosstab('select section::text, status, count::text fro...
^
HINT: No function matches the given name and argument types. You might need to add explicit type casts.
Solution with JSON aggregation:
CREATE TEMP TABLE t (
section text
, status text
, ct integer -- don't use "count" as column name.
);
INSERT INTO t VALUES
('A', 'Active', 1), ('A', 'Inactive', 2)
, ('B', 'Active', 4), ('B', 'Inactive', 5)
, ('C', 'Inactive', 7);
SELECT section,
(obj ->> 'Active')::int AS active,
(obj ->> 'Inactive')::int AS inactive
FROM (SELECT section, json_object_agg(status,ct) AS obj
FROM t
GROUP BY section
)X
Sorry this isn't complete because I can't test it here, but it may get you off in the right direction. I'm translating from something I use that makes a similar query:
select mt.section, mt1.count as Active, mt2.count as Inactive
from mytable mt
left join (select section, count from mytable where status='Active')mt1
on mt.section = mt1.section
left join (select section, count from mytable where status='Inactive')mt2
on mt.section = mt2.section
group by mt.section,
mt1.count,
mt2.count
order by mt.section asc;
The code I'm working from is:
select m.typeID, m1.highBid, m2.lowAsk, m1.highBid - m2.lowAsk as diff, 100*(m1.highBid - m2.lowAsk)/m2.lowAsk as diffPercent
from mktTrades m
left join (select typeID,MAX(price) as highBid from mktTrades where bid=1 group by typeID)m1
on m.typeID = m1.typeID
left join (select typeID,MIN(price) as lowAsk from mktTrades where bid=0 group by typeID)m2
on m1.typeID = m2.typeID
group by m.typeID,
m1.highBid,
m2.lowAsk
order by diffPercent desc;
which will return a typeID, the highest price bid and the lowest price asked and the difference between the two (a positive difference would mean something could be bought for less than it can be sold).
There's a different dynamic method that I've devised, one that employs a dynamic rec. type (a temp table, built via an anonymous procedure) & JSON. This may be useful for an end-user who can't install the tablefunc/crosstab extension, but can still create temp tables or run anon. proc's.
The example assumes all the xtab columns are the same type (INTEGER), but the # of columns is data-driven & variadic. That said, JSON aggregate functions do allow for mixed data types, so there's potential for innovation via the use of embedded composite (mixed) types.
The real meat of it can be reduced down to one step if you want to statically define the rec. type inside the JSON recordset function (via nested SELECTs that emit a composite type).
dbfiddle.uk
https://dbfiddle.uk/N1EzugHk
Crosstab function is available under the tablefunc extension. You'll have to create this extension one time for the database.
CREATE EXTENSION tablefunc;
You can use the below code to create pivot table using cross tab:
create table test_Crosstab( section text,
status text,
count numeric)
insert into test_Crosstab values ( 'A','Active',1)
,( 'A','Inactive',2)
,( 'B','Active',4)
,( 'B','Inactive',5)
select * from crosstab(
'select section
,status
,count
from test_crosstab'
)as ctab ("Section" text,"Active" numeric,"Inactive" numeric)