how to pivot simple table of 2 rows in postgresql? - sql

expected table is this:
good_days bad_days
6 25
But I have this table:
day_type x
bad_days 25
good_days 6
my code is not working:
select *
from (select * from main_table)t
pivot(count(x) for day_type in ('bad_days', 'good_days') ) as pivot_table

There are multiple ways to do this
Use postgresql extension tablefunc which contains crosstab method which can accept query result and pivot the result
You can also create a custom query (only works if you've less and known day_type column values)
WITH cte (day_type, x) AS (
VALUES ('bad_days', 25), ('good_days', 6))
SELECT sum(good_days) AS good_days,
sum(bad_days) AS bad_days
FROM (
(SELECT x AS good_days,
0 AS bad_days
FROM cte
WHERE day_type = 'good_days')
UNION ALL
(SELECT 0 AS good_days,
x AS bad_days
FROM cte
WHERE day_type = 'bad_days')) AS foo

A simple method is conditional aggregation:
select sum(x) filter (where data_type = 'bad_days') as bad_days,
sum(x) filter (where data_type = 'good_days') as good_days
from t;

Related

Stacking my conditions in a CASE statement it's not returning all cases for each member

SELECT DISTINCT
Member_ID,
CASE
WHEN a.ASTHMA_MBR = 1 THEN 'ASTHMA'
WHEN a.COPD_MBR = 1 THEN 'COPD'
WHEN a.HYPERTENSION_MBR = 1 THEN 'HYPERTENSION'
END AS DX_FLAG
So a member may have more than one, but my statement is only returning one of them.
I'm using Teradata and trying to convert multiple columns of boolean data into one column. The statement is only returning one condition when members may have 2 or more. I tried using Select instead of Select Distinct and it made no difference.
This is a kind of UNPIVOT:
with base_data as
( -- select the columns you want to unpivot
select
member_id
,date_col
-- the aliases will be the final column value
,ASTHMA_MBR AS ASTHMA
,COPD_MBR AS COPD
,HYPERTENSION_MBR AS HYPERTENSION
from your_table
)
,unpvt as
(
select member_id, date_col, x, DX_FLAG
from base_data
-- now unpivot those columns into rows
UNPIVOT(x FOR DX_FLAG IN (ASTHMA, COPD, HYPERTENSION)
) dt
)
select member_id, DX_FLAG, date_col
from unpvt
-- only show rows where the condition is true
where x = 1

SQL: Showing column name of calculated result

So I am calculating the max of a series of column and I was wondering what the best way to show the name of the result column.
Example Table:
hour1 hour2 hour3
16 10 5
My query looks like this:
(SELECT Max(v)
FROM (VALUES (hour1) , (hour2) , (hour3))
AS VALUE (v)) AS PEAK_VALUE
Note this is in another query.
Desired output:
PEAK_VALUE PEAK_HOUR
16 hour1
I would also like to do further calculations on the PEAK_VALUE column as well. For example dividing it by 2 for this output:
PEAK_VALUE HALF_VALUE PEAK_HOUR
16 8 hour1
You almost got it though you have some issue with your query syntax. You need to add the column name to the un-pivot. After that user row_number() to find the max value
SELECT PEAK_VALUE = v,
HALF_VALUE = v / 2,
PEAK_HOUR = c
FROM
(
SELECT *, rn = row_number() over (order by v desc)
FROM example
CROSS APPLY
(
VALUES ('hour1', hour1) , ('hour2', hour2) , ('hour3', hour3)
) AS VALUE (c, v)
) v
WHERE rn = 1
dbfiddle
Just another option sans row_number()
Example
Select top 1
Peak_Value = Hour
,Half_Value = Hour/2
,Peak_Hour = Item
From YourTable A
Cross Apply (values ('hour1',hour1)
,('hour2',hour2)
,('hour3',hour3)
) B (Item,Hour)
Order by Hour Desc
Results
Peak_Value Half_Value Peak_Hour
16 8 hour1
Another approach using UNPIVOT.
DECLARE #table table(hour1 int, hour2 int, hour3 int)
insert into #table
values(16,10,5)
SELECT top 1 max(val) as peak_value, max(val) /2 as Half_Value, [hour]
FROM #table
unpivot (val for [hour] in ([hour1],[hour2],[hour3])) as t
group by [hour]
order by max(val) desc
peak_value
Half_Value
hour
16
8
hour1
You can shove a whole unpivoting query inside a CROSS APPLY, and then do further calculations:
SELECT t.*, v.*
FROM yourTable t
CROSS APPLY (
SELECT TOP (1)
v.PEAK_VALUE,
HALF_VALUE = v.PEAK_VALUE / 2,
v.PEAK_HOUR,
FROM (VALUES
('hour1', hour1),
('hour2', hour2),
('hour3', hour3)
) AS v(PEAK_HOUR, PEAK_VALUE)
ORDER BY PEAK_VALUE DESC
) AS v
This is slightly different from the other answers, in that it will calculate the subquery per row

can't use JOIN with generate_series on Redshift

generate_series function on Redshift works as expected, when used in a simple select statement.
WITH series AS (
SELECT n as id from generate_series (-10, 0, 1) n
) SELECT * FROM series;
-- Works fine
As soon as I add a JOIN condition, redshift throws
com.amazon.support.exceptions.ErrorException: Function
generate_series(integer,integer,integer)" not supported"
DROP TABLE testing;
CREATE TABLE testing (
id INT
);
WITH series AS (
SELECT n as id from generate_series (-10, 0, 1) n
) SELECT * FROM series S JOIN testing T ON S.id = T.id;
-- Function "generate_series(integer,integer,integer)" not supported.
Redshift Version
SELECT version();
-- PostgreSQL 8.0.2 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.4.2 20041017 (Red Hat 3.4.2-6.fc3), Redshift 1.0.1485
Are there any workarounds to make this work?
generate_series is not supported by Redshift. It works only standalone on a leader node.
A workaround would be using row_number against any table that has sufficient number of rows:
with
series as (
select (row_number() over ())-11 from some_table limit 10
) ...
also, this question was asked multiple times already
You are correct that this does not work on Redshift.
See here.
The easiest workaround is to create a permanent table "manually" beforehand with the values within that table, e.g. you could have rows on that table for -1000 to +1000, then select the range from that table,
So for your example you would have something like
WITH series AS (
SELECT n as id from (select num as n from newtable where num between -10 and 0) n
) SELECT * FROM series S JOIN testing T ON S.id = T.id;
Does that work for you?
Alternatively, if you cannot create the table beforehand or prefer not to, you could use something like this
with ten_numbers as (select 1 as num union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9 union select 0)
,generted_numbers AS
(
SELECT (1000*t1.num) + (100*t2.num) + (10*t3.num) + t4.num-5000 as gen_num
FROM ten_numbers AS t1
JOIN ten_numbers AS t2 ON 1 = 1
JOIN ten_numbers AS t3 ON 1 = 1
JOIN ten_numbers AS t4 ON 1 = 1
)
select gen_num from generted_numbers
where gen_num between -10 and 0
order by 1;

SELECT FROM a subquery table consisting of a TRANSFORM ... PIVOT table

I have the following functioning query to create a crosstab/pivot table in Access
TRANSFORM Sum(y.TNAV) AS TNAV
SELECT y.RecDate
FROM BNYDaily AS y
WHERE (((y.AccName) In ("A","B")) AND y.RecDate >= DateValue("1/1/2013"))
GROUP BY y.RecDate
PIVOT y.AccName; )
The problem is that the query returns results with NULL fields that messes up my calculation. I want to omit rows in this crosstab table that have NULL value in either columns:
RecDate A B
....
1/25/2013 1,469,004,032.00 968.63
1/26/2013 1,466,082,304.00
1/28/2013 973.91
1/29/2013 1,471,277,440.00 971.66
...
I tried the following query that uses the above query as a subquery without any luck:
SELECT * FROM
(
TRANSFORM Sum(y.TNAV) AS TNAV
SELECT y.RecDate
FROM BNYDaily AS y
WHERE (((y.AccName) In ("A","B")) AND y.RecDate >= DateValue("1/1/2013"))
GROUP BY y.RecDate
PIVOT y.AccName;
) AS t
WHERE t.A IS NOT NULL AND t.B is NOT NULL
which oddly doesn't run in Access and returns an error. If I query from the crosstab query as a saved query table it works. Any ideas?
Instead of "squeezing out" the rows containing Nulls from the results of the crosstab, how about eliminating the rows that produce the Nulls from the source of the crosstab? I just tried the following and it seems to work:
TRANSFORM Sum(y.TNAV) AS TNAV
SELECT y.RecDate
FROM
(
SELECT RecDate, AccName, TNAV
FROM BNYDaily
WHERE RecDate IN (SELECT RecDate FROM BNYDaily WHERE AccName = "A")
AND RecDate IN (SELECT RecDate FROM BNYDaily WHERE AccName = "B")
) AS y
WHERE (((y.AccName) In ("A","B")) AND y.RecDate >= DateValue("1/1/2013"))
GROUP BY y.RecDate
PIVOT y.AccName;

Finding Covariance using SQL

# dt---------indx_nm1-----indx_val1-------indx_nm2------indx_val2
2009-06-08----ABQI------1001.2------------ACNACTR----------300.05
2009-06-09----ABQI------1002.12 ----------ACNACTR----------341.19
2009-06-10----ABQI------1011.4------------ACNACTR----------382.93
2009-06-11----ABQI------1015.43 ----------ACNACTR----------362.63
I have a table that looks like ^ (but with hundreds of rows that dates from 2009 to 2013). Is there a way that I could calculate the covariance : [(indx_val1 - avg(indx_val1)) * (indx_val2 - avg(indx_val2)] divided by total number of rows for each value of indx_val1 and indx_val2 (loop through the entire table) and return just a simple value for cov(ABQI, ACNACTR)
Since you have aggregates operating over two different groups, you will need two different queries. The main one groups by dt to get your row values per date. The other query has to perform AVG() and COUNT() aggregates across the whole rowset.
To use them both at the same time, you need to JOIN them together. But since there's no actual relation between the two queries, it is a cartesian product and we'll use a CROSS JOIN. Effectively, that joins every row of the main query with the single row retrieved by the aggregate query. You can then perform the arithmetic in the SELECT list, using values from both:
So, building on the query from your earlier question:
SELECT
indxs.*,
((indx_val2 - indx_val2_avg) * (indx_val1 - indx_val1_avg)) / total_rows AS cv
FROM (
SELECT
dt,
MAX(CASE WHEN indx_nm = 'ABQI' THEN indx_nm ELSE NULL END) AS indx_nm1,
MAX(CASE WHEN indx_nm = 'ABQI' THEN indx_val ELSE NULL END) AS indx_val1,
MAX(CASE WHEN indx_nm = 'ACNACTR' THEN indx_nm ELSE NULL END) AS indx_nm2,
MAX(CASE WHEN indx_nm = 'ACNACTR' THEN indx_val ELSE NULL END) AS indx_val2
FROM table1 a
GROUP BY dt
) indxs
CROSS JOIN (
/* Join against a query returning the AVG() and COUNT() across all rows */
SELECT
'ABQI' AS indx_nm1_aname,
AVG(CASE WHEN indx_nm = 'ABQI' THEN indx_val ELSE NULL END) AS indx_val1_avg,
'ACNACTR' AS indx_nm2_aname,
AVG(CASE WHEN indx_nm = 'ACNACTR' THEN indx_val ELSE NULL END) AS indx_val2_avg,
COUNT(*) AS total_rows
FROM table1 b
WHERE indx_nm IN ('ABQI','ACNACTR')
/* And it is a cartesian product */
) aggs
WHERE
indx_nm1 IS NOT NULL
AND indx_nm2 IS NOT NULL
ORDER BY dt
Here's a demo, building on your earlier one: http://sqlfiddle.com/#!6/2ec65/14
Here is a Scalar-valued function to perform a covariance calculation on any two column table formatted to XML.
To Test: Compile the function then execute the Alpha Test
CREATE Function [dbo].[Covariance](#XmlTwoValueSeries xml)
returns float
as
Begin
/*
-- -----------
-- ALPHA TEST
-- -----------
IF object_id('tempdb..#_201610101706') is not null DROP TABLE #_201610101706
select *
into #_201610101706
from
(
select *
from
(
SELECT '2016-01' Period, 1.24 col0, 2.20 col1
union
SELECT '2016-02' Period, 1.6 col0, 3.20 col1
union
SELECT '2016-03' Period, 1.0 col0, 2.77 col1
union
SELECT '2016-04' Period, 1.9 col0, 2.98 col1
) A
) A
DECLARE #XmlTwoValueSeries xml
SET #XmlTwoValueSeries = (
SELECT col0,col1 FROM #_201610101706
FOR
XML PATH('Output')
)
SELECT dbo.Covariance(#XmlTwoValueSeries) Covariance
*/
declare #returnvalue numeric(20,10)
set #returnvalue =
(
SELECT SUM((x - xAvg) *(y - yAvg)) / MAX(n) AS [COVAR(x,y)]
from
(
SELECT 1E * x x,
AVG(1E * x) OVER (PARTITION BY (SELECT NULL)) xAvg,
1E * y y,
AVG(1E * y) OVER (PARTITION BY (SELECT NULL)) yAvg,
COUNT(*) OVER (PARTITION BY (SELECT NULL)) n
FROM
(
SELECT
e.c.value('(col0/text())[1]', 'float' ) x,
e.c.value('(col1/text())[1]', 'FLOAT' ) y
FROM #XmlTwoValueSeries.nodes('Output') e(c)
) A
) A
)
return #returnvalue
end
GO