Reshape query results in pure PostgreSQL

Reshape query results in pure PostgreSQL - sql

I have a SQL query result table like this :
Date,metric,value
1,x,2
2,x,3
2,y,5
3,y,8
3,z,9
And I would like to get the sum by day for each metric ( filling with 0 when not present) :
Date,x,y,z
1,2,0,0
2,3,5,0
3,0,8,9
I do not know beforehand the names of the metrics. At the moment I'm loading the results in python and reshaping using pandas but surely there is a PostgreSQL way to do it.
How to achieve the above in PostgreSQL ?

you can use conditional aggregation with case when expression
select date,
max(case when metric='x' then value end) as x,
max(case when metric='y' then value end) as y,
max(case when metric='z' then value end) as z
from tablename
group by date

you can use crosstab
select * from crosstab('select date, metric ,value from metric_table order by date, metric ,value '
,'select metric from metric_table group by metric order by metric')
as ct( date integer ,y integer,x integer, z integer);
But beware, "as ct( date integer ,y integer,x integer, z integer)" part must be dynamically created before calling query, based on "select metric from metric_table group by metric order by metric" result set

Related

Stacking my conditions in a CASE statement it's not returning all cases for each member

SELECT DISTINCT
Member_ID,
CASE
WHEN a.ASTHMA_MBR = 1 THEN 'ASTHMA'
WHEN a.COPD_MBR = 1 THEN 'COPD'
WHEN a.HYPERTENSION_MBR = 1 THEN 'HYPERTENSION'
END AS DX_FLAG
So a member may have more than one, but my statement is only returning one of them.
I'm using Teradata and trying to convert multiple columns of boolean data into one column. The statement is only returning one condition when members may have 2 or more. I tried using Select instead of Select Distinct and it made no difference.

This is a kind of UNPIVOT:
with base_data as
( -- select the columns you want to unpivot
select
member_id
,date_col
-- the aliases will be the final column value
,ASTHMA_MBR AS ASTHMA
,COPD_MBR AS COPD
,HYPERTENSION_MBR AS HYPERTENSION
from your_table
)
,unpvt as
(
select member_id, date_col, x, DX_FLAG
from base_data
-- now unpivot those columns into rows
UNPIVOT(x FOR DX_FLAG IN (ASTHMA, COPD, HYPERTENSION)
) dt
)
select member_id, DX_FLAG, date_col
from unpvt
-- only show rows where the condition is true
where x = 1

PANDAS divide for a given value with groupby

I want to divide each 'Value' in this dataset by the Value at TIME=='1970-Q1' grouped by LOCATION.
This is how I'd implement the logic in SQL
WITH first_year AS (
SELECT LOCATION, Value
FROM `table`
WHERE TIME = '1970-Q1'
)
SELECT t.LOCATION, t.TIME, ((t.Value / f.Value) * 100) normValue
FROM `table` t,
first_year f
WHERE t.LOCATION = f.LOCATION
ORDER BY LOCATION, TIME ASC
However, you can also assume that we can sort (ascending) the column TIME within the group and take the first value. It's always a string like 'YYYY-QX'
Expected result:

Try with transform
df['normal'] = df.Value / df['VALUE'].where(df.TIME.str[5:] =='Q1').groupby(df['LOCATION']).transform('first')

Convert and sum variable a, grouping by variable b

I would like to convert the variable ar66 from nvarchar to numeric and sum it for the variable ar5.
I create the following code, but it does not work:
select top(10) ar5, (
select
case
when isnumeric(q1.ar66) = 1 then
cast(q1.ar66 AS numeric)
else
NULL
end
AS 'ar66_numeric'
from rmb_loan q1)
from rmb_loan q2
group by q2.ar5
Do you have any suggestion to solve the problem?

Does this do what you want?
select top (10) ar5, sum(try_convert(numeric(38, 6), q1.ar66))
from rmb_loan q2
group by q2.ar5;
When using select top you should normally have an order by clause.

HPE Vertica live aggregate projection example for user retention

create table events(
id char(36) PRIMARY KEY,
game_id varchar(24) not null,
user_device_id char(36) not null,
event_name varchar(100) not null,
generated_at timestamp with time zone not null
);
SELECT
events.generated_at::DATE AS time_stamp,
COUNT(DISTINCT (
CASE WHEN
events.event_name = 'new_user' THEN events.user_device_id
END
)
) as new_users,
COUNT(DISTINCT (
CASE WHEN
future_events.event_name <> 'new_user' THEN future_events.user_device_id
END
)
) as returned_users,
COUNT(DISTINCT (
CASE WHEN
future_events.event_name <> 'new_user' THEN future_events.user_device_id
END
)) / COUNT(DISTINCT (
CASE WHEN
events.event_name = 'new_user' THEN events.user_device_id
END
))::float as retention
FROM events
LEFT JOIN events AS future_events ON
events.user_device_id = future_events.user_device_id AND
events.generated_at = future_events.generated_at - interval '1 day' AND
events.game_id = future_events.game_id
GROUP BY
time_stamp
ORDER BY
time_stamp;
I am trying to get the Day N ('N' -> any number between 1 to 7) user retention via the above sql query. Due to the fact that I am a noob in HPE vertica, I am not being able to come up the optimum aggregate projection creating statement, Since projection significantly improves the performance of the query.

Aggregated projection won't help with a join query.
You can create a regular projection, segmented and sorted by the join columns, to achieve performance improvement:
CREATE PROJECTION events_p1 (
id,
game_id ENCODING RLE,
user_device_id ENCODING RLE,
event_name,
generated_at ENCODING RLE
) AS
SELECT id,
game_id,
user_device_id,
event_name,
generated_at
FROM events
ORDER BY generated_at,
game_id,
user_device_id
SEGMENTED BY hash(generated_at,game_id,user_device_id) ALL NODES KSAFE 1;

Summary Stats and Corresponding Dates - SQL Server

I was hoping someone perhaps could help. This problem was presented to me recently and I thought it would be easy, but (personally) found it a bit of a struggle. I can do it in Excel and SSRS - but I was curious if I was able to do it in SQL Server...
I would like to create a set of summary statistics (Max, Min) for a dataset. Easy enough... But I wanted to associate the corresponding date with those values.
Here is what my data looks like:
I have yearly data (not exactly - but beside the point) and I produce a pivoted summary like this using a series of CASE WHEN statements. This is fine - the output is seen on the right (above).
Each time I output this data - I like to provide a summary of the all the historic data (I only show the most recent data for sake of brevity). So... The question is how do I take an output like the one shown below (on different dates) and provide a summary data set like the one I have on the right?
So - a little background. I have already managed to join the Min and Max values using a UNION and that bit is fine. The tricky bit (I think) is how to form an INNER JOIN, using a sub query, with the Max or Min result values to return the corresponding Max or Min date, for each Type? Now it is highly likely that I am being a bit of an idiot and missing something obvious....but... Would really appreciate any help from anyone...
Many thanks in advance

This query will do the job, and for all TYPE
SELECT
Description, [CAR], [CAT], [MAT], [EAT], [PAR], [MAR], [FAR], [MOT], [LOT], [COT], [ROT]
FROM
(SELECT
unpvt.TYPE
,unpvt.Description
,unpvt.value
FROM (
SELECT
t.TYPE
,CONVERT(sql_variant,MAX(maxResult.MAX_RESULT)) as MAX_RESULT
,CONVERT(sql_variant,MIN(minResult.MIN_RESULT)) as MIN_RESULT
,CONVERT(sql_variant,MAX(CASE WHEN maxResult.MAX_RESULT IS NOT NULL THEN t.DATE ELSE NULL END)) as MAX_DATE
,CONVERT(sql_variant,MIN(CASE WHEN minResult.MIN_RESULT IS NOT NULL THEN t.DATE ELSE NULL END)) as MIN_DATE
FROM
table_name t -- You need to set your table name
LEFT JOIN (SELECT
TYPE
,MIN(RESULT) as MIN_RESULT
FROM
table_name -- You need to set your table name
GROUP BY
TYPE) minResult
on minResult.TYPE = t.TYPE
and minResult.MIN_RESULT = t.RESULT
LEFT JOIN (SELECT
TYPE
,MAX(RESULT) as MAX_RESULT
FROM
table_name -- You need to set your table name
GROUP BY
TYPE) maxResult
on maxResult.TYPE = t.TYPE
and maxResult.MAX_RESULT = t.RESULT
GROUP BY
t.TYPE) U
unpivot (
value
for Description in (MAX_RESULT, MIN_RESULT, MAX_DATE, MIN_DATE)
) unpvt) P
PIVOT
(
MAX(value)
FOR TYPE IN ([CAR], [CAT], [MAT], [EAT], [PAR], [MAR], [FAR], [MOT], [LOT], [COT], [ROT])
)AS PVT
DEMO : SQLFIDDLE
CONVERT(sql_variant, is a cast for columns to a common data type. This is a requirement of the UNPIVOT operator when you are running with subquery FROM.

It is possible to use the PIVOT command if your SQLServer is 2005 or better, but the raw data for the pivot need to be in a specific format, and the query I came up with is ugly
WITH minmax AS (
SELECT TYPE, RESULT, [date]
, row_number() OVER (partition BY TYPE ORDER BY TYPE, RESULT) a
, row_number() OVER (partition BY TYPE ORDER BY TYPE, RESULT DESC) d
FROM t)
SELECT info
, cam = CASE charindex('date', info)
WHEN 0 THEN cast(cast(cam AS int) AS varchar(50))
ELSE cast(cam AS varchar(50))
END
, car = CASE charindex('date', info)
WHEN 0 THEN cast(cast(car AS int) AS varchar(50))
ELSE cast(cam AS varchar(50))
END
, cat = CASE charindex('date', info)
WHEN 0 THEN cast(cast(cat AS int) AS varchar(50))
ELSE cast(cam AS varchar(50))
END
FROM (SELECT TYPE, 'maxres' info, RESULT value FROM minmax WHERE 1 = d
UNION ALL
SELECT TYPE, 'minres' info, RESULT value FROM minmax WHERE 1 = a
UNION ALL
SELECT TYPE, 'maxdate' info , [date] value FROM minmax WHERE 1 = d
UNION ALL
SELECT TYPE, 'mindate' info , [date] value FROM minmax WHERE 1 = a) DATA
PIVOT
(max(value) FOR TYPE IN ([CAM], [CAR], [CAT])) pvt
It's only a proof of concept so in SQLFiddle I have used a reducet set of fake data (3 row per 3 Type)
After the data preparation
SELECT TYPE, 'maxres' info, RESULT value FROM minmax WHERE 1 = d
UNION ALL
SELECT TYPE, 'minres' info, RESULT value FROM minmax WHERE 1 = a
UNION ALL
SELECT TYPE, 'maxdate' info , [date] value FROM minmax WHERE 1 = d
UNION ALL
SELECT TYPE, 'mindate' info , [date] value FROM minmax WHERE 1 = a
the value column is implicitly casted to the more complex datatype, in this case DateTime (you cannot have different data type in the same column), to see the data in the intended way an explicit cast is in needed, and is done with the CASE and CAST in
, cam = CASE charindex('date', info)
WHEN 0 THEN cast(cast(cam AS int) AS varchar(50))
ELSE cast(cam AS varchar(50))
END
the CASE check the data type, looking for the substring 'date' in the info column, then cast the row value back to INT for the minres and maxres column and in any case cast the value to varchar(50) to have the same data type again
UPDATE
With the sql_variant the CASE CAST block is not needed, thanks Ryx5
WITH minmax AS (
SELECT TYPE, RESULT, [date]
, row_number() OVER (partition BY TYPE ORDER BY TYPE, RESULT) a
, row_number() OVER (partition BY TYPE ORDER BY TYPE, RESULT DESC) d
FROM table_name)
SELECT info
, [CAM], [CAR], [CAT]
FROM (SELECT TYPE, 'maxres' info, cast(RESULT as sql_variant) value
FROM minmax WHERE 1 = d
UNION ALL
SELECT TYPE, 'minres' info, cast(RESULT as sql_variant) value
FROM minmax WHERE 1 = a
UNION ALL
SELECT TYPE, 'maxdate' info , cast([date] as sql_variant) value
FROM minmax WHERE 1 = d
UNION ALL
SELECT TYPE, 'mindate' info , cast([date] as sql_variant) value
FROM minmax WHERE 1 = a) DATA
PIVOT
(max(value) FOR TYPE IN ([CAM], [CAR], [CAT])) pvt

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Reshape query results in pure PostgreSQL - sql

you can use conditional aggregation with case when expression select date, max(case when metric='x' then value end) as x, max(case when metric='y' then value end) as y, max(case when metric='z' then value end) as z from tablename group by date

Related

Stacking my conditions in a CASE statement it's not returning all cases for each member

PANDAS divide for a given value with groupby

Convert and sum variable a, grouping by variable b

HPE Vertica live aggregate projection example for user retention

Summary Stats and Corresponding Dates - SQL Server

Categories

Resources