Pivot Table with Redshift (PostgreSQL) with Count - sql

I'm facing a challenge with Redshift:
I'm trying to dynamically move rows into columns and aggregate by count, however I noticed the pivot table feature is only available from PostgreSQL 9.
Any idea about how to do the following?
index fruit color
1 apple red
2 apple yellow
2 banana blue
2 banana blue
3 banana blue
3 banana green
3 pear green
3 pear red
to:
index red yellow blue green
1 1 0 0 0
2 0 1 2 0
3 1 0 1 2
Essentially, grouping and counting occurrences of color per id (fruit is not so important, although I'll use it as a filter later).
Note: I might also want to do a binary transformation later on (i.e 0 for 0 and 1 if > 0)
Edit: If the above is not possible, any way to do this instead ?
index color count
1 red 1
1 yellow 0
1 blue 0
1 green 0
2 red 0
2 yellow 1
2 blue 2
2 green 0
3 red 1
3 yellow 0
3 blue 1
3 green 2
(again blue,yellow,blue and green should be dynamic)

For the Edit, you could do
select x.index, x.color, sum(case when y.index is not null then 1 else 0 end) as count
from
((select index
from [table]
group by index
order by index) a
inner join
(select color
from [table]
group by color
order by color) b
on 1 = 1) x
left outer join
[table] y
on x.index = y.index
and x.color = y.color
group by x.index, x.color
order by x.index, x.color

If PIVOT is not available in Redshift, then you could always just use a standard pivot query:
SELECT
index,
SUM(CASE WHEN color = 'red' THEN 1 ELSE 0 END) AS red,
SUM(CASE WHEN color = 'yellow' THEN 1 ELSE 0 END) AS yellow,
SUM(CASE WHEN color = 'blue' THEN 1 ELSE 0 END) AS blue,
SUM(CASE WHEN color = 'green' THEN 1 ELSE 0 END) AS green
FROM yourTable
GROUP BY index

Related

T-SQL LAG function for returning previous rows with different WHERE condition

I have data like:
table name: "Data"
ID Name Color Value
1 A Blue 1
2 B Red 2
3 A Blue 3
4 B Red 4
5 B Blue 3
6 A Red 4
Can I use a SQL LAG function to get for each Name that is Red, the previous value for for that name that was Blue (ordering by ID)?
Result set:
ID Name Color Value PreviousValue
2 B Red 2 NULL
4 B Red 4 NULL
6 A Red 4 3
select *
from
(
select *
,case when color = 'red' and color != lag(color) over(partition by name order by id) then lag(value) over(partition by name order by ID) end PreviousValue
from t
) t
where color = 'red'
order by id
ID
Name
Color
Value
PreviousValue
2
B
Red
2
null
4
B
Red
4
null
6
A
Red
4
3
Fiddle

Total column in a pivot example

check here for background if needed:
Pivoting a table with parametrization
We have 3 tables.
tid_color - parametrization table
--------------------------
ID ColorDescription
--------------------------
1 Green
2 Yellow
3 Red
-------------------------
tid_car - parametrization table
--------------------------
ID CARDescription
-------------------------
1 Car X
2 Car Y
3 Car Z
--------------------------
table_owners_cars
------------------------------------------------
ID CarID ColorID Owner
------------------------------------------------
1 1 1 John
2 1 2 Mary
3 1 3 Mary
4 1 3 Giovanni
5 2 2 Mary
6 3 1 Carl
7 1 1 Hawking
8 1 1 Fanny
------------------------------------------------
CarID is FOREIGN KEY to tid_car
ColorId is FOREIGN KEY to tid_color
If we code:
SELECT tcar.CarDescription, tco.ColorDescription, Count(*) as Total
FROM table_owners_cars tocar
LEFT JOIN tid_color tco ON tco.Id = tocar.ColorId
LEFT JOIN tid_Car tcar ON tcar.Id = tocar.CarId
GROUP BY CarDescription, ColorDescription
it results as:
Id CarDescription ColorDescription Total
1 CarX Green 3
2 CarX Yellow 1
3 CarX Red 1
4 CarY Yellow 1
5 CarZ Green 1
I want to pivot exactly as follows:
---------------------------------------------
Id Car Green Yellow Red Total
---------------------------------------------
1 CarX 3 1 1 5
2 CarY 0 1 0 1
3 CarZ 1 0 0 1
---------------------------------------------
Now:
we want to count the total for each row in a particular column of the table_owners_cars and this value is close to total like we see in the last column (between parenthesis). There are CarX WITH a NULL for the colorID (same can happen with the other Car) and we want to know all the number of carX, carY, CarZ (with and without (=null or 0) assigned ColorId
---------------------------------------------------
Id Car Green Yellow Red Violet Total
---------------------------------------------------
1 CarX 3 1 1 0 5 (40)
2 CarY 0 1 0 0 1 (35)
3 CarZ 1 0 0 0 1 (4)
---------------------------------------------------
DESIRED TABLE
One try with the code (very similar to one provided in the aforementioned hyperlink):
SELECT pvt.CarID, tc.Description AS Car, CONCAT (' [1] as 'Green', [2] as 'Yellow', [3] as 'Red', [1]+[2]+[3] as 'total'', '(', count(*), ')' )
FROM
(SELECT CarID, colorId
FROM table_owners_cars tocar
) p
PIVOT
(
COUNT (ColorId)
FOR ColorId IN ( [1], [2], [3])
) AS pvt
INNER JOIN tid_car tc ON pvt.CarId=tc.Id
group by p.Car
this does not work. single quotes are also a nightmare with concat. Thanks in advance.
I just find these queries easier to do with conditional aggregation:
SELECT CarId, Description,
SUM(CASE WHEN color = 'Green' THEN 1 ELSE 0 END) as Green,
SUM(CASE WHEN color = 'Yellow' THEN 1 ELSE 0 END) as Yellow,
SUM(CASE WHEN color = 'Red' THEN 1 ELSE 0 END) as Red,
SUM(CASE WHEN color IN ('Green', 'Yellow', 'Red') THEN 1 ELSE 0 END) as total_gyr,
COUNT(*) as total
FROM table_owners_cars tocar
GROUP BY CarId, Description;
I see no reason to combine the two totals into a single string column -- as opposed to having them in separate integer columns. But, you can combine them if you want.

Conditional formatting on MAX value row

Below is a table:
Paration by ID & capture the row of MAX value when Role = Red
ID Role HistID Date Style
1 Yellow 101 1/1/17 M
1 Red 101 1/2/17 F
1 Red (Null) 1/5/17 C
2 Blue 101 5/1/17 a
2 Yellow 201 4/1/17 b
2 Red 301 5/5/17 C
3 Yellow (Null)
Referece the below rows:
ID Role HistID Date Style
1 Red (Null) 1/5/17 c
2 Red 301 5/5/17 c
Now based off those rows apply a condition.
WHEN HistID IS NOT NULL and Style = C THEN 'Assigned'
ELSE'Unassigned'
END Status
Output:
ID Role HistID Date Style Status
1 Yellow 101 1/1/17 M Unassigned
1 Red 101 1/2/17 F Unassigned
1 Red (Null) 1/5/17 C Unassigned
2 Blue 101 5/1/17 a Assigned
2 Yellow 201 4/1/17 b Assigned
2 Red 301 5/5/17 C Assigned
3 Yellow (Null) Unassigned
Not so much the answer here, I would like understand and learn the syntax behind applying MAX , Case Expression and Keep clause.
Use window functions:
select t.*,
(case when matches_flag > 0 then 'Assigned' else 'Unassigned' end) as status
from (select t.*,
sum(case when role = 'Red' and histid is not null and style = 'C' then 1 else 0 end) over
(partition by id) as matches_flag
from t
) t;
EDIT:
The subquery is not actually needed. I just think it makes the logic easier to follow. You can do:
select t.*,
(case when sum(case when role = 'Red' and histid is not null and style = 'C' then 1 else 0 end) over (partition by id) > 0
then 'Assigned'
else 'Unassigned'
end) as status
from t;

Show result with atleast two matching values and one matching is set to primary

How do you display results where the person have 2 matching values with Color blue and red and person have primary of blue is set to 1 and red to 0
PersonList primary color
person1 1 blue
person1 0 red
person2 1 blue
person3 1 red
person3 0 blue
person4 1 blue
person4 0 red
person4 1 blue
Result Should Display Person1 and Person 4
NOTE: As long as blue its primary is set to 1 and red set to 0.
So far this is the my query from the result above
Select * person p Inner Join COLOR c ON p.person_colorid = c.person_colorid
I have tried this query but I know there is wrong with this. Which will display red with primary as 1 and also blue as 1
Person table contains [personList],[person_colorid] and [is_primary] while color table contains [color] and [person_colorid]
Select * person p Inner Join COLOR
c ON p.person_colorid = c.person_colorid where c.color IN (blue,red) AND p.primary = 1
Use a CASE expression to check the conditions that the PersonList which matching both the conditions.
Query
select t.[PersonList] from(
select [PersonList],
sum(case when [color] = 'red' and [primary] = 0 then 1 else 0 end) as [red],
sum(case when [color] = 'blue' and [primary] = 1 then 1 else 0 end) as [blue]
from [your_table_name]
group by [PersonList]
)t
where t.[red] > 0 and t.[blue] > 0;

Select records with the same name and another field having two values

Given a table of products that are available in different colors,
NAME COLOR
---- -----
pen red
pen blue
pen yellow
box red
mic red
tape blue
How can I find the names of the products that are available in both red and blue (pen), and the names of products available in red, but not in blue (box, mic)?
Fiddle: http://sqlfiddle.com/#!9/021a6/3
I like group by with having for these types of queries.
For both colors:
select name
from t
group by name
having sum(case when color = 'red' then 1 else 0 end) > 0 and
sum(case when color = 'blue' then 1 else 0 end) > 0;
For red but not blue:
select name
from t
group by name
having sum(case when color = 'red' then 1 else 0 end) > 0 and
sum(case when color = 'blue' then 1 else 0 end) = 0;
The conditions in the having clause count the number of rows that match the condition (for each name). So, > 0 means that at least one row matched and = 0 means that no row matched.