Pivoting a redshift table

Pivoting a redshift table - sql

I think I am needing to pivot my database... or maybe there is some other function I can use to get the result I am looking for. Below is what my current dataset looks like (I actually have about 15 metrics):
+----------------------------------+---------+------------------------+----------------+
| ID | Metric 1| Metric 2 | Overall Column |
+----------------------------------+---------+------------------------+----------------+
| 1 | Red | Yellow | Red |
| 2 | Yellow | Yellow | Yellow |
| 3 | Yellow | | Yellow |
+----------------------------------+---------+------------------------+----------------+
The overall column already has logic in SQL to say 'Red' if any of the Metrics are Red (even if they are Yellow, too), and then 'Yellow' if any are Yellow. There are also cases where Two metrics can be Yellow, Red, etc. What I am looking to do is add a new column that will show specifically which metric (or metrics) caused the overall value of Red or Yellow. What I am thinking is some sort of pivot that will, for each ID, have metrics as a row value and the corresponding color also as a row value (if that makes sense), and then I can do a listagg function and then join that table back on to my original dataset based on the ID.
Pivot example, ignore col2 & col3..
+----------------------------------+---------+------------------------+----------------+
| ID | col1 | col2 | col3 |
+----------------------------------+---------+------------------------+----------------+
| 1 | Red | | |
| 1 | Yellow | | |
| 3 | Yellow | | |
+----------------------------------+---------+------------------------+----------------+
After this I can listagg that table to capture multiple colors and then join it to the original table. The only thing I am leaving out there is if there is both Red and Yellow metric for an individual ID and then I do a listagg, that would bring both Red and Yellow even though the overall value is based on the Red metric. Hoping the SQL experts can help me out here.

Redshift is currently based on Postgres 8.03 so it is missing a lot of functionality we've come to expect from Postgres over the last few years. So trying to come up with a solution involving unnest, array or lateral is out of the question (I've learned this the hard way).
So barring the availability of all those new-fangled features, you can unpivot the source table and create a set of each id and its metrics by using union all and creating a union for each metric column.
select a.id, metrics.metric
from tbl a
inner join (
select id, metric1 metric from tbl where metric1 is not null
union all select id, metric2 from tbl where metric2 is not null
union all select id, metric15 from tbl where metric15 is not null
) metrics ON metrics.id = a.id
order by a.id, metrics.metric
Results
id | item
---+--------
1 | red
1 | yellow
2 | blue
2 | green
2 | pink
3 | orange
SQL Fiddle

Related

How to write a function to split a string and replace the elements by values from another table?

Mariadb version 10.3.34.
SQL to create the example tables is on this gist.
I have to work with a foreign database on which I have no control. So suggestions to modify the structure of the DB are, sadly, unacceptable. I can add functions, though.
Now, in this database, things can have from 0 to n colors, and the color references are coded as a string of all possible values joined by a | char. I know this is a bad practice, but this is not my db, I can't change it.
+----------------------+
| things |
| name (pkey)| colorsid|
+------------+---------+
| 'door' | '20|5' |
| 'car' | '10' |
| 'hammer' | null |
| 'box' | '5' |
+------------+---------+
+------------------+
| colors |
| id | color |
+------+–––––––––––+
| 5 | 'red' |
| 10 | 'blue' |
| 20 | 'black' |
+------+–––––––––––+
So the door is black and red, the car is blue, the hammer has no color, and the box is red.
Is there a way to build a thing_has_color function so I could do something like this:
SELECT name from things WHERE thing_has_color( name, 'red' );
The result would be
+--------+
| name |
+--------+
| 'door' |
| 'box' |
+--------+
Performance is not an issue (to a reasonable extent, of course). The DB is expected to contain at most a few tens of colors, and no more than 10 000 things.

MariaDB has a FIND_IN_SET function, where set is a list of comma separated values. Just replace pipe by comma:
SELECT name FROM things
WHERE FIND_IN_SET((
SELECT id FROM colors WHERE color="red"),
REPLACE(colorsid,"|", ","));
Another option would be to use a regular expression:
SELECT name FROM things
WHERE colorsid REGEXP
concat("[[:<:]]",(SELECT ID FROM colors WHERE color="red"),"[[:>:]]");
However both solutions will be slow, since they can't use an index.

You may join the tables as the following:
SELECT T.name
FROM things T JOIN colors D
ON CONCAT('|', T.colorsid, '|') LIKE CONCAT('%|', D.id, '|%')
WHERE D.color = 'red'
See a demo.

PostgreSQL: Melt table and calculate percentages for different groups

I am trying to create a funnel chart, but my data is in a wide format right now. It has a couple groups that I want to compare (e.g., A and B in the example below) and they are on different scales, so I want to use proportions as well as the raw values.
I have a starting table that looks like this:
| group | One | Two | Three |
|-------|-----|-----|-------|
| A | 100 | 75 | 50 |
| B | 10 | 7 | 6 |
|-------|-----|-----|-------|
I need to get the table to look like this:
| group | stage | count | proportion of stage One |
|-------|-------|-------|-------------------------|
| A | One | 100 | 1 |
| A | Two | 75 | 0.75 |
| A | Three | 50 | 0.5 |
| B | One | 10 | 1 |
| B | Two | 7 | 0.7 |
| B | Three | 6 | 0.6 |
|-------|-------|-------|-------------------------|
The proportion is calculated as each row's value divided by the maximum value for that group. Stage One is always gonna be 100%, then Stage 2 is the count for that row divided by the max of count for that group value.
The best I could do is connect to the database in python and use Pandas to melt the table, but I would really like to keep everything in a SQL script.
I've been fumbling around and making zero progress four too long. Any help is much appreciated.

You can do this with a UNION query, selecting first the values of One, then Two and Three with the appropriate division to get the proportion:
SELECT "group", 'One' AS stage, One, 1 AS proportion
FROM data
UNION ALL
SELECT "group", 'Two', Two, ROUND(1.0*Two/One, 2)
FROM data
UNION ALL
SELECT "group", 'Three', Three, ROUND(1.0*Three/One, 2)
FROM data
ORDER BY "group"
Output:
group stage one proportion
A One 100 1
A Two 75 0.75
A Three 50 0.50
B One 10 1
B Two 7 0.70
B Three 6 0.60
Demo on dbfiddle

I would recommend a lateral join:
SELECT t."group", v.stage, v.count, v.count * 1.0 / t.one
FROM t CROSS JOIN LATERAL
(VALUES ('One', one),
('Two', two),
('Three', three)
) v(stage, count);
A lateral join should be a little faster than a union all on a small amount of data. As the data gets bigger, only scanning the table once is a bigger win. However, the biggest win is when the "table" is really a more complex query. Then the lateral join can be significantly better in performance.

Flattening edit diffs onto a master record — can I do this more simply or efficiently?

I'm working on a system that needs to track user edits over time. Due to various constraints, we need to keep the original records pristine and merge the edits down into a single row representing the current state.
I'm essentially aiming for the result of "replaying" the edits, so that edited columns show their most recent edited value and unedited columns show the original.
By simplified example, given a table of original records:
books:
book_id | title | color | year
-------------------------------
1 | First | blue | null
2 | Second | green | null
3 | Third | red | 1992
And a table of edits that have been made to the records where all unchanged values are null:
edits:
edit_id | book_id | title | color | year
----------------------------------------
101 | 1 | Uno | null | 2003
102 | 1 | Ett | teal | null
103 | 2 | null | null | 1999
I'm producing output like:
book_id | title | color | year
-------------------------------
1 | Ett | teal | 2003
2 | Second | green | 1999
3 | Third | red | 1992
My current implementation works as expected (on PostgreSQL 9.6), but I have the sneaking feeling that I may be missing a simpler or more efficient way to go about it:
SELECT
books.id,
COALESCE(
(
array_agg(edits.title ORDER BY edits.id DESC)
FILTER (WHERE edits.title IS NOT NULL)
)[1],
books.title
) as title
-- [... repeat for other fields ...]
FROM books
LEFT JOIN edits
ON books.id = edits.book_id
GROUP BY books.id;
Any thoughts?

If you create an aggregate that returns the last non-null value, you could do it like this:
select b.book_id,
coalesce(last(e.title order by e.edit_id), b.title) as title,
coalesce(last(e.color order by e.edit_id), b.color) as color,
coalesce(last(e.year order by e.edit_id), b.year) as year
from books b
left join edits e on b.book_id = e.book_id
group by b.book_id
order by b.book_id;
See the Postgres Wiki for an implementation of the last() (and first()) function.
This might be faster as it does not have to keep all values in memory just to pick the last one. It only keeps one value in memory during aggregation.

Return all the distinct values of column B in one row for each distinct value in column A

Take the following table:
CREATE TABLE boxes (
box integer,
color character varying,
size integer,
...
);
where both box and color can assume not unique values out of a small
set.
Querying this table with:
SELECT color, box FROM boxes;
the result will be something like:
+-------+-----+
| color | box |
+-------+-----+
| blue | 2 |
| blue | 3 |
| blue | 4 |
| green | 1 |
| green | 3 |
| red | 1 |
| red | 2 |
| red | 2 |
+-------+-----+
Is it possible to query this table in a manner such that the result has two columns, one with an array (or string, or list) with all the different box values for each distinct color?
The result should be something like this:
+-------+-----------+
| color | box_types |
+-------+-----------+
| blue | {2,3,4} |
| green | {1,3} |
| red | {1,2} |
+-------+-----------+
where the color column must contain unique values, and each row must contain only distinct box numbers in the aggregate column.
Given the non-agnostic character of this question, I would like to collect all the best solutions for the major DBMS. When answering, please specify for which DBMS each query works.

Try below.
SELECT
color ,
STUFF(
(SELECT DISTINCT ',' +CONVERT(varchar(10), box)
FROM boxes
WHERE color = a.color
FOR XML PATH (''))
, 1, 1, '') AS box_types
FROM boxes AS a
GROUP BY color;
Check SQL Fiddle

Well, in MySQL you can do the following :
select color, group_concat(box) from tbl group by color
In Oracle:
select color, wm_concat(box) from tbl group by color

First of all, this is the negation of the principle of "normalization", in other words it's "bad".
However, there are some dbms, like Microsoft SQL Server, that implement this possibility with the clause PIVOT (and its contrary UNPIVOT).
This clause permits to create a table (using your example) like this:
+-------+------+------+------+
| color | box1 | box2 | box3 |
+-------+------+------+------+
| blue | 2 | 3 | 4 |
| green | 1 | 3 | null |
| red | 1 | 2 | null |
+-------+------+------+------+

SQL WHERE Help: Pulling Data Multiple Rows

I want to pull, say, all rows where a User has Color=blue and Color=red. I am interested in pulling these multiple rows to determine which users CHANGED their Color from blue to red, or from red to blue.
The general query i have now is this. What is wrong and how can i improve it? thank you!
Does this return Zero results because I am asking that the row's value has BOTH blue and red at the same time? (which is impossible)
my other worry, is that if I use OR instead of AND, that i will include rows for users that are color blue, or color red, but did NOT change between the two colors.
I want the results to ONLY show rows 1 and 4
SELECT *
FROM Table a
WHERE a.color='blue'
AND a.color='red'
Table Structure is below
Row | Date | Userid | Session | Color
1 | 11/1 | 001 | 24 | Blue
2 | 11/2 | 002 | 25 | Green
3 | 11/2 | 003 | 26 | Yellow
4 | 11/6 | 001 | 32 | Red

The glaring problem is:
SELECT *
FROM Table a
WHERE a.color='Blue'
OR a.color='Red'
You will either need a field with the previous color to be stored (kind of like a history) if you wish because otherwise there's not enough information in the database to properly assess what colors have been changed from.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Pivoting a redshift table - sql

Related

How to write a function to split a string and replace the elements by values from another table?

PostgreSQL: Melt table and calculate percentages for different groups

Flattening edit diffs onto a master record — can I do this more simply or efficiently?

Return all the distinct values of column B in one row for each distinct value in column A

SQL WHERE Help: Pulling Data Multiple Rows

Categories

Resources