Pivot for redshift database - sql

I know this question has been asked before but any of the answers were not able to help me to meet my desired requirements. So asking the question in new thread
In redshift how can use pivot the data into a form of one row per each unique dimension set, e.g.:
id Name Category count
8660 Iced Chocolate Coffees 105
8660 Iced Chocolate Milkshakes 10
8662 Old Monk Beer 29
8663 Burger Snacks 18
to
id Name Cofees Milkshakes Beer Snacks
8660 Iced Chocolate 105 10 0 0
8662 Old Monk 0 0 29 0
8663 Burger 0 0 0 18
The category listed above gets keep on changing.
Redshift does not support the pivot operator and a case expression would not be of much help (if not please suggest how to do it)
How can I achieve this result in redshift?
(The above is just an example, we would have 1000+ categories and these categories keep's on changing)

i don't think there is a easy way to do that in Redshift,
also you say you have more then 1000 categories and the number is growing
you need to taking in to account you have limit of 1600 columns per table,
see attached link
[http://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_TABLE_usage.html][1]
you can use case but then you need to create case for each category
select id,
name,
sum(case when Category='Coffees' then count end) as Cofees,
sum(case when Category='Milkshakes' then count end) as Milkshakes,
sum(case when Category='Beer' then count end) as Beer,
sum(case when Category='Snacks' then count end) as Snacks
from my_table
group by 1,2
other option you have is to upload the table for example to R and then you can use cast function for example.
cast(data, name~ category)
and then upload the data back to S3 or Redshift

We do a lot of pivoting at Ro - we built python based tool for autogenerating pivot queries. This tool allows for the same basic options as what you'd find in excel, including specifying aggregation functions as well as whether you want overall aggregates.

Redshift released a Pivot/Unpivot functionality on last re:Invent 2021 (December 2021): https://docs.aws.amazon.com/redshift/latest/dg/r_FROM_clause-pivot-unpivot-examples.html
SELECT *
FROM (SELECT id, Name, Category, count FROM my_table) PIVOT (
SUM(count) FOR Category IN ('Coffees', 'Milkshakes', 'Beer', 'Snacks')
);

If you will typically want to query specific subsets of the categories from the pivot table, a workaround based on the approach linked in the comments might work.
You can populate your "pivot_table" from the original like so:
insert into pivot_table (id, Name, json_cats) (
select id, Name,
'{' || listagg(quote_ident(Category) || ':' || count, ',')
within group (order by Category) || '}' as json_cats
from to_pivot
group by id, Name
)
And access specific categories this way:
select id, Name,
nvl(json_extract_path_text(json_cats, 'Snacks')::int, 0) Snacks,
nvl(json_extract_path_text(json_cats, 'Beer')::int, 0) Beer
from pivot_table
Using varchar(max) for the JSON column type will give 65535 bytes which should be room for a couple thousand categories.

#user3600910 is right with the approach however 'END' is required else '500310' invalid operation would occur.
select id,
name,
sum(case when Category='Coffees' then count END) as Cofees,
sum(case when Category='Milkshakes' then count END) as Milkshakes,
sum(case when Category='Beer' then count END) as Beer,
sum(case when Category='Snacks' then count END) as Snacks
from my_table
group by 1,2

The answer given above worked for me after switching count to 1
select id,
name,
sum(case when Category='Coffees' then 1 end) as Cofees,
sum(case when Category='Milkshakes' then 1 end) as Milkshakes,
sum(case when Category='Beer' then 1 end) as Beer,
sum(case when Category='Snacks' then 1 end) as Snacks
from my_table
group by 1,2

Related

How can I sum a row's value in my table based on a specific type?

My table looks like this. Its a table with an inventory of clothes.
Basically, an user can enter a type of clothe and a quantity.
When he did it, it add a new value in the table with the date of the input.
The type 2 is for shoes and the type 3 for shirts
What I'm trying to do is to sum the quantity based on the type like this :
So I tried this :
SELECT name, type, sum(quantity)
from Clothes
where type="2"
group by name
But it didn't work, it sums all the type of clothes. How can I do ?
Use case expressions to do conditional aggregation:
SELECT name,
SUM(case when type = 3 then quantity else 0 end) shirts,
SUM(case when type = 2 then quantity else 0 end) shoes
from Clothes
group by name
You should group using the type too.
Doing this you'll get a table with 3 columns:
1st one with the name, secondo col with the type and the third with the quantity
SELECT name, type, sum(quantity)
from Clothes
group by name,type
Then you should format as you wish the data
If otherwise you want to get the exact result with a query you should dig more deep and maybe using some 'Case' inside the sum function and put a zero if is not of the selected type:
select name,
sum(case when type = 3 then quantity else 0 end) as Shirts,
sum(case when type = 2 then quantity else 0 end) as Shoes
from Clothes
group by name;
result:
A solution using a PIVOT table will achieve the same result with multi-column aggregation of quantities corresponding to the type column:
SELECT [ProductName], [2] As Shoes, [3] As Clothes
FROM
(SELECT [ProductName], [ProductType], [Quantity] FROM [Inventory_Table])
AS DataSource
PIVOT
(SUM([Quantity]) FOR [ProductType] IN ([2], [3])) AS pvt_table
Note: For the above to work in SQL Server T-SQL I had to replace the [Name] and [Type] columns with other columns names.

Creating row with different where

I have this code to get the number of users of all items in the list and the average level.
select itemId,count(c.characterid) as numberOfUse, avg(maxUpgrade) as averageLevel
from items i inner join characters c on i.characterId=c.characterId
where itemid in (22001,22002,22003,22004,22005,22006,22007,22008,22009,22010,22011,22012,22013,22014,22015,22016,22030,22031,22032,22033,22034,22035,22036,22037,22038,22039,22040,22041,22042,22050,22051,22052,22053,22054,22055,22056,22057,22058,22059,22060,22070,22071,22072,22073,22074,22075,22076,22077,22085,22086,22087,22091,22092)
and attached>0
group by itemId
It does is creating a row for the rune id, one for the number of users, and one for the average-level people who upgrade it, and it does that for all players of the server.
I would like to create a new column every 10 levels to have stats every 10 levels, so I can see what item is more used depending on player level. The item level depending on the level, so the way I do to select only a certain level is using WHERE itemid>0 and itemid<10, and I do that every 10 levels, copy data, and push them in a google sheet.
So I would like a result with columns :
itemid use_1-10 avg_level_1-10 use_11-20 avg_level_21-30 etc...
So I could copy all the results at once and not having to do the same process 15 times.
If I am following this correctly, you can do conditional aggregation. Assuming that a "level" is stored in column level in table characters, you would do:
select i.itemId,
sum(case when c.level between 1 and 10 then 1 else 0 end) as use_1_10,
avg(case when c.level between 1 and 10 then maxUpgrade end) as avg_level_1_10,
sum(case when c.level between 11 and 20 then 1 else 0 end) as use_11_20,
avg(case when c.level between 11 and 20 then maxUpgrade end) as avg_level_11_20,
...
from items i
inner join characters c on i.characterId = c.characterId
where i.itemid in (...) and attached > 0
group by i.itemId
Note: consider prefixing column attached in the where clause with the table it belongs to, in order to avoid ambiguity.

Case Statement for multiple criteria

I would like to ignore some of the results of my query as for all intents and purposes, some of the results are a duplicate, but based on the way the request was made, we need to use this hierarchy and although we are seeing different 'Company_Name' 's, we need to ignore one of the results.
Query:
SELECT
COUNT(DISTINCT A12.Company_name) AS Customer_Name_Count,
Company_Name,
SUM(Total_Sales) AS Total_Sales
FROM
some_table AS A12
GROUP BY
2
ORDER BY
3 ASC, 2 ASC
This code omits half a doze joins and where statements that are not germane to this question.
Results:
Customer_Name_Count Company_Name Total_Sales
-------------------------------------------------------------
1 3 Blockbuster 1,000
2 6 Jimmy's Bar 1,500
3 6 Jimmy's Restaurant 1,500
4 9 Impala Hotel 2,000
5 12 Sports Drink 2,500
In the above set, we can see that numbers 2 & 3 have the same count and the same total_sales number and similar company names. Is there a way to create a case statement that takes these 3 factors into consideration and then drops one or the other for Jimmy's enterprises? The other issue is that this has to be variable as there are other instances where this happens. And I would only want this to happen if the count and sales number match each other with a similar name in the company name.
Desired result:
Customer_Name_Count Company_Name Total_Sales
--------------------------------------------------------------
1 3 Blockbuster 1,000
2 6 Jimmy's Bar 1,500
3 9 Impala Hotel 2,000
4 12 Sports Drink 2,500
Looks like other answers are accurate based on assumption that Company_IDs are the same for both.
If Company_IDs are different for both Jimmy's Bar and Jimmy's Restaurant then you can use something like this. I suggest you get functional users involved and do some data clean-up else you'll be maintaining this every time this issue arise:
SELECT
COUNT(DISTINCT CASE
WHEN A12.Company_Name = 'Name2' THEN 'Name1'
ELSE A12.Company_Name
END) AS Customer_Name_Count
,CASE
WHEN A12.Company_Name = 'Name2' THEN 'Name1'
ELSE A12.Company_Name
END AS Company_Name
,SUM(A12.Total_Sales) AS Total_Sales
FROM some_table er
GROUP BY CASE
WHEN A12.Company_Name = 'Name2' THEN 'Name1'
ELSE A12.Company_Name
END
Your problem is that the joins you are using are multiplying the number of rows. Somewhere along the way, multiple names are associated with exactly the same entity (which is why the numbers are the same). You can fix this by aggregating by the right id:
SELECT COUNT(DISTINCT A12.Company_name) AS Customer_Name_Count,
MAX(Company_Name) as Company_Name,
SUM(Total_Sales) AS Total_Sales
FROM some_table AS A12
GROUP BY Company_id -- I'm guessing the column is something like this
ORDER BY 3 ASC, 2 ASC;
This might actually overstate the sales (I don't know). Better would be fixing the join so it only returned one name. One possibility is that it is a type-2 dimension, meaning that there is a time component for values that change over time. You may need to restrict the join to a single time period.
You need to have function to return a common name for the companies and then use DISTINCT:
SELECT DISTINCT
Customer_Name_Count,
dbo.GetCommonName(Company_Name) as Company_Name,
Total_Sales
FROM dbo.theTable
You can try to use ROW_NUMBER with window function to make row number by Customer_Name_Count and Total_Sales then get rn = 1
SELECT * FROM (
SELECT *,ROW_NUMBER() OVER(PARTITION BY Customer_Name_Count,Total_Sales ORDER BY Company_Name) rn
FROM (
SELECT
COUNT(DISTINCT A12.Company_name) AS Customer_Name_Count,
Company_Name,
SUM(Total_Sales) AS Total_Sales
FROM
some_table AS A12
GROUP BY
Company_Name
)t1
)t1
WHERE rn = 1

How to compare between 2 COUNT() SQL Server

I have a table like this:
Name Profit
==============
A 50
A -10
A 60
I want to count how many data partition by Name and then I compare it with how many data that only profit. So, from the data above I will get result like this:
Name Total Profit Percentage
===============================
A 3 2 66.7
Please help me to solve this problem. Thanks in advance.
I think a simple GROUP BY query should work:
SELECT
Name,
COUNT(*) AS Total,
SUM(CASE WHEN Profit > 0 THEN 1 ELSE 0 END) AS Profit,
100.0 * SUM(CASE WHEN Profit > 0 THEN 1 ELSE 0 END) / COUNT(*) AS Percentage
FROM yourTable
GROUP BY
Name;
The only part of the query which might not be self-explanatory is the summation of the CASE expression. This sum tallies, for each group of records having the same name, the number of times that Profit has a non zero value. This technique is called conditional aggregation, and we also reuse this sum when calculating the percentage.
A somewhat enhanced version of Tim's answer (i.e. eliminate the calculation repetition):
SELECT Name, Total, Profit, 100 * Profit / Total AS Percentage
FROM (SELECT Name,
COUNT(*) AS Total,
SUM(CASE WHEN Profit > 0 THEN 1 ELSE 0 END) AS Profit
FROM yourTable
GROUP BY Name) q;
The engine may optimize the repletion, but it is mainly for readability and maintainability (in case the calculation needs to change). In this case, there is not much gain because it is only one repletion, but in cases where thee are several repetitions, this solution becomes more useful.
In case the maintainability point is not clear, let's say the definition of profitability has changed in the company and now we consider > 10 to be profitable. In Tim's query, you'll have to change every calculation from > 0 to > 10. In the query above, you'll only have to change it in one place.
Try this
declare #t table
(
Name varchar(10),
Profit int
)
insert into #t
values('A',50),('A',60),('A',-10)
SELECT
Name,
Profit = SUM(CASE WHEN Profit>0 THEN 1 ELSE 0 END),
Total = COUNT(1),
Average = (SUM(CASE WHEN Profit>0 THEN 1.0 ELSE 0.0 END)/COUNT(1))*100
FROM #t
GROUP BY Name

Counting values in columns

What I am looking for is to group by and count the total of different data in the same table and have them show in two different columns. Like below.
Data in table A
Fields:
Name Type
Bob 1
John 2
Bob 1
Steve 1
John 1
Bob 2
Desired result from query:
Name Type 1 Type 2
Bob 2 1
John 1 1
Steve 1 0
This will do the trick in SQL Server:
SELECT
name,
SUM( CASE type WHEN 1 THEN 1 ELSE 0 END) AS type1,
SUM( CASE type WHEN 2 THEN 1 ELSE 0 END) AS type2
FROM
myTable
GROUP BY
name
No time to write the code, but the Case statement is what you want here. SImply havea value of 1 if it meets the case and zero if it deosn't. Then you can sum the columns.
Use two separate GROUP BY subqueries.
SELECT Name, a.Count1, b.Count2
from myTable
JOIN
(SELECT Name, SUM(Type) AS Count1 FROM myTable GROUP BY Name WHERE Type=1) AS a ON a.Name = myTable.Name
(SELECT Name, SUM(Type) FROM myTable GROUP BY Name WHERE Type=2) AS b ON b.Name = myTable.Name
You're looking for a CrossTab solution. The above solutions will work, but you'll come unstuck if you want a general solution and have N types.
A CrossTab solution will solve this for you. If this is for quickly crunching some numbers then dump your data into Excel and use the native Pivot Table feature.
If it's for a RDBMS in an app, then it depends upon the RDBMS. MS SQL 2005 and above has a crosstab syntax. See:
http://www.databasejournal.com/features/mssql/article.php/3521101/Cross-Tab-reports-in-SQL-Server-2005.htm
#Seb has a good solution, but it's server-dependent. Here's an alternate using subselects that should be portable:
select
name,
(select count(type) from myTable where type=1 and name=a.name) as type1,
(select count(type) from myTable where type=2 and name=a.name) as type2
from
myTable as a
group by
name