Transpose Values of Dynamically Created Pivoted Table to Rows - sql

In PostgreSQL I have a table like this:
CREATE TABLE cross_table (brand varchar(10), gender varchar(10), sales int);
INSERT INTO cross_table (brand, gender, sales) VALUES ('Nike', 'Male', 10);
INSERT INTO cross_table (brand, gender, sales) VALUES ('Nike', 'Male', 20);
INSERT INTO cross_table (brand, gender, sales) VALUES ('Adidas', 'Woman', 20);
INSERT INTO cross_table (brand, gender, sales) VALUES ('Nike', 'Male', 10);
INSERT INTO cross_table (brand, gender, sales) VALUES ('Adidas', 'Woman', 30);
INSERT INTO cross_table (brand, gender, sales) VALUES ('Puma', 'Woman', 40);
INSERT INTO cross_table (brand, gender, sales) VALUES ('Puma', 'Male', 10);
INSERT INTO cross_table (brand, gender, sales) VALUES ('Nike', 'Male', 20);
INSERT INTO cross_table (brand, gender, sales) VALUES ('Puma', 'Woman', 10);
INSERT INTO cross_table (brand, gender, sales) VALUES ('Adidas', 'Woman', 20);
Then I can run this query to get brand as rows, gender as columns and "Sum of Sales" and "Count of Sales" as values:
SELECT brand,
jsonb_object_agg(
gender,
json_build_object(
'Sum of Sales', "Sum of Sales",
'Count of Sales', "Count of Sales"
)
) AS gender,
SUM("Sum of Sales") AS "Sum of Sales",
SUM("Count of Sales") AS "Count of Sales"
FROM (
SELECT brand,
gender,
sum(sales) AS "Sum of Sales",
count(sales) AS "Count of Sales"
FROM cross_table
GROUP BY ROLLUP(brand), gender
) AS a
GROUP BY brand
This would produce the following result:
brand
gender
Sum of Sales
Count of Sales
NULL
{"Male": {"Sum of Sales": 70,"Count of Sales": 5},"Woman": {"Sum of Sales": 120, "Count of Sales": 5}}
190
10
adidas
{"Woman": {"Sum of Sales": 70,"Count of Sales": 3}}
70
3
nike
{"Male": {"Sum of Sales": 60, "Count of Sales": 4}}
60
4
puma
{"Male": {"Sum of Sales": 10,"Count of Sales": 1 }, "Woman": {"Sum of Sales": 50,"Count of Sales": 2}}
60
3
But I would like to have the values as rows like this:
brand
values
Male
Woman
Total
null
Sum of Sales
70
120
190
Count of Sales
5
5
10
adidas
Sum of Sales
70
70
Count of Sales
3
3
nike
Sum of Sales
60
60
Count of Sales
4
4
puma
Sum of Sales
10
50
60
Count of Sales
1
2
3
I've looked at two approaches:
1. Putting the values into an object like this:
json_build_object(
'Sum of Sales', SUM(sales),
'Count of Sales', COUNT(sales)
) as "Values"
And then in some way "expanding" them as rows under "Values" column.
2. Using unnest in some way but unsure about that approach.
Important requirement:
The values in brand and gender are unknown so the values from brand (nike, etc) and gender (male, etc) are generated dynamically.
Does anyone know how to do this?
For context; this is a continuation of this question:
Get Row Totals for Dynamically Created Pivoted Table

1. Splitting the resulting rows in 2 subrows in not a big deal :
SELECT brand,
label,
jsonb_object_agg(
gender,
json_build_object(label, total)
) AS gender,
SUM(total) AS "Total"
FROM (
SELECT brand,
gender,
sum(sales) :: integer AS total,
'Sum of Sales' AS label
FROM cross_table
GROUP BY ROLLUP(brand), gender
UNION ALL
SELECT brand,
gender,
count(sales) :: integer,
'Count of Sales'
FROM cross_table
GROUP BY ROLLUP(brand), gender
) AS a
GROUP BY brand, label
ORDER BY brand, label ;
2. Splitting the resulting columns in 2 or more subcolumns may be more touchy :
2.A Static list of gender values
If gender has only two values "male" and "woman" then :
CREATE TYPE gender AS (male integer, woman integer) ;
SELECT brand,
label,
(jsonb_populate_record(null :: gender, jsonb_object_agg(lower(gender),total))).*,
SUM(total) AS "Total"
FROM (
SELECT brand,
gender,
sum(sales) :: integer AS total,
'Sum of Sales' AS label
FROM cross_table
GROUP BY ROLLUP(brand), gender
UNION ALL
SELECT brand,
gender,
count(sales) :: integer,
'Count of Sales'
FROM cross_table
GROUP BY ROLLUP(brand), gender
) AS a
GROUP BY brand, label
ORDER BY brand, label ;
2.B Dynamic list of gender values
If the gender list of values may vary in time, then :
We need to dynamically create and update the composite type gender from a trigger on table cross_table :
CREATE OR REPLACE FUNCTION trigger_cross_table ()
RETURNS trigger LANGUAGE plpgsql AS $$
DECLARE
_columns text ;
BEGIN
SELECT string_agg(DISTINCT gender || ' integer', ',')
INTO _columns
FROM cross_table ;
DROP TYPE IF EXISTS gender ;
EXECUTE 'CREATE TYPE gender (' || _columns || '(' ;
RETURN NULL ;
END ; $$ ;
CREATE OR REPLACE TRIGGER trigger_cross_table AFTER INSERT OR UPDATE OF gender OR DELETE
ON cross_table FOR EACH STATEMENT EXECUTE FUNCTION trigger_cross_table() ;
The final query is the same :
SELECT brand,
label,
(jsonb_populate_record(null :: gender, jsonb_object_agg(lower(gender),total))).*,
SUM(total) AS "Total"
FROM (
SELECT brand,
gender,
sum(sales) :: integer AS total,
'Sum of Sales' AS label
FROM cross_table
GROUP BY ROLLUP(brand), gender
UNION ALL
SELECT brand,
gender,
count(sales) :: integer,
'Count of Sales'
FROM cross_table
GROUP BY ROLLUP(brand), gender
) AS a
GROUP BY brand, label
ORDER BY brand, label ;
see dbfiddle

Related

How to show only top 2 and top 1 values in a grouping sets query?

I have this PostgreSQL table and data:
CREATE TABLE info (
brand VARCHAR(255),
segment VARCHAR(255),
name VARCHAR(255)
);
INSERT INTO info (brand, segment, name) VALUES ('Toyota', 'SUV', 'Highlander');
INSERT INTO info (brand, segment, name) VALUES ('Toyota', 'SUV', 'Highlander');
INSERT INTO info (brand, segment, name) VALUES ('Toyota', 'SUV', 'Highlander');
INSERT INTO info (brand, segment, name) VALUES ('Toyota', 'SUV', '4Runner');
INSERT INTO info (brand, segment, name) VALUES ('Toyota', 'SUV', 'RAV4');
INSERT INTO info (brand, segment, name) VALUES ('Toyota', 'SUV', 'RAV4');
INSERT INTO info (brand, segment, name) VALUES ('Toyota', 'Sedan', 'Camry');
INSERT INTO info (brand, segment, name) VALUES ('Toyota', 'Sedan', 'Camry');
INSERT INTO info (brand, segment, name) VALUES ('Toyota', 'Sedan', 'Corolla');
INSERT INTO info (brand, segment, name) VALUES ('Toyota', 'Sedan', 'Corolla');
INSERT INTO info (brand, segment, name) VALUES ('Toyota', 'Sedan', 'Corolla');
INSERT INTO info (brand, segment, name) VALUES ('Toyota', 'Truck', 'Tacoma');
INSERT INTO info (brand, segment, name) VALUES ('Toyota', 'Truck', 'Tundra');
INSERT INTO info (brand, segment, name) VALUES ('Toyota', 'Truck', 'Tacoma');
INSERT INTO info (brand, segment, name) VALUES ('Toyota', 'Van', 'Sienna');
I have made this query to show the count for each grouping set and order it by the total count for each brand, segment, and name:
SELECT
brand,
segment,
name,
count (1) as total
FROM
info
GROUP BY
GROUPING SETS (
(brand),
(brand, segment),
(brand, segment,name)
)
ORDER BY
max(count (1)) over (partition by brand) desc,
max(count (1)) over (partition by brand,segment) desc,
count (1) desc;
This fiddle shows how it looks like.
Now I want to select only the top 2 segments per brand, and top 1 name per brand/segment.
So the result should look like this:
brand
segment
name
total
Toyota
15
Toyota
SUV
6
Toyota
SUV
Highlander
3
Toyota
Sedan
5
Toyota
Sedan
Corolla
3
I have tried using window functions but the result is not what I expected
Try using the ROW_NUMBER function as the following:
WITH get_grouping_set AS
(
SELECT brand, segment, name, count(1) AS total
FROM info
GROUP BY GROUPING SETS
(
(brand),
(brand, segment),
(brand, segment, name)
)
),
brand_segment_order AS
(
SELECT brand, segment,
ROW_NUMBER() OVER (PARTITION BY brand ORDER BY total DESC) rn_seg
FROM get_grouping_set
WHERE segment IS NOT NULL AND name IS NULL
),
joined_data AS
(
SELECT T.*,
ROW_NUMBER() OVER (PARTITION BY T.brand, T.segment ORDER BY T.total DESC) rn
FROM get_grouping_set T JOIN brand_segment_order T2
ON T.brand = T2.brand AND T.segment = T2.segment OR T.segment IS NULL
WHERE T2.rn_seg <= 2
)
SELECT brand, segment, name, total
FROM joined_data
WHERE (rn = 1 AND segment IS NULL ) OR (rn <= 2 AND segment IS NOT NULL)
ORDER BY brand, MAX(Total) OVER (PARTITION BY brand, segment) DESC,
Total DESC, segment NULLS FIRST, name NULLS FIRST
See demo
Another solution.
You can use the dense_rank function ordered by the max(count) for each brand-segment group as the following:
WITH get_grouping_set AS
(
SELECT brand, segment, name, count(1) AS total,
MAX(count(*)) over (PARTITION BY brand, segment) max_brand_segment
FROM info
GROUP BY GROUPING SETS
(
(brand),
(brand, segment),
(brand, segment, name)
)
),
brand_segment_order AS
(
SELECT *,
DENSE_RANK() OVER (PARTITION BY brand ORDER BY max_brand_segment DESC) segment_rank,
DENSE_RANK() OVER (PARTITION BY brand, segment ORDER BY Total DESC) name_rank
FROM get_grouping_set
)
SELECT brand, segment, name, total
FROM brand_segment_order
WHERE segment_rank <= 3 AND name_rank <= 2
ORDER BY brand, max_brand_segment DESC,
Total DESC, segment NULLS FIRST, name NULLS FIRST
WHERE segment_rank <= 3 this will retrieve two segments per brand, the plus one because the base brand (where the segment is null) is included.
AND name_rank <= 2 this will retrieve one name per segment, the plus one because the base segment (where the name is null) is included.
The use of the dense_rank function is to get all of (segments, names) in case of ties, i.e. when there are multiple segments/ names with the same max(count).

Get Row Totals for Dynamically Created Pivoted Table

In PostgreSQL I have a table like this:
CREATE TABLE cross_table (brand varchar(10), gender varchar(10), sales int);
INSERT INTO cross_table (brand, gender, sales) VALUES ('Nike', 'Male', 10);
INSERT INTO cross_table (brand, gender, sales) VALUES ('Nike', 'Male', 20);
INSERT INTO cross_table (brand, gender, sales) VALUES ('Adidas', 'Woman', 20);
INSERT INTO cross_table (brand, gender, sales) VALUES ('Nike', 'Male', 10);
INSERT INTO cross_table (brand, gender, sales) VALUES ('Adidas', 'Woman', 30);
INSERT INTO cross_table (brand, gender, sales) VALUES ('Puma', 'Woman', 40);
INSERT INTO cross_table (brand, gender, sales) VALUES ('Puma', 'Male', 10);
INSERT INTO cross_table (brand, gender, sales) VALUES ('Nike', 'Male', 20);
INSERT INTO cross_table (brand, gender, sales) VALUES ('Puma', 'Woman', 10);
INSERT INTO cross_table (brand, gender, sales) VALUES ('Adidas', 'Woman', 20);
And then I run this query to get brand as rows, gender as columns ans sales as value:
with main_query as (
SELECT brand,
GROUPING(brand) AS "brand_grouping",
gender,
GROUPING(gender) AS "gender_grouping",
sum(sales) AS "sales"
FROM cross_table
GROUP BY ROLLUP (brand, gender)
),
second_query AS (
SELECT brand,
brand_grouping,
cast(
json_object_agg(
gender,
sales
ORDER BY gender DESC
) FILTER (WHERE gender_grouping = 0) AS jsonb) "gender",
SUM(sales) AS "sales"
FROM main_query
GROUP BY (brand, brand_grouping)
)
SELECT brand,
gender,
sales
FROM second_query
ORDER BY brand_grouping, brand
This would produce the following result:
brand
gender
sales
adidas
{ "Woman": 70 }
140
nike
"Male": 60
120
puma
"Male": 10, "Woman": 50
120
NULL
NULL
190
Please note: gender column is now in object but brackets won't show in the table view here on Stackoverflow.
This is fine, only problem is that it is missing row totals for the pivoted "gender" column. I can solve this hardcoded by changing last query to this:
SELECT brand,
CASE WHEN gender IS null THEN
jsonb_build_object(
'Woman', SUM(("gender"->>'Woman')::float8) OVER (),
'Male', SUM(("gender"->>'Male')::float8) OVER ()
) ELSE "gender" END AS "gender",
sales
FROM second_query
ORDER BY brand_grouping, brand
Getting this result:
brand
gender
sales
adidas
{"Woman": 70}
140
nike
"Male": 60
120
puma
"Male": 10, "Woman": 50
120
NULL
"Male": 70, "Woman": 120
190
Which is correct but I need to do this dynamically without knowing the keys (Male/Woman) of "gender".
Does anyone know how to do this?
Try this :
SELECT brand
, jsonb_object_agg(gender, sales) AS gender
, sum(sales) AS sales
FROM (
SELECT brand
, gender
, sum(sales) AS sales
FROM cross_table
GROUP BY ROLLUP(brand), gender
) AS a
GROUP BY brand
Result :
brand
gender
sales
null
{"Male": 70, "Woman": 120}
190
Adidas
{"Woman": 70}
70
Nike
{"Male": 60}
60
Puma
{"Male": 10, "Woman": 50}
60
see dbfiddle

Pivoting a table with SQL

I have a table with position (junior, senior), salary, and an ID. I have done the following to find the highest salary for each position.
SELECT position, MAX(salary) FROM candidates GROUP BY position;
What I am getting:
How I want it:
I want to transpose the outcome so that 'junior' and 'senior' are the columns without using crosstab. I have looked at many pivot examples but they are done on examples much more complex than mine.
I am not proficient in PostgreSQL, but I believe there is a practical workaround solution since this is a simple table:
SELECT
max(case when position = 'senior' then salary else null end) senior,
max(case when position = 'junior' then salary else null end) junior
FROM payments
It worked with this example:
create table payments (id integer, position varchar(100), salary int);
insert into payments (id, position, salary) values (1, 'junior', 1000);
insert into payments (id, position, salary) values (1, 'junior', 2000);
insert into payments (id, position, salary) values (1, 'junior', 5000);
insert into payments (id, position, salary) values (1, 'junior', 3000);
insert into payments (id, position, salary) values (2, 'senior', 3000);
insert into payments (id, position, salary) values (2, 'senior', 8000);
insert into payments (id, position, salary) values (2, 'senior', 9000);
insert into payments (id, position, salary) values (2, 'senior', 7000);
insert into payments (id, position, salary) values (2, 'senior', 4000);
select
max(case when position = 'junior' then salary else 0 end) junior,
max(case when position = 'senior' then salary else 0 end) senior
from payments;
Here is my attempt at teaching myself crosstab:
CREATE EXTENSION IF NOT EXISTS tablefunc;
select Junior
, Senior
from
(
select *
from crosstab
(
'select 1, position, max(salary)
from candidates
group by position
'
, $$VALUES('Junior'), ('Senior')$$
)
as ct(row_number integer, Junior integer, Senior integer) --I don't know your actual data types, so you will need to update this as needed
) q
Edit: Below is no longer relevant as this appears to be PostgreSQL
Based on your description, it sounds like you probably want a pivot like this:
select q.*
from
(
select position
, salary
from candidates
) q
pivot (
max(salary) for position in ([Junior], [Senior])
) p
This example was made in SQL Server since we don't know DBMS.
It depends on which SQL dialect you are running. It also depends on the complexity of your table. In SQL Server, I believe you can use the solutions provided in this question for relatively simple tables: Efficiently convert rows to columns in sql server

How to replace the NULL value of the last row of a column with 'Grand total' while retaining 'Total' replacing NULL value in the same column?

Below is the table created and inserted values in it:
CREATE TABLE Employees
(
Id INTEGER IDENTITY(1,1),
Name VARCHAR(50),
Gender VARCHAR(50),
Salary INTEGER,
Country VARCHAR(50)
)
INSERT INTO Employees (Name, Gender, Salary, Country)
VALUES ('Mark', 'Male', 5000, 'USA')
INSERT INTO Employees (Name, Gender, Salary, Country)
VALUES ('John', 'Male', 4500, 'India')
INSERT INTO Employees (Name, Gender, Salary, Country)
VALUES ('Pam', 'Female', 5500, 'USA')
INSERT INTO Employees (Name, Gender, Salary, Country)
VALUES ('Sara', 'Female', 4000, 'India')
INSERT INTO Employees (Name, Gender, Salary, Country)
VALUES ('Todd', 'Male', 3500, 'India')
INSERT INTO Employees (Name, Gender, Salary, Country)
VALUES ('Mary', 'Female', 5000, 'UK')
INSERT INTO Employees (Name, Gender, Salary, Country)
VALUES ('Ben', 'Male', 6500, 'UK')
INSERT INTO Employees (Name, Gender, Salary, Country)
VALUES ('Elizabeth', 'Female', 7000, 'USA')
INSERT INTO Employees (Name, Gender, Salary, Country)
VALUES ('Tom', 'Male', 5500, 'UK')
INSERT INTO Employees (Name, Gender, Salary, Country)
VALUES ('Ron', 'Male', 5000, 'USA')
SELECT * FROM Employees
Now I ran the following query:
SELECT
COALESCE(Country, '') AS [Country],
COALESCE(Gender, 'Total') AS [Gender],
SUM(Salary) AS [Total Salary]
FROM
Employees
GROUP BY
ROLLUP(Country, Gender)
When you look at the query result, the last row of the Gender column has the value 'Total' in it.
I want to replace 'Total' with 'Grand Total' only in the last row of Gender column while keeping 'Total' text in the other rows of Gender column.
Is there any possibility to achieve that ?
If so, then what is the simplest possible way to achieve it ?
You can use GROUPING_ID() for it:
SELECT
COALESCE(Country,'') AS [Country],
CASE WHEN GROUPING_ID(Country)=1 THEN 'Grand Total' ELSE COALESCE(Gender,'Total') END as [Gender],
SUM(Salary) AS [Total Salary]
FROM Employees
GROUP BY ROLLUP(Country,Gender)
DBFIDDLE
EDIT: In the comment of the question is noted that the order of the result should be specified, to make sure it is correct.
This query can be ordered like this, to make sure totals are below the details.
SELECT
COALESCE(Country,'') AS [Country],
CASE WHEN GROUPING_ID(Country)=1 THEN 'Grand Total' ELSE COALESCE(Gender,'Total') END as [Gender],
SUM(Salary) AS [Total Salary],
GROUPING_ID(Country),
GROUPING_ID(Gender)
FROM Employees
GROUP BY ROLLUP(Country,Gender)
ORDER BY COALESCE(Country,'ZZZ'),GROUPING_ID(Country),
Gender,GROUPING_ID(Gender)
One other easy way would be to just to concatenate the country name using isnull which is preferable in Sql server with just two values, such as:
select
isnull(Country,'') Country,
isnull(Gender, Concat(IsNull(Country, 'Grand'), ' Total')) Gender,
Sum(Salary) [Total Salary]
from Employees
group by rollup(Country,Gender);

Counting the instances of customers

Say that I have a table with one column named CustomerId.
The example of the instance of this table is :
CustomerId
14
12
11
204
14
204
I want to write a query that counts the number of occurences of customer IDs.
At the end, I would like to have a result like this :
CustomerId NumberOfOccurences
14 2
12 1
11 1
204 2
14 1
I cannot think of a way to do this.
This is the most basic example of GROUP BY
SELECT CustomerId, count(*) as NumberOfOccurences
FROM tablex GROUP BY CustomerId;
Practice exercise #3 on this page explains how to do this.
CREATE TABLE customers
( customer_id number(10) not null,
customer_name varchar2(50) not null,
city varchar2(50),
CONSTRAINT customers_pk PRIMARY KEY (customer_id)
);
INSERT INTO customers (customer_id, customer_name, city)
VALUES (7001, 'Microsoft', 'New York');
INSERT INTO customers (customer_id, customer_name, city)
VALUES (7002, 'IBM', 'Chicago');
INSERT INTO customers (customer_id, customer_name, city)
VALUES (7003, 'Red Hat', 'Detroit');
INSERT INTO customers (customer_id, customer_name, city)
VALUES (7004, 'Red Hat', 'New York');
INSERT INTO customers (customer_id, customer_name, city)
VALUES (7005, 'Red Hat', 'San Francisco');
INSERT INTO customers (customer_id, customer_name, city)
VALUES (7006, 'NVIDIA', 'New York');
INSERT INTO customers (customer_id, customer_name, city)
VALUES (7007, 'NVIDIA', 'LA');
INSERT INTO customers (customer_id, customer_name, city)
VALUES (7008, 'NVIDIA', 'LA');
Solution:
The following SQL statement would return the number of distinct cities for each customer_name in the customers table:
SELECT customer_name, COUNT(DISTINCT city) as "Distinct Cities"
FROM customers
GROUP BY customer_name;
It would return the following result set:
CUSTOMER_NAME Distinct Cities
IBM 1
Microsoft 1
NVIDIA 2
Red Hat 3