Get Row Totals for Dynamically Created Pivoted Table - sql

In PostgreSQL I have a table like this:
CREATE TABLE cross_table (brand varchar(10), gender varchar(10), sales int);
INSERT INTO cross_table (brand, gender, sales) VALUES ('Nike', 'Male', 10);
INSERT INTO cross_table (brand, gender, sales) VALUES ('Nike', 'Male', 20);
INSERT INTO cross_table (brand, gender, sales) VALUES ('Adidas', 'Woman', 20);
INSERT INTO cross_table (brand, gender, sales) VALUES ('Nike', 'Male', 10);
INSERT INTO cross_table (brand, gender, sales) VALUES ('Adidas', 'Woman', 30);
INSERT INTO cross_table (brand, gender, sales) VALUES ('Puma', 'Woman', 40);
INSERT INTO cross_table (brand, gender, sales) VALUES ('Puma', 'Male', 10);
INSERT INTO cross_table (brand, gender, sales) VALUES ('Nike', 'Male', 20);
INSERT INTO cross_table (brand, gender, sales) VALUES ('Puma', 'Woman', 10);
INSERT INTO cross_table (brand, gender, sales) VALUES ('Adidas', 'Woman', 20);
And then I run this query to get brand as rows, gender as columns ans sales as value:
with main_query as (
SELECT brand,
GROUPING(brand) AS "brand_grouping",
gender,
GROUPING(gender) AS "gender_grouping",
sum(sales) AS "sales"
FROM cross_table
GROUP BY ROLLUP (brand, gender)
),
second_query AS (
SELECT brand,
brand_grouping,
cast(
json_object_agg(
gender,
sales
ORDER BY gender DESC
) FILTER (WHERE gender_grouping = 0) AS jsonb) "gender",
SUM(sales) AS "sales"
FROM main_query
GROUP BY (brand, brand_grouping)
)
SELECT brand,
gender,
sales
FROM second_query
ORDER BY brand_grouping, brand
This would produce the following result:
brand
gender
sales
adidas
{ "Woman": 70 }
140
nike
"Male": 60
120
puma
"Male": 10, "Woman": 50
120
NULL
NULL
190
Please note: gender column is now in object but brackets won't show in the table view here on Stackoverflow.
This is fine, only problem is that it is missing row totals for the pivoted "gender" column. I can solve this hardcoded by changing last query to this:
SELECT brand,
CASE WHEN gender IS null THEN
jsonb_build_object(
'Woman', SUM(("gender"->>'Woman')::float8) OVER (),
'Male', SUM(("gender"->>'Male')::float8) OVER ()
) ELSE "gender" END AS "gender",
sales
FROM second_query
ORDER BY brand_grouping, brand
Getting this result:
brand
gender
sales
adidas
{"Woman": 70}
140
nike
"Male": 60
120
puma
"Male": 10, "Woman": 50
120
NULL
"Male": 70, "Woman": 120
190
Which is correct but I need to do this dynamically without knowing the keys (Male/Woman) of "gender".
Does anyone know how to do this?

Try this :
SELECT brand
, jsonb_object_agg(gender, sales) AS gender
, sum(sales) AS sales
FROM (
SELECT brand
, gender
, sum(sales) AS sales
FROM cross_table
GROUP BY ROLLUP(brand), gender
) AS a
GROUP BY brand
Result :
brand
gender
sales
null
{"Male": 70, "Woman": 120}
190
Adidas
{"Woman": 70}
70
Nike
{"Male": 60}
60
Puma
{"Male": 10, "Woman": 50}
60
see dbfiddle

Related

Transpose Values of Dynamically Created Pivoted Table to Rows

In PostgreSQL I have a table like this:
CREATE TABLE cross_table (brand varchar(10), gender varchar(10), sales int);
INSERT INTO cross_table (brand, gender, sales) VALUES ('Nike', 'Male', 10);
INSERT INTO cross_table (brand, gender, sales) VALUES ('Nike', 'Male', 20);
INSERT INTO cross_table (brand, gender, sales) VALUES ('Adidas', 'Woman', 20);
INSERT INTO cross_table (brand, gender, sales) VALUES ('Nike', 'Male', 10);
INSERT INTO cross_table (brand, gender, sales) VALUES ('Adidas', 'Woman', 30);
INSERT INTO cross_table (brand, gender, sales) VALUES ('Puma', 'Woman', 40);
INSERT INTO cross_table (brand, gender, sales) VALUES ('Puma', 'Male', 10);
INSERT INTO cross_table (brand, gender, sales) VALUES ('Nike', 'Male', 20);
INSERT INTO cross_table (brand, gender, sales) VALUES ('Puma', 'Woman', 10);
INSERT INTO cross_table (brand, gender, sales) VALUES ('Adidas', 'Woman', 20);
Then I can run this query to get brand as rows, gender as columns and "Sum of Sales" and "Count of Sales" as values:
SELECT brand,
jsonb_object_agg(
gender,
json_build_object(
'Sum of Sales', "Sum of Sales",
'Count of Sales', "Count of Sales"
)
) AS gender,
SUM("Sum of Sales") AS "Sum of Sales",
SUM("Count of Sales") AS "Count of Sales"
FROM (
SELECT brand,
gender,
sum(sales) AS "Sum of Sales",
count(sales) AS "Count of Sales"
FROM cross_table
GROUP BY ROLLUP(brand), gender
) AS a
GROUP BY brand
This would produce the following result:
brand
gender
Sum of Sales
Count of Sales
NULL
{"Male": {"Sum of Sales": 70,"Count of Sales": 5},"Woman": {"Sum of Sales": 120, "Count of Sales": 5}}
190
10
adidas
{"Woman": {"Sum of Sales": 70,"Count of Sales": 3}}
70
3
nike
{"Male": {"Sum of Sales": 60, "Count of Sales": 4}}
60
4
puma
{"Male": {"Sum of Sales": 10,"Count of Sales": 1 }, "Woman": {"Sum of Sales": 50,"Count of Sales": 2}}
60
3
But I would like to have the values as rows like this:
brand
values
Male
Woman
Total
null
Sum of Sales
70
120
190
Count of Sales
5
5
10
adidas
Sum of Sales
70
70
Count of Sales
3
3
nike
Sum of Sales
60
60
Count of Sales
4
4
puma
Sum of Sales
10
50
60
Count of Sales
1
2
3
I've looked at two approaches:
1. Putting the values into an object like this:
json_build_object(
'Sum of Sales', SUM(sales),
'Count of Sales', COUNT(sales)
) as "Values"
And then in some way "expanding" them as rows under "Values" column.
2. Using unnest in some way but unsure about that approach.
Important requirement:
The values in brand and gender are unknown so the values from brand (nike, etc) and gender (male, etc) are generated dynamically.
Does anyone know how to do this?
For context; this is a continuation of this question:
Get Row Totals for Dynamically Created Pivoted Table
1. Splitting the resulting rows in 2 subrows in not a big deal :
SELECT brand,
label,
jsonb_object_agg(
gender,
json_build_object(label, total)
) AS gender,
SUM(total) AS "Total"
FROM (
SELECT brand,
gender,
sum(sales) :: integer AS total,
'Sum of Sales' AS label
FROM cross_table
GROUP BY ROLLUP(brand), gender
UNION ALL
SELECT brand,
gender,
count(sales) :: integer,
'Count of Sales'
FROM cross_table
GROUP BY ROLLUP(brand), gender
) AS a
GROUP BY brand, label
ORDER BY brand, label ;
2. Splitting the resulting columns in 2 or more subcolumns may be more touchy :
2.A Static list of gender values
If gender has only two values "male" and "woman" then :
CREATE TYPE gender AS (male integer, woman integer) ;
SELECT brand,
label,
(jsonb_populate_record(null :: gender, jsonb_object_agg(lower(gender),total))).*,
SUM(total) AS "Total"
FROM (
SELECT brand,
gender,
sum(sales) :: integer AS total,
'Sum of Sales' AS label
FROM cross_table
GROUP BY ROLLUP(brand), gender
UNION ALL
SELECT brand,
gender,
count(sales) :: integer,
'Count of Sales'
FROM cross_table
GROUP BY ROLLUP(brand), gender
) AS a
GROUP BY brand, label
ORDER BY brand, label ;
2.B Dynamic list of gender values
If the gender list of values may vary in time, then :
We need to dynamically create and update the composite type gender from a trigger on table cross_table :
CREATE OR REPLACE FUNCTION trigger_cross_table ()
RETURNS trigger LANGUAGE plpgsql AS $$
DECLARE
_columns text ;
BEGIN
SELECT string_agg(DISTINCT gender || ' integer', ',')
INTO _columns
FROM cross_table ;
DROP TYPE IF EXISTS gender ;
EXECUTE 'CREATE TYPE gender (' || _columns || '(' ;
RETURN NULL ;
END ; $$ ;
CREATE OR REPLACE TRIGGER trigger_cross_table AFTER INSERT OR UPDATE OF gender OR DELETE
ON cross_table FOR EACH STATEMENT EXECUTE FUNCTION trigger_cross_table() ;
The final query is the same :
SELECT brand,
label,
(jsonb_populate_record(null :: gender, jsonb_object_agg(lower(gender),total))).*,
SUM(total) AS "Total"
FROM (
SELECT brand,
gender,
sum(sales) :: integer AS total,
'Sum of Sales' AS label
FROM cross_table
GROUP BY ROLLUP(brand), gender
UNION ALL
SELECT brand,
gender,
count(sales) :: integer,
'Count of Sales'
FROM cross_table
GROUP BY ROLLUP(brand), gender
) AS a
GROUP BY brand, label
ORDER BY brand, label ;
see dbfiddle

How to show only top 2 and top 1 values in a grouping sets query?

I have this PostgreSQL table and data:
CREATE TABLE info (
brand VARCHAR(255),
segment VARCHAR(255),
name VARCHAR(255)
);
INSERT INTO info (brand, segment, name) VALUES ('Toyota', 'SUV', 'Highlander');
INSERT INTO info (brand, segment, name) VALUES ('Toyota', 'SUV', 'Highlander');
INSERT INTO info (brand, segment, name) VALUES ('Toyota', 'SUV', 'Highlander');
INSERT INTO info (brand, segment, name) VALUES ('Toyota', 'SUV', '4Runner');
INSERT INTO info (brand, segment, name) VALUES ('Toyota', 'SUV', 'RAV4');
INSERT INTO info (brand, segment, name) VALUES ('Toyota', 'SUV', 'RAV4');
INSERT INTO info (brand, segment, name) VALUES ('Toyota', 'Sedan', 'Camry');
INSERT INTO info (brand, segment, name) VALUES ('Toyota', 'Sedan', 'Camry');
INSERT INTO info (brand, segment, name) VALUES ('Toyota', 'Sedan', 'Corolla');
INSERT INTO info (brand, segment, name) VALUES ('Toyota', 'Sedan', 'Corolla');
INSERT INTO info (brand, segment, name) VALUES ('Toyota', 'Sedan', 'Corolla');
INSERT INTO info (brand, segment, name) VALUES ('Toyota', 'Truck', 'Tacoma');
INSERT INTO info (brand, segment, name) VALUES ('Toyota', 'Truck', 'Tundra');
INSERT INTO info (brand, segment, name) VALUES ('Toyota', 'Truck', 'Tacoma');
INSERT INTO info (brand, segment, name) VALUES ('Toyota', 'Van', 'Sienna');
I have made this query to show the count for each grouping set and order it by the total count for each brand, segment, and name:
SELECT
brand,
segment,
name,
count (1) as total
FROM
info
GROUP BY
GROUPING SETS (
(brand),
(brand, segment),
(brand, segment,name)
)
ORDER BY
max(count (1)) over (partition by brand) desc,
max(count (1)) over (partition by brand,segment) desc,
count (1) desc;
This fiddle shows how it looks like.
Now I want to select only the top 2 segments per brand, and top 1 name per brand/segment.
So the result should look like this:
brand
segment
name
total
Toyota
15
Toyota
SUV
6
Toyota
SUV
Highlander
3
Toyota
Sedan
5
Toyota
Sedan
Corolla
3
I have tried using window functions but the result is not what I expected
Try using the ROW_NUMBER function as the following:
WITH get_grouping_set AS
(
SELECT brand, segment, name, count(1) AS total
FROM info
GROUP BY GROUPING SETS
(
(brand),
(brand, segment),
(brand, segment, name)
)
),
brand_segment_order AS
(
SELECT brand, segment,
ROW_NUMBER() OVER (PARTITION BY brand ORDER BY total DESC) rn_seg
FROM get_grouping_set
WHERE segment IS NOT NULL AND name IS NULL
),
joined_data AS
(
SELECT T.*,
ROW_NUMBER() OVER (PARTITION BY T.brand, T.segment ORDER BY T.total DESC) rn
FROM get_grouping_set T JOIN brand_segment_order T2
ON T.brand = T2.brand AND T.segment = T2.segment OR T.segment IS NULL
WHERE T2.rn_seg <= 2
)
SELECT brand, segment, name, total
FROM joined_data
WHERE (rn = 1 AND segment IS NULL ) OR (rn <= 2 AND segment IS NOT NULL)
ORDER BY brand, MAX(Total) OVER (PARTITION BY brand, segment) DESC,
Total DESC, segment NULLS FIRST, name NULLS FIRST
See demo
Another solution.
You can use the dense_rank function ordered by the max(count) for each brand-segment group as the following:
WITH get_grouping_set AS
(
SELECT brand, segment, name, count(1) AS total,
MAX(count(*)) over (PARTITION BY brand, segment) max_brand_segment
FROM info
GROUP BY GROUPING SETS
(
(brand),
(brand, segment),
(brand, segment, name)
)
),
brand_segment_order AS
(
SELECT *,
DENSE_RANK() OVER (PARTITION BY brand ORDER BY max_brand_segment DESC) segment_rank,
DENSE_RANK() OVER (PARTITION BY brand, segment ORDER BY Total DESC) name_rank
FROM get_grouping_set
)
SELECT brand, segment, name, total
FROM brand_segment_order
WHERE segment_rank <= 3 AND name_rank <= 2
ORDER BY brand, max_brand_segment DESC,
Total DESC, segment NULLS FIRST, name NULLS FIRST
WHERE segment_rank <= 3 this will retrieve two segments per brand, the plus one because the base brand (where the segment is null) is included.
AND name_rank <= 2 this will retrieve one name per segment, the plus one because the base segment (where the name is null) is included.
The use of the dense_rank function is to get all of (segments, names) in case of ties, i.e. when there are multiple segments/ names with the same max(count).

How to replace the NULL value of the last row of a column with 'Grand total' while retaining 'Total' replacing NULL value in the same column?

Below is the table created and inserted values in it:
CREATE TABLE Employees
(
Id INTEGER IDENTITY(1,1),
Name VARCHAR(50),
Gender VARCHAR(50),
Salary INTEGER,
Country VARCHAR(50)
)
INSERT INTO Employees (Name, Gender, Salary, Country)
VALUES ('Mark', 'Male', 5000, 'USA')
INSERT INTO Employees (Name, Gender, Salary, Country)
VALUES ('John', 'Male', 4500, 'India')
INSERT INTO Employees (Name, Gender, Salary, Country)
VALUES ('Pam', 'Female', 5500, 'USA')
INSERT INTO Employees (Name, Gender, Salary, Country)
VALUES ('Sara', 'Female', 4000, 'India')
INSERT INTO Employees (Name, Gender, Salary, Country)
VALUES ('Todd', 'Male', 3500, 'India')
INSERT INTO Employees (Name, Gender, Salary, Country)
VALUES ('Mary', 'Female', 5000, 'UK')
INSERT INTO Employees (Name, Gender, Salary, Country)
VALUES ('Ben', 'Male', 6500, 'UK')
INSERT INTO Employees (Name, Gender, Salary, Country)
VALUES ('Elizabeth', 'Female', 7000, 'USA')
INSERT INTO Employees (Name, Gender, Salary, Country)
VALUES ('Tom', 'Male', 5500, 'UK')
INSERT INTO Employees (Name, Gender, Salary, Country)
VALUES ('Ron', 'Male', 5000, 'USA')
SELECT * FROM Employees
Now I ran the following query:
SELECT
COALESCE(Country, '') AS [Country],
COALESCE(Gender, 'Total') AS [Gender],
SUM(Salary) AS [Total Salary]
FROM
Employees
GROUP BY
ROLLUP(Country, Gender)
When you look at the query result, the last row of the Gender column has the value 'Total' in it.
I want to replace 'Total' with 'Grand Total' only in the last row of Gender column while keeping 'Total' text in the other rows of Gender column.
Is there any possibility to achieve that ?
If so, then what is the simplest possible way to achieve it ?
You can use GROUPING_ID() for it:
SELECT
COALESCE(Country,'') AS [Country],
CASE WHEN GROUPING_ID(Country)=1 THEN 'Grand Total' ELSE COALESCE(Gender,'Total') END as [Gender],
SUM(Salary) AS [Total Salary]
FROM Employees
GROUP BY ROLLUP(Country,Gender)
DBFIDDLE
EDIT: In the comment of the question is noted that the order of the result should be specified, to make sure it is correct.
This query can be ordered like this, to make sure totals are below the details.
SELECT
COALESCE(Country,'') AS [Country],
CASE WHEN GROUPING_ID(Country)=1 THEN 'Grand Total' ELSE COALESCE(Gender,'Total') END as [Gender],
SUM(Salary) AS [Total Salary],
GROUPING_ID(Country),
GROUPING_ID(Gender)
FROM Employees
GROUP BY ROLLUP(Country,Gender)
ORDER BY COALESCE(Country,'ZZZ'),GROUPING_ID(Country),
Gender,GROUPING_ID(Gender)
One other easy way would be to just to concatenate the country name using isnull which is preferable in Sql server with just two values, such as:
select
isnull(Country,'') Country,
isnull(Gender, Concat(IsNull(Country, 'Grand'), ' Total')) Gender,
Sum(Salary) [Total Salary]
from Employees
group by rollup(Country,Gender);

Concatenate distinct strings and numbers

I am trying to get a distinct concatenated list of employee_ids and sum their employee_allowance. However, I do not want to sum duplicate employee_id's employee_allowance.
My expected result
name
employee_ids
allowance
this column is for explanation (not part of output)
Bob
11Bob532, 11Bob923
26
13+13=26 because the id's are different, so we sum both
Sara
12Sara833
93
John
18John243, 18John823
64
21+43=64 because we got rid of the duplicate 18John243's allowance
Table creation/dummy data
CREATE TABLE emp (
name varchar2(100) NOT NULL,
employee_id varchar2(100) NOT NULL,
employee_allowance number not null
);
INSERT INTO emp (name, employee_id, employee_allowance) VALUES ('Bob', '11Bob923', 13);
INSERT INTO emp (name, employee_id, employee_allowance) VALUES ('Bob', '11Bob532', 13);
INSERT INTO emp (name, employee_id, employee_allowance) VALUES ('Sara', '12Sara833', 93);
INSERT INTO emp (name, employee_id, employee_allowance) VALUES ('John', '18John243', 21);
INSERT INTO emp (name, employee_id, employee_allowance) VALUES ('John', '18John243', 21);
INSERT INTO emp (name, employee_id, employee_allowance) VALUES ('John', '18John823', 43);
My attempt
My output gives me the distinct, concatenated employee_ids but still sums up the duplicate employee_allowance row.
SELECT
name,
LISTAGG(DISTINCT employee_id, ', ') WITHIN GROUP (ORDER BY employee_id) "ids",
SUM(employee_allowance)
FROM emp
GROUP BY
name
Find the DISTINCT rows first and then aggregate:
SELECT name,
LISTAGG(employee_id, ', ') WITHIN GROUP (ORDER BY employee_id) AS employee_ids,
SUM(employee_allowance) AS allowance
FROM (
SELECT DISTINCT *
FROM emp
)
GROUP BY name
Which, for the sample data, outputs:
NAME
EMPLOYEE_IDS
ALLOWANCE
Bob
11Bob532, 11Bob923
26
John
18John243, 18John823
64
Sara
12Sara833
93
db<>fiddle here

Counting the instances of customers

Say that I have a table with one column named CustomerId.
The example of the instance of this table is :
CustomerId
14
12
11
204
14
204
I want to write a query that counts the number of occurences of customer IDs.
At the end, I would like to have a result like this :
CustomerId NumberOfOccurences
14 2
12 1
11 1
204 2
14 1
I cannot think of a way to do this.
This is the most basic example of GROUP BY
SELECT CustomerId, count(*) as NumberOfOccurences
FROM tablex GROUP BY CustomerId;
Practice exercise #3 on this page explains how to do this.
CREATE TABLE customers
( customer_id number(10) not null,
customer_name varchar2(50) not null,
city varchar2(50),
CONSTRAINT customers_pk PRIMARY KEY (customer_id)
);
INSERT INTO customers (customer_id, customer_name, city)
VALUES (7001, 'Microsoft', 'New York');
INSERT INTO customers (customer_id, customer_name, city)
VALUES (7002, 'IBM', 'Chicago');
INSERT INTO customers (customer_id, customer_name, city)
VALUES (7003, 'Red Hat', 'Detroit');
INSERT INTO customers (customer_id, customer_name, city)
VALUES (7004, 'Red Hat', 'New York');
INSERT INTO customers (customer_id, customer_name, city)
VALUES (7005, 'Red Hat', 'San Francisco');
INSERT INTO customers (customer_id, customer_name, city)
VALUES (7006, 'NVIDIA', 'New York');
INSERT INTO customers (customer_id, customer_name, city)
VALUES (7007, 'NVIDIA', 'LA');
INSERT INTO customers (customer_id, customer_name, city)
VALUES (7008, 'NVIDIA', 'LA');
Solution:
The following SQL statement would return the number of distinct cities for each customer_name in the customers table:
SELECT customer_name, COUNT(DISTINCT city) as "Distinct Cities"
FROM customers
GROUP BY customer_name;
It would return the following result set:
CUSTOMER_NAME Distinct Cities
IBM 1
Microsoft 1
NVIDIA 2
Red Hat 3