Aggregate the long tail of a group by query into "others"

Aggregate the long tail of a group by query into "others" - sql

I have a table with one dimension and one metric:
name metric
A 4
A 9
B 27
C 9
D 6
I want to group by the dimension and then group the long tail of the results into an 'others' or 'the rest of the data' label.
For example my query should return all the names that the sum of their metrics are greater than 10 and group the rest into 'others':
name metric
A 13
others 15
B 27
I can get this result by aggregating twice:
with T as (
select
name
, (case when sum(metric) > 10 then name else 'others' end) as group_name
, sum(metric) as metric
from MyData
group by name
)
select
group_name as name
, sum(metric) as metric
from T
group by group_name
order by metric
Can I do this in a single operation without using sub queries?
SQL Snippet

I'm pretty certain this requires two levels of aggregation, because the original data doesn't have the information for grouping the names. You need one aggregation to classify the names and one to calculate the final results.
That said, I would write this as:
select (case when sum_metric > 10 then name else 'others' end) as group_name,
sum(sum_metric) as metric
from (select name, sum(metric) as sum_metric
from mydata
group by name
) t
group by group_name;
That said, you could use select distinct and window function for something inscrutable such as:
select distinct (case when sum(metric) > 10 then name else 'others' end),
sum(sum(metric)) over (partition by (case when sum(metric) > 10 then name else 'others' end)) as metric
from mydata
group by name;
However, select distinct is really doing another aggregation. So this eliminates the subquery but not the work.

Related

T-SQL query to find the required output

I am new to SQL queries, I have some data and I am trying to find the result which is shown below.
In my sample data, I have customer ID repeating multiple times due to multiple locations, What I am looking to do is create a query which gives output shown in image output format,
If customer exists only once I take that row
If customer exists more than once, I check the country; if Country = 'US', I take that ROW and discard others
If customer exists more than once and country is not US, then I pick the first row
PLEASE NOTE: I Have 35 columns and I dont want to change the ROWS order as I have to select the 1st row in case customer exist more than once and country is not 'US'.
What I have tried: I am trying to do this using rank function but was unsuccessful. Not sure if my approach is right, Please anyone share the T-SQL query for the problem.
Regards,
Rahul
Sample data:
Output required :

I have created a (short) dbfiddle
Short explanation (to just repeat the code here on SO):
Step1:
-- select everyting, and 'US' as first row
SELECT
cust_id,
country,
sales,
CASE WHEN country='US' THEN 0 ELSE 1 END X,
ROW_NUMBER() OVER (PARTITION BY cust_id
ORDER BY (CASE WHEN country='US' THEN 0 ELSE 1 END)) R
FROM table1
ORDER BY cust_id, CASE WHEN country='US' THEN 0 ELSE 1 END;
Step2:
-- filter only rows which are first row...
SELECT *
FROM (
SELECT
cust_id,
country,
sales,
CASE WHEN country='US' THEN 0 ELSE 1 END X,
ROW_NUMBER() OVER (PARTITION BY cust_id
ORDER BY (CASE WHEN country='US' THEN 0 ELSE 1 END)) R
FROM table1
-- ORDER BY cust_id, CASE WHEN country='US' THEN 0 ELSE 1 END
) x
WHERE x.R=1

I can't vouch for performance but it should work on SQL Server 2005. Assuming your table is named CustomerData try this:
select cust_id, country, Name, Sales, [Group]
from CustomerData
where country = 'US'
union
select c.* from CustomerData c
join (
select cust_id, min(country) country
from CustomerData
where cust_id not in (
select cust_id
from CustomerData
where country = 'US'
)
group by cust_id
) a on a.cust_id = c.cust_id and a.country = c.country
It works by finding all those with a record with US as the country and then unioning that with the first country from every record that doesn't have the US as a country. If min() isn't getting the country you want then you'll need to find an alternative aggregation function that will select the country you want.

How to transform rows to columns using Postgres SQL?

I have the data in the following structure
Desired Output

Postgres (starting in 9.4) supports the filter syntax for conditional aggregation. So I would recommend:
SELECT customer_id,
MAX(value) FILTER (WHERE name = 'age') as age,
MAX(value) FILTER (WHERE name = 'marketing_consent') as marketing_consent,
MAX(value) FILTER (WHERE name = 'gender') as gender
FROM t
GROUP BY customer_id

SELECT customer_id,
MAX(CASE WHEN name='age' THEN value ELSE NULL END) AS age,
MAX(CASE WHEN name='marketing_consent' THEN value ELSE NULL END) AS marketing_consent,
MAX(CASE WHEN name='gender' THEN value ELSE NULL END) AS gender
FROM table
GROUP BY customer_id;
You just group by customer_id and pick out each value in its own column. You need an aggregate function on the various values columns for syntactical reasons, which is why each column has a Max function.
Not e that it would of course be better to store the data in a normalized fashion. Also, with newer versions of postgres, you can use filter instead of case, which I find a bit more readable.

Pivot for the same gender

How do i have multiple pivot. I would like to achieve the result as highlighted below.
For each Grade and each Gender, i would like to have the TotalA and Total B values aligned in 4 columns in a single row. My final result need to contain all 10 columns shown below.
My desired output [Need to contain 2 rows with GENDER column remained]:
I tried with below: But the script removed my Gender column and unable to pivot 2 columns (TotalA, TotalB) into 4 additional columns at the same time.
SELECT *,
[TotalA_Male] = [M],
[TotalB_Female] = [F]
FROM
(
SELECT * FROM table) AS s
PIVOT
(
MAX(TotalA) FOR [Gender] IN ([M],[F])
) AS p

I don't think you want a pivot at all. You are looking to find the partial sum of your total column, grouped by some key columns (it looks like Country and Grade in this case) . Window functions let you perform this partial sum. However, they won't filter by gender. You'll need to use a CASE expression inside the SUM() to only include male or female in your partial sums:
SELECT *,
SUM(CASE WHEN Gender = 'M' THEN TotalA ELSE 0 END) OVER(PARTITION BY Country, Grade) AS TotalA_Male,
SUM(CASE WHEN Gender = 'F' THEN TotalA ELSE 0 END) OVER(PARTITION BY Country, Grade) AS TotalA_Female,
SUM(CASE WHEN Gender = 'M' THEN TotalB ELSE 0 END) OVER(PARTITION BY Country, Grade) AS TotalB_Male,
SUM(CASE WHEN Gender = 'F' THEN TotalB ELSE 0 END) OVER(PARTITION BY Country, Grade) AS TotalB_Female
FROM totals
See also: https://msdn.microsoft.com/en-us/library/ms189461.aspx
Basically, the window functions let you do a GROUP BY as part of a single column expression in the SELECT list. The result of the aggregate and group by is included in every row, just as if it were any other expression. Note how there is no GROUP BY or PIVOT in the rest of the query. The PARTITION BY in the OVER() clause works like a GROUP BY, specifying how to group the rows in the resultset for the purposes of performing the specified aggregation (in this case, SUM()).

You can only pivot on a single column so what you need to to is unpivot those TotalA and TotalB columns into rows and then generate a single column based on gender and the total and use that in a pivot...
select * from (
select
grade,
/* combine the columns for a pivot */
total_gender_details = details + '_' + gender,
totals
from
(values
(1, 'F', cast(7.11321 as float), cast(15.55607 as float)),
(1, 'M', 6.31913, 15.50801),
(2, 'F', 5.26457, 6.94687),
(2, 'M', 6.34666, 9.29783)
) t(grade,gender,totalA,totalB)
/* unpivot the totals into rows */
unpivot (
totals
for details in ([totalA], [totalB])
) up
) t
pivot (
sum(totals)
for total_gender_details in ([totalA_M],[totalA_F],[totalB_M],[totalB_F])
) p

Sum of 1 column to be displayed for all rows in a dataset

I have a case where I insert multiple datasets into a temp table. At the end, I would like to display the total number of rows for these multiple datasets across all the rows of the temp table. For example:
cnt1 name age
300 peter 21
200 piper 22
Desired result set:
cnt1 name age
500 peter 21
500 piper 22
This is the outcome I am looking for at the end of a very long stored procedure. I am not able to figure out how to add up on a single column and display the sum across all the rows.

With window function:
select sum(cnt1) over() as cnt1, name, age
from TableName
EDIT:
select (select sum(distinct cnt1) from TableName) as cnt1, name, age
from TableName

Try this (you can use union all to append it to your earlier results if you want).
Select sum(Cnt1) over () as Cnt1, name, age
from MyTable
My answer originally used over (partition by 1), but I see that that is unnecessary.

Cross join with a subquery that returns the grand total:
select gt cnt1, name, age
from mytable
cross join (select sum(cnt1) gt from mytable) x

unique count of the columns?

i want to get a unique count of the of multiple columns containing the similar or different data...i am using sql server 2005...for one column i am able to take the unique count... but to take a count of multiple columns at a time, what's the query ?

You can run the following selected, getting the data from a derived table:
select count(*) from (select distinct c1, c2, from t1) dt

To get the count of combined unique column values, use
SELECT COUNT(*) FROM TableName GROUP BY UniqueColumn1, UniqueColumn2
To get the unique counts of multiple individual columns, use
SELECT COUNT(DISTINCT Column1), COUNT(DISTINCT Column2)
FROM TableName
Your question is not clear what exactly you want to achieve.

I think what you're getting at is individual SUMS from two unique columns in one query. I was able to accomplish this be using
SELECT FiscalYear, SUM(Col1) AS Col1Total, SUM(Col2) AS Col2Total
FROM TableName
GROUP BY FiscalYear
If your data is not numerical in nature, you can use CASE statements
SELECT FiscalYear, SUM(CASE WHEN ColA = 'abc' THEN 1 ELSE 0 END) AS ColATotal,
SUM(CASE WHEN ColB = 'xyz' THEN 1 ELSE 0 END) AS ColBTotal
FROM TableName
GROUP BY FiscalYear
Hope this helps!

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Aggregate the long tail of a group by query into "others" - sql

Related

T-SQL query to find the required output

How to transform rows to columns using Postgres SQL?

Pivot for the same gender

Sum of 1 column to be displayed for all rows in a dataset

unique count of the columns?

Categories

Resources