Unique values with string_agg on more than 1 column - sql

I am trying to group by and get list of values for multiple columns. Here is an example:
City | State | Income
-------+-------+--------
Salem | OH | 40000
Salem | OH | 45000
Mason | OH | 50000
Dayton | OH | 60000
Salem | MA | 40000
Mason | MA | 45000
Mason | MA | 50000
Dayton | MA | 70000
Salem | PA | 45000
Mason | PA | 50000
Dayton | PA | 60000
The result I am looking for is:
City | States | Income
-------+------------+--------------
Salem | OH,MA,PA | 40000,45000
Mason | OH,MA,PA | 50000,45000
Dayton | OH,MA,PA | 60000,70000
I managed to get this far:
City | States | Income
-------+------------+-------------------------
Salem | OH,MA,PA | 40000,40000,45000,45000
Mason | OH,MA,PA | 50000,50000,50000,45000
Dayton | OH,MA,PA | 60000,70000,60000
How do I go from here to the result set?
City | States | Income
-------+------------+-------------------------
Salem | OH,MA,PA | 40000,45000,50000
Mason | OH,MA,PA | 50000,45000
Dayton | OH,MA,PA | 60000,70000

Alas, you cannot use string_agg() with distinct. But you can use conditional aggregation:
select city,
string_agg(case when seqnum_state = 1 then state end, ',') as states,
string_agg(case when seqnum_income = 1 then income end, ',') as incomes
from (select t.*,
row_number() over (partition by city, state order by state) as seqnum_state,
row_number() over (partition by city, income order by income) as seqnum_income
from t
) t
group by city;
Here is a db<>fiddle.

You can perform separate group by on (City, state) and (City, Income) to remove duplicates, then you can separately build the (States) and (Incomes) aggregated strings and finally you can join the results in a single table:
DECLARE #tmp TABLE (City VARCHAR(100), State VARCHAR(100), Income int);
INSERT INTO #tmp
VALUES ('Salem' ,'OH', 40000) ,('Salem' ,'OH', 45000) ,('Mason' ,'OH', 50000)
,('Dayton','OH', 60000) ,('Salem' ,'MA', 40000) ,('Mason' ,'MA', 45000)
,('Mason' ,'MA', 50000) ,('Dayton','MA', 70000) ,('Salem' ,'PA', 45000)
,('Mason' ,'PA', 50000) ,('Dayton','PA', 60000)
;with States as(
select City, state
from #tmp
group by City, state
),
incomes as(
select City, Income
from #tmp
group by City, Income
)
, states_g as (
select city, STRING_AGG(state,',') as States
from states
group by city
)
, incomes_g as (
select city, STRING_AGG(Income,',') as Incomes
from incomes
group by city
)
select
s.City, s.States, i.Incomes
from
states_g as s
inner join
incomes_g as i
on i.City = s.City
Results:

Here is one more way of doing it (db fiddle):
select city,
(select string_agg(value,', ') from (select distinct value from string_split(string_agg(state, ','),',')) t) as states,
(select string_agg(value,', ') from (select distinct value from string_split(string_agg(income, ','),',')) t) as incomes
from t
group by city;
You may easily convert the splitting and merging part into a reusable scalar valued function.
Experts are welcome to comment on performance.

Related

Row Number by Certain Column in SQL

I have a table that contains customer transactions. It looks like this:
Tha data is sorted by Total Transaction. I want to create a column that contains number by City. For Example, the first row shows City is London so the values is 1, second row becaus it's from London too, the value is also 1. When the Next Row is not London, the value is 2. So it looks like this:
Is there a way to create that row number in SQL Server?
You can try using dense_rank()
select *,dense_rank() over(order by city) as cityNumber
from tablename
order by total_transaction desc
I believe the question is valid and as per my understanding on the requirement , you need a two level of sub query to get to the final result,
Here I have used max as the data first has to be sorted by Total Transaction and then we can use dense_rank to give a row number using the max value and city.
select t.city as "City"
,dense_rank() over (order by max_total_per_city desc,city) as "City Number"
,t.customer as "Customer"
,t.total_transaction as "Total Transaction"
from
(
select *
,max(total_transaction) over (partition by city) as max_total_per_city
from tableName t
) t
order by total_transaction desc
You can get the CityNumbers with ROW_NUMBER() window function:
select City, row_number() over (order by max(TotalTransaction) desc) CityNumber
from tablename
group by City
so you can join the above query to the table:
select t.City, c.CityNumber, t.Customer, t.Totaltransaction
from tablename t inner join (
select City, row_number() over (order by max(TotalTransaction) desc) CityNumber
from tablename
group by City
) c on c.City = t.City
order by t.TotalTransaction desc
Or with DENSE_RANK() window function:
select t.City,
dense_rank() over (order by (select max(TotalTransaction) from tablename where City = t.City) desc) as cityNumber,
t.Customer,
t.TotalTransaction
from tablename t
order by t.TotalTransaction desc
See the demo.
Results:
> City | CityNumber | Customer | Totaltransaction
> :--------- | ---------: | :------- | ---------------:
> London | 1 | Michael | 250
> London | 1 | Edward | 180
> Paris | 2 | Michael | 160
> Madrid | 3 | Luis | 153
> London | 1 | Serena | 146
> Madrid | 3 | Lionel | 133
> Manchester | 4 | Frank | 96

There is a table having country and city columns as shown in the below table input. I need the output as mentioned below

I need a SQL query to get the desired output from the input table
You can do this with a UNION query, first selecting the distinct country names, and then each of the cities for that country. The output is then ordered by the country; whether the value is a country or a city; and then by the value:
SELECT DISTINCT country AS data, country, 1 AS ctry
FROM cities
UNION ALL
SELECT city, country, 0
FROM cities
ORDER BY country, ctry DESC, data
Output:
data country ctry
India India 1
BNG India 0
CHN India 0
HYD India 0
Sweden Sweden 1
GOTH Sweden 0
STOCK Sweden 0
VAXO Sweden 0
Demo on dbfiddle
It really looks like you are willing to interleave the records, with each country followed by its related countries.
The actual solution heavily depends on your datase, so let me assume that yours supports window functions, row constructor values() and lateral joins (SQL Server and Postgres are two candidates).
In SQL Server, you could do:
select distinct rn, idx, val
from (
select t.*, dense_rank() over(order by country) rn
from mytable t
) t
cross apply (values (t.country, 1), (t.city, 2)) as v(val, idx)
order by rn, idx, val
Demo on DB Fiddle:
rn | idx | val
-: | --: | :-----
1 | 1 | INDIA
1 | 2 | BNG
1 | 2 | CHN
1 | 2 | HYD
2 | 1 | SWEDEN
2 | 2 | STOCK
2 | 2 | VAXO
In Postgres, you would just replace outer apply with cross join lateral: Demo.

Get records based on column max value

I have cars table with data
country | car | price
---------------------
Germany | Mercedes | 30000
Germany | BMW | 20000
Germany | Opel | 15000
Japan | Honda | 20000
Japan | Toyota | 15000
I need get country, car and price from table, with highest price
for each country
Germany Mercedes 30000
Japan Honda 20000
try
select cars.* FROM cars
INNER JOIN (
select country, max(price) AS maxprice from cars
GROUP BY country
) m
ON cars.country = m.country AND cars.price = m.maxprice
Use ROW_NUMBER()
SELECT *
FROM ( SELECT *,
ROW_NUMBER() OVER (PARTITION BY country
ORDER BY price DESC) as rn
FROM cars ) as T
WHERE T.rn = 1
If you allow ties, use RANK instead
SELECT *
FROM ( SELECT *,
RANK() OVER (PARTITION BY country
ORDER BY price DESC) as rn
FROM cars ) as T
WHERE T.rn = 1

Find Min Value and value of a corresponding column for that result

I have a table of user data in my SQL Server database and I am attempting to summarize the data. Basically, I need some min, max, and sum values and to group by some columns
Here is a sample table:
Member ID | Name | DateJoined | DateQuit | PointsEarned | Address
00001 | Leyth | 1/1/2013 | 9/30/2013 | 57 | 123 FirstAddress Way
00002 | James | 2/1/2013 | 7/21/2013 | 34 | 4 street road
00001 | Leyth | 2/1/2013 | 10/15/2013| 32 | 456 LastAddress Way
00003 | Eric | 2/23/2013 | 4/14/2013 | 15 | 5 street road
I'd like the summarized table to show the results like this:
Member ID | Name | DateJoined | DateQuit | PointsEarned | Address
00001 | Leyth | 1/1/2013 | 10/15/2013 | 89 | 123 FirstAddress Way
00002 | James | 2/1/2013 | 7/21/2013 | 34 | 4 street road
00003 | Eric | 2/23/2013 | 4/14/2013 | 15 | 5 street road
Here is my query so far:
Select MemberID, Name, Min(DateJoined), Max(DateQuit), SUM(PointsEarned), Min(Address)
From Table
Group By MemberID
The Min(Address) works this time, it retrieves the address that corresponds to the earliest DateJoined. However, if we swapped the two addresses in the original table, we would retrieve "123 FirstAddress Way" which would not correspond to the 1/1/2013 date joined.
For almost everything you can use a simple groupby, but as you need "the same address than the row where the minimum datejoined is" is a little bit tricker and you can solve it in several ways, one is a subquery searching the address each time
SELECT
X.*,
(select Address
from #tmp t2
where t2.MemberID = X.memberID and
t2.DateJoined = (select MIN(DateJoined)
from #tmp t3
where t3.memberID = X.MemberID))
FROM
(select MemberID,
Name,
MIN(DateJoined) as DateJoined,
MAX(DateQuit) as DateQuit,
SUM(PointsEarned) as PointEarned
from #tmp t1
group by MemberID,Name
) AS X
`
Or other is a subquery with a Join
SELECT
X.*,
J.Address
FROM
(select
MemberID,
Name,
MIN(DateJoined) as DateJoined,
MAX(DateQuit) as DateQuit,
SUM(PointsEarned) as PointEarned
from #tmp t1
group by MemberID,Name
) AS X
JOIN #tmp J ON J.MemberID = X.MemberID AND J.DateJoined = X.DateJoined
You could rank your rows according to the date, and select the minimal one:
SELECT t.member_id,
name,
date_joined,
date_quit,
points_earned
address AS address
FROM (SELECT member_id
name,
MIN (date_joined) AS date_joined,
MAX (date_quit) AS date_quit,
SUM (points_earned) AS points_earned,
FROM my_table
GROUP BY member_id, name) t
JOIN (SELECT member_id,
address,
RANK() OVER (PARTITION BY member_id ORDER BY date_joined) AS rk
FROM my_table) addr ON addr.member_id = t.member_id AND rk = 1
SELECT DISTINCT st.memberid, st.name, m1.datejoined, m2.datequit, SUM(st.pointsearned), m1.Address
from SAMPLEtable st
LEFT JOIN ( SELECT memberid
, name
, MIN(datejoined)
, datequit
FROM sampletable
) m1 ON st.memberid = m1.memberid
LEFT JOIN ( SELECT memberid
, name
, datejoined
, MAX(datequit)
FROM sampletable
) m2 ON m1.memberid = m2.memberid

Calculating percentage within a group

given a table that for the following commands:
select sex, count(*) from my_table group by sex;
select sex, employed, count(*) from my_table group by sex, employed;
gives:
sex | count
-------+------
male | 1960
female | 1801
and:
sex | employed | count
---------+----------+-------
male | f | 1523
male | t | 437
female | f | 1491
female | t | 310
I'm having a difficulty writing a query that will calculate percentage of employed within each sex group. So the output should look like this:
sex | employed | count | percent
---------+----------+--------+-----------
male | f | 1523 | 77.7% (1523/1960)
male | t | 437 | 22.3% (437/1960)
female | f | 1491 | 82.8% (1491/1801)
female | t | 310 | 17.2% (310/1801)
May be too late, but for upcoming searchers, possible solution could be:
select sex, employed, COUNT(*) / CAST( SUM(count(*)) over (partition by sex) as float)
from my_table
group by sex, employed
By IO Statistics this seems to be most effective solution - may be dependant on number of rows to be queried - tested on numbers above ...
The same attitude could be used for getting male / female percentage:
select sex, COUNT(*) / CAST( SUM(count(*)) over () as float)
from my_table
group by sex
Regards,
Jan
You can do it with a sub-select and a join:
SELECT t1.sex, employed, count(*) AS `count`, count(*) / t2.total AS percent
FROM my_table AS t1
JOIN (
SELECT sex, count(*) AS total
FROM my_table
GROUP BY sex
) AS t2
ON t1.sex = t2.sex
GROUP BY t1.sex, employed;
I can't think of other approaches off the top of my head.