Unable to calculate median - SQL Server 2017 - sql

I am trying to computer the median number of transactions in each category.
A few notes (as the dataset below is a small snippet of a much larger dataset):
An employee can belong to multiple categories
Each transaction's median should be > 0
Not every person appears in every category
The data is set up like this:
| Person | Category | Transaction |
|:-------:|:--------:|:-----------:|
| PersonA | Sales | 27 |
| PersonB | Sales | 75 |
| PersonC | Sales | 87 |
| PersonD | Sales | 36 |
| PersonE | Sales | 70 |
| PersonB | Buys | 60 |
| PersonC | Buys | 92 |
| PersonD | Buys | 39 |
| PersonA | HR | 59 |
| PersonB | HR | 53 |
| PersonC | HR | 98 |
| PersonD | HR | 54 |
| PersonE | HR | 70 |
| PersonA | Other | 46 |
| PersonC | Other | 66 |
| PersonD | Other | 76 |
| PersonB | Other | 2 |
An ideal output would look like:
| Category | Median | Average |
|:--------:|:------:|:-------:|
| Sales | 70 | 59 |
| Buys | 60 | 64 |
| HR | 59 | 67 |
| Other | 56 | 48 |
I can get the average by:
SELECT
Category,
AVG(Transaction) AS Average_Transactions
FROM
table
GROUP BY
Category
And that works great!
This post tried to help me find the median. What I wrote was:
SELECT
Category,
PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY Transaction) OVER (PARTITION BY Category) AS Median_Transactions
FROM
table
GROUP BY
Category
But I get an error:
Msg 8120: Column 'Transactions' is invalid in the select list because it is not contained in either an aggregate function or the **GROUP BY** clause
How can I fix this?

You can do what you want using SELECT DISTINCT:
SELECT DISTINCT Category,
PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY Transaction) OVER (PARTITION BY Category) AS Median_Transactions
FROM table;
Unfortunately, SQL Server doesn't offer the PERCENTILE_ functions as window functions and doesn't have a MEDIAN() aggregation function. You can also do this using subqueries and counts.

It's not optimal but this is your solution
SELECT DISTINCT
category,
PERCENTILE_DISC(0.5)WITHIN GROUP(ORDER BY val) OVER (PARTITION BY category) AS Median_Transactions,
AVG(val) OVER (PARTITION BY d.category) [AVG]
FROM #data d;

I don't think this is pretty but it works. I didn't spend time on polishing it
with
avg_t as
( select category, avg(sales) as avg_sales
from sample
group by 1),
mn as
( select category, avg(sales) as median_sales
from (
select category, sales ,
row_number() over (partition by category order by sales asc) as r ,
count(person) over (partition by category) as total_count
from sample
) mn_sub
where (total_count % 2 = 0 and r in ( (total_count/2), ((total_count/2)+1)) ) or
(total_count % 2 <> 0 and r = ((total_count+1)/2))
group by 1
)
select avg_t.category, avg_t.avg_sales, mn.median_sales
from avg_t
inner join mn
on avg_t.category=mn.category

Related

Joining 2 unrelated tables together

I have just delved into PostgreSQL and am currently trying to practice an unorthodox query whereby I want to join 2 unrelated tables, each with the same number of rows, together such that every row carries the combined columns of both tables.
These are what I have:
technical table
position | height | technical_id
----------+--------+-------------
Striker | 172 | 3
CAM | 165 | 4
(2 rows)
footballers table
name | age | country | game_id
----------+-----+-----------+--------
Pele | 77 | Brazil | 1
Maradona | 65 | Argentina | 2
(2 rows)
What i have tried:
SELECT name, '' AS position, null AS height, age, country, game_id, null as technical_id
from footballers
UNION
SELECT '' as name, position, height, null AS age,'' AS country, null as game_id, technical_id
from technical;
Output:
name | position | height | age | country | game_id | technical_id
----------+----------+--------+-----+-----------+---------+-------------
| Striker | 172 | | | | 3
| CAM | 165 | | | | 4
Maradona | | | 65 | Argentina | 2 |
Pele | | | 77 | Brazil | 1 |
(4 rows)
What I'm looking for (ideally):
name | position | height | age | country | game_id | technical_id
----------+----------+--------+-----+-----------+---------+-------------
Pele | Striker | 172 | 77 | Brazil | 1 | 3
Maradona | CAM | 165 | 65 | Argentina | 2 | 4
(2 rows)
Please use below query. But its not the right way of designing the schema. You should have a foreign key.
select t1.position,t1.height,t1.technical_id,t2.name,t2.age,t2.country,t2.game_id
from
(select position,height,technical_id, row_number() over(partition by
position,height,technical_id) as rnk) t1
inner join
(select name,age,country,game_id, row_number() over(partition by
name,age,country,game_id) as rnk) t2
on t1.rnk = t2.rnk;
You don't have a column to join on, so you can generate one. What works is a sequential number generated by row_number(). So:
select *
from (select t.*, row_number() over () as sequm
from technical t
) t join
(select f.*, row_number() over () as sequm
from footballers f
) f
using (seqnum);
Note: Postgres has extended the syntax of row_number() so it does not require an order by clause. The ordering of the rows is arbitrary and might change on different runs of the query.

Combine PARTITION BY and GROUP BY

I have a (mssql) table like this:
+----+----------+---------+--------+--------+
| id | username | date | scoreA | scoreB |
+----+----------+---------+--------+--------+
| 1 | jim | 01/2020 | 100 | 0 |
| 2 | max | 01/2020 | 0 | 200 |
| 3 | jim | 01/2020 | 0 | 150 |
| 4 | max | 02/2020 | 150 | 0 |
| 5 | jim | 02/2020 | 0 | 300 |
| 6 | lee | 02/2020 | 100 | 0 |
| 7 | max | 02/2020 | 0 | 200 |
+----+----------+---------+--------+--------+
What I need is to get the best "combined" score per date. (With "combined" score I mean the best scores per user and per date summarized)
The result should look like this:
+----------+---------+--------------------------------------------+
| username | date | combined_score (max(scoreA) + max(scoreB)) |
+----------+---------+--------------------------------------------+
| jim | 01/2020 | 250 |
| max | 02/2020 | 350 |
+----------+---------+--------------------------------------------+
I came this far:
I can group the scores by user like this:
SELECT
username, (max(scoreA) + max(scoreB)) AS combined_score,
FROM score_table
GROUP BY username
ORDER BY combined_score DESC
And I can get the best score per date with PARTITION BY like this:
SELECT *
FROM
(SELECT t.*, row_number() OVER (PARTITION BY date ORDER BY scoreA DESC) rn
FROM score_table t) as tmp
WHERE tmp.rn = 1
ORDER BY date
Is there a proper way to combine these statements and get the result I need? Thank you!
Btw. Don't care about possible ties!
You can combine window functions and aggregation functions like this:
SELECT s.*
FROM (SELECT username, date, (max(scoreA) + max(scoreB)) AS combined_score,
ROW_NUMBER() OVER (PARTITION BY date ORDER BY max(scoreA) + max(scoreB) DESC) as seqnum
FROM score_table
GROUP BY username, date
) s
ORDER BY combined_score DESC;
Note that date needs to be part of the aggregation.

SQL Query to Find Min and Max Values between Values, dates and companies in the same Query

This is to find the historic max and min price of a stock in the same query for every past 10 days from the current date. below is the data. I've tried the query but getting the same high and low for all the rows. The high and low needs to be calculated per stock for a period of 10 days.
RDBMS -- SQL Server 2014
Note: also duration might be past 30 to 2months if required ie... 30 days. or 60 days.
for example, the output needs to be like ABB,16-12-2019,1480 (MaxClose),1222 (MinClose) (test data) in last 10 days.
+------+------------+-------------+
| Name | Date | Close |
+------+------------+-------------+
| ABB | 26-12-2019 | 1272.15 |
| ABB | 24-12-2019 | 1260.15 |
| ABB | 23-12-2019 | 1261.3 |
| ABB | 20-12-2019 | 1262 |
| ABB | 19-12-2019 | 1476 |
| ABB | 18-12-2019 | 1451.45 |
| ABB | 17-12-2019 | 1474.4 |
| ABB | 16-12-2019 | 1480.4 |
| ABB | 13-12-2019 | 1487.25 |
| ABB | 12-12-2019 | 1484.5 |
| INFY | 26-12-2019 | 73041.66667 |
| INFY | 24-12-2019 | 73038.33333 |
| INFY | 23-12-2019 | 73036.66667 |
| INFY | 20-12-2019 | 73031.66667 |
| INFY | 19-12-2019 | 73030 |
| INFY | 18-12-2019 | 73028.33333 |
| INFY | 17-12-2019 | 73026.66667 |
| INFY | 16-12-2019 | 73025 |
| INFY | 13-12-2019 | 73020 |
| INFY | 12-12-2019 | 73018.33333 |
+------+------------+-------------+
The query I tried but no luck
select max([close]) over (PARTITION BY name) AS MaxClose,
min([close]) over (PARTITION BY name) AS MinClose,
[Date],
name
from historic
where [DATE] between [DATE] -30 and [DATE]
and name='ABB'
group by [Date],
[NAME],
[close]
order by [DATE] desc
If you just want the highest and lowest close per name, then simple aggregation is enough:
select name, max(close) max_close, min(close) min_close
from historic
where close >= dateadd(day, -10, getdate())
group by name
order by name
If you want the entire corresponding records, then rank() is a solution:
select name, date, close
from (
select
h.*,
rank() over(partition by name order by close) rn1,
rank() over(partition by name order by close desc) rn2
from historic h
where close >= dateadd(day, -10, getdate())
) t
where rn1 = 1 or rn2 = 1
order by name, date
Top and bottom ties will show up if any.
You can add a where condition to filter on a given name.
If you are looking for a running min/max
Example
Select *
,MinClose = min([Close]) over (partition by name order by date rows between 10 preceding and current row)
,MaxClose = max([Close]) over (partition by name order by date rows between 10 preceding and current row)
From YourTable
Returns

eSQL multiple join but with conditions

I've 3 tables as under
MERCHANDISE
+-----------+-----------+---------------+
| MERCH_NUM | MERCH_DIV | MERCH_SUB_DIV |
+-----------+-----------+---------------+
| 1 | car | awd |
| 1 | car | awd |
| 2 | bike | 1kcc |
| 3 | cycle | hybrid |
| 3 | cycle | city |
| 4 | moped | fixie |
+-----------+-----------+---------------+
PRIORITY
+----------+-----------+---------+---------+------------+------------+---------------+
| CUST_NUM | SALES_NUM | DOC_NUM | BALANCE | PRIORITY_1 | PRIORITY_2 | PRIORITY_CODE |
+----------+-----------+---------+---------+------------+------------+---------------+
| 90 | 1000 | 10 | 23 | 1 | 6 | NO |
| 91 | 1001 | 20 | 32 | 3 | 7 | PRI |
| 92 | 1002 | 30 | 11 | 2 | 8 | LATE |
| 93 | 1003 | 40 | 22 | 5 | 9 | 1MON |
+----------+-----------+---------+---------+------------+------------+---------------+
ORDER
+----------+-----------+---------+---------+-----------+-----------+
| CUST_NUM | SALES_NUM | DOC_NUM | COUNTRY | MERCH_NUM | MERCH_DIV |
+----------+-----------+---------+---------+-----------+-----------+
| 90 | 1000 | 10 | INDIA | 1 | car |
| 91 | 1001 | 20 | CHINA | 2 | bike |
| 92 | 1002 | 30 | USA | 3 | cycle |
| 93 | 1003 | 40 | UK | 4 | moped |
+----------+-----------+---------+---------+-----------+-----------+
I want to join the left joined table from the last two tables with the first one such that the MERCH_SUB_DIV 'awd' appears only once for each unique combination of merch_num and merch_div
the code I came up with is as under, but I'm not sure how do I eliminate the duplicate row just for the awd
select
ROW#, MERCH.MERCH_NUMBER, ORDPRI.MERCH_NUMBER, ORDPRI.CUST_NUM,
BALANCE, SALES_NUM, ITEM_NUM, RANK, PRIORITY_1
from (
select
ROW_NUMBER() OVER(
PARTITION BY ORD.DOC_NUM, ORD.ITEM_NUM
ORDER BY ORD.DOC_NUM, ORD.ITEM_NUM ASC
) AS Row#,
ORD.CUST_NUM, PRI.CUST_NUM, ORD.MERCH_NUM, ORD.MERCH_DIV, PRI.BALANCE,
pri.DOC_NUM, pri.SALES_NUM, pri.PRIORITY_1, pri.PRIORITY_2
from ORDER as ORD
left join PRIORITY as PRI on ORD.DOC_NUM = PRI.DOC_NUM
and ORD.SALES_NUMBER = PRI.SALES_NUM
where country_name in ('USA', ‘INDIA’)
) as ORDPRI
left join MERCHANDISE as MERCH on ORDPRI.DIV = MERCH.DIV
and ORDPRI.MERCH_NUM = MERCH.MERCH_NUM
You have to use 'DISTINCT' keyword to get unique values, but if your 'Priority table' & 'Order table' contains different values for Same MERCH_NUM then the final result contains the repetation of the 'MERCH_NUM'.
SELECT DISTINCT M.MERCH_NUMBER, O.MERCH_NUMBER, O.CUST_NUM, BALANCE, SALES_NUM,ITEM_NUM,RANK,PRIORITY_1
FROM priority_table P
LEFT JOIN order_table O ON P.CUST_NUM = O.CUST_NUM AND P.SALES_NUM=O.SALES_NUM AND P.DOC_NUM = O.DOC_NUM
LEFT JOIN merchandise_table M ON M.MERCH_NUM = O.MERCH_NUM
A way around can be to add one new Row_Number() in the outermost query having Partition by MERCH_SUB_DIV + all the columns in the final list and then filter final results based on the New Row_Number() . Follows a pseudo code that might help:
select
-- All expected columns in final result except the newRow#
ROW#, MERCH_NUM, CUST_NUM,
BALANCE, SALES_NUM, PRIORITY_1
from (
select
ROW#,
-- the new row number includes all column you want to show in final result
row_number() over ( PARTITION BY MERCH.MERCH_SUB_DIV ,
MERCH.MERCH_NUM, ORDPRI.MERCH_NUM, ORDPRI.CUST_NUM,
BALANCE, SALES_NUM, PRIORITY_1
order by (select 1 )) as newRow# ,
MERCH.MERCH_NUM, ORDPRI.CUST_NUM,
BALANCE, SALES_NUM, PRIORITY_1
from (
-- main query goes here
select
ROW_NUMBER() OVER(
PARTITION BY ORD.DOC_NUM --, ORD.ITEM_NUM
ORDER BY ORD.DOC_NUM ASC --, ORD.ITEM_NUM
) AS Row#,
ORD.CUST_NUM, ORD.MERCH_NUM, ORD.MERCH_DIV as DIV, PRI.BALANCE,
pri.DOC_NUM, pri.SALES_NUM, pri.PRIORITY_1, pri.PRIORITY_2
from #ORDER as ORD
left join #PRIORITY as PRI on ORD.DOC_NUM = PRI.DOC_NUM
and ORD.SALES_NUMBER = PRI.SALES_NUM
where country_name in ('USA', 'INDIA')
) as ORDPRI
left join #MERCHANDISE as MERCH on ORDPRI.DIV = MERCH.DIV
and ORDPRI.MERCH_NUM = MERCH.MERCH_NUM
) as T
-- final filter to get distinct values
where newRow# = 1
Sample code here .. Hope this helps!!

Cursor? Loop? Aggregate up rows data along with row results

I have the following table & data
Table name = MyTable
Description | Partition | Total
------------|---------------|--------------
CASH | Reconciled | 25
CASH | Adjustm | 50
CASH | Balanc | 120
LOANS | Adjustm | 44
LOANS | Balanc | 32
CARDS | Adjustm | 81
CARDS | Balanc | 67
MTG | Adjustm | 14
MTG | Balanc | 92
The requirement is simple enough - it's a straight select from the table, but for each unique description, I need to sum up the totals of all the partitions, such that the user will see
Description | Partition | Total
------------|---------------|--------------
CASH | TOTAL | 195 <
CASH | Reconciled | 25
CASH | Adjustm | 50
CASH | Balanc | 120
LOANS | TOTAL | 76 <
LOANS | Adjustm | 44
LOANS | Balanc | 32
CARDS | TOTAL | 148 <
CARDS | Adjustm | 81
CARDS | Balanc | 67
MTG | TOTAL | 106 <
MTG | Adjustm | 14
MTG | Balanc | 92
It's a stored proc I'm writing - I don't have the option of pulling this into a MT to perform this so I need to perform it in the body of the stored proc. Am I looking at some while Loop or Cursor to provide the roll up I need, or is there another glaringly obvious and easy solution that I'm just not seeing? Aside from the roll up, it's a straight
select * from MyTable
DB is Sybase.
Thanks
You can do this by using the GROUPING SETS extension of the GROUP BY clause:
SELECT Description,
COALESCE(Parition, 'Total') AS Partition,
SUM(Total) AS Total
FROM MyTable
GROUP BY GROUPING SETS ((Description, Partition), (Description));
or you could use:
SELECT Description,
COALESCE(Parition, 'Total') AS Partition,
SUM(Total) AS Total
FROM MyTable
GROUP BY ROLLUP (Description, Partition);
Without ROLLUP, you can do this using UNION ALL:
SELECT Description,
Parition,
Total
FROM MyTable
UNION ALL
SELECT Description,
'Total' AS Partition,
SUM(Total) AS Total
FROM MyTable
GROUP BY Description;