Get total count of duplicates in column - sql

I need a query to count the total number of duplicates in a table, is there any way to do this?
If I have a table like this:
+------------+----------+
| item_name |quantity |
+------------+----------+
| Calculator | 89 |
| Notebooks | 40 |
| Pencil | 40 |
| Pens | 32 |
| Shirts | 29 |
| Shoes | 29 |
| Trousers | 29 |
+------------+----------+
I can't use SELECT COUNT(quantity) because it returns 2. (40 | 29)
How can I return 5? (40 | 40 | 29 | 29 | 29)

Using analytic functions:
WITH cte AS (
SELECT *, COUNT(*) OVER (PARTITION BY quantity) cnt
FROM yourTable
)
SELECT COUNT(*)
FROM cte
WHERE cnt > 1;

One method uses two levels of aggregation:
select sum(cnt)
from (select quantity, count(*) as cnt
from t
group by quantity
) t
where cnt > 1;
Interestingly, if you wanted "3" -- the number of rows that have duplicates, you could express this as:
select count(*) - count(distinct quantity)
from t;
But that is not what you are asking for.

Related

Count distinct customers over rolling window partition

My question is similar to redshift: count distinct customers over window partition but I have a rolling window partition.
My query looks like this but distinct within COUNT in Redshift is not supported
select p_date, seconds_read,
count(distinct customer_id) over (order by p_date rows between unbounded preceding and current row) as total_cumulative_customer
from table_x
My goal is to calculate total unique customer up to every date (hence rolling window).
I tried using the dense_rank() approach but it would simply fail since I cannot use window function like this
select p_date, max(total_cumulative_customer) over ()
(select p_date, seconds_read,
dense_rank() over (order by customer_id rows between unbounded preceding and current row) as total_cumulative_customer -- WILL FAIL HERE
from table_x
Any workaround or different approach would be helpful!
EDIT:
INPUT DATA sample
+------+----------+--------------+
| Cust | p_date | seconds_read |
+------+----------+--------------+
| 1 | 1-Jan-20 | 10 |
| 2 | 1-Jan-20 | 20 |
| 4 | 1-Jan-20 | 30 |
| 5 | 1-Jan-20 | 40 |
| 6 | 5-Jan-20 | 50 |
| 3 | 5-Jan-20 | 60 |
| 2 | 5-Jan-20 | 70 |
| 1 | 5-Jan-20 | 80 |
| 1 | 5-Jan-20 | 90 |
| 1 | 7-Jan-20 | 100 |
| 3 | 7-Jan-20 | 110 |
| 4 | 7-Jan-20 | 120 |
| 7 | 7-Jan-20 | 130 |
+------+----------+--------------+
Expected Output
+----------+--------------------------+------------------+--------------------------------------------+
| p_date | total_distinct_cum_cust | sum_seconds_read | Comment |
+----------+--------------------------+------------------+--------------------------------------------+
| 1-Jan-20 | 4 | 100 | total distinct cust = 4 i.e. 1,2,4,5 |
| 5-Jan-20 | 6 | 450 | total distinct cust = 6 i.e. 1,2,3,4,5,6 |
| 7-Jan-20 | 7 | 910 | total distinct cust = 6 i.e. 1,2,3,4,5,6,7 |
+----------+--------------------------+------------------+--------------------------------------------+
For this operation:
select p_date, seconds_read,
count(distinct customer_id) over (order by p_date rows between unbounded preceding and current row) as total_cumulative_customer
from table_x;
You can do pretty much what you want with two levels of aggregation:
select min_p_date,
sum(count(*)) over (order by min_p_date rows between unbounded preceding and current row) as running_distinct_customers
from (select customer_id, min(p_date) as min_p_date
from table_x
group by customer_id
) c
group by min_p_date;
Summing the seconds read as well is a bit tricky, but you can use the same idea:
select p_date,
sum(sum(seconds_read)) over (order by p_date rows between unbounded preceding and current row) as seconds_read,
sum(sum(case when seqnum = 1 then 1 else 0 end)) over (order by p_date rows between unbounded preceding and current row) as running_distinct_customers
from (select customer_id, p_date, seconds_read,
row_number() over (partition by customer_id order by p_date) as seqnum
from table_x
) c
group by min_p_date;
One workaround uses a subquery:
select p_date, seconds_read,
(
select count(distinct t1.customer_id)
from table_x t1
where t1.p_date <= t.p_date
) as total_cumulative_customer
from table_x t
I'd like to add that you can also accomplish this with an explicit self join which is, in my opinion, more straightforward and readable than the subquery approaches described in the other answers.
select
t1.p_date,
sum(t2.seconds_read) as sum_seconds_read,
count(distinct t2.customer_id) as distinct_cum_cust_totals
from
table_x t1
join
table_x t2
on
t2.date <= t1.date
group by
t1.date
Most query planners will reduce a correlated subquery like in the solutions above to an efficient join like this, so either solution is usually fine, but for the general case, I believe this is a better solution since some engines (like BigQuery) won't allow correlated subqueries and will force you to explicitly define the join in your query.

How to get Top 10 for a grouped column?

My data is a list of customers and products, and the cost for each product
Member Product Cost
Bob A123 $25
Bob A123 $25
Bob A123 $75
Joe A789 $50
Joe A789 $50
Bob C321 $50
Joe A123 $50
etc, etc, etc
My current query grabs each customer, product and cost, and also the total cost for that customer. It gives results like this:
Member Product Cost Total Cost
Bob A123 $125 $275
Bob A1433 $100 $275
Bob C321 $50 $275
Joe A123 $150 $250
Joe A789 $100 $250
How can I get the top 10 by Total Cost, not just the top 10 records overall? My query is:
SELECT a.Member
,a.Product
,SUM(a.Cost)
,(SELECT SUM(b.Cost) from MyTable b WHERE b.Member = a.Member) as 'Total Cost'
FROM MyTable a
GROUP BY a.Member
,a.Product
ORDER BY [Total Cost] DESC
If I do a SELECT TOP 10 it only gives me the first 10 rows. The actual Top 10 would end up being more like 40 or 50 rows.
Thanks!
Try this one.
SELECT tbl.member,
tbl.product,
Sum(tbl.cost) AS cost,
Max(stbl.totalcost) AS totalcost
FROM mytable tbl
INNER JOIN (SELECT member,
Sum(cost) AS totalcost,
Row_number() OVER (ORDER BY Sum(cost) DESC) AS rn
FROM mytable
GROUP BY member) stbl
ON stbl.member = tbl.member
WHERE stbl.rn <= 10
GROUP BY tbl.member, tbl.product
ORDER BY Max(stbl.rn)
Online Demo: http://sqlfiddle.com/#!18/87857/1/0
Table structure & Sample Data
CREATE TABLE mytable
(
member NVARCHAR(50),
product NVARCHAR(10),
cost INT
)
INSERT INTO mytable
VALUES ('Bob','A123','25'),
('Bob','A123','25'),
('Bob','A123','75'),
('Joe','A789','50'),
('Joe','A789','50'),
('Bob','C321','50'),
('Joe','A123','50'),
('Rock','A123','50'),
('Anord','A100','50'),
('Jack','A123','50'),
('Anord','A123','50'),
('Joe','A123','50'),
('Karma','A123','50'),
('Seetha','A123','50'),
('Aruna','A123','50'),
('Jake','A123','50'),
('Paul','A123','50'),
('Logan','A123','50'),
('Joe','A123','50');
Subquery - Total cost per customer
SELECT member,
Sum(cost) AS totalcost,
Row_number() OVER (ORDER BY Sum(cost) DESC) AS rn
FROM mytable
GROUP BY member
Subquery: Output
+---------+------------+----+
| member | totalcost | rn |
+---------+------------+----+
| Joe | 250 | 1 |
| Bob | 175 | 2 |
| Anord | 100 | 3 |
| Aruna | 50 | 4 |
| Jack | 50 | 5 |
| Jake | 50 | 6 |
| Karma | 50 | 7 |
| Logan | 50 | 8 |
| Paul | 50 | 9 |
| Rock | 50 | 10 |
| Seetha | 50 | 11 |
+---------+------------+----+
Record Count: 11
Main Query
SELECT tbl.member,
tbl.product,
Sum(tbl.cost) AS cost,
Max(stbl.totalcost) AS totalcost,
Max(stbl.rn) AS rn
FROM mytable tbl
INNER JOIN (SELECT member,
Sum(cost) AS totalcost,
Row_number() OVER (ORDER BY Sum(cost) DESC) AS rn
FROM mytable
GROUP BY member) stbl
ON stbl.member = tbl.member
GROUP BY tbl.member, tbl.product
ORDER BY Max(stbl.rn)
Main Query: Output
+---------+----------+-------+------------+----+
| member | product | cost | totalcost | rn |
+---------+----------+-------+------------+----+
| Joe | A123 | 150 | 250 | 1 |
| Joe | A789 | 100 | 250 | 1 |
| Bob | C321 | 50 | 175 | 2 |
| Bob | A123 | 125 | 175 | 2 |
| Anord | A100 | 50 | 100 | 3 |
| Anord | A123 | 50 | 100 | 3 |
| Aruna | A123 | 50 | 50 | 4 |
| Jack | A123 | 50 | 50 | 5 |
| Jake | A123 | 50 | 50 | 6 |
| Karma | A123 | 50 | 50 | 7 |
| Logan | A123 | 50 | 50 | 8 |
| Paul | A123 | 50 | 50 | 9 |
| Rock | A123 | 50 | 50 | 10 |
| Seetha | A123 | 50 | 50 | 11 |
+---------+----------+-------+------------+----+
Record Count: 14
You can use rank() and partition by but you may also need to use a window function:
with temp as (
SELECT a.Member
,a.Product
,SUM(a.Cost)
,(SELECT SUM(b.Cost) from MyTable b WHERE b.Member = a.Member)
as 'Total Cost'
FROM MyTable a
GROUP BY a.Member,a.Product
)
select a.*, rank() over (partition by member order by [Total Cost]
desc) as rank
from temp a
order by rank desc limit 10
You can use dense_rank() with apply :
select mt.*
from (select mt.*, sum(mt.Cost) over (partition by Product, Member) as Cost,
dense_rank() over (order by TotalCost desc) as seq
from MyTable mt cross apply
(select sum(mt1.Cost) as TotalCost
from MyTable mt1
whete mt1.member = mt.member
) mt1
) mt
where mt.seq <= 10;
Use a subquery to get the TOP 10 total costs and join to your query:
SELECT
t.Member, t.Product, t.Cost, g.[Total Cost]
FROM (
SELECT Member, Product, SUM(Cost) as Cost
FROM MyTable
GROUP BY Member, Product
) t INNER JOIN (
SELECT TOP (10) Member, SUM(Cost) as [Total Cost]
FROM MyTable
GROUP BY Member
ORDER BY [Total Cost] DESC
) g on g.Member = t.Member
ORDER BY g.[Total Cost] DESC, t.Member, t.Cost DESC
Depending on your requirement you may use:
SELECT TOP (10) WITH TIES...
You don't have to select from the same table twice. Use SUM OVER to get the total per member.
Use DENSE_RANK to get the totals ranked (highest total = 1, second highest total = 2, ...).
Use TOP(10) WITH TIES to get all rows having the top ten totals.
The query:
select top(10) with ties *
from
(
select
member,
product,
sum(cost),
sum(sum(cost)) over (partition by member) as total_cost
from mytable
group by member, product
) results
order by dense_rank() over (order by total_cost) desc;
If you want exactly 10 customers even when there are ties, then a slight variation on Thorsten's method will work:
select top(10) with ties t.*
from (select member, product, sum(cost) as cost,
sum(sum(cost)) over (partition by member) as total_cost
from t
group by member, product
) t
order by dense_rank() over (order by total_cost) desc, member;
The addition of member as a second key may seem like a minor addition. However, it ensures that the dense_rank() is unique for each member (of course ordered by total_cost). This, in turn, guarantees that you get exactly 10 customers.
You can use dense_rank() like below. Worked in SQL Server 2016. Change the value of limit variable to filter number of rows returned.
declare #limit int = 10;
SELECT *
FROM
(
select x.*,rn = dense_rank() over (order by x.TotalCost desc)
from (
SELECT a.Member
,a.Product
,SUM(a.Cost)
,(SELECT SUM(b.Cost) from MyTable b WHERE b.Member = a.Member) as 'TotalCost'
FROM MyTable a
GROUP BY a.Member
,a.Product
ORDER BY [Total Cost] DESC
) x
) y
where rn <= #limit
order by rn

sql query to find unique records

I am new to sql and need your help to achieve the below , I have tried using group and count functions but I am getting all the rows in the unique group which are duplicated.
Below is my source data.
CDR_ID,TelephoneNo,Call_ID,call_Duration,Call_Plan
543,xxx-23,12,12,500
543,xxx-23,12,12,501
543,xxx-23,12,12,510
643,xxx-33,11,17,700
343,xxx-33,11,17,700
766,xxx-74,32,1,300
766,xxx-74,32,1,300
877,xxx-32,12,2,300
877,xxx-32,12,2,300
877,xxx-32,12,2,301
Please note :-the source has multiple combinations of unique records, so when I do the count the unique set is not appearing as count =1
example :- the below data in source have 60 records for each combination
877,xxx-32,12,2,300 -- 60 records
877,xxx-32,12,2,301 -- 60 records
I am trying to get the unique unique records, but the duplicate records are also getting in
Below are the rows which should come up in the unique group. i.e. there will be multiple call_Plans for the same combinations of CDR_ID,TelephoneNo,Call_ID,call_Duration. I want to read records for which there is only one call plan for each unique combination of CDR_ID,TelephoneNo,Call_ID,call_Duration,
CDR_ID,TelephoneNo,Call_ID,call_Duration,Call_Plan
643,xxx-33,11,17,700
343,xxx-33,11,17,700
766,xxx-74,32,1,300
Please advice on this.
Thanks and Regards
To do more complex groupings you could also use a Common Table Expression/Derived Table along with windowed functions:
declare #t table(CDR_ID int,TelephoneNo nvarchar(20),Call_ID int,call_Duration int,Call_Plan int);
insert into #t values (543,'xxx-23',12,12,500),(543,'xxx-23',12,12,501),(543,'xxx-23',12,12,510),(643,'xxx-33',11,17,700),(343,'xxx-33',11,17,700),(766,'xxx-74',32,1,300),(766,'xxx-74',32,1,300),(877,'xxx-32',12,2,300),(877,'xxx-32',12,2,300),(877,'xxx-32',12,2,301);
with cte as
(
select CDR_ID
,TelephoneNo
,Call_ID
,call_Duration
,Call_Plan
,count(*) over (partition by CDR_ID,TelephoneNo,Call_ID,call_Duration) as c
from (select distinct * from #t) a
)
select *
from cte
where c = 1;
Output:
+--------+-------------+---------+---------------+-----------+---+
| CDR_ID | TelephoneNo | Call_ID | call_Duration | Call_Plan | c |
+--------+-------------+---------+---------------+-----------+---+
| 343 | xxx-33 | 11 | 17 | 700 | 1 |
| 643 | xxx-33 | 11 | 17 | 700 | 1 |
| 766 | xxx-74 | 32 | 1 | 300 | 1 |
+--------+-------------+---------+---------------+-----------+---+
using not exists()
select distinct *
from t
where not exists (
select 1
from t as i
where i.cdr_id = t.cdr_id
and i.telephoneno = t.telephoneno
and i.call_id = t.call_id
and i.call_duration = t.call_duration
and i.call_plan <> t.call_plan
)
rextester demo: http://rextester.com/RRNNE20636
returns:
+--------+-------------+---------+---------------+-----------+-----+
| cdr_id | TelephoneNo | Call_id | call_Duration | Call_Plan | cnt |
+--------+-------------+---------+---------------+-----------+-----+
| 343 | xxx-33 | 11 | 17 | 700 | 1 |
| 643 | xxx-33 | 11 | 17 | 700 | 1 |
| 766 | xxx-74 | 32 | 1 | 300 | 1 |
+--------+-------------+---------+---------------+-----------+-----+
Basically you should try this:
SELECT A.CDR_ID, A.TelephoneNo, A.Call_ID, A.call_Duration, A.Call_Plan
FROM YOUR_TABLE A
INNER JOIN (SELECT CDR_ID,TelephoneNo,Call_ID,call_Duration
FROM YOUR_TABLE
GROUP BY CDR_ID,TelephoneNo,Call_ID,call_Duration
HAVING COUNT(*)=1
) B ON A.CDR_ID= B.CDR_ID AND A.TelephoneNo=B.TelephoneNo AND A.Call_ID=B.Call_ID AND A.call_Duration=B.call_Duration
You can do a shorter query using Windows Function COUNT(*) OVER ...
Below query will provide you the result
SELECT CDR_ID,TelephoneNo,Call_ID,call_Duration,Call_Plan, COUNT(*)
FROM TABLE_NAME GROUP BY CDR_ID,TelephoneNo,Call_ID,call_Duration,Call_Plan
HAVING COUNT(*) < 2;
It gives you with the count as well. If not required you can remove it.
Select *, count(CDR_ID)
from table
group by CDR_ID, TelephoneNo, Call_ID, call_Duration, Call_Plan
having count(CDR_ID) = 1

Add cumulative total sum over many columns in Postgres

My table is like this:
+----+--------+--------+--------+---------+
| id | type | c1 | c2 | c3 |
+----+--------+--------+--------+---------+
| a | 0 | 10 | 10 | 10 |
| a | 0 | 0 | 10 | |
| a | 0 | 50 | 10 | |
| c | 0 | | 10 | 20 |
| c | 0 | | 10 | |
+----+--------+--------+--------+---------+
I need to the output like this:
+----+---------+--------+--------+---------+
| id | type | c1 | c2 | c3 |
+----+---------+--------+--------+---------+
| a | 0 | 10 | 10 | 10 |
| a | 0 | 0 | 10 | |
| a | 0 | 50 | 10 | |
| c | 0 | | 10 | 20 |
| c | 0 | | 10 | |
+----+---------+--------+--------+---------+
|total | 0 | 60 | 50 | 30 |
+------------------------------------------+
|cumulative| 0 | 60 | 110 | 140 |
+------------------------------------------+
My query so far:
WITH res_1 AS
(SELECT id,c1,c3,c3 FROM cloud10k.dash_reportcard),
res_2 AS
(SELECT 'TOTAL'::VARCHAR, SUM(c1),SUM(c2),SUM(c3) FROM cloud10k.dash_reportcard)
SELECT * FROM res_1
UNION ALL
SELECT * FROM res_2;
It produces a sum total per column.
How can I add the cumulative total sum?
Note: the demo has 3 data columns, my actual table has more than 250.
It would be very tedious and increasingly inefficient to list 250 columns over and over for the sum of columns - an O(n²) problem in disguise. Effectively, you want the equivalent of a window-function to calculate the running total over columns instead of rows.
You can:
Transform the row to a set ("unpivot").
Run the window aggregate function sum() OVER (...).
Transform the set back to a row ("pivot").
WITH total AS (
SELECT 'total'::text AS id, 0 AS type
, sum(c1) AS s1, sum(c2) AS s2, sum(c3) AS s3 -- more ...
FROM cloud10k.dash_reportcard
)
TABLE cloud10k.dash_reportcard
UNION ALL
TABLE total
UNION ALL
SELECT 'cumulative', 0, a[1], a[2], a[3] -- more ...
FROM (
SELECT ARRAY(
SELECT sum(v.s) OVER (ORDER BY rn)
FROM total
, LATERAL (VALUES (1, s1), (2, s2), (3, s3)) v(rn, s) -- more ...
)::int[] AS a
) sub;
See:
What is the difference between LATERAL JOIN and a subquery in PostgreSQL?
SELECT DISTINCT on multiple columns
The last step could also be done with crosstab() from the tablefunc module, but for this simple case it's simpler to just aggregate into an array and break out elements to a separate columns in the outer SELECT.
Alternative for Postgres 9.1
Same as above, but:
...
UNION ALL
SELECT 'cumulative'::text, 0, a[1], a[2], a[3] -- more ...
FROM (
SELECT ARRAY(
SELECT sum(v.s) OVER (ORDER BY rn)
FROM (
SELECT row_number() OVER (), s
FROM unnest((SELECT ARRAY[s1, s2, s3] FROM total)) s -- more ...
) v(rn, s)
)::int[] AS a
) sub;
Consider:
PostgreSQL unnest() with element number
db<>fiddle here - demonstrating both
Old sqlfiddle
Just add another CTE to get cumulative row:
WITH res_1 AS
(SELECT id,c1,c2,c3
FROM dash_reportcard),
res_2 AS
(SELECT 'TOTAL'::VARCHAR, SUM(c1) AS sumC1,
SUM(c2) AS sumC2, SUM(c3) AS sumC3
FROM dash_reportcard),
res_3 AS
(SELECT 'CUMULATIVE'::VARCHAR, sumC1,
sumC2+sumC1, sumC1+sumC2+sumC3
FROM res_2)
SELECT * FROM res_1
UNION ALL
SELECT * FROM res_2
UNION ALL
SELECT * FROM res_3;
Demo here
WITH total AS (
SELECT 'TOTAL'::VARCHAR, SUM(c1) AS sumc1, SUM(c2) AS sumc2, SUM(c3) AS sumc3
FROM cloud10k.dash_reportcard
), cum_total AS (
SELECT 'CUMULATIVE'::varchar, sumc1, sumc1+sumc2, sumc1+sumc2+sumc3
FROM total
)
SELECT id, c1, c2, c3 FROM cloud10k.dash_reportcard
UNION ALL
SELECT * FROM total
UNION ALL
SELECT * FROM cum_total;

how to get median for every record?

There's no median function in sql server, so I'm using this wonderful suggestion:
https://stackoverflow.com/a/2026609/117700
this computes the median over an entire dataset, but I need the median per record.
My dataset is:
+-----------+-------------+
| client_id | TimesTested |
+-----------+-------------+
| 214220 | 1 |
| 215425 | 1 |
| 212839 | 4 |
| 215249 | 1 |
| 210498 | 3 |
| 110655 | 1 |
| 110655 | 1 |
| 110655 | 12 |
| 215425 | 4 |
| 100196 | 1 |
| 110032 | 1 |
| 110032 | 1 |
| 101944 | 3 |
| 101232 | 2 |
| 101232 | 1 |
+-----------+-------------+
here's the query I am using:
select client_id,
(
SELECT
(
(SELECT MAX(TimesTested ) FROM
(SELECT TOP 50 PERCENT t.TimesTested
FROM counted3 t
where t.timestested>1
and CLIENT_ID=t.CLIENT_ID
ORDER BY t.TimesTested ) AS BottomHalf)
+
(SELECT MIN(TimesTested ) FROM
(SELECT TOP 50 PERCENT t.TimesTested
FROM counted3 t
where t.timestested>1
and CLIENT_ID=t.CLIENT_ID
ORDER BY t.TimesTested DESC) AS TopHalf)
) / 2 AS Median
) TotalAvgTestFreq
from counted3
group by client_id
but it is giving my funny data:
+-----------+------------------+
| client_id | median???????????|
+-----------+------------------+
| 100007 | 84 |
| 100008 | 84 |
| 100011 | 84 |
| 100014 | 84 |
| 100026 | 84 |
| 100027 | 84 |
| 100028 | 84 |
| 100029 | 84 |
| 100042 | 84 |
| 100043 | 84 |
| 100071 | 84 |
| 100072 | 84 |
| 100074 | 84 |
+-----------+------------------+
i can i get the median for every client_id ?
I am currently trying to use this awesome query from Aaron's site:
select c3.client_id,(
SELECT AVG(1.0 * TimesTested ) median
FROM
(
SELECT o.TimesTested ,
rn = ROW_NUMBER() OVER (ORDER BY o.TimesTested ), c.c
FROM counted3 AS o
CROSS JOIN (SELECT c = COUNT(*) FROM counted3) AS c
where count>1
) AS x
WHERE rn IN ((c + 1)/2, (c + 2)/2)
) a
from counted3 c3
group by c3.client_id
unfortunately, as Richardthekiwi points out:
it's for a single median whereas this question is about a median
per-partition
i would like to know how i can join it on counted3 to get the median per partition?>
Note: If testFreq is an int or bigint type, you need to CAST it before taking an average, otherwise you'll get integer division, e.g. (2+5)/2 => 3 if 2 and 5 are the median records - e.g. AVG(Cast(testfreq as float)).
select client_id, avg(testfreq) median_testfreq
from
(
select client_id,
testfreq,
rn=row_number() over (partition by CLIENT_ID
order by testfreq),
c=count(testfreq) over (partition by CLIENT_ID)
from tbk
where timestested>1
) g
where rn in (round(c/2,0),c/2+1)
group by client_id;
The median is found either as the central record in an ODD number of rows, or the average of the two central records in an EVEN number of rows. This is handled by the condition rn in (round(c/2,0),c/2+1) which picks either the one or two records required.
try this:
select client_id,
(
SELECT
(
(SELECT MAX(testfreq) FROM
(SELECT TOP 50 PERCENT t.testfreq
FROM counted3 t
where t.timestested>1
and c3.CLIENT_ID=t.CLIENT_ID
ORDER BY t.testfreq) AS BottomHalf)
+
(SELECT MIN(testfreq) FROM
(SELECT TOP 50 PERCENT t.testfreq
FROM counted3 t
where t.timestested>1
and c3.CLIENT_ID=t.CLIENT_ID
ORDER BY t.testfreq DESC) AS TopHalf)
) / 2 AS Median
) TotalAvgTestFreq
from counted3 c3
group by client_id
I added the c3 alias to the outer CLIENT_ID references and the outer table.