Using tuples in SQL "IN" clause - sql

I have a table containing the fields group_id and group_type and I want to query the table for all the records having any tuple (group id, group type) from a list of tuples. For example, I want to be able to do something like:
SELECT *
FROM mytable
WHERE (group_id, group_type) IN (("1234-567", 2), ("4321-765", 3), ("1111-222", 5))
A very similar question is already asked at: using tuples in sql in clause , but the solution proposed there presumes the tuple list is to be fetched from another table. This doesn't work in my case is the tuple values are hard coded.
One solution is to use string concatenation:
SELECT *
FROM mytable
WHERE group_id + STR(group_type, 1) IN ("1234-5672", "4321-7653", "1111-2225")
But the problem is that the table is quite big and doing a string concatenation and conversion for each record would be very expensive.
Any suggestion?

Given a very minor tweak (replace double quotes with single and add the VALUES keyword), your proposed syntax is valid Standard SQL-92 syntax i.e.
SELECT *
FROM mytable
WHERE (group_id, group_type) IN (
VALUES ('1234-567', 2),
('4321-765', 3),
('1111-222', 5)
);
Sadly, MSFT have not added it to SQL Server and consider it an 'unplanned' feature.
FWIW PostgreSQL and Sqlite are examples of SQL products that support this syntax.

In SQL Server 2008 you can do like this:
select *
from mytable as T
where exists (select *
from (values ('1234-567', 2),
('4321-765', 3),
('1111-222', 5)) as V(group_id, group_type)
where T.group_id = V.group_id and
T.group_type = V.group_type
)

EDIT: this is a dated answer, although it was the accepted answer in 2011, other answers with more upvotes reflect more recent approaches.
Why not construct the OR statements?
SELECT *
FROM mytable
WHERE (group_id = '1234-567' and group_type = 2)
OR (group_id = '4321-765' and group_type = 3)
OR (group_id = '1111-222' and group_type = 5)
Granted, it doesn't look as nice and neat as your concept example but it will do the job (and if you IN with tuples did exist, it would implement it exactly the same way under the covers most likely.

You can use a common table expression to pretend that these tuples are in another table:
;WITH Tuples as (
select '1234-567' as group_id, 2 as group_type union all
select '4321-765', 3 union all
select '1111-222', 5
)
SELECT * /* TODO - Pick appropriate columns */
from mytable m where exists (
select * from Tuples t
where m.group_id = t.group_id and m.group_type = t.group_type)

Using that solution, this should work:
SELECT *
FROM mytable m
WHERE EXISTS (
SELECT * FROM (
SELECT "1234-567" group_id, 2 group_type UNION ALL
SELECT "4321-765", 3 UNION ALL
SELECT "1111-222", 5) [t]
WHERE m.group_id = t.group_id AND m.group_type = t.group_type)
BTW, you should probably use a CTE to create that inner table.

I haven't seen this yet, but something like this should work
SELECT * FROM AgeGroup ag JOIN
(VALUES
('18-24', 18, 24),
('25-34 ', 25, 39),
('35-44 ', 35, 49),
('45-54 ', 45, 59),
('55-64 ', 55, 69),
('65+ ', 65, 299)
) AS x (agegroup, minage, maxage)
ON ag.age_group = x.agegroup
AND ag.min_age=x.minage
AND ag.max_age=x.maxage

Here is another tuple solution using a join:
SELECT
*
FROM mytable m
JOIN
(
SELECT "1234-567" group_id, 2 group_type
UNION ALL SELECT "4321-765", 3
UNION ALL SELECT "1111-222", 5
) [t]
ON m.group_id = t.group_id
AND m.group_type = t.group_type

I had a similar problem but my tuple collection was dynamic - it was sent over to the SQL Server in a query parameter. I came up with the following solution:
Pass a tuple as an XML:
DECLARE #tuplesXml xml = '<tuples><tuple group-id="1234-567" group-type="2"/><tuple group-id="4321-765" group-type="3"/></tuples>';
Inner join the table that you want to filter with the XML nodes:
SELECT t.* FROM mytable t
INNER JOIN #tuplesXml.nodes('/tuples/tuple') AS tuple(col)
ON tuple.col.value('./#group-id', 'varchar(255)') = t.group_id
AND tuple.col.value('./#group-type', 'integer') = t.group_type
It seems to work fine in my situation which is a bit more complex than the one described in the question.
Keep in mind that it is necessary to use t.* instead of * and the table returned from nodes method needs to be aliased (it's tuple(col) in this case).

select * from table_name where 1=1 and (column_a, column_b) in ((28,1),(25,1))

Related

ORACLE SQL Pivot Issue

I am trying to pivot a sql result. I need to do this all in the one query. The below is telling me invalid identifier for header_id. I am using an Oracle database.
Code
Select * From (
select ppd.group_id,g.group_name, ct.type_desc,ht.hos_cat_descr
from item_history ih, item ci, contract ppd,
header ch, group g, cd_std_type ct, cd_hos h,
cd_std_hospital_cat ht
where ih.item_id = ci.item_id
and ih.header_id = ch.header_id
and ci.hos_id = h.hos_id
and ih.item_id = ci.item_id
and ch.user_no = ppd.user_no
and ppd.group_id = g.group_id
and ch.header_type = ct.header_type_id
and ci.hos_id = h.hos_id
and h.cat_id = ht.cat_id
)
Pivot
(
count(distinct header_id) as Volume
For hos_cat_descr IN ('A')
)
Your inner query doesn't have header_id in its projection, so the pivot clause doesn't have that column available to use. You need to add it, either as:
Select * From (
select ppd.group_id,g.group_name, ct.type_desc,ht.hos_cat_descr,ih.header_id
---------------------------------------------------------------^^^^^^^^^^^^^
from ...
)
Pivot
(
count(distinct header_id) as Volume
For hos_cat_descr IN ('A')
)
or:
Select * From (
select ppd.group_id,g.group_name, ct.type_desc,ht.hos_cat_descr,ch.header_id
---------------------------------------------------------------^^^^^^^^^^^^^
from ...
)
Pivot
(
count(distinct header_id) as Volume
For hos_cat_descr IN ('A')
)
It doesn't really matter which, since those two values must be equal as they are part of a join condition.
You could achieve the same thing with simpler aggregation instead of a pivot, but presumably you are doing more work in the pivot really.

can you set environmental variable to round numbers in SQL Server query?

I have a bunch of calculations in a SQL Server 2012 query, kind of like:
select T1_month
,a.some_value, b.value_to_compare,(select (some_value - value_to_compare)/value_to_compare*100 where value_to_compare != 0) percent_diff
from
(select T1_month
,sum(some_value) some_value
from T1
group by T1_month) a
join
(select T2_month
,sum(value_to_compare) value_to_compare
from T2
group by T2_month) b
on a.T1_month = b.T2_month
order by T1_month;
I used a round function here, but I need to add a lot more similar lines. Is there any way to just set a global variable to round all columns in one shot? Otherwise it's just a lot of leg work.
round((some_value - value_to_compare)/value_to_compare*100, 2)
I'll be pasting to Excel but it would be nice to round it in the source without having to use the round function so many times.
Here is a workaround, no need to type Round() function for every line:
SELECT 9.0 / 7 * 100 AS Direct_Query
CREATE TABLE #Table_1 (
[Column_1] [NUMERIC](18, 2) NULL
)
INSERT INTO #Table_1
SELECT 9.0/7 * 100
SELECT Column_1 AS Temp_Table_Formatted FROM #Table_1
Edit:
If you cannot use temp table, you can wrap your query with CTE then just Round() the result of CTE, which is pretty easy to do with help of a multiline editor like Sublime Text or VisualStudio Code:
WITH CTE_Result_To_Format
AS (
SELECT
T1_month
,a.some_value
,b.value_to_compare
,(
SELECT
(some_value - value_to_compare) / value_to_compare * 100
WHERE value_to_compare != 0
)
percent_diff
FROM (
SELECT
T1_month
,SUM(some_value) some_value
FROM T1
GROUP BY
T1_month
) a
JOIN (
SELECT
T2_month
,SUM(value_to_compare) value_to_compare
FROM T2
GROUP BY
T2_month
) b
ON a.T1_month = b.T2_month
)
SELECT
r.T1_month
,ROUND(r.some_value, 2) AS some_value
,ROUND(r.value_to_compare, 2) AS value_to_compare
,ROUND(r.percent_diff, 2) AS percent_diff
FROM CTE_Result_To_Format r
ORDER BY
r.T1_month

Decode function for selecting Tables

Is there any (same or alternate) way of using Table names with the DECODE function.
Something like this:
SELECT * FROM
DECODE(FLAG,1,TABLE1,2,TABLE2,TABLE3)
In SQL the table names cannot be determined at the run time. They need to be provided at the compile time.
If you need to provide the table names dynamically, you will need to use Dynamic SQL.
Hope it Helps
Vishad
select * from table1 where decode(flag, 1,1, 0) = 1 union
select * from table2 where decode(flag, 2,1, 0) = 1 union
select * from table3 where decode(flag, 1,0, 2,0, 1) = 1;

Reuse subquery result in WHERE-Clause for INSERT

i am using Microsoft SQL Server 2008
i would like to save the result of a subquery to reuse it in a following subquery.
Is this possible?
What is best practice to do this? (I am very new to SQL)
My query looks like:
INSERT INTO [dbo].[TestTable]
(
[a]
,[b]
)
SELECT
(
SELECT TOP 1 MAT_WS_ID
FROM #TempTableX AS X_ALIAS
WHERE OUTERBASETABLE.LT_ALL_MATERIAL = X_ALIAS.MAT_RM_NAME
)
,(
SELECT TOP 1 MAT_WS_NAME
FROM #TempTableY AS Y_ALIAS
WHERE Y_ALIAS.MAT_WS_ID = MAT_WS_ID
--(
--SELECT TOP 1 MAT_WS_ID
--FROM #TempTableX AS X_ALIAS
--WHERE OUTERBASETABLE.LT_ALL_MATERIAL = X_ALIAS.MAT_RM_NAME
--)
)
FROM [dbo].[LASERTECHNO] AS OUTERBASETABLE
My question is:
Is this correct what i did.
I replaced the second SELECT Statement in the WHERE-Clause for [b] (which is commented out and exactly the same as for [a]), with the result of the first SELECT Statement of [a] (=MAT_WS_ID).
It seems to give the right results.
But i dont understand why!
I mean MAT_WS_ID is part of both temporary tables X_ALIAS and Y_ALIAS.
So in the SELECT statement for [b], in the scope of the [b]-select-query, MAT_WS_ID could only be known from the Y_ALIAS table. (Or am i wrong, i am more a C++, maybe the scope things in SQL and C++ are totally different)
I just wannt to know what is the best way in SQL Server to reuse an scalar select result.
Or should i just dont care and copy the select for every column and the sql server optimizes it by its own?
One approach would be outer apply:
SELECT mat.MAT_WS_ID
, (
SELECT TOP 1 MAT_WS_NAME
FROM #TempTableY AS Y_ALIAS
WHERE Y_ALIAS.MAT_WS_ID = mat.MAT_WS_ID
)
FROM [dbo].[LASERTECHNO] AS OUTERBASETABLE
OUTER APPLY
(
SELECT TOP 1 MAT_WS_ID
FROM #TempTableX AS X_ALIAS
WHERE OUTERBASETABLE.LT_ALL_MATERIAL = X_ALIAS.MAT_RM_NAME
) as mat
You could rank rows in #TempTableX and #TempTableY partitioning them by MAT_RM_NAME in the former and by MAT_WS_ID in the latter, then use normal joins with filtering by rownum = 1 in both tables (rownum being the column containing the ranking numbers in each of the two tables):
WITH x_ranked AS (
SELECT
*,
rownum = ROW_NUMBER() OVER (PARTITION BY MAT_RM_NAME ORDER BY (SELECT 1))
FROM #TempTableX
),
y_ranked AS (
SELECT
*,
rownum = ROW_NUMBER() OVER (PARTITION BY MAT_WS_ID ORDER BY (SELECT 1))
FROM #TempTableY
)
INSERT INTO dbo.TestTable (a, b)
SELECT
x.MAT_WS_ID,
y.MAT_WS_NAME
FROM dbo.LASERTECHNO t
LEFT JOIN x_ranked x ON t.LT_ALL_MATERIAL = x.MAT_RM_NAME AND x.rownum = 1
LEFT JOIN y_ranked y ON x.MAT_WS_ID = y.MAT_WS_ID AND y.rownum = 1
;
The ORDER BY (SELECT 1) bit is a trick to specify an indeterminate ordering, which, accordingly, would result in indeterminate rownum = 1 rows picked by the query. That is to more or less duplicate your TOP 1 without an explicit order, but I would recommend you to specify a more sensible ORDER BY clause to make the results more predictable.

Simple way to calculate median with MySQL

What's the simplest (and hopefully not too slow) way to calculate the median with MySQL? I've used AVG(x) for finding the mean, but I'm having a hard time finding a simple way of calculating the median. For now, I'm returning all the rows to PHP, doing a sort, and then picking the middle row, but surely there must be some simple way of doing it in a single MySQL query.
Example data:
id | val
--------
1 4
2 7
3 2
4 2
5 9
6 8
7 3
Sorting on val gives 2 2 3 4 7 8 9, so the median should be 4, versus SELECT AVG(val) which == 5.
In MariaDB / MySQL:
SELECT AVG(dd.val) as median_val
FROM (
SELECT d.val, #rownum:=#rownum+1 as `row_number`, #total_rows:=#rownum
FROM data d, (SELECT #rownum:=0) r
WHERE d.val is NOT NULL
-- put some where clause here
ORDER BY d.val
) as dd
WHERE dd.row_number IN ( FLOOR((#total_rows+1)/2), FLOOR((#total_rows+2)/2) );
Steve Cohen points out, that after the first pass, #rownum will contain the total number of rows. This can be used to determine the median, so no second pass or join is needed.
Also AVG(dd.val) and dd.row_number IN(...) is used to correctly produce a median when there are an even number of records. Reasoning:
SELECT FLOOR((3+1)/2),FLOOR((3+2)/2); -- when total_rows is 3, avg rows 2 and 2
SELECT FLOOR((4+1)/2),FLOOR((4+2)/2); -- when total_rows is 4, avg rows 2 and 3
Finally, MariaDB 10.3.3+ contains a MEDIAN function
I just found another answer online in the comments:
For medians in almost any SQL:
SELECT x.val from data x, data y
GROUP BY x.val
HAVING SUM(SIGN(1-SIGN(y.val-x.val))) = (COUNT(*)+1)/2
Make sure your columns are well indexed and the index is used for filtering and sorting. Verify with the explain plans.
select count(*) from table --find the number of rows
Calculate the "median" row number. Maybe use: median_row = floor(count / 2).
Then pick it out of the list:
select val from table order by val asc limit median_row,1
This should return you one row with just the value you want.
I found the accepted solution didn't work on my MySQL install, returning an empty set, but this query worked for me in all situations that I tested it on:
SELECT x.val from data x, data y
GROUP BY x.val
HAVING SUM(SIGN(1-SIGN(y.val-x.val)))/COUNT(*) > .5
LIMIT 1
Unfortunately, neither TheJacobTaylor's nor velcrow's answers return accurate results for current versions of MySQL.
Velcro's answer from above is close, but it does not calculate correctly for result sets with an even number of rows. Medians are defined as either 1) the middle number on odd numbered sets, or 2) the average of the two middle numbers on even number sets.
So, here's velcro's solution patched to handle both odd and even number sets:
SELECT AVG(middle_values) AS 'median' FROM (
SELECT t1.median_column AS 'middle_values' FROM
(
SELECT #row:=#row+1 as `row`, x.median_column
FROM median_table AS x, (SELECT #row:=0) AS r
WHERE 1
-- put some where clause here
ORDER BY x.median_column
) AS t1,
(
SELECT COUNT(*) as 'count'
FROM median_table x
WHERE 1
-- put same where clause here
) AS t2
-- the following condition will return 1 record for odd number sets, or 2 records for even number sets.
WHERE t1.row >= t2.count/2 and t1.row <= ((t2.count/2) +1)) AS t3;
To use this, follow these 3 easy steps:
Replace "median_table" (2 occurrences) in the above code with the name of your table
Replace "median_column" (3 occurrences) with the column name you'd like to find a median for
If you have a WHERE condition, replace "WHERE 1" (2 occurrences) with your where condition
I propose a faster way.
Get the row count:
SELECT CEIL(COUNT(*)/2) FROM data;
Then take the middle value in a sorted subquery:
SELECT max(val) FROM (SELECT val FROM data ORDER BY val limit #middlevalue) x;
I tested this with a 5x10e6 dataset of random numbers and it will find the median in under 10 seconds.
Install and use this mysql statistical functions: http://www.xarg.org/2012/07/statistical-functions-in-mysql/
After that, calculate median is easy:
SELECT median(val) FROM data;
A comment on this page in the MySQL documentation has the following suggestion:
-- (mostly) High Performance scaling MEDIAN function per group
-- Median defined in http://en.wikipedia.org/wiki/Median
--
-- by Peter Hlavac
-- 06.11.2008
--
-- Example Table:
DROP table if exists table_median;
CREATE TABLE table_median (id INTEGER(11),val INTEGER(11));
COMMIT;
INSERT INTO table_median (id, val) VALUES
(1, 7), (1, 4), (1, 5), (1, 1), (1, 8), (1, 3), (1, 6),
(2, 4),
(3, 5), (3, 2),
(4, 5), (4, 12), (4, 1), (4, 7);
-- Calculating the MEDIAN
SELECT #a := 0;
SELECT
id,
AVG(val) AS MEDIAN
FROM (
SELECT
id,
val
FROM (
SELECT
-- Create an index n for every id
#a := (#a + 1) mod o.c AS shifted_n,
IF(#a mod o.c=0, o.c, #a) AS n,
o.id,
o.val,
-- the number of elements for every id
o.c
FROM (
SELECT
t_o.id,
val,
c
FROM
table_median t_o INNER JOIN
(SELECT
id,
COUNT(1) AS c
FROM
table_median
GROUP BY
id
) t2
ON (t2.id = t_o.id)
ORDER BY
t_o.id,val
) o
) a
WHERE
IF(
-- if there is an even number of elements
-- take the lower and the upper median
-- and use AVG(lower,upper)
c MOD 2 = 0,
n = c DIV 2 OR n = (c DIV 2)+1,
-- if its an odd number of elements
-- take the first if its only one element
-- or take the one in the middle
IF(
c = 1,
n = 1,
n = c DIV 2 + 1
)
)
) a
GROUP BY
id;
-- Explanation:
-- The Statement creates a helper table like
--
-- n id val count
-- ----------------
-- 1, 1, 1, 7
-- 2, 1, 3, 7
-- 3, 1, 4, 7
-- 4, 1, 5, 7
-- 5, 1, 6, 7
-- 6, 1, 7, 7
-- 7, 1, 8, 7
--
-- 1, 2, 4, 1
-- 1, 3, 2, 2
-- 2, 3, 5, 2
--
-- 1, 4, 1, 4
-- 2, 4, 5, 4
-- 3, 4, 7, 4
-- 4, 4, 12, 4
-- from there we can select the n-th element on the position: count div 2 + 1
If MySQL has ROW_NUMBER, then the MEDIAN is (be inspired by this SQL Server query):
WITH Numbered AS
(
SELECT *, COUNT(*) OVER () AS Cnt,
ROW_NUMBER() OVER (ORDER BY val) AS RowNum
FROM yourtable
)
SELECT id, val
FROM Numbered
WHERE RowNum IN ((Cnt+1)/2, (Cnt+2)/2)
;
The IN is used in case you have an even number of entries.
If you want to find the median per group, then just PARTITION BY group in your OVER clauses.
Rob
Most of the solutions above work only for one field of the table, you might need to get the median (50th percentile) for many fields on the query.
I use this:
SELECT CAST(SUBSTRING_INDEX(SUBSTRING_INDEX(
GROUP_CONCAT(field_name ORDER BY field_name SEPARATOR ','),
',', 50/100 * COUNT(*) + 1), ',', -1) AS DECIMAL) AS `Median`
FROM table_name;
You can replace the "50" in example above to any percentile, is very efficient.
Just make sure you have enough memory for the GROUP_CONCAT, you can change it with:
SET group_concat_max_len = 10485760; #10MB max length
More details: http://web.performancerasta.com/metrics-tips-calculating-95th-99th-or-any-percentile-with-single-mysql-query/
I have this below code which I found on HackerRank and it is pretty simple and works in each and every case.
SELECT M.MEDIAN_COL FROM MEDIAN_TABLE M WHERE
(SELECT COUNT(MEDIAN_COL) FROM MEDIAN_TABLE WHERE MEDIAN_COL < M.MEDIAN_COL ) =
(SELECT COUNT(MEDIAN_COL) FROM MEDIAN_TABLE WHERE MEDIAN_COL > M.MEDIAN_COL );
You could use the user-defined function that's found here.
Building off of velcro's answer, for those of you having to do a median off of something that is grouped by another parameter:
SELECT grp_field, t1.val FROM (
SELECT grp_field, #rownum:=IF(#s = grp_field, #rownum + 1, 0) AS row_number,
#s:=IF(#s = grp_field, #s, grp_field) AS sec, d.val
FROM data d, (SELECT #rownum:=0, #s:=0) r
ORDER BY grp_field, d.val
) as t1 JOIN (
SELECT grp_field, count(*) as total_rows
FROM data d
GROUP BY grp_field
) as t2
ON t1.grp_field = t2.grp_field
WHERE t1.row_number=floor(total_rows/2)+1;
Takes care about an odd value count - gives the avg of the two values in the middle in that case.
SELECT AVG(val) FROM
( SELECT x.id, x.val from data x, data y
GROUP BY x.id, x.val
HAVING SUM(SIGN(1-SIGN(IF(y.val-x.val=0 AND x.id != y.id, SIGN(x.id-y.id), y.val-x.val)))) IN (ROUND((COUNT(*))/2), ROUND((COUNT(*)+1)/2))
) sq
My code, efficient without tables or additional variables:
SELECT
((SUBSTRING_INDEX(SUBSTRING_INDEX(group_concat(val order by val), ',', floor(1+((count(val)-1) / 2))), ',', -1))
+
(SUBSTRING_INDEX(SUBSTRING_INDEX(group_concat(val order by val), ',', ceiling(1+((count(val)-1) / 2))), ',', -1)))/2
as median
FROM table;
Single query to archive the perfect median:
SELECT
COUNT(*) as total_rows,
IF(count(*)%2 = 1, CAST(SUBSTRING_INDEX(SUBSTRING_INDEX( GROUP_CONCAT(val ORDER BY val SEPARATOR ','), ',', 50/100 * COUNT(*)), ',', -1) AS DECIMAL), ROUND((CAST(SUBSTRING_INDEX(SUBSTRING_INDEX( GROUP_CONCAT(val ORDER BY val SEPARATOR ','), ',', 50/100 * COUNT(*) + 1), ',', -1) AS DECIMAL) + CAST(SUBSTRING_INDEX(SUBSTRING_INDEX( GROUP_CONCAT(val ORDER BY val SEPARATOR ','), ',', 50/100 * COUNT(*)), ',', -1) AS DECIMAL)) / 2)) as median,
AVG(val) as average
FROM
data
Optionally, you could also do this in a stored procedure:
DROP PROCEDURE IF EXISTS median;
DELIMITER //
CREATE PROCEDURE median (table_name VARCHAR(255), column_name VARCHAR(255), where_clause VARCHAR(255))
BEGIN
-- Set default parameters
IF where_clause IS NULL OR where_clause = '' THEN
SET where_clause = 1;
END IF;
-- Prepare statement
SET #sql = CONCAT(
"SELECT AVG(middle_values) AS 'median' FROM (
SELECT t1.", column_name, " AS 'middle_values' FROM
(
SELECT #row:=#row+1 as `row`, x.", column_name, "
FROM ", table_name," AS x, (SELECT #row:=0) AS r
WHERE ", where_clause, " ORDER BY x.", column_name, "
) AS t1,
(
SELECT COUNT(*) as 'count'
FROM ", table_name, " x
WHERE ", where_clause, "
) AS t2
-- the following condition will return 1 record for odd number sets, or 2 records for even number sets.
WHERE t1.row >= t2.count/2
AND t1.row <= ((t2.count/2)+1)) AS t3
");
-- Execute statement
PREPARE stmt FROM #sql;
EXECUTE stmt;
END//
DELIMITER ;
-- Sample usage:
-- median(table_name, column_name, where_condition);
CALL median('products', 'price', NULL);
My solution presented below works in just one query without creation of table, variable or even sub-query.
Plus, it allows you to get median for each group in group-by queries (this is what i needed !):
SELECT `columnA`,
SUBSTRING_INDEX(SUBSTRING_INDEX(GROUP_CONCAT(`columnB` ORDER BY `columnB`), ',', CEILING((COUNT(`columnB`)/2))), ',', -1) medianOfColumnB
FROM `tableC`
-- some where clause if you want
GROUP BY `columnA`;
It works because of a smart use of group_concat and substring_index.
But, to allow big group_concat, you have to set group_concat_max_len to a higher value (1024 char by default).
You can set it like that (for current sql session) :
SET SESSION group_concat_max_len = 10000;
-- up to 4294967295 in 32-bits platform.
More infos for group_concat_max_len: https://dev.mysql.com/doc/refman/5.1/en/server-system-variables.html#sysvar_group_concat_max_len
Another riff on Velcrow's answer, but uses a single intermediate table and takes advantage of the variable used for row numbering to get the count, rather than performing an extra query to calculate it. Also starts the count so that the first row is row 0 to allow simply using Floor and Ceil to select the median row(s).
SELECT Avg(tmp.val) as median_val
FROM (SELECT inTab.val, #rows := #rows + 1 as rowNum
FROM data as inTab, (SELECT #rows := -1) as init
-- Replace with better where clause or delete
WHERE 2 > 1
ORDER BY inTab.val) as tmp
WHERE tmp.rowNum in (Floor(#rows / 2), Ceil(#rows / 2));
Knowing exact row count you can use this query:
SELECT <value> AS VAL FROM <table> ORDER BY VAL LIMIT 1 OFFSET <half>
Where <half> = ceiling(<size> / 2.0) - 1
SELECT
SUBSTRING_INDEX(
SUBSTRING_INDEX(
GROUP_CONCAT(field ORDER BY field),
',',
((
ROUND(
LENGTH(GROUP_CONCAT(field)) -
LENGTH(
REPLACE(
GROUP_CONCAT(field),
',',
''
)
)
) / 2) + 1
)),
',',
-1
)
FROM
table
The above seems to work for me.
I used a two query approach:
first one to get count, min, max and avg
second one (prepared statement) with a "LIMIT #count/2, 1" and "ORDER BY .." clauses to get the median value
These are wrapped in a function defn, so all values can be returned from one call.
If your ranges are static and your data does not change often, it might be more efficient to precompute/store these values and use the stored values instead of querying from scratch every time.
as i just needed a median AND percentile solution, I made a simple and quite flexible function based on the findings in this thread. I know that I am happy myself if I find "readymade" functions that are easy to include in my projects, so I decided to quickly share:
function mysql_percentile($table, $column, $where, $percentile = 0.5) {
$sql = "
SELECT `t1`.`".$column."` as `percentile` FROM (
SELECT #rownum:=#rownum+1 as `row_number`, `d`.`".$column."`
FROM `".$table."` `d`, (SELECT #rownum:=0) `r`
".$where."
ORDER BY `d`.`".$column."`
) as `t1`,
(
SELECT count(*) as `total_rows`
FROM `".$table."` `d`
".$where."
) as `t2`
WHERE 1
AND `t1`.`row_number`=floor(`total_rows` * ".$percentile.")+1;
";
$result = sql($sql, 1);
if (!empty($result)) {
return $result['percentile'];
} else {
return 0;
}
}
Usage is very easy, example from my current project:
...
$table = DBPRE."zip_".$slug;
$column = 'seconds';
$where = "WHERE `reached` = '1' AND `time` >= '".$start_time."'";
$reaching['median'] = mysql_percentile($table, $column, $where, 0.5);
$reaching['percentile25'] = mysql_percentile($table, $column, $where, 0.25);
$reaching['percentile75'] = mysql_percentile($table, $column, $where, 0.75);
...
Here is my way . Of course, you could put it into a procedure :-)
SET #median_counter = (SELECT FLOOR(COUNT(*)/2) - 1 AS `median_counter` FROM `data`);
SET #median = CONCAT('SELECT `val` FROM `data` ORDER BY `val` LIMIT ', #median_counter, ', 1');
PREPARE median FROM #median;
EXECUTE median;
You could avoid the variable #median_counter, if you substitude it:
SET #median = CONCAT( 'SELECT `val` FROM `data` ORDER BY `val` LIMIT ',
(SELECT FLOOR(COUNT(*)/2) - 1 AS `median_counter` FROM `data`),
', 1'
);
PREPARE median FROM #median;
EXECUTE median;
After reading all previous ones they didn't match with my actual requirement so I implemented my own one which doesn't need any procedure or complicate statements, just I GROUP_CONCAT all values from the column I wanted to obtain the MEDIAN and applying a COUNT DIV BY 2 I extract the value in from the middle of the list like the following query does :
(POS is the name of the column I want to get its median)
(query) SELECT
SUBSTRING_INDEX (
SUBSTRING_INDEX (
GROUP_CONCAT(pos ORDER BY CAST(pos AS SIGNED INTEGER) desc SEPARATOR ';')
, ';', COUNT(*)/2 )
, ';', -1 ) AS `pos_med`
FROM table_name
GROUP BY any_criterial
I hope this could be useful for someone in the way many of other comments were for me from this website.
Based on #bob's answer, this generalizes the query to have the ability to return multiple medians, grouped by some criteria.
Think, e.g., median sale price for used cars in a car lot, grouped by year-month.
SELECT
period,
AVG(middle_values) AS 'median'
FROM (
SELECT t1.sale_price AS 'middle_values', t1.row_num, t1.period, t2.count
FROM (
SELECT
#last_period:=#period AS 'last_period',
#period:=DATE_FORMAT(sale_date, '%Y-%m') AS 'period',
IF (#period<>#last_period, #row:=1, #row:=#row+1) as `row_num`,
x.sale_price
FROM listings AS x, (SELECT #row:=0) AS r
WHERE 1
-- where criteria goes here
ORDER BY DATE_FORMAT(sale_date, '%Y%m'), x.sale_price
) AS t1
LEFT JOIN (
SELECT COUNT(*) as 'count', DATE_FORMAT(sale_date, '%Y-%m') AS 'period'
FROM listings x
WHERE 1
-- same where criteria goes here
GROUP BY DATE_FORMAT(sale_date, '%Y%m')
) AS t2
ON t1.period = t2.period
) AS t3
WHERE
row_num >= (count/2)
AND row_num <= ((count/2) + 1)
GROUP BY t3.period
ORDER BY t3.period;
create table med(id integer);
insert into med(id) values(1);
insert into med(id) values(2);
insert into med(id) values(3);
insert into med(id) values(4);
insert into med(id) values(5);
insert into med(id) values(6);
select (MIN(count)+MAX(count))/2 from
(select case when (select count(*) from
med A where A.id<B.id)=(select count(*)/2 from med) OR
(select count(*) from med A where A.id>B.id)=(select count(*)/2
from med) then cast(B.id as float)end as count from med B) C;
?column?
----------
3.5
(1 row)
OR
select cast(avg(id) as float) from
(select t1.id from med t1 JOIN med t2 on t1.id!= t2.id
group by t1.id having ABS(SUM(SIGN(t1.id-t2.id)))=1) A;
Often, we may need to calculate Median not just for the whole table, but for aggregates with respect to our ID. In other words, calculate median for each ID in our table, where each ID has many records. (good performance and works in many SQL + fixes problem of even and odds, more about performance of different Median-methods https://sqlperformance.com/2012/08/t-sql-queries/median )
SELECT our_id, AVG(1.0 * our_val) as Median
FROM
( SELECT our_id, our_val,
COUNT(*) OVER (PARTITION BY our_id) AS cnt,
ROW_NUMBER() OVER (PARTITION BY our_id ORDER BY our_val) AS rn
FROM our_table
) AS x
WHERE rn IN ((cnt + 1)/2, (cnt + 2)/2) GROUP BY our_id;
Hope it helps
MySQL has supported window functions since version 8.0, you can use ROW_NUMBER or DENSE_RANK (DO NOT use RANK as it assigns the same rank to same values, like in sports ranking):
SELECT AVG(t1.val) AS median_val
FROM (SELECT val,
ROW_NUMBER() OVER(ORDER BY val) AS rownum
FROM data) t1,
(SELECT COUNT(*) AS num_records FROM data) t2
WHERE t1.row_num IN
(FLOOR((t2.num_records + 1) / 2),
FLOOR((t2.num_records + 2) / 2));
A simple way to calculate Median in MySQL
set #ct := (select count(1) from station);
set #row := 0;
select avg(a.val) as median from
(select * from table order by val) a
where (select #row := #row + 1)
between #ct/2.0 and #ct/2.0 +1;
The most simple and fast way to calculate median in mysql.
select x.col
from (select lat_n,
count(1) over (partition by 'A') as total_rows,
row_number() over (order by col asc) as rank_Order
from station ft) x
where x.rank_Order = round(x.total_rows / 2.0, 0)