Regular Expression - sql

I FOUND MY MISTAKE!!! This is wrong CASE WHEN regexp_substr(artikel.abez1,'[^/]*$') AS download,
I just deleted 'Case when' and it works! Thank you everyone! :)
I have this data:
DE-Internet-LTE
AE-Internet-Ethernet-10M/30M
How can I get only the value "10M"? and if there's none, I want it to return to null.
I used this query:
regexp_substr(artikel.abez1,'[^-/]*$') AS upload
On the second row, it gives the result "10M" but on the first row it return to "LTE" instead of "null".
I'm using SQL Tool 1.8 b38.
UPDATE:
My full query:
`
SELECT DISTINCT artikel.artnr1, L1.lfnr1 AS lfnr, L1.name1 AS lf_name,
artikel.abez1, regexp_substr(artikel.abez1,'(.?){1}(.?)-', 1, 1, '',
2) AS land, regexp_substr(artikel.abez1,'(.?-){1}(.?)-', 1, 1, '',
2) AS Technologie, CASE WHEN
regexp_substr(artikel.abez1,'(.?-){2}(.?)-', 1, 1, '', 2) IS NULL
THEN regexp_substr(artikel.abez1,'[^-]$') ELSE
regexp_substr(artikel.abez1,'(.?-){2}(.?)-', 1, 1, '', 2) END AS
Topologie, regexp_substr(regexp_substr(artikel.abez1,
'([^-/]+)/[^-/]+$'), '^[^-/]+') AS upload, CASE WHEN
regexp_substr(artikel.abez1,'[^/]$') AS download,
bestanfragepos.preis / bestanfrage.bwkurs, 'Anfrage' AS Art,
To_Char(bestanfrage.lfdanfrage), bestanfrage.anfragedatum, CASE WHEN
InStr(angaufgut.reserve1, '.') > 1 THEN Months_Between(
bestanfrage.anfragedatum, To_Date(angaufgut.reserve1)) ELSE
To_Number(angaufgut.reserve1) end AS Laufzeit FROM artikel inner join
modell ON modell.lfdnr = artikel.lfdmodnr left join bestanfragepos ON
artikel.lfdnr = bestanfragepos.lfdartnr left join bestanfrage ON
bestanfragepos.lfdanfrage = bestanfrage.lfdanfrage left join lieferant
L1 ON L1.liefnr = bestanfrage.lfdliefnr left JOIN angaufgut ON
bestanfragepos.lfdangaufgutnr = angaufgut.lfdnr WHERE Lower(modcode)
LIKE 'ac%' AND Lower(abez1) NOT LIKE 'cust%' and artikel.mandant = 1
AND bestanfragepos.preis != 0 ORDER BY abez1 /
`
it gives error ora-00920 invalid relational operator.
It only worked on a single data when I used this:
SELECT regexp_substr(regexp_substr(artikelbez, '([^-/]+)/[^-/]+$'), '^[^-/]+') FROM nag_reporting_leitungspreise WHERE art = 'Vertrag' AND REF = 3791

Your code returns 30M. If you really want 10M, then this should do what you want:
select regexp_substr(regexp_substr(t.abez1, '([^-/]+)/[^-/]+$'), '^[^-/]+') AS upload
from t;
Here is a db<>fiddle.

with t(str) as (
select * from table(ku$_vcnt(
'DE-Internet-LTE',
'AE-Internet-Ethernet-10M/30M',
'AE-Internet-Ethernet-20m/40m',
'AE-Internet-Ethernet-300M/500M',
'AE-Internet-Ethernet-4000k/600k',
'AE-Internet-Ethernet-200M'
))
)
select
str,
regexp_substr(str, '(\d+(k|m))(/\d+(k|m))?',1,1,'i',1) upload,
regexp_substr(str, '/(\d+(k|m))',1,1,'i',1) download
from t;
Results:
STR UPLOAD DOWNLOAD
----------------------------------- ------- --------
DE-Internet-LTE
AE-Internet-Ethernet-10M/30M 10M 30M
AE-Internet-Ethernet-20m/40m 20m 40m
AE-Internet-Ethernet-300M/500M 300M 500M
AE-Internet-Ethernet-4000k/600k 4000k 600k
AE-Internet-Ethernet-200M 200M

Just have a look at the below solution:
with t(str) as
(
select column_value
from table(sys.odcivarchar2list(
'DE-Internet-LTE',
'AE-Internet-Ethernet-10M/30M',
'AE-Internet-Ethernet-20m/40m',
'AE-Internet-Ethernet-300M/500M',
'AE-Internet-Ethernet-4000k/600k',
'AE-Internet-Ethernet-200M'
))
)
select str,
regexp_substr(str, '-(\d+.*?)(/|$)',1,1,null,1) as upload,
regexp_substr(str, '/(\d+.*?)$',1,1,null,1) as download,
regexp_substr(regexp_substr(str, '([^-/]+)/[^-/]+$'), '^[^-/]+') as Gordan_Linoff_upload
from t;
Output:

Related

SQL - Point differences between two lists

Given two comma separated (un-ordered) lists of numbers, I want to extract only the differences between them (using regexp probably).
e.g.:
select
'1010484,1025781,1051394,1069679' as list_1,
'1005923,1010484,1025781,1034010,1044261,1048311,1051394' as list_2
What I wish for is a result such as:
l1_additional_data: 1069679
l2_additional_data: 1005923,1034010,1044261,1048311
How can this be done?
I'm using Vertica, BTW - That means that no hierarchic ("connect by") queries could be used here.
Thanks in advance!
There's a relevant post that will be helpful - Splitting string into multiple rows in Oracle
I don't know vertica but based on oracle You could go with:
with list1 as
(
select
regexp_substr(list_1 ,'[^,]+', 1, level) as list_1_rows
from (
select
'1010484,1025781,1051394,1069679' as list_1
from dual)
connect by
regexp_substr(list_1 ,'[^,]+', 1, level) is not null),
list2 as (select
regexp_substr(list_2 ,'[^,]+', 1, level) as list_2_rows
from (
select
'1005923,1010484,1025781,1034010,1044261,1048311,1051394' as list_2
from dual)
connect by regexp_substr(list_2 ,'[^,]+', 1, level) is not null)
select * from list1
full outer join list2
on list1.list_1_rows = list2.list_2_rows
where list_1_rows is null or list_2_rows is null
OK, Here's my solution - But it's not very efficient, and it probably won't scale (in terms of performance):
WITH lists AS
(SELECT'1010484,1025781,1051394,1069679' AS list_1, '1005923,1010484,1025781,1034010,1044261,1048311,1051394' AS list_2 )
, numbers AS
(SELECT row_number() over() i
FROM system_columns limit 100)
SELECT group_concat(parsed_code_1) list_1_additions,
group_concat(parsed_code_2) list_2_additions
FROM(SELECT parsed_code_1
FROM(SELECT split_part(list_1, ',', i) parsed_code_1
FROM lists
CROSS JOIN numbers
WHERE i <= regexp_count(list_1, ',')+1) l
WHERE parsed_code_1 IS NOT NULL) a
FULL OUTER JOIN
(SELECT parsed_code_2
FROM(SELECT split_part(list_2, ',', i) parsed_code_2
FROM lists
CROSS JOIN numbers
WHERE i <= regexp_count(list_2, ',')+1) l
WHERE parsed_code_2 IS NOT NULL) b ON(parsed_code_1 = parsed_code_2)
WHERE parsed_code_1 IS NULL OR parsed_code_2 IS NULL

Finding the maximum value from part of a string

I have the below data coming from DB. I would like to get the maximum of last digits after ".". For example data looks like this, where the last digits after last "." are 160410, 6, 16 etc.
I would like to get the "11.2.0.4.160419" output
11.2.0.4.160419
11.2.0.4.6
11.2.0.4.16
11.2.0.4.10
11.2.0.4.18
11.2.0.4.2
11.2.0.4.14
11.2.0.4.4
11.2.0.4.160119
11.2.0.4.3
11.2.0.4.15
11.2.0.4.9
11.2.0.4.17
11.2.0.4.8
11.2.0.4.5
11.2.0.4.7
11.2.0.4.1
11.2.0.4.151117
11.2.0.4.13
11.2.0.4.12
11.2.0.4.20
11.2.0.4.11
11.2.0.4.19
data before the "." are not same. It has various values. Infact the actual data is like this
DATABASE PATCH FOR EXADATA (JAN 2016 - 11.2.0.4.160119) : (22309110)
DATABASE PATCH FOR EXADATA (JAN 2016 - 11.2.0.4.16) : (22309111)
.
.
In this I am interested to get max of 160119.
-- Added
Sorry I am back again. We are looking for further where we need to get the result like this
11.2.0.4.160419
Meaning, the maximum of after "." , but when displaying display everything in between the parenthesis.
Actual data
'DATABASE PATCH FOR EXADATA (NOV 2015 - 11.2.0.4.151117)
DATABASE PATCH FOR EXADATA (APR2014 - 11.2.0.4.6) : (18293775)
DATABASE PATCH FOR EXADATA (APR2015 - 11.2.0.4.16) : (20449729)
desired output
(NOV 2015 - 11.2.0.4.151117)
I have this query working
with
inputs ( target_guid, description) as (
select t.target_guid, a.description from MGMT$OH_PATCH a, mgmt$oh_installed_targets oh,MGMT$TARGET_COMPONENTS c,MGMT$TARGET_FLAT_MEMBERS d, mgmt_targets t where t.target_type = 'oracle_dbmachine' and d.member_target_type = 'host' and d.aggregate_target_guid = t.target_guid and c.target_type = 'oracle_database' and c.host_name = d.member_target_name and a.host_name = c.host_name and a.target_guid = oh.oh_target_guid and oh.inst_target_type like '%database%' and a.description is not null and a.description like '%PATCH FOR EXADATA%' group by t.target_guid, a.description order by t.target_guid
)
select target_guid, max(to_number(regexp_substr(description, '.(\d*))', 1, 1, null, 1))) as version
from inputs group by target_guid;
with the output of
5DA0496CCCD42CA1099F1AD06216F3C0 160419
ED10DD7D4C62CEAA117E7B7E97883EC2 9
I need the output as
5DA0496CCCD42CA1099F1AD06216F3C0 11.2.0.4.160419
ED10DD7D4C62CEAA117E7B7E97883EC2 11.2.0.4.9
Can you please help?
You can extract the last digits using:
select regexp_substr(col, '[0-9]+$', 1, 1)
If you don't like depending on the greediness of Oracle regular expressions (which I can appreciate), you can use:
select trim(leading '.' from regexp_substr(col, '[.][0-9]+$', 1, 1))
You can get the maximum value by converting to a numeric and taking the max:
select max(cast(regexp_substr(col, '[0-9]+$', 1, 1) as number))
To get the full column:
select t.*
from (select t.*
from t
order by cast(regexp_substr(col, '[0-9]+$', 1, 1) as number) desc
) t
where rownum = 1;
Finally, for your particular data, there is a simpler solution:
select t.*
from (select t.*
from t
order by length(col) desc, col desc
) t
where rownum = 1;
However, this assumes that all the stuff before the final '.' is the same.
If the assumptions I detailed in my Comment to your original question are correct, then something like this should work:
with
inputs ( inp_str ) as (
select 'DATABASE PATCH FOR EXADATA (JAN 2016 - 11.2.0.4.160119) : (22309110)'
from dual union all
select 'DATABASE PATCH FOR EXADATA (JAN 2016 - 11.2.0.4.16) : (22309111)' from dual
)
select max(to_number(regexp_substr(inp_str, '.(\d*)\)', 1, 1, null, 1))) as max_something
from inputs;
The select statement is really just the last two lines; the rest is for testing purposes. Replace inp_str with your actual column name, inputs with your table name, and max_something with your desired output column name.
EDIT:
Here is a solution for the OP's restated problem (see request in comments).
with
inputs ( inp_str ) as (
select 'DATABASE PATCH FOR EXADATA (JAN 2016 - 11.2.0.4.160119) : (22309110)'
from dual union all
select 'DATABASE PATCH FOR EXADATA (JAN 2016 - 11.2.0.4.16) : (22309111)' from dual
)
select regexp_substr(inp_str, '\(([^)]+)\)', 1, 1, null, 1) as token
from inputs
where to_number(regexp_substr(inp_str, '.(\d*)\)', 1, 1, null, 1)) =
( select max(to_number(regexp_substr(inp_str, '.(\d*)\)', 1, 1, null, 1)))
from inputs
)
;
Output:
TOKEN
-------------------------
JAN 2016 - 11.2.0.4.160119
1 row selected.
Maybe check out Oracle regexp_like e.g. WHERE REGEXP_LIKE(first_name, EXPRESSION)

Select Value Match - SQL

Can you advise if it is possible, to select a count for numerous substrings in a query
so if I have a message field which contains for example, text messages and I could do
SELECT COUNT(1)
FROM MESSAGES
WHERE MESSAGE_BODY LIKE '%hello%'
but what I want to do is more:
SELECT STRING, COUNT(1)
FROM MESSAGES
WHERE MESSAGE_BODY IN (list of strings with wild card)
is this possible?
to break down example:
ID | Message_Body
1 | Hello, How Are You?
2 | Hi, Great Thanks
3 | Hello, How is things?
4 | Ciao
Output wanted:
hello , 2
ciao, 1
SELECT (input strings), COUNT(1)
FROM TABLE
WHERE (input strings) IN ('%hello%','%ciao%')
If I understood you correctly, you can try something like this:
SELECT t.string,
CASE WHEN t.MESSAGE_BODY LIKE '%laptop%' then 1 else 0 END +
CASE WHEN t.MESSAGE_BODY LIKE '%one%' then 1 else 0 END +
CASE WHEN t.MESSAGE_BODY LIKE '%two%' then 1 else 0 END as count_col
FROM YourTable t
If you just want multiple LIKE comaparison, use REGEXP_LIKE() :
SELECT STRING, COUNT(1)
FROM MESSAGES
where regexp_like(MESSAGE_BODY, 'one|two|laptop')
EDIT: You can use a derived table containing all strings you are intrested on and left join to the original table for count:
SELECT t.wrd,COUNT(s.id) as cnt
FROM (
SELECT 'hello' as wrd FROM DUAL
UNION ALL
SELECT 'ciao' as wrd FROM DUAL) t
LEFT OUTER JOIN messages s
ON(s.message_body LIKE '%' || t.wrd || '%')
GROUP BY t.wrd
Here is with looking for whole words:
SELECT a.word, COUNT (message.message_body)
FROM ( SELECT REGEXP_SUBSTR ('hello,ciao', '[^,]+', 1, LEVEL) word
FROM DUAL
CONNECT BY REGEXP_SUBSTR ('hello,ciao', '[^,]+', 1, LEVEL) IS NOT NULL) a
LEFT OUTER JOIN MESSAGES ON REGEXP_INSTR (MESSAGE_BODY, '(^|\s)' || a.word || '(\s|$)', 1, 1, 0, 'i') > 0
GROUP BY a.word

How to convert only first letter uppercase without using Initcap in Oracle?

Is there a way to convert the first letter uppercase in Oracle SQl without using the Initcap Function?
I have the problem, that I must work with the DISTINCT keyword in SQL clause and the Initcap function doesn´t work.
Heres is my SQL example:
select distinct p.nr, initcap(p.firstname), initcap(p.lastname), ill.describtion
from patient p left join illness ill
on p.id = ill.id
where p.deleted = 0
order by p.lastname, p.firstname;
I get this error message: ORA-01791: not a SELECTed expression
When SELECT DISTINCT, you can't ORDER BY columns that aren't selected. Use column aliases instead, as:
select distinct p.nr, initcap(p.firstname) fname, initcap(p.lastname) lname, ill.describtion
from patient p left join illness ill
on p.id = ill.id
where p.deleted = 0
order by lname, fname
this would do it, but i think you need to post your query as there may be a better solution
select upper(substr(<column>,1,1)) || substr(<column>,2,9999) from dual
To change string to String, you can use this:
SELECT
regexp_replace ('string', '[a-z]', upper (substr ('string', 1, 1)), 1, 1, 'i')
FROM dual;
This assumes that the first letter is the one you want to convert. It your input text starts with a number, such as 2 strings then it won't change it to 2 Strings.
You can also use the column number instead of the name or alias:
select distinct p.nr, initcap(p.firstname), initcap(p.lastname), ill.describtion
from patient p left join illness ill
on p.id = ill.id
where p.deleted = 0
order by 3, 2;
WITH inData AS
(
SELECT 'word1, wORD2, word3, woRD4, worD5, word6' str FROM dual
),
inRows as
(
SELECT 1 as tId, LEVEL as rId, trim(regexp_substr(str, '([A-Za-z0-9])+', 1, LEVEL)) as str
FROM inData
CONNECT BY instr(str, ',', 1, LEVEL - 1) > 0
)
SELECT tId, LISTAGG( upper(substr(str, 1, 1)) || substr(str, 2) , '') WITHIN GROUP (ORDER BY rId) AS camelCase
FROM inRows
GROUP BY tId;

Simple way to calculate median with MySQL

What's the simplest (and hopefully not too slow) way to calculate the median with MySQL? I've used AVG(x) for finding the mean, but I'm having a hard time finding a simple way of calculating the median. For now, I'm returning all the rows to PHP, doing a sort, and then picking the middle row, but surely there must be some simple way of doing it in a single MySQL query.
Example data:
id | val
--------
1 4
2 7
3 2
4 2
5 9
6 8
7 3
Sorting on val gives 2 2 3 4 7 8 9, so the median should be 4, versus SELECT AVG(val) which == 5.
In MariaDB / MySQL:
SELECT AVG(dd.val) as median_val
FROM (
SELECT d.val, #rownum:=#rownum+1 as `row_number`, #total_rows:=#rownum
FROM data d, (SELECT #rownum:=0) r
WHERE d.val is NOT NULL
-- put some where clause here
ORDER BY d.val
) as dd
WHERE dd.row_number IN ( FLOOR((#total_rows+1)/2), FLOOR((#total_rows+2)/2) );
Steve Cohen points out, that after the first pass, #rownum will contain the total number of rows. This can be used to determine the median, so no second pass or join is needed.
Also AVG(dd.val) and dd.row_number IN(...) is used to correctly produce a median when there are an even number of records. Reasoning:
SELECT FLOOR((3+1)/2),FLOOR((3+2)/2); -- when total_rows is 3, avg rows 2 and 2
SELECT FLOOR((4+1)/2),FLOOR((4+2)/2); -- when total_rows is 4, avg rows 2 and 3
Finally, MariaDB 10.3.3+ contains a MEDIAN function
I just found another answer online in the comments:
For medians in almost any SQL:
SELECT x.val from data x, data y
GROUP BY x.val
HAVING SUM(SIGN(1-SIGN(y.val-x.val))) = (COUNT(*)+1)/2
Make sure your columns are well indexed and the index is used for filtering and sorting. Verify with the explain plans.
select count(*) from table --find the number of rows
Calculate the "median" row number. Maybe use: median_row = floor(count / 2).
Then pick it out of the list:
select val from table order by val asc limit median_row,1
This should return you one row with just the value you want.
I found the accepted solution didn't work on my MySQL install, returning an empty set, but this query worked for me in all situations that I tested it on:
SELECT x.val from data x, data y
GROUP BY x.val
HAVING SUM(SIGN(1-SIGN(y.val-x.val)))/COUNT(*) > .5
LIMIT 1
Unfortunately, neither TheJacobTaylor's nor velcrow's answers return accurate results for current versions of MySQL.
Velcro's answer from above is close, but it does not calculate correctly for result sets with an even number of rows. Medians are defined as either 1) the middle number on odd numbered sets, or 2) the average of the two middle numbers on even number sets.
So, here's velcro's solution patched to handle both odd and even number sets:
SELECT AVG(middle_values) AS 'median' FROM (
SELECT t1.median_column AS 'middle_values' FROM
(
SELECT #row:=#row+1 as `row`, x.median_column
FROM median_table AS x, (SELECT #row:=0) AS r
WHERE 1
-- put some where clause here
ORDER BY x.median_column
) AS t1,
(
SELECT COUNT(*) as 'count'
FROM median_table x
WHERE 1
-- put same where clause here
) AS t2
-- the following condition will return 1 record for odd number sets, or 2 records for even number sets.
WHERE t1.row >= t2.count/2 and t1.row <= ((t2.count/2) +1)) AS t3;
To use this, follow these 3 easy steps:
Replace "median_table" (2 occurrences) in the above code with the name of your table
Replace "median_column" (3 occurrences) with the column name you'd like to find a median for
If you have a WHERE condition, replace "WHERE 1" (2 occurrences) with your where condition
I propose a faster way.
Get the row count:
SELECT CEIL(COUNT(*)/2) FROM data;
Then take the middle value in a sorted subquery:
SELECT max(val) FROM (SELECT val FROM data ORDER BY val limit #middlevalue) x;
I tested this with a 5x10e6 dataset of random numbers and it will find the median in under 10 seconds.
Install and use this mysql statistical functions: http://www.xarg.org/2012/07/statistical-functions-in-mysql/
After that, calculate median is easy:
SELECT median(val) FROM data;
A comment on this page in the MySQL documentation has the following suggestion:
-- (mostly) High Performance scaling MEDIAN function per group
-- Median defined in http://en.wikipedia.org/wiki/Median
--
-- by Peter Hlavac
-- 06.11.2008
--
-- Example Table:
DROP table if exists table_median;
CREATE TABLE table_median (id INTEGER(11),val INTEGER(11));
COMMIT;
INSERT INTO table_median (id, val) VALUES
(1, 7), (1, 4), (1, 5), (1, 1), (1, 8), (1, 3), (1, 6),
(2, 4),
(3, 5), (3, 2),
(4, 5), (4, 12), (4, 1), (4, 7);
-- Calculating the MEDIAN
SELECT #a := 0;
SELECT
id,
AVG(val) AS MEDIAN
FROM (
SELECT
id,
val
FROM (
SELECT
-- Create an index n for every id
#a := (#a + 1) mod o.c AS shifted_n,
IF(#a mod o.c=0, o.c, #a) AS n,
o.id,
o.val,
-- the number of elements for every id
o.c
FROM (
SELECT
t_o.id,
val,
c
FROM
table_median t_o INNER JOIN
(SELECT
id,
COUNT(1) AS c
FROM
table_median
GROUP BY
id
) t2
ON (t2.id = t_o.id)
ORDER BY
t_o.id,val
) o
) a
WHERE
IF(
-- if there is an even number of elements
-- take the lower and the upper median
-- and use AVG(lower,upper)
c MOD 2 = 0,
n = c DIV 2 OR n = (c DIV 2)+1,
-- if its an odd number of elements
-- take the first if its only one element
-- or take the one in the middle
IF(
c = 1,
n = 1,
n = c DIV 2 + 1
)
)
) a
GROUP BY
id;
-- Explanation:
-- The Statement creates a helper table like
--
-- n id val count
-- ----------------
-- 1, 1, 1, 7
-- 2, 1, 3, 7
-- 3, 1, 4, 7
-- 4, 1, 5, 7
-- 5, 1, 6, 7
-- 6, 1, 7, 7
-- 7, 1, 8, 7
--
-- 1, 2, 4, 1
-- 1, 3, 2, 2
-- 2, 3, 5, 2
--
-- 1, 4, 1, 4
-- 2, 4, 5, 4
-- 3, 4, 7, 4
-- 4, 4, 12, 4
-- from there we can select the n-th element on the position: count div 2 + 1
If MySQL has ROW_NUMBER, then the MEDIAN is (be inspired by this SQL Server query):
WITH Numbered AS
(
SELECT *, COUNT(*) OVER () AS Cnt,
ROW_NUMBER() OVER (ORDER BY val) AS RowNum
FROM yourtable
)
SELECT id, val
FROM Numbered
WHERE RowNum IN ((Cnt+1)/2, (Cnt+2)/2)
;
The IN is used in case you have an even number of entries.
If you want to find the median per group, then just PARTITION BY group in your OVER clauses.
Rob
Most of the solutions above work only for one field of the table, you might need to get the median (50th percentile) for many fields on the query.
I use this:
SELECT CAST(SUBSTRING_INDEX(SUBSTRING_INDEX(
GROUP_CONCAT(field_name ORDER BY field_name SEPARATOR ','),
',', 50/100 * COUNT(*) + 1), ',', -1) AS DECIMAL) AS `Median`
FROM table_name;
You can replace the "50" in example above to any percentile, is very efficient.
Just make sure you have enough memory for the GROUP_CONCAT, you can change it with:
SET group_concat_max_len = 10485760; #10MB max length
More details: http://web.performancerasta.com/metrics-tips-calculating-95th-99th-or-any-percentile-with-single-mysql-query/
I have this below code which I found on HackerRank and it is pretty simple and works in each and every case.
SELECT M.MEDIAN_COL FROM MEDIAN_TABLE M WHERE
(SELECT COUNT(MEDIAN_COL) FROM MEDIAN_TABLE WHERE MEDIAN_COL < M.MEDIAN_COL ) =
(SELECT COUNT(MEDIAN_COL) FROM MEDIAN_TABLE WHERE MEDIAN_COL > M.MEDIAN_COL );
You could use the user-defined function that's found here.
Building off of velcro's answer, for those of you having to do a median off of something that is grouped by another parameter:
SELECT grp_field, t1.val FROM (
SELECT grp_field, #rownum:=IF(#s = grp_field, #rownum + 1, 0) AS row_number,
#s:=IF(#s = grp_field, #s, grp_field) AS sec, d.val
FROM data d, (SELECT #rownum:=0, #s:=0) r
ORDER BY grp_field, d.val
) as t1 JOIN (
SELECT grp_field, count(*) as total_rows
FROM data d
GROUP BY grp_field
) as t2
ON t1.grp_field = t2.grp_field
WHERE t1.row_number=floor(total_rows/2)+1;
Takes care about an odd value count - gives the avg of the two values in the middle in that case.
SELECT AVG(val) FROM
( SELECT x.id, x.val from data x, data y
GROUP BY x.id, x.val
HAVING SUM(SIGN(1-SIGN(IF(y.val-x.val=0 AND x.id != y.id, SIGN(x.id-y.id), y.val-x.val)))) IN (ROUND((COUNT(*))/2), ROUND((COUNT(*)+1)/2))
) sq
My code, efficient without tables or additional variables:
SELECT
((SUBSTRING_INDEX(SUBSTRING_INDEX(group_concat(val order by val), ',', floor(1+((count(val)-1) / 2))), ',', -1))
+
(SUBSTRING_INDEX(SUBSTRING_INDEX(group_concat(val order by val), ',', ceiling(1+((count(val)-1) / 2))), ',', -1)))/2
as median
FROM table;
Single query to archive the perfect median:
SELECT
COUNT(*) as total_rows,
IF(count(*)%2 = 1, CAST(SUBSTRING_INDEX(SUBSTRING_INDEX( GROUP_CONCAT(val ORDER BY val SEPARATOR ','), ',', 50/100 * COUNT(*)), ',', -1) AS DECIMAL), ROUND((CAST(SUBSTRING_INDEX(SUBSTRING_INDEX( GROUP_CONCAT(val ORDER BY val SEPARATOR ','), ',', 50/100 * COUNT(*) + 1), ',', -1) AS DECIMAL) + CAST(SUBSTRING_INDEX(SUBSTRING_INDEX( GROUP_CONCAT(val ORDER BY val SEPARATOR ','), ',', 50/100 * COUNT(*)), ',', -1) AS DECIMAL)) / 2)) as median,
AVG(val) as average
FROM
data
Optionally, you could also do this in a stored procedure:
DROP PROCEDURE IF EXISTS median;
DELIMITER //
CREATE PROCEDURE median (table_name VARCHAR(255), column_name VARCHAR(255), where_clause VARCHAR(255))
BEGIN
-- Set default parameters
IF where_clause IS NULL OR where_clause = '' THEN
SET where_clause = 1;
END IF;
-- Prepare statement
SET #sql = CONCAT(
"SELECT AVG(middle_values) AS 'median' FROM (
SELECT t1.", column_name, " AS 'middle_values' FROM
(
SELECT #row:=#row+1 as `row`, x.", column_name, "
FROM ", table_name," AS x, (SELECT #row:=0) AS r
WHERE ", where_clause, " ORDER BY x.", column_name, "
) AS t1,
(
SELECT COUNT(*) as 'count'
FROM ", table_name, " x
WHERE ", where_clause, "
) AS t2
-- the following condition will return 1 record for odd number sets, or 2 records for even number sets.
WHERE t1.row >= t2.count/2
AND t1.row <= ((t2.count/2)+1)) AS t3
");
-- Execute statement
PREPARE stmt FROM #sql;
EXECUTE stmt;
END//
DELIMITER ;
-- Sample usage:
-- median(table_name, column_name, where_condition);
CALL median('products', 'price', NULL);
My solution presented below works in just one query without creation of table, variable or even sub-query.
Plus, it allows you to get median for each group in group-by queries (this is what i needed !):
SELECT `columnA`,
SUBSTRING_INDEX(SUBSTRING_INDEX(GROUP_CONCAT(`columnB` ORDER BY `columnB`), ',', CEILING((COUNT(`columnB`)/2))), ',', -1) medianOfColumnB
FROM `tableC`
-- some where clause if you want
GROUP BY `columnA`;
It works because of a smart use of group_concat and substring_index.
But, to allow big group_concat, you have to set group_concat_max_len to a higher value (1024 char by default).
You can set it like that (for current sql session) :
SET SESSION group_concat_max_len = 10000;
-- up to 4294967295 in 32-bits platform.
More infos for group_concat_max_len: https://dev.mysql.com/doc/refman/5.1/en/server-system-variables.html#sysvar_group_concat_max_len
Another riff on Velcrow's answer, but uses a single intermediate table and takes advantage of the variable used for row numbering to get the count, rather than performing an extra query to calculate it. Also starts the count so that the first row is row 0 to allow simply using Floor and Ceil to select the median row(s).
SELECT Avg(tmp.val) as median_val
FROM (SELECT inTab.val, #rows := #rows + 1 as rowNum
FROM data as inTab, (SELECT #rows := -1) as init
-- Replace with better where clause or delete
WHERE 2 > 1
ORDER BY inTab.val) as tmp
WHERE tmp.rowNum in (Floor(#rows / 2), Ceil(#rows / 2));
Knowing exact row count you can use this query:
SELECT <value> AS VAL FROM <table> ORDER BY VAL LIMIT 1 OFFSET <half>
Where <half> = ceiling(<size> / 2.0) - 1
SELECT
SUBSTRING_INDEX(
SUBSTRING_INDEX(
GROUP_CONCAT(field ORDER BY field),
',',
((
ROUND(
LENGTH(GROUP_CONCAT(field)) -
LENGTH(
REPLACE(
GROUP_CONCAT(field),
',',
''
)
)
) / 2) + 1
)),
',',
-1
)
FROM
table
The above seems to work for me.
I used a two query approach:
first one to get count, min, max and avg
second one (prepared statement) with a "LIMIT #count/2, 1" and "ORDER BY .." clauses to get the median value
These are wrapped in a function defn, so all values can be returned from one call.
If your ranges are static and your data does not change often, it might be more efficient to precompute/store these values and use the stored values instead of querying from scratch every time.
as i just needed a median AND percentile solution, I made a simple and quite flexible function based on the findings in this thread. I know that I am happy myself if I find "readymade" functions that are easy to include in my projects, so I decided to quickly share:
function mysql_percentile($table, $column, $where, $percentile = 0.5) {
$sql = "
SELECT `t1`.`".$column."` as `percentile` FROM (
SELECT #rownum:=#rownum+1 as `row_number`, `d`.`".$column."`
FROM `".$table."` `d`, (SELECT #rownum:=0) `r`
".$where."
ORDER BY `d`.`".$column."`
) as `t1`,
(
SELECT count(*) as `total_rows`
FROM `".$table."` `d`
".$where."
) as `t2`
WHERE 1
AND `t1`.`row_number`=floor(`total_rows` * ".$percentile.")+1;
";
$result = sql($sql, 1);
if (!empty($result)) {
return $result['percentile'];
} else {
return 0;
}
}
Usage is very easy, example from my current project:
...
$table = DBPRE."zip_".$slug;
$column = 'seconds';
$where = "WHERE `reached` = '1' AND `time` >= '".$start_time."'";
$reaching['median'] = mysql_percentile($table, $column, $where, 0.5);
$reaching['percentile25'] = mysql_percentile($table, $column, $where, 0.25);
$reaching['percentile75'] = mysql_percentile($table, $column, $where, 0.75);
...
Here is my way . Of course, you could put it into a procedure :-)
SET #median_counter = (SELECT FLOOR(COUNT(*)/2) - 1 AS `median_counter` FROM `data`);
SET #median = CONCAT('SELECT `val` FROM `data` ORDER BY `val` LIMIT ', #median_counter, ', 1');
PREPARE median FROM #median;
EXECUTE median;
You could avoid the variable #median_counter, if you substitude it:
SET #median = CONCAT( 'SELECT `val` FROM `data` ORDER BY `val` LIMIT ',
(SELECT FLOOR(COUNT(*)/2) - 1 AS `median_counter` FROM `data`),
', 1'
);
PREPARE median FROM #median;
EXECUTE median;
After reading all previous ones they didn't match with my actual requirement so I implemented my own one which doesn't need any procedure or complicate statements, just I GROUP_CONCAT all values from the column I wanted to obtain the MEDIAN and applying a COUNT DIV BY 2 I extract the value in from the middle of the list like the following query does :
(POS is the name of the column I want to get its median)
(query) SELECT
SUBSTRING_INDEX (
SUBSTRING_INDEX (
GROUP_CONCAT(pos ORDER BY CAST(pos AS SIGNED INTEGER) desc SEPARATOR ';')
, ';', COUNT(*)/2 )
, ';', -1 ) AS `pos_med`
FROM table_name
GROUP BY any_criterial
I hope this could be useful for someone in the way many of other comments were for me from this website.
Based on #bob's answer, this generalizes the query to have the ability to return multiple medians, grouped by some criteria.
Think, e.g., median sale price for used cars in a car lot, grouped by year-month.
SELECT
period,
AVG(middle_values) AS 'median'
FROM (
SELECT t1.sale_price AS 'middle_values', t1.row_num, t1.period, t2.count
FROM (
SELECT
#last_period:=#period AS 'last_period',
#period:=DATE_FORMAT(sale_date, '%Y-%m') AS 'period',
IF (#period<>#last_period, #row:=1, #row:=#row+1) as `row_num`,
x.sale_price
FROM listings AS x, (SELECT #row:=0) AS r
WHERE 1
-- where criteria goes here
ORDER BY DATE_FORMAT(sale_date, '%Y%m'), x.sale_price
) AS t1
LEFT JOIN (
SELECT COUNT(*) as 'count', DATE_FORMAT(sale_date, '%Y-%m') AS 'period'
FROM listings x
WHERE 1
-- same where criteria goes here
GROUP BY DATE_FORMAT(sale_date, '%Y%m')
) AS t2
ON t1.period = t2.period
) AS t3
WHERE
row_num >= (count/2)
AND row_num <= ((count/2) + 1)
GROUP BY t3.period
ORDER BY t3.period;
create table med(id integer);
insert into med(id) values(1);
insert into med(id) values(2);
insert into med(id) values(3);
insert into med(id) values(4);
insert into med(id) values(5);
insert into med(id) values(6);
select (MIN(count)+MAX(count))/2 from
(select case when (select count(*) from
med A where A.id<B.id)=(select count(*)/2 from med) OR
(select count(*) from med A where A.id>B.id)=(select count(*)/2
from med) then cast(B.id as float)end as count from med B) C;
?column?
----------
3.5
(1 row)
OR
select cast(avg(id) as float) from
(select t1.id from med t1 JOIN med t2 on t1.id!= t2.id
group by t1.id having ABS(SUM(SIGN(t1.id-t2.id)))=1) A;
Often, we may need to calculate Median not just for the whole table, but for aggregates with respect to our ID. In other words, calculate median for each ID in our table, where each ID has many records. (good performance and works in many SQL + fixes problem of even and odds, more about performance of different Median-methods https://sqlperformance.com/2012/08/t-sql-queries/median )
SELECT our_id, AVG(1.0 * our_val) as Median
FROM
( SELECT our_id, our_val,
COUNT(*) OVER (PARTITION BY our_id) AS cnt,
ROW_NUMBER() OVER (PARTITION BY our_id ORDER BY our_val) AS rn
FROM our_table
) AS x
WHERE rn IN ((cnt + 1)/2, (cnt + 2)/2) GROUP BY our_id;
Hope it helps
MySQL has supported window functions since version 8.0, you can use ROW_NUMBER or DENSE_RANK (DO NOT use RANK as it assigns the same rank to same values, like in sports ranking):
SELECT AVG(t1.val) AS median_val
FROM (SELECT val,
ROW_NUMBER() OVER(ORDER BY val) AS rownum
FROM data) t1,
(SELECT COUNT(*) AS num_records FROM data) t2
WHERE t1.row_num IN
(FLOOR((t2.num_records + 1) / 2),
FLOOR((t2.num_records + 2) / 2));
A simple way to calculate Median in MySQL
set #ct := (select count(1) from station);
set #row := 0;
select avg(a.val) as median from
(select * from table order by val) a
where (select #row := #row + 1)
between #ct/2.0 and #ct/2.0 +1;
The most simple and fast way to calculate median in mysql.
select x.col
from (select lat_n,
count(1) over (partition by 'A') as total_rows,
row_number() over (order by col asc) as rank_Order
from station ft) x
where x.rank_Order = round(x.total_rows / 2.0, 0)