Display range of numbers - prefixed with an alphabet character - sql

Oracle Database 19c Enterprise Edition
I have a table with various codes as listed below. Code is prefixed with an alphabet and then followed by a number. Some are in sequence, and some are individual or random independent numbers.
A1,A2,A3,A4,A5,A8,A9,A10,A11,A12,B3,B5,B7,B8,B9,B110,B111,B112,C1,C2,C3,C4,C5,C6,C7,C8
I want to display them in ranges as shown below. Here is the link to the schema and data: SQL Fiddle
Expected Output:
A1-A5
A8-A12
B3
B5
B7-B9
B110-B112
C1-C8
I tried solutions like http://lalitkumarb.wordpress.com/2015/07/22/find-range-of-consecutive-values-in-a-sequence-of-numbers-or-dates/ but doesn't work for me as I have the letter prefixed to the number.

From Oracle 12, you can split the string into prefix and suffix and then use MATCH_RECOGNIZE to efficiently perform row-by-row pattern matching:
SELECT prefix || first_suffix || '-' || prefix || last_suffix AS range
FROM (
SELECT TRANSLATE(ref_code, 'A0123456789', 'A') AS prefix,
TO_NUMBER(TRANSLATE(ref_code, '0ABCDEFGHIJKLMONPQRSTUVWXYZ', '0')) AS suffix
FROM xx_ref_codes
)
MATCH_RECOGNIZE(
PARTITION BY prefix
ORDER BY suffix
MEASURES
FIRST(suffix) AS first_suffix,
LAST(suffix) AS last_suffix
PATTERN (consecutive* final_row)
DEFINE consecutive AS suffix + 1 = NEXT(suffix)
);
Or, if you want to use analytic functions and then aggregate then:
SELECT prefix || MIN(suffix) || '-' || prefix || MAX(suffix) AS range
FROM (
SELECT prefix,
suffix,
suffix - ROW_NUMBER() OVER (PARTITION BY prefix ORDER BY suffix) AS grp
FROM (
SELECT TRANSLATE(ref_code, 'A0123456789', 'A') AS prefix,
TO_NUMBER(TRANSLATE(ref_code, '0ABCDEFGHIJKLMONPQRSTUVWXYZ', '0')) AS suffix
FROM xx_ref_codes
)
)
GROUP BY prefix, grp
Which, for your sample data, both output:
RANGE
A1-A5
A8-A12
B3-B3
B5-B5
B7-B9
B110-B112
C1-C8
fiddle

Could also be solved with standard SQL:
select header || min(num) || case when min(num) <> max(num) then '-' || header || max(num) end as result
from (
select substr(ref_code,1,1) as header, to_number(substr(ref_code,2)) as num,
to_number(substr(ref_code,2)) - row_number() over(partition by substr(ref_code,1,1) order by to_number(substr(ref_code,2))) as grp, ref_code
from xx_ref_codes
)
group by header, grp
;
A1-A5
A8-A12
B3
B5
B7-B9
B110-B112
C1-C8

Related

How to sort when you are using UNION operator in Oracle SQL. I am using two select statements and UNION operator, I want to sort results of both query

I am trying to solve HackerRank SQL - The PADS question.
The Question is:
Generate the following two result sets:
Query an alphabetically ordered list of all names in OCCUPATIONS, immediately followed by the first letter of each profession as a parenthetical (i.e.: enclosed in parentheses). For example: AnActorName(A), ADoctorName(D), AProfessorName(P), and ASingerName(S).
Query the number of ocurrences of each occupation in OCCUPATIONS. Sort the occurrences in ascending order, and output them in the following format:
There are a total of [occupation_count] [occupation]s.
where [occupation_count] is the number of occurrences of an occupation in OCCUPATIONS and [occupation] is the lowercase occupation name. If more than one Occupation has the same [occupation_count], they should be ordered alphabetically.
My Solution is:
SELECT NAME || '(' || SUBSTR(OCCUPATION,1,1) || ')'
FROM OCCUPATIONS
ORDER BY NAME
UNION
SELECT 'There are a total of ' || COUNT(OCCUPATION) || ' ' || LOWER(OCCUPATION) || 's.'
FROM OCCUPATIONS
GROUP BY OCCUPATION
ORDER BY OCCUPATION;
OP:
ERROR at line 4:
ORA-00933: SQL command not properly ended
(It seems, we cannot use ORDER BY BEFORE UNION)
I revised my code to:
SELECT NAME || '(' || SUBSTR(OCCUPATION,1,1) || ')'
FROM OCCUPATIONS
UNION
SELECT 'There are a total of ' || COUNT(OCCUPATION) || ' ' || LOWER(OCCUPATION) || 's.'
FROM OCCUPATIONS
GROUP BY OCCUPATION
ORDER BY NAME, OCCUPATION;
OP:
ERROR at line 7:
ORA-00904: "NAME": invalid identifier
Please, help me out here.
Generate the following two result sets
You are NOT generating two result sets. You are performing two SELECTs and trying to merge them into a single result set using UNION and that is not what the question asks for. Stop using UNION and use two queries.
The first result set would be:
SELECT NAME || '(' || SUBSTR(OCCUPATION,1,1) || ')'
FROM OCCUPATIONS
ORDER BY NAME;
The second result set would be:
SELECT 'There are a total of ' || COUNT(OCCUPATION) || ' ' || LOWER(OCCUPATION) || 's.'
FROM OCCUPATIONS
GROUP BY OCCUPATION
and then you need to ORDER BY the number of occurrences AND then by the occupation name (which I leave to you to solve).
Since you're wanting to output two ordered sets of data in one query, the easiest way is to assign an identifier to each query and then order by that and the column you want to order by, e.g.:
SELECT info
FROM (SELECT 1 qry, NAME || '(' || SUBSTR(OCCUPATION,1,1) || ')' info
FROM OCCUPATIONS
UNION ALL
SELECT 2 qry, 'There are a total of ' || COUNT(OCCUPATION) || ' ' || LOWER(OCCUPATION) || 's.' info
FROM OCCUPATIONS
GROUP BY OCCUPATION)
ORDER BY qry, info;
Note that, because the two queries aren't going to return the same rows, I've used a UNION ALL, since a UNION does a DISTINCT on the resultant data set, whereas UNION ALL doesn't. Also, I'm assuming that if you had two different people with the same name and occupation (e.g. different birth dates), you should output both rows, rather than one row?
Note also that when you have a UNION/UNION ALL query, the output columns inherit the column name from the first query, which is why your second query was giving you the invalid identifier error (you hadn't given your column an alias!).
Please try here hope this help:
select name||'('||SUBSTR(OCCUPATION,1,1)||')' as col
from OCCUPATIONS
UNION ALL
select
'There are a total of '||count(occupation)||' '||LOWER(occupation)||'s.' as col
from OCCUPATIONS
group by occupation
order by col
;
If we need to tune up performance and we also know the 2 selection is not duplicate, just use UNION ALL .

What is the rule of order by with special character?

I sort my data with select pk_customer_no from customer order by pk_customer_no
The code with '-', didn't group together and sort by letter, It seems sql just ignore it and sort by the third letter.
How can I sort by the '-'?
The '-' character is ignored in sorting.
You can use order by replace '-' with '0' (zero), if you want to put the words with '-' in front.
select t.pk_customer_no as rep from (
values ('YH'), ('YHC'), ('Z-CH'), ('Z-CHE'), ('ZCM'), ('Z-CP'), ('Z1'), ('ZHT'), ('ZLA'), ('Z-JP'), ('ZLENO')
) as t (pk_customer_no)
order by replace(t.pk_customer_no, '-', '0')
You can use order by replace '-' with 'Z' if you want to put the words with '-' at the end.
select t.pk_customer_no as rep from (
values ('YH'), ('YHC'), ('Z-CH'), ('Z-CHE'), ('ZCM'), ('Z-CP'), ('Z1'), ('ZHT'), ('ZLA'), ('Z-JP'), ('ZLENO')
) as t (pk_customer_no)
order by replace(t.pk_customer_no, '-', 'Z')

Count and order comma separated values

I have the below one column "table" (apologies for the data model, not my fault :():
COL_IN
------
2K, E
E, 2K
O
I would like to obtain the below output, ordered by count descending:
COL_OUT COUNT
----------
K 4
E 2
O 1
COUNT is a reserved keyword, so it's not a good column name - even in the final output. I use COUNT_ instead (with an underscore).
Other than that, you can modify the input strings so they become valid JSON arrays, so that you can then use JSON functions to split them. After you split the strings into tokens, it's a simple matter to separate the leading number (if present) from the rest of the string, and to aggregate. NVL in the sum adds 1 for each token without a leading integer.
Including the sample data for testing only (if you have an actual table, remove the WITH clause at the top):
with
tbl (col_in) as (
select '2K, E' from dual union all
select 'E, 2K' from dual union all
select 'O' from dual
)
select ltrim(col, '0123456789') as col_out
, sum(nvl(to_number(regexp_substr(col, '^\d*')), 1)) as count_
from tbl,
json_table('["' || regexp_replace(col_in, ', *', '","') || '"]', '$[*]'
columns col path '$')
group by ltrim(col, '0123456789')
order by count_ desc, col_out
;
COL_OUT COUNT_
------- ------
K 4
E 2
O 1
You can use hierarchical query in such a way that
WITH t2 AS
(
SELECT TRIM(REGEXP_SUBSTR(col_in,'[^,]+',1,level)) AS s
FROM t
CONNECT BY level <= REGEXP_COUNT(col_in,',')+1
AND PRIOR SYS_GUID() IS NOT NULL
AND PRIOR col_in = col_in
)
SELECT REGEXP_SUBSTR(s,'[^0-9]') AS col_out,
SUM(NVL(REGEXP_SUBSTR(s,'[^[:alpha:]]'),1)) AS count
FROM t2
GROUP BY REGEXP_SUBSTR(s,'[^0-9]'),REGEXP_SUBSTR(s,'[^[:alpha:]]')
ORDER BY count DESC
presuming all of the data are alphanumeric only(eg.not containing special charaters such as $,#,! ..etc.)

Listagg Overflow function implementation (Oracle SQL)

I am using LISTAGG function for my query, however, it returned an ORA-01489: result of string concatenation is too long error. So I googled that error and found out I can use ON OVERFLOW TRUNCATE and I implemented that into my SQL but now it generates missing right parenthesis error and I can't seem to figure out why?
My query
SELECT DISTINCT cust_id, acct_no, state, language_indicator, billing_system, market_code,
EMAIL_ADDR, DATE_OF_CHANGE, TO_CHAR(DATE_LOADED, 'DD-MM-YYYY') DATE_LOADED,
(SELECT LISTAGG( SUBSTR(mtn, 7, 4),'<br>' ON OVERFLOW TRUNCATE '***' )
WITHIN GROUP (ORDER BY cust_id || acct_no) mtnlist
FROM process.feature WHERE date_loaded BETWEEN TO_DATE('02-08-2018','MM-dd-yyyy')
AND TO_DATE('02-09-2018', 'MM-dd-yyyy') AND cust_id = ffsr.cust_id
AND acct_no = ffsr.acct_no AND filename = 'FEATURE.VB2B.201802090040'
GROUP BY cust_id||acct_no) mtnlist
FROM process.feature ffsr WHERE date_loaded BETWEEN TO_DATE('02-08-2018','MM-dd-yyyy')
AND TO_DATE('02-09-2018','MM-dd-yyyy') AND cust_id BETWEEN 0542185146 AND 0942025571
AND src_ind = 'B' AND filename = 'FEATURE.VB2B.201802090040'
AND letter_type = 'FA' ORDER BY cust_id;
With a little bit of help by XML, you might get it work. Example is based on HR schema.
SQL> select
2 listagg(s.department_name, ',') within group (order by null) result
3 from departments s, departments d;
from departments s, departments d
*
ERROR at line 3:
ORA-01489: result of string concatenation is too long
SQL>
SQL> select
2 rtrim(xmlagg(xmlelement (e, s.department_name || ',')).extract
3 ('//text()').getclobval(), ',') result
4 from departments s, departments d;
RESULT
--------------------------------------------------------------------------------
Administration,Administration,Administration,Administration,Administration,Admin
SQL>
This demo sourced from livesql.oracle.com
-- Create table with 93 strings of different lengths, plus one NULL string. Notice the only ASCII character not used is '!', so I will use it as a delimiter in LISTAGG.
create table strings as
with letters as (
select level num,
chr(ascii('!')+level) let
from dual
connect by level <= 126 - ascii('!')
union all
select 1, null from dual
)
select rpad(let,num,let) str from letters;
-- Note the use of LENGTHB to get the length in bytes, not characters.
select str,
sum(lengthb(str)+1) over(order by str rows unbounded preceding) - 1 cumul_lengthb,
sum(lengthb(str)+1) over() - 1 total_lengthb,
count(*) over() num_values
from strings
where str is not null;
-- This statement implements the ON OVERFLOW TRUNCATE WITH COUNT option of LISTAGG in 12.2. If there is no overflow, the result is the same as a normal LISTAGG.
select listagg(str, '!') within group(order by str) ||
case when max(total_lengthb) > 4000 then
'! ... (' || (max(num_values) - count(*)) || ')'
end str_list
from (
select str,
sum(lengthb(str)+1) over(order by str) - 1 cumul_lengthb,
sum(lengthb(str)+1) over() - 1 total_lengthb,
count(*) over() num_values
from strings
where str is not null
)
where total_lengthb <= 4000
or cumul_lengthb <= 4000 - length('! ... (' || num_values || ')');

remove duplicate values from a oracle sql query's output

I have a situation where I want to remove the duplicated record from the result by using sql query in oracle 10g. I am using regular expression to remove the alphabets from the result
Original value = 1A,1B,2C,2F,4A,4z,11A,11B
Current Sql query
select REGEXP_REPLACE( tablex.column, '[A-Za-z]' , '' )
from db1
gives me the following output
1,1,2,3,4,4,11,11
how can i remove duplicate from the output to just show unique values
i.e.
1,2,3,4,11
Assuming that your table contains strings with values separated with commas.
You can try something like this:
Here is a sqlfiddle demo
select rtrim(xmltype('<r><n>' ||
replace(REGEXP_REPLACE( col, '[A-Za-z]' , '' ), ',', ',</n><n>')||',</n></r>'
).extract('//n[not(preceding::n = .)]/text()').getstringval(), ',')
from tablex;
What it does is after using your regexp_replace it makes a xmltype from it and then uses XPATH to get the desired output.
If you also want to sort the values (and still use the xml approach) then you need XSL
select rtrim(xmltype('<r><n>' ||
replace(REGEXP_REPLACE( col, '[A-Za-z]' , '' ), ',', '</n><n>')||'</n></r>'
).extract('//n[not(preceding::n = .)]')
.transform(xmltype('<?xml version="1.0" ?><xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"><xsl:template match="/"><xsl:for-each select="//n[not(preceding::n = .)]"><xsl:sort select="." data-type="number"/><xsl:value-of select="."/>,</xsl:for-each></xsl:template></xsl:stylesheet>'))
.getstringval(), ',')
from tablex;
But you can also try different approaches, such as splitting the tokens to rows and then recollecting them
select rtrim(xmlagg(xmlelement(e, n || ',') order by to_number(n))
.extract('//text()'), ',')
from(
SELECT distinct rn, trim(regexp_substr(col, '[^,]+', 1, level)) n
FROM (select row_number() over (order by col) rn ,
REGEXP_REPLACE( col, '[A-Za-z]' , '' ) col
from tablex) t
CONNECT BY instr(col, ',', 1, level - 1) > 0
)
group by rn;