How to find the highest numbered version of text? - sql

How to find the highest numbered version of text? For example I have the data with text+digit:
Supra1, Supra2,...,SupraN in column1: translated_description.
select
*
from
oe.product_descriptions
where
translated_description like '%Supra%';
I need to extract the value from another column (column2) for the highest number e.g. N=30 for Supra30 in column1.

If all of the values in column1 have numbers with the same number of digits, you can order by it and use the fetch first syntax:
SELECT column2
FROM mytable
WHERE column1 LIKE 'Supra%'
ORDER BY column1 DESC
FETCH FIRST ROW ONLY
If the number of digits in column1 varies, you'll have to extract them, convert the number, and sort numerically:
SELECT column2
FROM mytable
WHERE column1 LIKE 'Supra%'
ORDER BY TO_NUMBER(REPLACE(column1, 'Supra', '')) DESC
FETCH FIRST ROW ONLY

Try using regexp_substr to extract the number and then apply max on it:
SELECT max(to_number(regexp_substr(t.translated_description, 'Supra([0-9]+)', 1, 1, NULL, 1)))
FROM oe.product_descriptions t
This will extract the number, assuming that the format of the content of the column is SOMETEXTnumber

If there could be more than 1 Supra in the column and if the numbers after the word supra could be 1 or 2 then (just for testing) you could have data like this:
WITH
tbl AS
(
Select 1 "ID", '10GB Removable HDD ... Supra7 disk drives ... transfer rate up to 160MB/s' "COL_1" From Dual Union All
Select 2 "ID", 'Some words ... Supra9 some more words ... and numbers 16 - 32GB' "COL_1" From Dual Union All
Select 3 "ID", 'Words, words, ... Supra12, and Supra13 are considered better than Supra15... words, words' "COL_1" From Dual
),
In this case you should check where the Supra words are located within the string and get ridd of everyting infront. So, here is a cte checking for three words Supra within the text:
supras AS
(
Select
ID,
CASE WHEN InStr(Upper(COL_1), 'SUPRA', 1, 1) > 0 THEN SubStr(COL_1, InStr(Upper(COL_1), 'SUPRA', 1, 1)) END "SUPRA_1",
CASE WHEN InStr(Upper(COL_1), 'SUPRA', 1, 2) > 0 THEN SubStr(COL_1, InStr(Upper(COL_1), 'SUPRA', 1, 2)) END "SUPRA_2",
CASE WHEN InStr(Upper(COL_1), 'SUPRA', 1, 3) > 0 THEN SubStr(COL_1, InStr(Upper(COL_1), 'SUPRA', 1, 3)) END "SUPRA_3"
From
tbl
)
ID SUPRA_1 SUPRA_2 SUPRA_3
---------- ----------------------------------------------------------------------------------------- ------------------------------------------------------------- --------------------------
1 Supra7 disk drives ... transfer rate up to 160MB/s
2 Supra9 some more words ... and numbers 16 - 32GB
3 Supra12, and Supra13 are considered better than Supra15... words, words Supra13 are considered better than Supra15... words, words Supra15... words, words
And, finaly, this resulting dataset could be transformed to just numbers among which you should select the MAX one:
-- Main SQL
SELECT MAX(GREATEST(SUPRA_1, SUPRA_2, SUPRA_3)) "MAX_SUPRA"
FROM
(
Select
ID,
CASE WHEN INSTR('0123456789', SubStr(SUPRA_1, 6, 1)) > 0 And INSTR('0123456789', SubStr(SUPRA_1, 7, 1)) > 0 THEN To_Number(SubStr(SUPRA_1, 6, 2))
WHEN INSTR('0123456789', SubStr(SUPRA_1, 6, 1)) > 0 THEN To_Number(SubStr(SUPRA_1, 6, 1))
ELSE 0
END "SUPRA_1",
--
CASE WHEN INSTR('0123456789', SubStr(SUPRA_2, 6, 1)) > 0 And INSTR('0123456789', SubStr(SUPRA_2, 7, 1)) > 0 THEN To_Number(SubStr(SUPRA_2, 6, 2))
WHEN INSTR('0123456789', SubStr(SUPRA_2, 6, 1)) > 0 THEN To_Number(SubStr(SUPRA_2, 6, 1))
ELSE 0
END "SUPRA_2",
--
CASE WHEN INSTR('0123456789', SubStr(SUPRA_3, 6, 1)) > 0 And INSTR('0123456789', SubStr(SUPRA_3, 7, 1)) > 0 THEN To_Number(SubStr(SUPRA_3, 6, 2))
WHEN INSTR('0123456789', SubStr(SUPRA_3, 6, 1)) > 0 THEN To_Number(SubStr(SUPRA_3, 6, 1))
ELSE 0
END "SUPRA_3"
From
supras
)
MAX_SUPRA
----------
15
-- Below is result of the inner query in main sql
ID SUPRA_1 SUPRA_2 SUPRA_3
---------- ---------- ---------- ----------
1 7 0 0
2 9 0 0
3 12 13 15

if you use SQL SERVER ,you can try this Query:
select ID as Column1,CONVERT(INT,SUBSTRING(ID,6, (LEN(ID)-5))) as Column2
from T
order by CONVERT(INT,SUBSTRING(ID,6, (LEN(ID)-5))) desc

Related

BigQuery SQL query to Indicate a sequence of 3 rows sharing the same value

I need a query that every time the indicator column turns into zero and there are 3 zeros in a row, I would like to assign them a unique group number.
Here is a sample data:
select 0 as offset, 1 as indicator, -1 as grp union all
select 1, 1, -1 union all
select 2, 1, -1 union all
select 3, 1, -1 union all
select 4, 1, -1 union all
select 5, 1, -1 union all
select 6, 1, -1 union all
select 7, 0, 1 union all
select 8, 0, 1 union all
select 9, 0, 1 union all
select 10, 1, -1 union all
select 11, 0, 2 union all
select 12, 0, 2 union all
select 13, 0, 2 union all
select 14, 1, -1 union all
select 15, 1, -1 union all
select 16, 1, -1
In this example there are two sequences of 3 zeros, indicated as grp=1 and grp=2.
Consider below approach
select offset, indicator, if(grp = 0, -1, grp) as grp
from (
select offset, indicator, dense_rank() over(order by pregroup) - 1 as grp
from (
select offset, indicator,
if(countif(indicator = 0) over(partition by pregroup) = 3 and indicator = 0, pregroup, -1) as pregroup
from (
select offset, indicator, count(*) over win - countif(indicator = 0) over win as pregroup
from your_table
window win as (order by offset)
)
)
)
if applied to slightly modified sample data n your question (with sequence of 4 zeros - just for test purpose) - output is
The below query solves this.
Firstly it assigns all of the desired groups a tag.
Secondly, we get the row number for them and use integer casting on row_number to assign them a unique group number.
with data as (select 0 as offset, 1 as indicator, -1 as grp union all
select 1, 1, -1 union all
select 2, 1, -1 union all
select 3, 1, -1 union all
select 4, 1, -1 union all
select 5, 1, -1 union all
select 6, 1, -1 union all
select 7, 0, 1 union all
select 8, 0, 1 union all
select 9, 0, 1 union all
select 10, 1, -1 union all
select 11, 0, 2 union all
select 12, 0, 2 union all
select 13, 0, 2 union all
select 14, 1, -1 union all
select 15, 1, -1 union all
select 16, 1, -1 ),
tagged as (select
*,
-- mark as part of the group if both indicators in front, both indicators behind, or one indicator in front and behind are 0.
case
when indicator = 0 and lead(indicator) over(order by offset) = 0 and lead(indicator, 2) over(order by offset) = 0 then true
when indicator = 0 and lead(indicator) over(order by offset) = 0 and lag(indicator) over(order by offset) = 0 then true
when indicator = 0 and lag(indicator) over(order by offset) = 0 and lag(indicator, 2) over(order by offset) = 0 then true
else false
end as part_of_group
from data),
group_tags as (
select
*,
-- use cast as int to acquire the group number from the row number
CAST((row_number() over(order by offset) + 1)/3 AS INT) as group_tag
from
tagged
where
part_of_group = true)
-- rejoin this data back together
select
d.*,
gt.group_tag
from data as d
left join
group_tags as gt
on
d.offset = gt.offset
You may consider below approach as well,
WITH partitions AS (
SELECT *, indicator = 0 AND COUNT(div) OVER (PARTITION BY div, indicator) = 3 AS flag
FROM (
SELECT *, SUM(indicator) OVER (ORDER BY offset) AS div FROM sample_data
)
)
SELECT offset, indicator, IF(flag, DENSE_RANK() OVER w, -1) AS grp
FROM partitions
WINDOW w AS (PARTITION BY CASE WHEN flag THEN 0 ELSE 1 END ORDER BY div)
ORDER BY offset;
Query results

TSQL counting '1' in a string position, by positions

There are fields for categories like that:
"101011111000000101010011000101..." every position in this strings represents a certain category if set to "1".
So "1" means set and "0" means not set.
I would like to count the categories with the highest number of "1" and order them descending.
My current solution is like that:
SELECT COUNT(SUBSTRING([Interests], 1, 1)) AS xcount, 1 AS ID
FROM [db1].[dbo].[Contacts]
WHERE SUBSTRING([Interests], 1, 1) = '1'
UNION
SELECT COUNT(SUBSTRING([Interests], 2, 1)) AS xcount, 2 AS ID
FROM [db1].[dbo].[Contacts]
WHERE SUBSTRING([Interests], 2, 1) = '1'
UNION
SELECT COUNT(SUBSTRING([Interests], 3, 1)) AS xcount, 3 AS ID
FROM [db1].[dbo].[Contacts]
WHERE SUBSTRING([Interests], 3, 1) = '1'
UNION
SELECT COUNT(SUBSTRING([Interests], 4, 1)) AS xcount, 4 AS ID
FROM [db1].[dbo].[Contacts]
WHERE SUBSTRING([Interests], 4, 1) = '1'
UNION
SELECT COUNT(SUBSTRING([Interests], 5, 1)) AS xcount, 5 AS ID
FROM [db1].[dbo].[Contacts]
WHERE SUBSTRING([Interests], 5, 1) = '1'
ORDER BY xcount DESC
Is there a better or faster way to count those categories?
SELECT SUM(CASE WHEN SUBSTRING([Interests], _ID.ID, 1) = '1' THEN 1 ELSE 0 END) AS xcount, _ID.ID
FROM [db1].[dbo].[Contacts], (VALUES (1),(2),(3),(4),(5)) AS _ID(ID)
GROUP BY _ID.ID
ORDER BY xcount DESC
For more categories just increase _ID sequence.
This will count number of '1' in a string consisting of 0 and 1
declare #s varchar(100) ='101011111000000101010011000101';
select cnt = len(#s) - len(replace(#s,'0',''))

Regexp_substr expression

I have problem with my REGEXP expression which I want to loop and every iteration deletes text after slash. My expression looks like this now
REGEXP_SUBSTR('L1161148/1/10', '.*(/)')
I'm getting L1161148/1/ instead of L1161148/1
You said you wanted to loop.
CAVEAT: Both of these solutions assume there are no NULL list elements (all slashes have a value in between them).
SQL> with tbl(data) as (
select 'L1161148/1/10' from dual
)
select level, nvl(substr(data, 1, instr(data, '/', 1, level)-1), data) formatted
from tbl
connect by level <= regexp_count(data, '/') + 1 -- Loop # of delimiters +1 times
order by level desc;
LEVEL FORMATTED
---------- -------------
3 L1161148/1/10
2 L1161148/1
1 L1161148
SQL>
EDIT: To handle multiple rows:
SQL> with tbl(rownbr, col1) as (
select 1, 'L1161148/1/10/2/34/5/6' from dual
union
select 2, 'ALKDFJV1161148/123/456/789/1/2/3' from dual
)
SELECT rownbr, column_value substring_nbr,
nvl(substr(col1, 1, instr(col1, '/', 1, column_value)-1), col1) formatted
FROM tbl,
TABLE(
CAST(
MULTISET(SELECT LEVEL
FROM dual
CONNECT BY LEVEL <= REGEXP_COUNT(col1, '/')+1
) AS sys.OdciNumberList
)
)
order by rownbr, substring_nbr desc
;
ROWNBR SUBSTRING_NBR FORMATTED
---------- ------------- --------------------------------
1 7 L1161148/1/10/2/34/5/6
1 6 L1161148/1/10/2/34/5
1 5 L1161148/1/10/2/34
1 4 L1161148/1/10/2
1 3 L1161148/1/10
1 2 L1161148/1
1 1 L1161148
2 7 ALKDFJV1161148/123/456/789/1/2/3
2 6 ALKDFJV1161148/123/456/789/1/2
2 5 ALKDFJV1161148/123/456/789/1
2 4 ALKDFJV1161148/123/456/789
2 3 ALKDFJV1161148/123/456
2 2 ALKDFJV1161148/123
2 1 ALKDFJV1161148
14 rows selected.
SQL>
You can try removing the string after the last slash:
select regexp_replace('L1161148/1/10', '/([^/]*)$', '') from dual
You are trying to go as far as the last / and then "look back" and retain what was before it. With regular expressions you can do that with a subexpression, like this:
select regexp_substr('L1161148/1/10', '(.*)/.*', 1, 1, null, 1) from dual;
Here, as usual, the first argument "1" means where to start the search, the second "1" means which matching substring to choose, "null" means no special matching modifiers (like case-insensitive matching and such - not needed here), and the last "1" means return the first subexpression - the first thing in parentheses in the "match pattern."
However, regular expressions should only be used when you can't do it with the standard substr and instr (and translate) functions. Here the job is quite easy:
instr(text_string, '/', -1)
will give you the position of the LAST / in text_string (the -1 means find the last occurrence, instead of the first: count from the end of the string). So the whole thing can be written as:
select substr('L1161148/1/10', 1, instr('L1161148/1/10', '/', -1) - 1) from dual;
Edit: In the spirit of Gary_W's solution, here is a generalization to several strings and stripping successive layers from each input string; still not using regular expressions (resulting in slightly faster performance) and using a recursive CTE, available since Oracle version 11; I believe Gary's solution works only from Oracle 12c on.
Query: (I changed Gary's second input string a bit, to make sure the query works properly)
with tbl(item_id, input_str) as (
select 1, 'L1161148/1/10/2/34/5/6' from dual union all
select 2, 'ALKD/FJV11/61148/123/456/789/1/2/3' from dual
),
r (item_id, proc_string, stage) as (
select item_id, input_str, 0 from tbl
union all
select item_id, substr(proc_string, 1, instr(proc_string, '/', -1) - 1), stage + 1
from r
where instr(proc_string, '/') > 0
)
select * from r
order by item_id, stage;
Output:
ITEM_ID PROC_STRING STAGE
---------- ---------------------------------------- ----------
1 L1161148/1/10/2/34/5/6 0
1 L1161148/1/10/2/34/5 1
1 L1161148/1/10/2/34 2
1 L1161148/1/10/2 3
1 L1161148/1/10 4
1 L1161148/1 5
1 L1161148 6
2 ALKD/FJV11/61148/123/456/789/1/2/3 0
2 ALKD/FJV11/61148/123/456/789/1/2 1
2 ALKD/FJV11/61148/123/456/789/1 2
2 ALKD/FJV11/61148/123/456/789 3
2 ALKD/FJV11/61148/123/456 4
2 ALKD/FJV11/61148/123 5
2 ALKD/FJV11/61148 6
2 ALKD/FJV11 7
2 ALKD 8

Get all missing numbers in the sequence

The numbers are originally alpha numeric so I have a query to parse out the numbers:
My query here gives me a list of numbers:
select distinct cast(SUBSTRING(docket,7,999) as INT) from
[DHI_IL_Stage].[dbo].[Violation] where InsertDataSourceID='40' and
ViolationCounty='Carroll' and SUBSTRING(docket,5,2)='TR' and
LEFT(docket,4)='2011' order by 1
Returns the list of numbers parsed out.
For example, the number will be 2012TR557. After using the query it will be 557.
I need to write a query that will give back the missing numbers in a sequence.
Here is one approach
The following should return one row for each sequence of missing numbers. So, if you series is 3, 5, 6, 9, then it should return:
4 4
7 8
The query is:
with nums as (
select distinct cast(SUBSTRING(docket, 7, 999) as INT) as n,
row_number() over (order by cast(SUBSTRING(docket, 7, 999) as INT)) as seqnum
from [DHI_IL_Stage].[dbo].[Violation]
where InsertDataSourceID = '40' and
ViolationCounty = 'Carroll' and
SUBSTRING(docket,5,2) = 'TR' and
LEFT(docket, 4) = '2011'
)
select (nums_prev.n + 1) as first_missing, nums.n - 1 as last_missing
from nums left outer join
nums nums_prev
on nums.seqnum = nums_prev.seqnum + 1
where nums.n <> nums_prev.n + 1 ;

Oracle custom sort

The query...
select distinct name from myTable
returns a bunch of values that start with the following character sequences...
ADL*
FG*
FH*
LAS*
TWUP*
Where '*' is the remainder of the string.
I want to do an order by that sorts in the following manner...
ADL*
LAS*
TWUP*
FG*
FH*
But then I also want to sort within each name in the standard order by fashion. So, an example, if I have the following values
LAS-21A
TWUP-1
FG999
FH3
ADL99999
ADL88888
ADL77777
LAS2
I want it to be sorted like this...
ADL77777
ADL88888
ADL99999
LAS2
TWUP-1
FG999
FH3
I initially thought I could accomplish this vias doing an order by decode(blah) with some like trickery inside of the decode but I've been unable to accomplish it. Any insights?
Goofy and verbose, but should work:
select name, case when substr (name, 1, 3) = 'ADL' then 1
when substr (name, 1, 3) = 'LAS' then 2
when substr (name, 1, 4) = 'TWUP' then 3
when substr (name, 1, 2) = 'FG' then 4
when substr (name, 1, 2) = 'FH' then 5
else 6
end SortOrder
from myTable
order by 2, 1;
Not sure if 6 is the correct place to sort the other items, but it is obvious how to fix that. At least it is clear what is going on, even if I have no idea why you are doing it this way.
EDIT: If these are the only values, you could change lines 4 and 5:
select name, case when substr (name, 1, 3) = 'ADL' then 1
when substr (name, 1, 3) = 'LAS' then 2
when substr (name, 1, 4) = 'TWUP' then 3
when substr (name, 1, 1) = 'F' then 4
else 6
end SortOrder
from myTable
order by 2, 1;
ANOTHER EDIT: And again, if these are the only values, you can simplify even more. Since the only one out of order is the F* series, you can force them to the end, and use the actual first letter for all the others. This is simpler, but relies too much on the exact values for my preference. On the other hand, it does remove many of the seemingly unnecessary calls to substr :
select name, case when substr (name, 1, 1) = 'F' then 'Z'
else name
end SortOrder
from myTable
order by 2, 1;
The problem is that your prefix contains a variable number of characters. This is a good time to deploy regular expressions (if you have 10g or higher).
SQL> select cola
2 from t34
3 order by decode( regexp_substr(cola, '[[:alpha:]]+')
4 , 'ADL' , 10
5 , 'LAS', 20
6 , 'TWUP', 30
7 , 'FG' , 40
8 , 'FH' , 50
9 , 60 )
10 , cola
11 /
COLA
----------
ADL77777
ADL88888
ADL99999
LAS-21A
LAS2
TWUP-1
FG999
FH3
8 rows selected.
SQL>
If earlier versions of Oracle we can use the OWA_PATTERN.AMATCH() function to the same effect:
SQL> select cola
2 from t34
3 order by decode( owa_pattern.amatch(cola, 1, '^[A-Z]+')
4 , 'ADL' , 10
5 , 'LAS', 20
6 , 'TWUP', 30
7 , 'FG' , 40
8 , 'FH' , 50
9 , 60 )
10 , cola
11 /
COLA
----------
ADL77777
ADL88888
ADL99999
FG999
FH3
LAS-21A
LAS2
TWUP-1
8 rows selected.
SQL>