Reading a part of a alpha numeric string in SQL - sql

I have a table with one column " otname "
table1.otname contains multiple rows of alpha-numeric string resembling the following data sample:
11.10.32.12.U.A.F.3.2.21.249.1
2001.1.1003.8281.A.LE.P.P
2010.1.1003.8261.A.LE.B.B
I want to read the fourth number in every string ( part of the string in bold ) and write a query in Oracle 10g
to read its description stored in another table. My dilemma is writing the first part of the query.i.e. choosing the fourth number of every string in a table
My second query will be something like this:
select description_text from table2 where sncode = 8281 -- fourth part of the data sample in every string
Many thanks.
novice

Works with 9i+:
WITH portion AS (
SELECT SUBSTR(t.otname, INSTR(t.otname, ".", 1, 3)+1, INSTR(t.otname, ".", 1, 4)) 'sncode'
FROM TABLE t)
SELECT t.description_text
FROM TABLE2 t
JOIN portion p ON p.sncode = t.sncode
The use of SUBSTR should be obvious; INSTR is being used to find location the period (.), starting at the first character in the string (parameter value 1), on the 3rd and 4th appearance in the string. You might have to subtract one from the position returned for the 4th instance of the period - test this first to be sure you're getting the right values:
SELECT SUBSTR(t.otname, INSTR(t.otname, ".", 1, 3)+1, INSTR(t.otname, ".", 1, 4)) 'sncode'
FROM TABLE t
I used subquery factoring so the substring happens before you join to the second table. It can be done as a subquery, but subquery factoring is faster.

Newer versions of oracle (including 10g) have various regular expression functions. So you can do something like this:
where sncode = to_number(regexp_replace(otname, '^(\d+\.\d+\.\d+\.(\d+))?.+$', '\2'))
This matches 3 sets of digits-followed-by-a-dot, and a fourth grouped set of digits, followed by the rest of the string, and returns a string consisting of all that entirely replaced by the first group (the fourth set of digits).
Here's a complete query (if I understood your description of the two tables correctly):
select t2.description_text
from table1 t1, table2 t2
where t2.sncode = to_number(regexp_replace(t1.otname, '^(\d+\.\d+\.\d+\.(\d+))?.+$', '\2'))
Another slightly shorter alternative regex:
where t2.sncode = to_number(regexp_replace(t1.otname, '^((\d+\.){3}(\d+))?.+$', '\3'))

Related

How run Select Query with LIKE on thousands of rows

Newbie here. Been searching for hours now but I can seem to find the correct answer or properly phrase my search.
I have thousands of rows (orderids) that I want to put on an IN function, I have to run a LIKE at the same time on these values since the columns contains json and there's no dedicated table that only has the order_id value. I am running the query in BigQuery.
Sample Input:
ORD12345
ORD54376
Table I'm trying to Query: transactions_table
Query:
SELECT order_id, transaction_uuid,client_name
FROM transactions_table
WHERE JSON_VALUE(transactions_table,'$.ordernum') LIKE IN ('%ORD12345%','%ORD54376%')
Just doesn't work especially if I have thousands of rows.
Also, how do I add the order id that I am querying so that it appears under an order_id column in the query result?
Desired Output:
Option one
WITH transf as (Select order_id, transaction_uuid,client_name , JSON_VALUE(transactions_table,'$.ordernum') as o_num from transactions_table)
Select * from transf where o_num like '%ORD12345%' or o_num like '%ORD54376%'
Option two
split o_num by "-" as separator , create table of orders like (select 'ORD12345' as num
Union
Select 'ORD54376' aa num) and inner join it with transf.o_num
One method uses OR:
WHERE JSON_VALUE(transactions_table, '$.ordernum') LIKE IN '%ORD12345%' OR
JSON_VALUE(transactions_table, '$.ordernum') LIKE '%ORD54376%'
An alternative method uses regular expressions:
WHERE REGEXP_CONTAINS(JSON_VALUE(transactions_table, '$.ordernum'), 'ORD12345|ORD54376')
According to the documentation, here, the LIKE operator works as described:
Checks if the STRING in the first operand X matches a pattern
specified by the second operand Y. Expressions can contain these
characters:
A percent sign "%" matches any number of characters or
bytes.
An underscore "_" matches a single character or byte.
You can escape "\", "_", or "%" using two backslashes. For example, "\%". If
you are using raw strings, only a single backslash is required. For
example, r"\%".
Thus , the syntax would be like the following:
SELECT
order_id,
transaction_uuid,
client_name
FROM
transactions_table
WHERE
JSON_VALUE(transactions_table,
'$.ordernum') LIKE '%ORD12345%'
OR JSON_VALUE(transactions_table,
'$.ordernum') LIKE '%ORD54376%
Notice that we specify two conditions connected with the OR logical operator.
As a bonus information, when querying large datasets it is a good pratice to select only the columns you desire in your out output ( either in a Temp Table or final view) instead of using *, because BigQuery is columnar, one of the reasons it is faster.
As an alternative for using LIKE, you can use REGEXP_CONTAINS, according to the documentation:
Returns TRUE if value is a partial match for the regular expression, regex.
Using the following syntax:
REGEXP_CONTAINS(value, regex)
However, it will also work if instead of a regex expression you use a STRING between single/double quotes. In addition, you can use the pipe operator (|) to allow the searched components to be logically ordered, when you have more than expression to search, as follows:
where regexp_contains(email,"gary|test")
I hope if helps.

Oracle SQL - How to Select a substring index inside another Query?

I'm writing a query that returns a bunch of things from multiple tables. The main query is against Table_1. I need to return a substring from a field in table 7. But I'm getting an error that Substring_Index is an invalid identifier. How can I achieve the intended result?
I have a field COLUMN_1 of TABLE_1 that has 3+ pieces of data, separated by " : " (space colon space) and I need to strip out the text before the first delimiter, and return the rest of it (regardless of length).
A simplified example:
SELECT t1.name
,t1.address
,t1.phone
,t2. fave_brand
,SUBSTRING_INDEX(t3.fave_product, ' : ', -1) AS Fave Product
FROM table_1 t1
INNER JOIN table_2 t2
ON t2.brand_SK = t1.fave_brand_FK
INNER JOIN table_3 t3
ON t3.product_list_SK = t1.fave_products
WHERE <a series of constraints>;
Please note, I am NOT normally an SQL developer, but the back-end dev is on vacation and I've been tasked with cobbling this fix together. I'm a beginner at best.
In oracle you could use regexp_replace():
regexp_replace(t3.fave_product, '^[^:]*:', '') "Fave Product"
regexp_replace() replaces the part of the string that matches the regexp given as second argument with the value given as third argument. Here, we use the empty string as third argument, meaning that the matching part of the string is suppressed.
Regexp breakdown:
^ beginning of the string
[^:]* as many characters as possible other than ":" (possibly, 0 characters)
: character ":"
NB: identifiers that contain special characters (such as space) need to be double quoted.
Oracle does not support substring_index(). That is a MySQL function.
You can use regexp_substr(). Without sample data it is a little hard to be 100% sure, but I think the logic you want is:
regexp_substr(t3.fave_product, '[^:]+$') as fave_product

Oracle: regexp for a complicated case

I have a table, and one of the columns contains a string with items separated by semicolons(;)
I want to selectively transfer the data to a new table based on the pattern of the String.
For example, it may look like
16;;14;30;24;11;13;14;14;10;13;18;15;18;24;13/18;11;;23;12;;19;10;;11;26;;;42;26;38/39;12;;;;;;;11;;;;;;;;;;;;;;;
or
11;;11;11;11;11;11;11;11;11;11;11;11;11;11;11;11;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
I don't care about what's between the semicolons, but I care about which positions contain items. For example, if I only want the 1st, 3rd, 4th position to contain items, I would allow the following...
32;;14;18/12;;;;;;;;; or 32;;14;18/12;;;;55;;;;11;;;;;;;
This one down below is not okay because the 3rd position does not hold any value.
32;;;18/12;;;;;;;;;
If regexp works for this, then I can use merge into to move the desired records to the target table. If this cannot be done, I'll have to process each record in Java, and selectively insert the records to the new table.
source table:
id | StringValue | count
target table:
id | StringValue | count
The sql that I have in mind:
merge into you_target_table tt
using ( select StringValue, count
from source_table where REGEXP_LIKE ( StringValue, 'some pattern')
) st
on ( st.StringValue = tt.StringValue and st.count=tt.count )
when not matched then
insert (id, StringValue , count)
values (someseq.nextval, st.value1, st.count)
when matched then
update
set tt.count = tt.count + st.count;
Also I'm certain that all StringValue in source table is unique, so what's after when matched then is not important, but due to the syntax, I think I must have something.
For each position you want a value put [^;]+;, that matches any character, that is not ; and occurs at least one time followed by a ;. If you don't care for a position put [^;]*;. That's almost similar to the first one but the characters, that are before the ; may also be none. Anchor the whole thing to the beginning with ^.
So for your 1st, 3rd and 4th position example you'd get:
^[^;]+;[^;]*;[^;]+;[^;]+;
In a query that'd look like:
SELECT *
FROM elbat
WHERE regexp_like(nmuloc, '^[^;]+;[^;]*;[^;]+;[^;]+;');
db<>fiddle
It may be further improved by putting the sub expressions in a group, that is, put parenthesis around them, and use quantors -- a number in curly braces after the group. For example ([^;]+;){2} would match two positions that are not empty. Your example would get shorten to:
^[^;]+;[^;]*;([^;]+;){2}
While #stiky bit answer is totally correct there is another similar but perhaps more readable solution:
SELECT *
FROM elbat
WHERE regexp_substr(nmuloc, '(.*?)(;|$)', 1, 1, '', 1) is not null
AND regexp_substr(nmuloc, '(.*?)(;|$)', 1, 3, '', 1) is not null
AND regexp_substr(nmuloc, '(.*?)(;|$)', 1, 4, '', 1) is not null;
db<>fiddle
Pros:
clearly states position number that should not be null
has universal pattern for any condition, so no need in changing regex
can use any regex as delimiter, not only single character
actually extracts item, so you can further test it with any function
Cons:
rather verbose
n times slower, where n is condition count
even more slower (up to 2 times) cause of backtracking on each non-delimiter symbol
However in my experience this efficiency difference is minor if query is not run against billions of rows. And even then disk reading would consume most of the time.
How it's made:
(.*?)(;|$) - lazily searches for any character sequence (possibly zero-length) ended with delimiter or end of string
1 - position to start search. 1 is default. Needed only to get to the next parameter
1, 3 or 4 - occurrence or pattern
'' - match_parameter. Can be used for setting up matching mode, but here also only to get to the last parameter
1 - sub-expression number makes regexp_substr return only first capturing group. That is (.*?) i.e. item itself without delimiter.

Comparing fields when a field has data in between 2 characters that match the field being compared

I have code that looks like this:
left outer join
gme_batch_header bh
on
substr(ln.lot_number,instr(ln.lot_number,'(') + 1,
instr(ln.lot_number,')') - instr(ln.lot_number,'(') - 1)
=
bh.batch_no
It works fine, but I have come across a few lot numbers that have two sections of strings that are between parenthesis. How would I compare what is between the second set of parenthesis? Here is an example of the data in the lot number field:
E142059-307-SCRAP-(74055)
This one works with the code,
58LF-3-B-2-2-2 (SCRAP)-(61448)
This one tries comparing SCRAP with the batch no, which isn't correct. It needs to be the 61448.
The result is always the last item in parenthesis.
After more research, I actually got it to work with this code:
substr(ln.lot_number,instr(ln.lot_number,'(',-1) + 1, instr(ln.lot_number,')',-1) - instr(ln.lot_number,'(',-1) - 1)
Assuming SQL2005+, and it is always the last occurrence you want, then I would suggest finding the last instance of a ( in your query and substring to there. To get the last instance you could use something like:
REVERSE(SUBSTRING(REVERSE(lot_number),0,CHARINDEX('(',REVERSE(lot_number))))
If your version of Oracle supports regular expressions try this:
substr(regexp_substr(ln.lot_number,'[0-9]+\)$'),1,length(regexp_substr(ln.lot_number,'[0-9]+\)$'))-1)
Explanation:
regexp_substr(scrap_row,'[0-9]+\)$' ==> find me just numbers in the string that ends in ). This returns the numbers but it includes the closing parenthesis.
To remove the closing parenthsis, just send it through substring and extract first number through the length of the number stopping at 1 character from the end of the string.
Query for analysis:
with scrap
as (select '58LF-3-B-2-2-2 (SCRAP)-(61448)' as scrap_row from dual)
select scrap_row,
regexp_substr(scrap_row,'[0-9]+\)$') as regex_substring,
length(regexp_substr(scrap_row,'[0-9]+\)$')) as length_regex_substring,
substr(regexp_substr(scrap_row,'[0-9]+\)$'),1,length(regexp_substr(scrap_row,'[0-9]+\)$'))-1) as regex_sans_parenthesis
from scrap
If you have 11g, this will do it pretty simply by using the subgroup argument of regexp_substr() and constructing the regex appropriately:
SQL> with tbl(data) as
(
select 'E142059-307-SCRAP-(74055)' from dual
union
select '58LF-3-B-2-2-2 (SCRAP)-(61448)' from dual
)
select data from tbl
where regexp_substr(data, '\((\d+)\)$', 1, 1, NULL, 1)
= '61448';
DATA
------------------------------
58LF-3-B-2-2-2 (SCRAP)-(61448)
The regular expression can be read as:
\( - Search for a literal left paren
( - Start a remembered subgroup
\d+ - followed by 1 more more digits
) - End remembered subgroup
\) - followed by a literal right paren
$ - at the end of the line.
The regexp_substr function arguments are:
Source - the source string
Pattern - The regex pattern to look for
position - Position in the string to start looking for the pattern
occurrence - If the pattern occurs multiple times, which occurrence you want
match_params - See the docs, not used here
subexpression - which subexpression to use (the remembered group)
So in English, look for a series of 1 or more digits surrounded by parens, where it occurs at the end of the line and save the digit part only to use to compare. IMHO a lot easier to follow/maintain than nested instr(), substr().
For re-useability, make a function called get_last_number_in_parens() that contains this code and uses an argument of the string to search. This way that logic is encapsulated and can be re-used by folks that may not be so comfortable with regular expressions, but can benefit from the power! One place to maintain code too. Then call like this:
select data from tbl
where get_last_number_in_parens(data) = '61448';
How easy is that?!
Hello you can check with this code. It works whaever the condition may be
SELECT SUBSTR('58LF-3-B-2-2-2-(61448)',instr('58LF-3-B-2-2-2-(61448)','(',-1)+1,LENGTH('58LF-3-B-2-2-2-(61448)')-instr('58LF-3-B-2-2-2-(61448)','(',-1)-1)
FROM dual;
SELECT SUBSTR('58LF-3-B-2-2-2 (SCRAP)-(61448)',instr('58LF-3-B-2-2-2 (SCRAP)-(61448)','(',-1)+1,LENGTH('58LF-3-B-2-2-2 (SCRAP)-(61448)')-instr('58LF-3-B-2-2-2 (SCRAP)-(61448)','(',-1)-1)
FROM dual;
Output
==================================
61448
==================================

SQL query search within a string

I have a column of data in my database that has a project number the numbers are formatted like this, YYYYnnnn.ee (for example 20140124.00). Since there are three distinct parts of the number, the user could search by either the YYYY or the nnnn. I have written a query that searches by the YYYY. I need to write a query for the nnnn.
How would you write a query to search for a specific string in a specific location of another string?
You can use SUBSTRING for that.
The first paramter to SUBSTRING should be the columname, the next parameter is the index of the start character and the last parameter is the number of characters.
The example below will return the nnnn part from your column. In the example below we are searching for records where the nnnn equals 1234.
SELECT *
FROM tablename
WHERE
(SUBSTRING(columnName, 5, 4) = '1234')
Please note, if your column is a number (decimal). You will need to first cast it as a varchar (string). For example:
SELECT *
FROM tablename
WHERE
(SUBSTRING(CAST(columnName AS VARCHAR(20)), 5, 4) = '1234')