How do I extract a substring from a random position in a string using built-in functions? - sql

I have a series of data stored in the following fashion:
Word of various kinds (ANT\username1) and even more words
This is another row, the words are random (ANT\username2)
Thankfully the username only ever shows once (ANT\username1)
Above represents three seperate rows.
The general flow of this data is:
Parenthesis can appear anywhere in the text
The username portion of each string (ANT\usernamex) will only ever appear once
The text preceeding and proceeding the username portion is always different lengths.
The username text may not always be present
As you probably already guessed what I need to do is take the username from each row and where it isn't present return null. Unfortunately I have no idea how to approach this - I've played around with left() and right() functions but don't really know how else to tackle this. Would appreciate if any answers that use a number of functions to accomplish the task have a quick blurb explaining the flow of logic (so I can then read the documentation for the functions to learn).

Note the specific results when the data is not as expected. This works for exactly the format '(ANT\....)'.
-- sample table
create table t(s varchar(max));
insert t select
'Word of various kinds (ANT\) blank' union all select
'Word of various kinds (ANT) blank' union all select
'Word of various kinds (ANT\ no closing' union all select
'Word of various kinds (ANT\(ANT\me) double up' union all select
'' union all select
'(ANT\' union all select
null union all select
'Word of various kinds (ANT\username1) and even more words' union all select
'This is another row, the words are random (ANT\username2)' union all select
'Thankfully the username only ever shows once (ANT\username1)';
-- Query
select Original = s,
Extracted = nullif(STUFF(LEFT(s, CharIndex(')',s+')',
PatIndex('%(ANT\%', s)) -1), 1,
PatIndex('%(ANT\%', s + '(ANT\')+4,''),'')
from t;

Related

SQL Like condition fails to run

I've been tasked to develop a query that behaves essentially like the following one:
SELECT * FROM tblTestData WHERE *.TestConditions LIKE '*textToSearch*'
The textToSearch is a string which contains information about the condition in which a given device is tested (Voltage, Current, Frequency, etc) in the following format as an example:
[V:127][PF:1][F:50][I:65]
The objective is to recover a list of any and all tests performed at a voltage of 127 Volts, so the SQL developed would look like the folllowing:
SELECT * FROM tblTestData WHERE *.TestConditions LIKE '*V:127*'
This works as intended but there is a problem due to an inproper introduction of data, there are cases in which the _textToSearch string looks like the following examples:
[V.127][PF:1][F:50][I:65]
[V.230][PF:1][F:50][I:65]
As you can see, my previous SQL transaction does not work as it does not meet the conditions.
If I try to do the following transaction with the objective of ignoring improper data format:
SELECT * FROM tblTestData WHERE *.TestConditions LIKE '*V*127*'
The transaction is not succesful and returns an error.
What am I doing wrong for this transaction not to work? I am approaching this problem wrong?
I see a pair of problems although with this transaction, if there were a group of test conditions like the following:
[V.127][PF:1][F:50][I:127]
[V.230][PF:1][F:50][I:127]
Would it return the values of both points given that both meet the condition of the transaction stated above?
In conclusion, my questions are:
What is wrong with the LIKE '*V*127*' condition for it not to work?
What implications has working with this condition? Can it return more information than desired if I am not careful?
I hope it is clear what I am asking for, if it isn't, please point out what is not clear and I will try to clarify it
One choice is to look for any character between the "V" and the "127":
WHERE TestConditions LIKE '%V_127%'
Note that % is the wildcard for a string of any length and _ is the wildcard for a single character.
You can also use regular expressions:
WHERE regexp_like(TestConditions, 'V[.:]127')
Note that regular expressions match anywhere in the string, so wildcards at the beginning and end are not needed.
You could check for both cases (although this will decrease performance)
SELECT *
FROM tblTestData
WHERE (TestConditions LIKE '%V:127%' OR TestConditions LIKE '%V.127%')
It is better to clean the data in your database if only old records have this problem.
Using regular expressions is recommended by Oracle for this kind of conditions. You could build a regular expression for your case:
WITH your_table AS (
SELECT '[V.127][PF:1][F:50][I:65]' text_to_search FROM dual
UNION
SELECT '[V.230][PF:1][F:50][I:65]' text_to_search FROM dual
UNION
SELECT '[V:127][PF:1][F:50][I:65]' text_to_search FROM dual
)
SELECT *
FROM your_table
WHERE REGEXP_LIKE(text_to_search,'\[V(.|:)127\]','i')
Or you could use the good old LIKE operator. In this case, you need to know that:
% matches zero or more characters
_ matches only one character
So you should use an underscore to match the : or the .
WITH your_table AS (
SELECT '[V.127][PF:1][F:50][I:65]' text_to_search FROM dual
UNION
SELECT '[V.230][PF:1][F:50][I:65]' text_to_search FROM dual
UNION
SELECT '[V:127][PF:1][F:50][I:65]' text_to_search FROM dual
)
SELECT *
FROM your_table
WHERE text_to_search LIKE '%V_127%';

Oracle SQL SELECT unpopulated records with the "NOT NULL" constraint

In order to retrieve the data I need, I must SELECT records where there is no data. Some of the rows are unpopulated(?).
Many of these "unpopulated" rows are set with the Not Null constraint. This makes things difficult! I cannot simply search for NULL rows because they are NOT NULL.
I have been able to select or exclude unpopulated rows with a few methods. These methods seem to randomly work or not work.
Example: select or exclude records where st.sart_code or st.sart_hold or st.sart_status or st.sart_date is unpopulated.
SELECT
sp.sprite_id, sp.sprite_last, sp.sprite_first,
st.sart_code
/* 4 data retrieval methods are listed below.
For st.sart_code, I have substituted:
st.sart_hold, st.sart_status, and st.sart_date in the methods 2-4*/
FROM
sprite sp
JOIN sart st
on sp.sprite_pidm = st.sart_pidm
METHOD 1 - select records with rows that do not have the value EVEA -- st.sart_code could contain multiple values for one sp.sprite_id. This is a checklist of items. I am looking for records that do not have EVEA in the checklist
Varchar2 type with a Not Null constraint
WHERE
Sp.sprite_change_ind is null
and
st.sart_pidm NOT IN
(SELECT st.sart_pidm
FROM sart st
WHERE st.sart_code = 'EVEA')
METHOD 2 - select records with rows that do not have the value A2 -- st.sart_hold could contain multiple values for one sp.sprite_id. st.sart_hold may be blank/unpopulated (record has no holds) or contain several different holds. The values are account hold types. I am looking for records that do not have that particular "A2" hold.
Varchar2 type with a Not Null constraint
EDIT I just realized that this works ONLY if there is at least one hold already. If the person has no holds, this script will not select the records (even though the person also has no A2 hold).
WHERE
Sp.sprite_change_ind is null
and
group by sp.sprite_id, sp.sprite_last, sp.sprite_first, st.sart_hold
having sum(case when st.sart_hold = 'A2' then 1 else 0 end) = 0;
METHOD 3 - select records with rows that have no value for st.sart_status -- st.sart_status could contain only 1 of 3 possible values or NO value for one sp.sprite_id. The values are file statuses. I am looking for records that have no status
Varchar2 type with a Not Null constraint
WHERE
Sp.sprite_change_ind is null
and
trim(st.sart_status) is null
METHOD 4 - select records with rows that are NOT missing ANY values in st.sart_date (all date fields in list are populated) -- st.sart_date could either contain a date or be blank/unpopulated for one sp.sprite_id. The value is a received date for a checklist item. I am excluding ANY record that has no date for any of the checklist items (there may be many items with corresponding dates).
Date type with a Not Null constraint
This is a little different, so I am including the first part again.
with MYVIEW AS
(
SELECT
sp.sprite_id AS Per_ID
sp.sprite_last,
sp.sprite_first,
st.sart_date as RECEIVED_DATE
FROM
sprite sp
JOIN sart st
on sp.sprite_pidm = st.sart_pidm
WHERE
Sp.sprite_change_ind is null
)
Select
Per_ID as "ID",
max(SPRITE_LAST_NAME) as "Last",
max(SPRITE_FIRST_NAME) as "First",
FROM MYVIEW
GROUP BY Per_ID
HAVING SUM(NVL2(RECEIVED_DATE,0,1)) = 0
My questions: I have had a difficult time finding methods of working with Not Null constraint fields.
EDIT: How do I see what is in the "not null" constrained field when it is not populated?
Why do the methods above not always work when looking for unpopulated fields? Do certain methods only work with certain data types (varchar2, number, date)? Or does it have to do with the type of JOIN I use? Something else?
Are there other methods out there someone could please direct me to? Any guidance would be greatly appreciated!
What is the correct terminology for "selecting records where there are unpopulated fields of [ColumnName DataType() NOT NULL]?" If I knew the terminology for what I am trying to ask, I could search for it.
NOTE My scripts are usually MUCH more involved than the examples above. I usually have at least 3 joins and many WHERE clauses.
Please let me know if this question is too involved! I am new here. :-)
Probably more a long comment than an answer, but since there isn't much activity here...
Oracle SQL SELECT blank records with the “NOT NULL” constraint Auntie Anita
- How do you have blanks if they are not null - is that partly what you're asking? Alex Poole
- That is one of the problems. Auntie Anita
Few things to know:
In Oracle the empty string '' is the same thing as NULL for VARCHAR/CHAR. That's a departure from "standard" SQL that makes a distinction between empty strings and NULL strings.
TRIM will return NULL for NULL/empty strings/space only strings.
but strings composed of spaces/invisible characters are not null. Even if they only contains the character CHR(0) (aka NUL -- with only one L )
and TRIM does not remove invisible characters. Only spaces.
To convince yourself, try those:
select NVL2(CAST('' AS VARCHAR2(20)), 'NOT NULL','NULL') FROM DUAL
select NVL2(CAST('' AS CHAR(20)), 'NOT NULL','NULL') FROM DUAL
select NVL2(TRIM(' '), 'NOT NULL','NULL') FROM DUAL
select NVL2(' ', 'NOT NULL','NULL') FROM DUAL
select NVL2(CHR(10), 'NOT NULL','NULL') FROM DUAL
select NVL2(CHR(0), 'NOT NULL','NULL') FROM DUAL
select NVL2(TRIM(' '||CHR(10)), 'NOT NULL','NULL') FROM DUAL
select NVL2(TRIM(' '||CHR(0)), 'NOT NULL','NULL') FROM DUAL
So, my guess is your "not null empty fields" in fact contain either some invisible characters -- or maybe even a single CHR(0). This is quite possible as in some languages, the NUL character is used as string terminator -- and might have sneaked into your DB at data import time for empty/missing values. Intentionally or not.
To check for that, you might want to try RAWTOHEX to examine your suspect data fields. In the following example, notice how the middle NUL character is lurking unnoticed when displayed as a string. But not in the raw hex dump:
SQL> select 'abc' || chr(0) || 'def' AS str,
RAWTOHEX('abc' || CHR(0) || 'def') AS hex FROM DUAL
STR HEX
abcdef 61626300646566
^^^^^^ ^^
Is there something Yes !
special here?
Please let me know if this question is too involved! I am new here. :-)
:D "StackOverflow" is usually much more efficient if you are able to narrow down your issue. Ideally providing some reproducible case (formelly know as SSCCE or MCVE).
Take time to examine closely your data, and if needed, don't hesitate to post an other more focused answer.

SQL statement with local (inline) array

In many languages, one can use inline lists of values, with some form of code similar to this:
for x in [1,7,8,12,14,56,123]:
print x # Or whatever else you fancy doing
Working with SQL for the last year or so, I've found out that even though using such an array in WHERE is not a problem...
select *
from foo
where someColumn in (1,7,8,12,14,56,123) and someThingElse...
...I have not found an equivalent form to GET data from an inline array:
-- This is not working
select *
from (1,7,8,12,14,56,123)
where somethingElse ...
Searching for solutions, I have only found people suggesting a union soup:
select *
from (SELECT 1 UNION SELECT 1 UNION SELECT 7 UNION ...)
where somethingElse ...
...which is arguably, ugly and verbose.
I can quickly generate the UNION soup from the list with a couple of keystrokes in my editor (VIM) and then paste it back to my DB prompt - but I am wondering whether I am missing some other method to accomplish this.
Also, if there's no standard way to do it, I would still be interested in DB-engine-specific solutions (Oracle, PostgreSQL, etc)
Thanks in advance for any pointers.
Row/Table value constructors can sometimes be used as a shortish hand, for example in MSSQL:
select * from (values (1),(7),(8),(12)) as T (f)
The syntax is more complex by necessity than for a simple array-like list passed to in () because it must be able to describe a multi-dimensional set of data:
select * from (values (1, 'a'),(7, 'b'),(8, 'c'),(12, 'd')) as T (f, n)
Of course, when you find the requirement to list literal values its often a good idea to stick them in a table and query for them.

SQL pattern matching

I have a question related to SQL.
I want to match two fields for similarities and return a percentage on how similar it is.
For example if I have a field called doc, which contains the following
This is my first assignment in SQL
and in another field I have something like
My first assignment in SQL
I want to know how I can check the similarities between the two and return by how much percent.
I did some research and wanted a second opinion plus I never asked for source code. Ive looked at Soundex(), Difference(), Fuzzy string matching using Levenshtein distance algorithm.
You didn't say what version of Oracle you are using. This example is based on 11g version.
You can use edit_distance function of utl_match package to determine how many characters you need to change in order to turn one string to another. greatest function returns the greatest value in the list of passed in parameters. Here is an example:
-- sample of data
with t1(col1, col2) as(
select 'This is my first assignment in SQL', 'My first assignment in SQL ' from dual
)
-- the query
select trunc(((greatest(length(col1), length(col2)) -
(utl_match.edit_distance(col2, col1))) * 100) /
greatest(length(col1), length(col2)), 2) as "%"
from t1
result:
%
----------
70.58
Addendum
As #jonearles correctly pointed out, it is much simpler to use edit_distance_similarity function of utl_match package.
with t1(col1, col2) as(
select 'This is my first assignment in SQL', 'My first assignment in SQL ' from dual
)
select utl_match.edit_distance_similarity(col1, col2) as "%"
from t1
;
Result:
%
----------
71

Sorting '£' (pound symbol) in sql

I am trying to sort £ along with other special characters, but its not sorting properly.
I want that string to be sorted along with other strings starting with special characters. For example I have four strings:
&!##
££$$
abcd
&#$%.
Now its sorting in the order: &!##, &#$%, abcd, ££$$.
I want it in the order: &!##, &#$%, ££$$, abcd.
I have used the function order by replace(column,'£','*') so that it sorts along with strings starting with *. Although this seems to work while querying the DB, when used in code and deployed the £ gets replaced by �, i.e. (replace(column,'�','*') in the query, and doesn't sort as expected.
How to resolve this issue? Is there any other solution to sort the pound symbol/£? Any help would be greatly appreciated.
You seem to have two problems; performing the actual sort, and (possibly) how the £ symbol appears in the results in your code. Without knowing anything about your code or client or environment it's rather hard to guess what you might need to change, but I'd start by looking at your NLS_LANG and other NLS settings at the client end. #amccausl's link might be useful, but it depends what you're doing. I suspect you'll find different values in nls_session_parameters when queried from SQL*Plus and from your code, which may give you some pointers.
The sorting itself is slightly clearer now. Have a look at the docs for Linguistic Sorting and String Searching and NLSSORT.
You can do something like this (with a CTE to generate your data):
with tmp_tab as (
select '&!##' as value from dual
union all select '££$$' from dual
union all select 'abcd' from dual
union all select '&#$%' from dual
)
select * from tmp_tab
order by nlssort(value, 'NLS_SORT = WEST_EUROPEAN')
VALUE
------
&!##
&#$%
££$$
abcd
4 rows selected.
You can get sort values supported by your configuration with select value from v$nls_valid_values where parameter = 'SORT', but WESTERN_EUROPEAN seems to do what you want, for this sample data anyway.
You can see the default sorting in your current session with select value from nls_session_parameters where parameter = 'NLS_SORT'. (You can change that with an ALTER SESSION, but it's only letting me do that with some values, so that may not be helpful here).
You need to make sure your application code is all proper UTF-8 (see http://htmlpurifier.org/docs/enduser-utf8.html for more details)
Seems like your issue is with db characterset, or difference in charactersets between the app and db. For Oracle side, you can check by doing:
select value from sys.nls_database_parameters where parameter='NLS_CHARACTERSET';
If this comes up ascii (like US7ASCII), then you may have issues storing the data properly. Even if this is the charset, you should be able to insert and retrieve sorted (binary sort) by using nvarchar2 and unistr (assuming they conform to your NLS_NCHAR_CHARACTERSET, see above query but change parameter), like:
create table test1(val nvarchar2(100));
insert into test1(val) values (unistr('\00a3')); -- pound currency
insert into test1(val) values (unistr('\00a5')); -- yen currency
insert into test1(val) values ('$'); -- dollar currency
commit;
select * from test1
order by val asc;
-- will give symbols in order: dollar('\0024'), pound ('\00a3'), yen ('\00a5')
I will say that I would not resort to using the national characterset, I would probably change the db characterset to fit the needs of my data, as supporting 2 diff character sets isn't ideal, but its available anyway
If you have no issues storing/retrieving on the data side, then your app/client characterset is probably different than your db.
Use nchar(168). It will work.
select nchar(168)