query to return all records that have a reoccuring string/number - sql

I am looking for assistance in finding records that have a reoccurring string/number in an attibute due to input mismanagement. For example, the table will look similar to the following:
ID|stuff
1 | 23 jackson jackson st
2 | 89 jackson st
3 | 1 1 jackson st
4 | 66 jackson st
I'd like the return to look like the following:
ID|stuff
1 | 23 jackson jackson st
3 | 1 1 jackson st
please note, in the above example, 's' doesnt cause it to return in id 2, even though its in both jackSon and St.
Any help would be greatly appreciated, thank you.

You can use back-references in Oracle regular expressions. I think this does what you want:
select *
from t
where regexp_like(' ' || stuff, ' ([^ ]+) .*\1');
Here is a db<>fiddle.

Use this WHERE predicate
where regexp_like(stuff, '(^|\W)(\w+)($|\W).*\2')
Note that initial and traling group (^|\W) and ($|\W) means start/ end of the string or a non-word charater will delimit the second group - the first instance of the duplicated word.
The second group is defined as a (\w+) one or more word charters.
You may want alternatively use \s (white space) instead of \W - see here for further details.
Here sample data returned by this regexp addressing also the non-word delimiters.
You should also not underestimate tabs and other white stuff, that the simple solution ignore.
23 jackson jackson st
1 1 jackson st
68 jackson.st.jackson
See also this answer with a similar topic.

Related

Big Query -- Reorder elements within a delimited string by another delimiter

Summary
I'd like to reorder elements in a string, the elements are delimited by new lines.
The elements I'd like to sort should be ordered by a string that can have numbers or letters within it. This sorting string is not at the beginning of the data, but rather it is also a delimited string (messy data set, I know). To make this even messier, there is an extra new line; this doesn't seem like the crux of this issue
Example
Below is a simplified version of what I'd like to do. I have a table, and I'd like to sort students' favorite shows and characters by the show's name, which is the second element of a pipe-delimited string.
student
favorite characters and shows
alice
10th doctor | dr who troy | community
bob
11 | stranger things Liz | 30 Rock mr peanut butter | bojack horseman
would become this:
student
favorite characters and shows
alice
troy | community 10th doctor | dr who
bob
Liz | 30 Rock mr peanut butter | bojack horseman 11 | stranger things
What I've tried
Big Query doesn't allow arrays of arrays. If it did, I would have an easier time here. I've tried working with COLLATE but today is my first time seeing that function; I'm not sure that is the right way to go, anyways.
Currently, I'm working to split by new line, and rejoin later. I have never done this with tables, so I'm a bit out of my element. Here is the query I'm working from:
WITH
-- example data from above
example_data AS (
SELECT
'alice' AS student,
-- note: the new line is at the end of every pipe-delimited line, so there is always some floating empty row when using functions like split()
'10th doctor | dr who\ntroy | community\n' AS favorite_characters_and_shows
UNION ALL
SELECT
'bob' AS student,
"11 | stranger things\nLiz | 30 Rock\nmr peanut butter | bojack horseman\n" AS favorite_characters_and_shows ),
-- I have no need for this to be another table, but it is where I am. Tell me if this is misguided, please.
soln_table AS (
SELECT
example_data.student,
example_data.favorite_characters_and_shows,
SPLIT(example_data.favorite_characters_and_shows, '\n'),
array( select x from unnest(SPLIT(example_data.favorite_characters_and_shows, '\n') ) as x order by x) as foo,
FROM
example_data )
-- where I am trying to display a sorted solution
SELECT
*
FROM
soln_table;
Consider below approach
select student, (
select string_agg(line, '\n' order by split(line, '|')[safe_offset(1)])
from unnest(split(favorite_characters_and_shows, '\n')) line
where trim(line) != ''
) as favorite_characters_and_shows
from example_data
if applied to sample data in your question - output is

Inverse of this regex expression

I have a list:
50 - David Herd (1961-1968)
49 - Teddy Sheringham (1997-2001)
48 - George Wall (1906-1915)
47 - Stan Pearson (1935-1954)
46 - Harry Gregg (1957-1966)
45 - Paddy Crerand (1963-1971)
44 - Jaap Stam (1998-2001)
43 - Paul Ince (1989-1995)
42 - Dwight Yorke (1998-2002)
I want to select all characters EXCEPT the first and last name with the space in between in order to delete them and leave just the first name, space and last name.
So far I can select the first name, space and last name with:
([[a-zA-Z]+\s[a-zA-Z]+)
But I am unsure of how to 'invert' this expression. Any pointers would be much appreciated.
If regex replacement be an option for you, you could try the following in regex mode:
Find: \d+ - (\w+(?: \w+)+) \(\d{4}-\d{4}\)
Replace: $1
Demo
One option is to match the surrounded data, and capture the firstname space lastname.
In the replacement use the capture group.
^.*?\b([a-zA-Z]+\s[a-zA-Z]+)\b.*
Regex demo

Oracle Regular Expression using instead of INSTR function

i keep data on table rows as followed like this;
t_course
+------+------------------------------------------+
| sid | courses |
+------+------------------------------------------+
| 1 | cs101.math102.ns202-2.phy104 |
+------+------------------------------------------+
| 2 | cs101.math201.ens202-1.phy104-10.chm105 |
+------+------------------------------------------+
| 3 | cs101.ns202-2.math201.ens202-1.phy104 |
+------+------------------------------------------+
Now, i want to take the sum of courses mentioned ns202 and ens202 in same time. Normally it should only brings record which id is 3, it brings all of the records (because of instr). i have used many methods for this, but it doesn't work. For example;
select count(*) from
t_course
where
instr(courses, 'ns202') > 0
and instr(courses, 'ens202') > 0;
Above code doesn't work properly because it takes ns202 but ens202 contains ns202 in itself.
I tried using regular expressions, i converted all course to row (split) but this has both broke working logic and slowed down.
How can i do this with regular expressions instead of instr according to begin withs (for example ns202%) logic? (Begining with ns202 first or after dot)
You can use regexp_like with word boundaries to get rows which have both ns202 and ens_202. Normally you would use \b for word-boundaries. As Oracle doesn't support it, the alternate is to use (\s|\W) with start ^ and end $ anchors.
\s - space character, \W - non word character. Add more characters as needed, as word-boundaries based on your requirements.
select *
from t_course
where regexp_like(courses,'(^|\s|\W)ns202(\s|\W|$)')
and regexp_like(courses,'(^|\s|\W)ens202(\s|\W|$)')
You will have the same problem with ens202, by the way - what if there is also cens202or tens202?
You can solve your problem with regular expressions. You can also solve it with the LIKE operator:
select <whatever>
from <table or tables>
where (courses like 'ns202%' or courses like '%.ns202%')
and (courses like 'ens202%' or courses like '%.ens202%')
You can test both approaches to see which works best for your data.

Postgres matching against an array of regular expressions

My client wants the possibility to match a set of data against an array of regular expressions, meaning:
table:
name | officeId (foreignkey)
--------
bob | 1
alice | 1
alicia | 2
walter | 2
and he wants to do something along those lines:
get me all records of offices (officeId) where there is a member with
ANY name ~ ANY[.*ob, ali.*]
meaning
ANY of[alicia, walter] ~ ANY of [.*ob, ali.*] results in true
I could not figure it out by myself sadly :/.
Edit
The real Problem was missing form the original description:
I cannot use select disctinct officeId .. where name ~ ANY[.*ob, ali.*], because:
This application, stored data in postgres-xml columns, which means i do in fact have (after evaluating xpath('/data/clients/name/text()'))::text[]):
table:
name | officeId (foreignkey)
-----------------------------------------
[bob, alice] | 1
[anthony, walter] | 2
[alicia, walter] | 3
There is the Problem. And "you don't do that, that is horrible, why would you do it like this, store it like it is meant to be stored in a relation database, user a no-sql database for Document-based storage, use json" are no options.
I am stuck with this datamodel.
This looks pretty horrific, but the only way I can think of doing such a thing would be a hybrid of a cross-join and a semi join. On small data sets this would probably work pretty well. On large datasets, I imagine the cross-join component could hit you pretty hard.
Check it out and let me know if it works against your real data:
with patterns as (
select unnest(array['.*ob', 'ali.*']) as pattern
)
select
o.name, o.officeid
from
office o
where exists (
select null
from patterns p
where o.name ~ p.pattern
)
The semi-join helps protect you from cases where you have a name like "alicia nob" that would meet multiple search patterns would otherwise come back for every match.
You could cast the array to text.
SELECT * FROM workers WHERE (xpath('/data/clients/name/text()', xml_field))::text ~ ANY(ARRAY['wal','ant']);
When casting a string array into text, strings containing special characters or consisting of keywords are enclosed in double quotes kind of like {jimmy,"walter, james"} being two entries. Also when matching with ~ it is matched against any part of the string, not the same as LIKE where it's matched against the whole string.
Here is what I did in my test database:
test=# select id, (xpath('/data/clients/name/text()', name))::text[] as xss, officeid from workers WHERE (xpath('/data/clients/name/text()', name))::text ~ ANY(ARRAY['wal','ant']);
id | xss | officeid
----+-------------------------+----------
2 | {anthony,walter} | 2
3 | {alicia,walter} | 3
4 | {"walter, james"} | 5
5 | {jimmy,"walter, james"} | 4
(4 rows)

sql combine two columns that might have null values

This should be an easy thing to do but I seem to keep getting an extra space. Basically what I am trying to do is combine multiple columns into one column. BUT every single one of these columns might be null as well. When I combine them, I also want them to be separated by a space (' ').
What I created is the following query:
select 'All'= ISNULL(Name+' ','')+ISNULL(City+' ','')+ISNULL(CAST(Age as varchar(50))+' ','') from zPerson
and the result is:
All
John Rock Hill 23
Munchen 29
Julie London 35
Fort Mill 27
Bob 29
As you can see: there is an extra space when the name is null. I don't want that.
The initial table is :
id Name City Age InStates AllCombined
1 John Rock Hill 23 1 NULL
2 Munchen 29 0 NULL
3 Julie London 35 0 NULL
4 Fort Mill 27 1 NULL
5 Bob 29 1 NULL
Any ideas?
select 'All'= LTRIM(ISNULL(Name+' ','')+ISNULL(City+' ','')+ISNULL(CAST(Age as varchar(50))+' ','') from zPerson)
SEE LTRIM()
In the data you have posted, the Name column contains no NULLs. Instead, it contains empty strings, so ISNULL(Name+' ','') will evalate to a single space.
The simplest resolution is to change the data so that empty-strings are null. This is appropriate in your case since this is clearly your intention.
UPDATE zPerson SET Name=NULL WHERE Name=''
Repeat this for your City and Age fields if necessary.
Use TRIM() arount the ISNULL() function, or LTRIM() around the entire selected term