SQLite3: Find match of a certain part of column value by regular expression and replace it with another value. - sql

I'm an absolute beginner at SQL. I do know how to use SQL commands that belong to Data Manipulation Language and Data Definition Language.
The name of the table is ways_tags
id,key,value,type
164931009,street,6th Main Road Ram Nagar (N) Extn,addr
The header represents the column names, I've printed just one row.
Here's what I want to accomplish:
Select observations that have type as addr and key as street.
In such observations, scan the characters of value column and check if the last 4 characters match Extn and replace it with Extension
Also, is there a way I can reduce size of the code, where it checks for TWO patterns instead of ONE?
In other words
You can replace what I'd like to accomplish on 2. with something like
In such observations, scan the characters of value column:
i. If last 4 characters in the value column is Extn replace with Extension.
ii.If last 2 characters in the value column is St. replace with Street.
I'm really sorry, but I just know the pseudocode and don't know how to incorporate regular expressions with normal SQL, and so I can't post anything to show that I've tried.
So eventually after performing the step,
164931009,street,6th Main Road Ram Nagar (N) Extension,addr
is the updated observation instead of
164931009,street,6th Main Road Ram Nagar (N) Extn,addr

Select observations that have type as addr and key as street.
… scan the characters of value column and check if the last 4 characters match Extn
WHERE type = 'addr'
AND key = 'street'
AND value LIKE '%Extn'
… and replace it with Extension
If "Extn" does not occur anywhere else in the value, this could be done with replace(), but in the general case, you have to extract all but the four last characters, and append the new value:
substr(value, 1, length(value) - 4) || 'Extension'
Then plug this into an UPDATE statement:
UPDATE ways_tags
SET value = substr(value, 1, length(value) - 4) || 'Extension'
WHERE type = 'addr'
AND key = 'street'
AND value LIKE '%Extn';
Doing two replacements in a single statement would be possible with a CASE expression, but would not reduce code size.

Related

Determine if a column has two equal vowels

How to determine if a column has two equal vowels in SQL Server?
For example 'maria' has two 'a' characters.
select
*
from
hr.locations
where
state_province is null
and
city like '...' <-- ?
You want to look for strings with a vowel appearing multiple times. You already have city like '...'.
Now, you may have in mind somehing like city like '%[aeiou]%<the same vowel>%', and you wonder how to make this <the same vowel> work. It simply is not possible; such reference is not available in LIKE. Instead find the expression for a single vowel: city like '%a%a%'. Then use OR for the different vowels:
select *
from hr.locations
where state_province is null
and
(
city like '%a%a%' or
city like '%e%e%' or
city like '%i%i%' or
city like '%o%o%' or
city like '%u%u%'
);
If your city column is case sensitive, and you want to find 'Anna' in spite of one A being in upper case and the other in lower case, make this lower(city) like '%a%a%'.
If your intention is to find those entries that contain exactly two equal vowels:
One way to find out how often a certain character (in your case a vowel) appears in a string is to first take the length of the entire string.
As second step, replace your character by an empty string and build the length of the new string.
This will be the length without all occurences of this character.
If the entire length reduced by the new length is 2, this will mean the character occurs exactly two times in your string.
So, you can create a query repeating this idea for every vowel, something like this:
SELECT yourcolumn
FROM yourtable
WHERE
LEN (yourcolumn) - LEN(REPLACE(yourcolumn,'a','')) = 2
OR LEN (yourcolumn) - LEN(REPLACE(yourcolumn,'e','')) = 2
OR LEN (yourcolumn) - LEN(REPLACE(yourcolumn,'i','')) = 2
OR LEN (yourcolumn) - LEN(REPLACE(yourcolumn,'o','')) = 2
OR LEN (yourcolumn) - LEN(REPLACE(yourcolumn,'u','')) = 2;
If your intention is to find those entries that contain at least two equal vowels: Just replace the "=" by ">=" or use LIKE instead.
Try out here: db<>fiddle

Query to retrieve only columns which have last name starting with 'K'

]
The name column has both first and last name in one column.
Look at the 'rep' column. How do I filter only those rep names where the last name is starting with 'K'?
The way that table is defined won't allow you to do that query reliably--particularly if you have international names in the table. Not all names come with the given name first and the family name second. In Asia, for example, it is quite common to write names in the opposite order.
That said, assuming you have all western names, it is possible to get the information you need--but your indexes won't be able to help you. It will be slower than if your data had been broken out properly.
SELECT rep,
RTRIM(LEFT(LTRIM(RIGHT(rep, LEN(rep) - CHARINDEX(' ', rep))), CHARINDEX(' ', LTRIM(RIGHT(rep, LEN(rep) - CHARINDEX(' ', rep)))) - 1)) as family_name
WHERE family_name LIKE 'K%'
So what's going on in that query is some string manipulation. The dialect up there is SQL Server, so you'll have to refer to your vendor's string manipulation function. This picks the second word, and assumes the family name is the second word.
LEFT(str, num) takes the number of characters calculated from the left of the string
RIGHT(str, num) takes the number of characters calculated from the right of the string
CHARINDEX(char, str) finds the first index of a character
So you are getting the RIGHT side of the string where the count is the length of the string minus the first instance of a space character. Then we are getting the LEFT side of the remaining string the same way. Essentially if you had a name with 3 parts, this will always pick the second one.
You could probably do this with SUBSTRING(str, start, end), but you do need to calculate where that is precisely, using only the string itself.
Hopefully you can see where there are all kinds of edge cases where this could fail:
There are a couple records with a middle name
The family name is recorded first
Some records have a title (Mr., Lord, Dr.)
It would be better if you could separate the name into different columns and then the query would be trivial--and you have the benefit of your indexes as well.
Your other option is to create a stored procedure, and do the calculations a bit more precisely and in a way that is easier to read.
Assuming that the name is <firstname> <lastname> you can use:
where rep like '% K%'

Oracle: regexp for a complicated case

I have a table, and one of the columns contains a string with items separated by semicolons(;)
I want to selectively transfer the data to a new table based on the pattern of the String.
For example, it may look like
16;;14;30;24;11;13;14;14;10;13;18;15;18;24;13/18;11;;23;12;;19;10;;11;26;;;42;26;38/39;12;;;;;;;11;;;;;;;;;;;;;;;
or
11;;11;11;11;11;11;11;11;11;11;11;11;11;11;11;11;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
I don't care about what's between the semicolons, but I care about which positions contain items. For example, if I only want the 1st, 3rd, 4th position to contain items, I would allow the following...
32;;14;18/12;;;;;;;;; or 32;;14;18/12;;;;55;;;;11;;;;;;;
This one down below is not okay because the 3rd position does not hold any value.
32;;;18/12;;;;;;;;;
If regexp works for this, then I can use merge into to move the desired records to the target table. If this cannot be done, I'll have to process each record in Java, and selectively insert the records to the new table.
source table:
id | StringValue | count
target table:
id | StringValue | count
The sql that I have in mind:
merge into you_target_table tt
using ( select StringValue, count
from source_table where REGEXP_LIKE ( StringValue, 'some pattern')
) st
on ( st.StringValue = tt.StringValue and st.count=tt.count )
when not matched then
insert (id, StringValue , count)
values (someseq.nextval, st.value1, st.count)
when matched then
update
set tt.count = tt.count + st.count;
Also I'm certain that all StringValue in source table is unique, so what's after when matched then is not important, but due to the syntax, I think I must have something.
For each position you want a value put [^;]+;, that matches any character, that is not ; and occurs at least one time followed by a ;. If you don't care for a position put [^;]*;. That's almost similar to the first one but the characters, that are before the ; may also be none. Anchor the whole thing to the beginning with ^.
So for your 1st, 3rd and 4th position example you'd get:
^[^;]+;[^;]*;[^;]+;[^;]+;
In a query that'd look like:
SELECT *
FROM elbat
WHERE regexp_like(nmuloc, '^[^;]+;[^;]*;[^;]+;[^;]+;');
db<>fiddle
It may be further improved by putting the sub expressions in a group, that is, put parenthesis around them, and use quantors -- a number in curly braces after the group. For example ([^;]+;){2} would match two positions that are not empty. Your example would get shorten to:
^[^;]+;[^;]*;([^;]+;){2}
While #stiky bit answer is totally correct there is another similar but perhaps more readable solution:
SELECT *
FROM elbat
WHERE regexp_substr(nmuloc, '(.*?)(;|$)', 1, 1, '', 1) is not null
AND regexp_substr(nmuloc, '(.*?)(;|$)', 1, 3, '', 1) is not null
AND regexp_substr(nmuloc, '(.*?)(;|$)', 1, 4, '', 1) is not null;
db<>fiddle
Pros:
clearly states position number that should not be null
has universal pattern for any condition, so no need in changing regex
can use any regex as delimiter, not only single character
actually extracts item, so you can further test it with any function
Cons:
rather verbose
n times slower, where n is condition count
even more slower (up to 2 times) cause of backtracking on each non-delimiter symbol
However in my experience this efficiency difference is minor if query is not run against billions of rows. And even then disk reading would consume most of the time.
How it's made:
(.*?)(;|$) - lazily searches for any character sequence (possibly zero-length) ended with delimiter or end of string
1 - position to start search. 1 is default. Needed only to get to the next parameter
1, 3 or 4 - occurrence or pattern
'' - match_parameter. Can be used for setting up matching mode, but here also only to get to the last parameter
1 - sub-expression number makes regexp_substr return only first capturing group. That is (.*?) i.e. item itself without delimiter.

Using SQL to make specific changes in a database.

I am trying to figure out some commands/code in SQL.
I have database with names, addresses IDs etc, but I have to convert firstname values ending in “jnr” to “(Jnr)” and those ending in “snr” to “(Snr)”.
How do I do this?
update table TABLE_NAME set NAMES = '*xyz*Jnr' where NAMES like '%jnr'
Update or select:
PASTE(column, CHAR_LENGTH(column)-3, 1, UPPER(SUBSTRING(column FROM CHAR_LENGTH(column)-3 FOR 1)
WHERE column LIKE '%jnr' OR column LIKE '%snr'
PASTE is used to put in one character at position 3 from end,
CHAR_LENGTH to get length of column value,
UPPER converts character to upper case,
SUBSTRING is used to pick one character here (j or s),
LIKE is used to find values ending with jnr, or snr.
All ANSI SQL (no dbms specified!)

Postgresql query to update fields using a regular expression

I have the following data in my "Street_Address_1" column:
123 Main Street
Using Postgresql, how would I write a query to update the "Street_Name" column in my Address table? In other words, "Street_Name" is blank and I'd like to populate it with the street name value contained in the "Street_Address_1" column.
From what I can tell, I would want to use the "regexp_matches" string method. Unfortunately, I haven't had much luck.
NOTE: You can assume that all addresses are in a "StreetNumber StreetName StreetType" format.
If you just want to take Street_Address_1 and strip out any leading numbers, you can do this:
UPDATE table
SET street_name = regexp_replace(street_address_1, '^[0-9]* ','','');
This takes the value in street_address_1 and replaces any leading string of numbers (plus a single space) with an empty string (the fourth parameter is for optional regex flags like "g" (global) and "i" (case-insensitive)).
This version allows things like "1212 15th Street" to work properly.
Something like...:
UPDATE table
SET Street_Name = substring(Street_Address_1 FROM '^[0-9]+ ([a-zAZ]+) ')
See relevant section from PGSQL 8.3.7 docs, the substring form is detailed shortly after the start of the section.