SELECT middle part of a String if it exists. Postgresql - sql

i've got a problem with transferring "real-World" data into my schema.
It's actually a "project" for my Database course and they gave us ab table with dog race results. This Table has a column which contains the name of the Dog (which itself consists of the actuall name and the name of the breeder) and informations about the Birthcountry, actual living Country and the birth year.
Example filed are "Lillycette [AU 2012]" or "Black Bear Lee [AU/AU 2013]" or "Lemon Ralph [IE/UK 1998]".
I've managed it to get out the first word and save it in the right column with split_part like this:
INSERT INTO tblHund (rufname)
SELECT
split_part(name, ' ', 1) AS rufname,
FROM tblimport;
tblimport is a table where I dumped the data from the csv file.
That works just as it should.
Accessing the second part of the Name with this fails because sometimes there isn't a second part and sometimes times there second part consists of two words.
And this is the where iam stuck right now.
I tried it with substring and regular expressions:
INSERT INTO tblZwinger (Name)
SELECT
substring(vatertier from E'[^ ]*\\ ( +)$')AS Name
FROM tblimport
WHERE substring(vatertier from E'[^ ]*\\ ( +)$') != '';
The above code is executed without errors but actually does nothing because the SELECT statement just give empty strings back.
It took me more then 3h to understand a bit of this regular Expressions but I still feel pretty stupid when I look at them.
Is there any other way of doing this. If so just give me a hint.
If not what is wrong with my expression above?
Thanks for your help.

You need to use atom ., which matches any single character inside capturing group:
E'[^ ]*\\ (.+)$'

SELECT
tblimport.*,
ti.parts[1] as f1,
ti.parts[2] as f2, -- It should be the "middle part"
ti.parts[3] as f3
FROM
tblimport,
regexp_matches(tblimport.vatertier, '([^\s]+)\s*(.*)\s+\[(.*)\]') as ti(parts)
WHERE
nullif(ti.parts[2], '') is not null
Something like above.

Related

Query to retrieve only columns which have last name starting with 'K'

]
The name column has both first and last name in one column.
Look at the 'rep' column. How do I filter only those rep names where the last name is starting with 'K'?
The way that table is defined won't allow you to do that query reliably--particularly if you have international names in the table. Not all names come with the given name first and the family name second. In Asia, for example, it is quite common to write names in the opposite order.
That said, assuming you have all western names, it is possible to get the information you need--but your indexes won't be able to help you. It will be slower than if your data had been broken out properly.
SELECT rep,
RTRIM(LEFT(LTRIM(RIGHT(rep, LEN(rep) - CHARINDEX(' ', rep))), CHARINDEX(' ', LTRIM(RIGHT(rep, LEN(rep) - CHARINDEX(' ', rep)))) - 1)) as family_name
WHERE family_name LIKE 'K%'
So what's going on in that query is some string manipulation. The dialect up there is SQL Server, so you'll have to refer to your vendor's string manipulation function. This picks the second word, and assumes the family name is the second word.
LEFT(str, num) takes the number of characters calculated from the left of the string
RIGHT(str, num) takes the number of characters calculated from the right of the string
CHARINDEX(char, str) finds the first index of a character
So you are getting the RIGHT side of the string where the count is the length of the string minus the first instance of a space character. Then we are getting the LEFT side of the remaining string the same way. Essentially if you had a name with 3 parts, this will always pick the second one.
You could probably do this with SUBSTRING(str, start, end), but you do need to calculate where that is precisely, using only the string itself.
Hopefully you can see where there are all kinds of edge cases where this could fail:
There are a couple records with a middle name
The family name is recorded first
Some records have a title (Mr., Lord, Dr.)
It would be better if you could separate the name into different columns and then the query would be trivial--and you have the benefit of your indexes as well.
Your other option is to create a stored procedure, and do the calculations a bit more precisely and in a way that is easier to read.
Assuming that the name is <firstname> <lastname> you can use:
where rep like '% K%'

Similar to with regex in Postgresql

In Postgresql database I have a column called names where I have some names which need to be parsed using regex to clean up punctuation parts. I am able to get a clean name using regexp_replace as follows:
select regexp_replace(name,'\.COM|''[A-Z]|[^a-zA-Z0-9 -]+|\s(?=&)|(?<!\w\w)(?:\s+|-)(?!\w\w)','','g')
from tableA
However, I would like to compare with some strings that are also cleaned of punctuation. How can I use similar to with the formed regular expression?
select name
from tableA
where (lower(name) ~ '\.COM|''[A-Za-z]|[^a-zA-Z0-9 -]+|\s(?=&)|(?<!\w\w)(?:\s+|-)(?!\w\w)') as nameParsed similar to '(fg )%' and
(lower(name) ~ '\.COM|''[A-Za-z]|[^a-zA-Z0-9 -]+|\s(?=&)|(?<!\w\w)(?:\s+|-)(?!\w\w)') as nameParsed similar to '%( cargo| carrier| cartage )%'
With the previous query I am getting this error:
LINE 3: ...-zA-Z0-9 -]+|\s(?=&)|(?<!\w\w)(?:\s+|-)(?!\w\w)') as namePar...
I have tried in where clause like this and it seems to be working:
select name
from tableA
where (select lower(regexp_replace(name,'\.COM|''[A-Z]|[^a-zA-Z0-9 -]+|\s(?=&)|(?<!\w\w)(?:\s+|-)(?!\w\w)','','g'))) similar to '(fg )%'
Is this the best approach? The execution time went to 46 seconds :(
Thanks in advance
You're trying to get a column name in a WHERE clause (is a comparison, not a column). So, you can use as follows:
SELECT name
FROM "tableA"
WHERE (regexp_replace(name,'\.COM|''[A-Z]|[^a-zA-Z0-9 -]+|\s(?=&)|(?<!\w\w)(?:\s+|-)(?!\w\w)','','g') similar to '(fg )%'
OR regexp_replace(name,'\.COM|''[A-Z]|[^a-zA-Z0-9 -]+|\s(?=&)|(?<!\w\w)(?:\s+|-)(?!\w\w)','','g') similar to '%( cargo| carrier| cartage )%');
Alternatively, you can use ilike instead of similar to if you want to find a specific word.

Remove unnecessary Characters by using SQL query

Do you know how to remove below kind of Characters at once on a query ?
Note : .I'm retrieving this data from the Access app and put only the valid data into the SQL.
select DISTINCT ltrim(rtrim(a.Company)) from [Legacy].[dbo].[Attorney] as a
This column is company name column.I need to keep string characters only.But I need to remove numbers only rows,numbers and characters rows,NULL,Empty and all other +,-.
Based on your extremely vague "rules" I am going to make a guess.
Maybe something like this will be somewhere close.
select DISTINCT ltrim(rtrim(a.Company))
from [Legacy].[dbo].[Attorney] as a
where LEN(ltrim(rtrim(a.Company))) > 1
and IsNumeric(a.Company) = 0
This will exclude entries that are not at least 2 characters and can't be converted to a number.
This should select the rows you want to delete:
where company not like '%[a-zA-Z]%' and -- has at least one vowel
company like '%[^ a-zA-Z0-9.&]%' -- has a not-allowed character
The list of allowed characters in the second expression may not be complete.
If this works, then you can easily adapt it for a delete statement.

PLSQL search string using LIKE backwards

I have a view which has VARCHAR2 column. The column contains a string
which is concatenation of 2 columns from a different table.
For example I search "James Smith".
I search the column using LIKE:
LOWER(USER_LIST.SEARCH_STRING) LIKE LOWER 'James Smith'
I get the results just fine.
I would like to know if there's an option to perform a reverse search (still using LIKE) and getting the same results, like so:
LOWER(USER_LIST.SEARCH_STRING) LIKE LOWER 'Smith James'
Please note that I'm aware that using regex or adding an additional column to the view can resolve this, but I wish to make as minimal changes as possible.
Thanks in advance.
I hope the below answer illustrates your requirement.
SELECT A.NM
FROM
(SELECT 'Avrajit Roy' nm FROM dual
)A
WHERE lower(A.NM) LIKE lower('avrajit roy')
OR TRIM(lower(SUBSTR(a.nm,instr(a.nm,' ',1)+1,LENGTH(a.nm))
||' '
||SUBSTR(a.nm,1,instr(a.nm,' ',1)))) LIKE lower('roy avrajit');

Non-parametrized partial match query

I'm an access beginner.
I'm trying to write a query that finds all partial matches between a field in one table and a field in another.
I'll give a simplified example of what I'm looking for as a final result and maybe you can help me write the SQL for it.
Table 1: Strings
pre
ing
up
Table 2: Words
present
door
cup
making
kingdom
Desired Result Table:
pre present
up cup
ing kingdom
ing making
Here's what I have - it can give me a prompt where I type in what I'm looking for. (e.g. if I type in 'up', then I get cup from the word list). What I want is a way to automate the whole process for all of the strings in the first list.
SELECT *
FROM word
WHERE (((word) LIKE "*" & string & "*"));
Thanks,
I don't have Access in front of me, but I think you can do something similar to this (this is for SQL Server):
select stringname, wordname
from strings
join words on wordname like '%' + stringname + '%'
SQL Fiddle Demo
If I had to guess at the Access syntax, replace % with * and possibly change the + to &, but hopefully this will get you going in the right direction.