How get name before first uppercase word excluding T-shirt - sql

Products have names like
Product one white Adidas
Other product black Hill sheet
Nice T-shirt blue Brower company
How to get starting part of product name before first uppercase word starting from second word and up to first uppercase word excluding word T-shirt.
Result from strings above should be
Product one white
Other product black
Nice T-shirt blue
Using Bohemian answer to question
Substring before first uppecase word excluding first word
regexp_replace('Nice T-shirt blue Brower company', '(?<!^)\m[A-ZÕÄÖÜŠŽ].*', '')
returns wrong result
Nice
How to modify regex so that it returns
Nice T-shirt blue
Using Postgres 12

Use a negative look ahead:
select regexp_replace('Nice T-shirt blue Brower company', '(?<!^)\m(?!T-shirt)[A-ZÕÄÖÜŠŽ].*', '')
See live demo.
(?!T-shirt) means the following characters must not be 'T-shirt'
You can add other capitalised terms to ignore:
(?!T-shirt|BB gun|BBQ tongs)

This regex works for your test cases:
^[A-Z][a-z ]*(T-shirt)?[a-z ]*
Explanation:
^: Start of line
[A-Z]: Any capital letter
[a-z ]*: zero or more characters that are either a lowercase letter or space
(T-shirt)?: The phrase T-shirt 0 or 1 times
[a-z ]*: same again

Instead of doing a direct match, you can simply remove the part of the string (in this case from the last uppercase word to the end) that you do not need:
select regexp_replace(name, '(?<=\s)[A-Z]+[a-zA-Z\s]+$', '') from tbl
See fiddle.

Related

How to extract 4 words before each word of a given list in sql

I have got a table with a column containing text (the column name is 'Text'). There are some acronyms in brackets, so I would like to extract them along with the five words appearing before them.
I have already extracted the rows that contain all the acronyms of my list using the like operator:
select Text from table
where Text like '(NASA)'
or Text like '(NBA)'
In stead of getting an output of the whole text in each row:
Text
He works for the National Aeronautics and Space Administration (NASA).
He played basketball for the National Basketball Association (NBA) from 2000 to 2002.
I would like to get the output of two columns one for the acronym and another for the meaning of the acronym (showing the five words prior to the acronym):
Acronym Meaning
(NASA) National Aeronautics and Space Administration
(NBA) for the National Basketball Association
Without actually seeing your data, I will assume that all the acronyms follow the same pattern but you should be able to adapt the code with the correct logic if your strings are structured differently. In this case '(Acronym) meaning' is the structure which I'm going to work with.
select '(NASA) National Aeronautics and Space Administration' as text
into #temp1
union all
select '(FBI) Federal Bureau of Investigation' as text
select SUBSTRING(text,CHARINDEX('(',text)+1 ,CHARINDEX(')',text)-CHARINDEX('(',text)-1) as Acronym,
SUBSTRING(text,CHARINDEX(')',text)+2 ,len(text)-CHARINDEX(')',text)+1) as meaning
from #temp1
This code subsets the original string by using character positions in the string between the brackets for the acronym and then character positions starting after closed brackets for the meanings.

Name Correction

Name Correction
As the wedding season is on, John has been given the work of printing guest names on wedding cards. John has written code to print only those names that start with upper-case alphabets and reject those that start with lower-case alphabets or special characters.
Your job is to do the following:
1.Correct the rejected names (names which start with lower case or with a special character). You have to change the first alphabet of the rejected name to Upper case and in the case of special character there will be no change.
2.Output the newly corrected names in ascending order.
Table format
Table: person
Field Type
name varchar(20)
Sample
Sample person table
name
mohit
Kunal
manoj
Raj
tanya
#man
Sample output table
name
#man
Manoj
Mohit
Tanya
Solution Attempted: IN SQL SERVER 2014
select name
from person as per
where (left(per.name,1) like '%[^A-Z]%' or left(per.name,1) like '% %')
union
select Upper(left(per.name,1))+right(per.name,len(per.name)-1)
from person as per
where left(per.name,1)<>left(Upper(per.name),1)
collate Latin1_General_CS_AI
order by per.name
Sample Test Cases Passes,
Still getting wrong answer in some competitor exam.
Please suggest what test case i have not handled.
Since you are only interested in correcting lower case and reporting special characters in the first character position I would use ascii comparision rather than regex.
select name, ascii(left(name,1)),
case
when ascii(left(name,1)) between 97 and 122 then
concat(char(ascii(left(name,1)) - 32),substring(name,2,len(name) -1))
else name
end name
from t
where ascii(left(name,1)) <= 64 or
ascii(left(name,1)) >= 91

Need help solving SQL Oracle Counting Characters

we have a large set of data and the professor is asking us to do the following:
Amy Gray has seven characters in her name. (The space between her first and last name does not count.) J. J. Brown has ten in his name. (The space and periods in J. J. count as characters.) Allison Black-White has eighteen in hers. (The hyphen counts as a character.)
Create a view named A9T4 that will display the size and the total number of students whose combined first and last name has that size. The two column headings should be Name_Size and Students. The rows should be sorted by descending size.
Note: As a simple check of your work, the longest name in A9 has 22 characters and the three shortest names have seven characters.
I used the Oracle DUMP, SUBSTR, and REGEXP_LIKE function to get the count.
http://www.techonthenet.com/oracle/functions/dump.php
http://www.techonthenet.com/oracle/functions/substr.php
http://www.techonthenet.com/oracle/functions/regexp_substr.php
CREATE TABLE SCHEMA1.NAMES (
eval_name VARCHAR2(100 CHAR)
);
insert into SCHEMA1.NAMES values('Amy Gray');
insert into SCHEMA1.NAMES values('J. J. Brown');
insert into SCHEMA1.NAMES values('Allison Black-White');
commit;
select eval_name, REGEXP_SUBSTR(SUBSTR(DUMP(eval_name),11), '^[0-9]*')-1 from SCHEMA1.NAMES;
--returns
Amy Gray 7
J. J. Brown 10
Allison Black-White 18
DUMP('Amy Gray') -- gives us 'Typ=1 Len=8: 65,109,121,32,71,114,97,121'
SUBSTR(DUMP('Amy Gray'),11) -- starts at position eleven, giving us
'8: 65,109,121,32,71,114,97,121'
REGEXP_SUBSTR(SUBSTR(dump('Amy Gray'),11), '^[0-9]*') -- gives us '8', all digits from the beginning of the string '^' to the first non-digit, ':'
--and the -1 removes the expected space between the first and last names.

How to use Like '%' in a phrase with multiple words

Let's say i have the following table
user|text
1 |red 123 orange blue green
2 |red orange blue
3 |blue orange 123 red
If I wanted to pull all users whose text includes both '123' and 'blue', how would i do it? I would want to pull user 1 and 3.
SELECT *
FROM Table
WHERE text LIKE '%123%'&&'%blue%'
OR text LIKE '%blue%'&&'%123%'
Is this better solved thru using a regexp function?
Try this code:
SELECT *
FROM Table
WHERE text LIKE '%123%'
and text like '%blue%'
You need to repeat LIKE for each pattern.
SELECT *
FROM Table
WHERE text LIKE '%123%' AND text LIKE '%blue%'
You could also write it as:
WHERE text LIKE '%123%blue%' OR text LIKE '%blue%123%'
or:
WHERE text RLIKE '123.*blue|blue.*123'
However, these two solutions get exponentially large if you have to match several strings in any order. The first version is linear in the number of matches.

Sort Postcode for menu/list

I need to sort a list of UK postcodes in to order.
Is there a simple way to do it?
UK postcodes are made up of letters and numbers:
see for full info of the format:
http://en.wikipedia.org/wiki/UK_postcodes
But my problem is this a simple alpha sort doesn't work because each code starts with 1 or two letters letters and then is immediately followed by a number , up to two digits, then a space another number then a letter. e.g. LS1 1AA or ls28 1AA, there is also another case where once the numbers in the first section exceed 99 then it continues 9A etc.
Alpha sort cause the 10s to immediately follow the 1:
...
LS1 9ZZ
LS10 1AA
...
LS2
I'm looking at creating a SQL function to convert the printable Postcode into a sortable postcode e.g. 'LS1 9ZZ' would become 'LS01 9ZZ', then use this function in the order by clause.
Has anybody done this or anything similar already?
You need to think of this as a tokenization issue so SW1A 1AA should tokenize to:
SW
1
A
1AA
(although you could break the inward part down into 1 and AA if you wanted to)
and G12 8QT should tokenize to:
G
12
(empty string)
8QT
Once you have broken the postcode down into those component parts then sorting should be easy enough. There is an exception with the GIR 0AA postcode but you can just hardcode a test for that one
edit: some more thoughts on tokenization
For the sample postcode SW1A 1AA, SW is the postcode area, 1A is the postcode district (which we'll break into two parts for sorting purposes), 1 is the postcode sector and AA is the unit postcode.
These are the valid postcode formats (source: Royal Mail PAF user guide page 8 - link at bottom of this page):
AN NAA
AAN NAA
ANN NAA
ANA NAA
AAA NAA (only for GIR 0AA code)
AANN NAA
AANA NAA
So a rough algorithm would be (assuming we want to separate the sector and unit postcode):
code = GIR 0AA? Tokenize to GI/R/ /0/AA (treating R as the district simplifies things)
code 5 letters long e.g G1 3AF? Tokenize to G/1/ /3/AF
code 6 letters long with 3rd character being a letter e.g. W1P 1HQ? Tokenize to W/1/P/1/HQ
code 6 letters long with 2nd character being a letter e.g. CR2 6XH? Tokenize to CR/2/ /6/XH
code 7 letters long with 4th character being a letter e.g. EC1A 1BB? Tokenize to EC/1/A/1/BB
otherwise e.g. TW14 2ZZ, tokenize to TW/14/ /2/ZZ
If the purpose is to display a list of postcodes for the user to choose from then I would adopt Neil Butterworth's suggestion of storing a 'sortable' version of the postcode in the database. The easiest way to create a sortable version is to pad all entries to nine characters:
two characters for the area (right-pad if shorter)
two for the district number (left-pad if shorter)
one for the district letter (pad if missing)
space
one for the sector
two for the unit
and GIR 0AA is a slight exception again. If you pad with spaces then the sort order should be correct. Examples using # to represent a space:
W1#1AA => W##1##1AA
WC1#1AA => WC#1##1AA
W10#1AA => W#10##1AA
W1W#1AA => W##1W#1AA
GIR#0AA => GI#R##0AA
WC10#1AA => WC10##1AA
WC1W#1AA => WC#1W#1AA
You need to right-pad the area if it's too short: left-padding produces the wrong sort order. All of the single letter areas - B, E, G, L, M, N, S, W - would sort before all of the two-letter areas - AB, AL, ..., ZE - if you left-padded
The district number needs to be left padded to ensure that the natural W1, W2, ..., W9, W10 order remains intact
I know this is a couple of years late but i too have just experienced this problem.
I have managed to over come it with the following code, so thought i would share as i searched the internet and could not find anything!
mysql_query("SELECT SUBSTRING_INDEX(postcode,' ',1) as p1, SUBSTRING_INDEX(postcode,' ',-1) as p2 from `table` ORDER BY LENGTH(p1), p1, p2 ASC");
This code will take a Full UK postcode and split it into 2.
It will then order by the first part of the postcode followed by the second.
I'd be tempted to store the normalised postcode in the database along with the real postcode - that way you only do the string manipulation once, and you can use an index to help you with the sort.