Oracle SubStr for Description field - sql

Hello I need help with the following Scenario.
There is a table with Company_Cd, Company_Name and All I need is the first 2 words of from the Company name if it has more than 3 words and 1 word if it has 2 words
Example:
Company_Cd Company_Name
123 ABC SOLUTIONS INC
345 XYZ GLOBAL TECH SOLUTIONS
899 NOWHERE COMPANY INC LTD
654 QSW SOLUTIONS
Desired Output:
Company_Cd Company_Name
123 ABC SOLUTIONS
345 XYZ GLOBAL
899 NOWHERE COMPANY
654 QSW

You can use the instr function to find the 1st and 2nd occurence of space and then use substr accordingly:
SELECT c.company_name,
(
CASE
WHEN instr(c.company_name,' ',1,2) >0 THEN SUBSTR(c.company_name, 1, instr(c.company_name,' ',1,2))
WHEN instr(c.company_name,' ',1,2) =0 AND instr(c.company_name,' ',1,1) >0 THEN SUBSTR(c.company_name, 1, instr(c.company_name,' ',1,1))
ELSE c.company_name
END)
FROM customer c

Please find below query for your use:
SELECT Company_Cd, IF((length(Company_Name) - length(replace(Company_Name, ' ', '')) + 1) >= 3, SUBSTRING_INDEX(Company_Name, ' ', 2), IF((length(Company_Name) - length(replace(Company_Name, ' ', '')) + 1) >= 2, SUBSTRING_INDEX(Company_Name, ' ', 1), Company_Name)) as result FROM company LIMIT 20;

You can also use Regular Expression:
SELECT Company_Cd,
regexp_replace(company_name,'(((\w+)\s){'||CASE WHEN regexp_count(trim(company_name),' ') IN (0,1) THEN 1
ELSE 2
END||'}).*','\1' )
FROM customer;

select company_cd,
trim(substr(company_name, 1, instr(company_name || ' ', ' ', 1, 2) - 1))
from company_tbl;
This solution begins by adding two spaces at the end of company_name; then it finds the position of the second space in this extended string, it removes the second space and everything after it - and then it trims the remaining space at the end (only needed if the company name was a single word; if all company names were guaranteed to be at least two words, the solution would be even simpler).

Related

Count frequencies of words separated with multiple spaces

I would like to count the occurrences of all words in a column. The tricky part is that words in a row can appear in long stretches; meaning there are many spaces in-between.
This is a dummy example:
column_name
aaa bbb ccc ddd
[aaa]
bbb
bbb
So far I managed to use the following code
SELECT column_name,
SUM(LEN(column_name) - LEN(REPLACE(column_name, ' ', ''))+1) as counts
FROM
dbo.my_own
GROUP BY
column_name
The code gives me smth like this
column_name counts
aaa bbb ccc ddd 1
[aaa] 1
bbb 2
However, my desired output is:
column_name counts
aaa 1
[aaa] 1
bbb 3
ccc 1
ddd 1
In SQL Server, you would use string_split():
select s.value as word, count(*)
from dbo.my_own o cross apply
string_split(o.column_name, ' ') s
where s.value <> ''
group by s.value;
String manipulation is highly database-dependent. Most databases have some method for doing this, but they can be quite different.
First, take a look at this question to see how to split the words in your column into multiple rows. In that question the words are separated by comma, but, of course, it works the same with spaces.
For your case, assuming a table tablename with an id and your words in columnname, where you have at most 4 words in the column, it would look like this:
SELECT
tablename.id,
SUBSTRING_INDEX(SUBSTRING_INDEX(tablename.columnname, ' ', numbers.n), ' ', -1) columnname
FROM
(SELECT 1 AS n UNION ALL
SELECT 2 UNION ALL
SELECT 3 UNION ALL
SELECT 4) numbers
INNER JOIN tablename
ON LENGTH(tablename.columnname) - LENGTH(REPLACE(tablename.columnname, ' ', '')) >= numbers.n - 1
ORDER BY
id, n
Then, you can simply count the words:
SELECT columnname, count(*) FROM (
SELECT
tablename.id,
SUBSTRING_INDEX(SUBSTRING_INDEX(tablename.columnname, ' ', numbers.n), ' ', -1) columnname
FROM
(SELECT 1 AS n UNION ALL
SELECT 2 UNION ALL
SELECT 3 UNION ALL
SELECT 4) numbers
INNER JOIN tablename
ON LENGTH(tablename.columnname) - LENGTH(REPLACE(tablename.columnname, ' ', '')) >= numbers.n - 1
ORDER BY
id, n
) normalized
GROUP BY columnname
If you have more than 4 words in your column, you need to expand the select from numbers accordingly.
Edit: Oh, I am late, and I assumed MySQL.

Using Regex_substr in Oracle to select string up to the last occurence of a space within \n characters length

We have an issue where a column in our Oracle database has a longer character length than a field in another system.
Therefore I am trying to use case statements along with substr in order to split strings that are more than 40 characters in length. My case statements so far do what I want them to do in the fact that it leaves the first 40 characters of a string in column_a and then puts the remainder of the string in column_b.
However, the problem that I have is that by just using substr, the strings are being split midway through words.
So I was wondering if anybody knew of a couple of regular expressions that I could use with regex_substr that will -
select a string UP TO the last space within 40 characters - for
column_a
select a string AFTER the last space within 40 characters - for
column_b
These are the case statements that I have so far with substr:
CASE WHEN Length(column_a) > 10 THEN SubStr(column_a, 0, 40) END AS column_a,
CASE WHEN Length(column_a) > 40 THEN SubStr(addressnum, 41) END AS column_b
I am not familiar with regular expressions at all and so any help would be very much appreciated!
I've solved with instr/substr:
select substr(column_a,1,instr(substr(column_a,1,40), ' ', -1 )) column1,
substr(column_a,instr(substr(column_a,1,40), ' ', -1 )+1, 40) column2
from table1
A very similar problem was posted today on OTN. https://community.oracle.com/message/13928697#13928697
I posted a general solution, which will cover the problem proposed here as well. It may come in handy if there are similar needs in the future.
For the problem posted here on SO, the row_lengths table will have only one row, with r_id = 1 and r_len = 40. For demo purposes I am showing below an input_strings different from what I used on OTN.
Setup:
create table input_strings (str_id number, txt varchar2(500));
insert into input_strings values (1,
'One Hundred Sixty-Nine Thousand Eight Hundred Seventy-Four Dollars And Nine Cents');
insert into input_strings values (2, null);
insert into input_strings values (3, 'Mathguy rules');
create table row_lengths (r_id number, r_len number);
insert into row_lengths values (1, 40);
commit;
select * from input_strings;
STR_ID TXT
------- ---------------------------------------------------------------------------------
1 One Hundred Sixty-Nine Thousand Eight Hundred Seventy-Four Dollars And Nine Cents
2
3 Mathguy rules
3 rows selected
select * from row_lengths;
R_ID R_LEN
------- ----------
1 40
1 row selected.
Query and output: (NOTE: I include token length to verify that the first token is no more than 40 characters. OP did not answer if the SECOND token can be more than 40 characters; if it can't, one can add rows to the row_lengths table, perhaps with r_len = 40 for every row.)
with
r ( r_id, r_len ) as (
select r_id , r_len from row_lengths union all
select max(r_id) + 1, 4000 from row_lengths union all
select max(r_id) + 2, null from row_lengths
),
b (str_id, str, r_id, token, prev_pos, new_pos) as (
select str_id, txt || ' ', -1, null, null, 0
from input_strings
union all
select b.str_id, b.str, b.r_id + 1,
substr(str, prev_pos + 1, new_pos - prev_pos - 1),
b.new_pos,
new_pos + instr(substr(b.str, b.new_pos + 1, r.r_len + 1) , ' ', -1)
from b join r
on b.r_id + 2 = r.r_id
)
select str_id, r_id, token, nvl(length(token), 0) as len
from b
where r_id > 0
order by str_id, r_id;
STR_ID R_ID TOKEN LEN
------- ------- ------------------------------------------------ -------
1 1 One Hundred Sixty-Nine Thousand Eight 37
1 2 Hundred Seventy-Four Dollars And Nine Cents 43
2 1 0
2 2 0
3 1 Mathguy rules 13
3 2 0
6 rows selected.

Get max number from table add one and check with specific convention

I have to produce artikel number based on some convention, and this convention is as below
The number of digits
{1 or 2 or 3}.{4 or 5}.{n}
example products numbers:
7.1001.1
1.1453.1
3.5436.1
12.7839.1
12.3232.1
13.7676.1
3.34565.1
12.56433.1
247.23413.1
The first part is based on producent, and every producent has its own number. Let's say Rebook - 12, Nike - 256 and Umbro - 3.
I have to pass this number and check in table if there are some rows containing it e.g i pass 12 then i should get everything which starts from 12.
and now there should be three cases what to do:
1st CASE: no rows at the table:
then retrieve 1001
2nd case: if there are rows
so for sure there is already at least one:
12.1001.1
and more if they are let's say:
12.1002.1
12.1003.1
...
12.4345.1
so should be retreived next one so: 4346
and if there are already 5-digits for this product so let's say:
12.1002.1
12.1003.1
...
12.9999.1
so should be retreived next one so: 10001
3rd case: in fact same as 2nd but if it rached 9999 for second part:
12.1001.1
...
12.9999.1
then returned should be: 10001
or
12.1002.1
12.1003.1
...
12.9999.1
12.10001.1
12.10002.1
so should be retreived next one so: 10003
Hope you know what i mean
I already have started something. This code is taking producent number - looking for all rows starting with it and then just simply adding 1 to the second part unfortunetly i am not sure how should i change it according to those 3 cases.
select
parsename(max(nummer), 3) + '.' -- 3
+ ltrim(max(cast(parsename(nummer, 2) as int) +1)) -- 5436 -> 5437
+ '.1'
from tbArtikel
where Nummer LIKE '3.%'
Counting on your help. If something unclear let me know.
Additional question:
Using cmd As New SqlCommand("SELECT CASE WHEN r.number Is NULL THEN 1001
WHEN r.number = 9999 THEN 10001
Else r.number + 1 End number
FROM (VALUES(#producentNumber)) AS a(art) -- this will search this number within inner query And make case..
LEFT JOIN(
-- Get producent (in Like) number And max number Of it (without Like it Get all producent numbers And their max number out Of all
SELECT PARSENAME(Nummer, 3) art,
MAX(CAST(PARSENAME(Nummer, 2) AS INT)) number
FROM tbArtikel WHERE Nummer Like '#producentNumber' + '[.]%'
GROUP BY PARSENAME(Nummer, 3)
) r
On r.art = a.art", con)
cmd.CommandType = CommandType.Text
cmd.Parameters.AddWithValue("#producentNumber", producentNumber)
A fairly straight forward way is to (ab)use PARSENAME to split the string to be able to extract the current maximum. An outer query can then just implement the rules for the value being missing/9999/other.
The value (12 here) is inserted in a table value constructor to be able to detect a missing value using a LEFT JOIN.
SELECT CASE WHEN r.number IS NULL THEN 1001
WHEN r.number = 9999 THEN 10001
ELSE r.number + 1 END number
FROM ( VALUES(12) ) AS a(category)
LEFT JOIN (
SELECT PARSENAME(prodno, 3) category,
MAX(CAST(PARSENAME(prodno, 2) AS INT)) number
FROM products
GROUP BY PARSENAME(prodno, 3)
) r
ON r.category = a.category;
An SQLfiddle to test with.
As a further optimization, you could add a WHERE prodno LIKE '12[.]%' in the inner query to not parse through un-necessary rows.
I don't fully understand what you're asking for. I am unsure about the examples...but if i was doing it I'd try to break the field into 3 fields first and then do something with them.
sqlfiddle
SELECT nummer,LEFT(nummer,first-1) as field1,
RIGHT(LEFT(nummer,second-1),second-first-1) as field2,
RIGHT(nummer,LEN(nummer)-second) as field3
FROM
(SELECT nummer,
CHARINDEX('.',nummer) as first,
CHARINDEX('.',nummer,CHARINDEX('.',nummer)+1)as second
from tbArtikel)T
Hopefully with the 3 fields broken up, it's much easier to apply logics to them now.
update:
Okay i reread your question and i sort of know what you're trying to get at..
if user search for a value that doesn't exist for example 8.
Then you want 1001 returned
if they search for anything else that has results then return the max+1
unless it's 9999 then return 10001.
If this is correct then check this sqlfiddle2
DECLARE #search varchar(20)
SET #search = '8'
SELECT field1,max(nextvalue) as nextvalue FROM
(SELECT field1,
MAX(CASE (field2)
WHEN 9999 THEN 10001
ELSE field2+1
END) as nextvalue
FROM
(SELECT nummer,
CAST(LEFT(nummer,first-1) as INTEGER) as field1,
CAST(RIGHT(LEFT(nummer,second-1),second-first-1) as INTEGER) as field2,
CAST(RIGHT(nummer,LEN(nummer)-second) as INTEGER) as field3
FROM
(SELECT nummer,
CHARINDEX('.',nummer) as first,
CHARINDEX('.',nummer,CHARINDEX('.',nummer)+1)as second
FROM tbArtikel
)T
)T2
GROUP BY field1
UNION
SELECT CAST (#search as INTEGER)as field1 ,1001
)T3
WHERE field1 = #search
GROUP BY field1
Just change the #search variable to see it's results
I think there might be a cleaner way to do this but it's not coming to me right now :(
If you really can't add 2 new fields (is't probably the simplest and fastest solution), and probably can't add functional index, you must extract 2nd part number and get max of this, increment, then concatenate with your condition 1st part number and '.1' at the end:
SELECT :par1 || '.' || (Max(To_Number(SubStr(nummer, dot1 + 1, dot2 - dot1 -1 ))) + 1) || '.1' NEW_number
--SELECT SubStr(nummer, 1, dot1 - 1) N1st, SubStr(nummer, dot1 + 1, dot2 - dot1 -1 ) N2nd, SubStr(nummer, dot2 + 1) N1th
FROM (
SELECT nummer, InStr(nummer, '.') dot1, InStr(nummer, '.', 1, 2) dot2
FROM tbArtikel
WHERE nummer LIKE :par1 || '.%')
;
--GROUP BY SubStr(nummer, 1, dot1 – 1)
it was for oracle sql, i don't have sql-serwer to test, but probably this is simplest answer:
select #par1 + '.' + (select max(cast(SUBSTRING(nummer, CHARINDEX( '.', nummer, 1 ) +1, CHARINDEX( '.', nummer, CHARINDEX( '.', nummer, 1 ) +1 ) - CHARINDEX( '.', nummer, 1 ) -1) as int)) + 1 from tbArtikel where nummer LIKE #par1 || '.%') + '.1'
if parsename(nummer, 2) is you defined function to get 2nd number then:
select #parm + '.' + (max(cast(parsename(nummer, 2) as int)) + 1) + '.1'
from tbArtikel
where Nummer LIKE #parm + '.%'

sort by second string in database field

I have the below sql statement which sorts an address field (address1) using the street name not the number. This seems to work fine but I want the street names to appear alphabetically. The ASC at the end of order by doesnt help
e.g Address1 field might contain
"5 Elm Close" - a normal sort and order will sort by the number the below will sort by looking at the 2nd string "Elm"
(Using SQL Server)
SELECT tblcontact.ContactID, tblcontact.Forename, tblcontact.Surname,
tbladdress.AddressLine1, tbladdress.AddressLine2
FROM tblcontact
INNER JOIN tbladdress
ON tblcontact.AddressID = tbladdress.AddressID
LEFT JOIN tblDonate
ON tblcontact.ContactID = tblDonate.ContactID
WHERE (tbladdress.CollectionArea = 'Queens Park')
GROUP BY tblcontact.ContactID, tblcontact.Forename, tblcontact.Surname,
tbladdress.AddressLine1, tbladdress.AddressLine2
ORDER BY REVERSE(LEFT(REVERSE(tbladdress.AddressLine1),
charindex(' ', REVERSE(tbladdress.AddressLine1)+' ')-1)) asc
Gordon's statement sorts as below
1 Kings Road
10 Olivier Way
11 Albert Street
11 Kings Road
11 Princes Road
120 High Street
Try this: I based it off of Gordon's code, but altered it to remove the LEFT(AddressLine1, 1) portion - a single-character string could never be match the pattern "n + space + %".
This works on my SQL-Server 2012 environment:
WITH tbladdress AS
(
SELECT AddressLine1 FROM (VALUES ('1 Kings Road'),('10 Olivier Way'), ('11 Albert Street')) AS V(AddressLine1)
)
SELECT
AddressLine1
FROM tbladdress
order by (case when tbladdress.AddressLine1 like '[0-9]% %'
then substrING(tbladdress.AddressLine1, charindex(' ', tbladdress.AddressLine1) + 1, len(tbladdress.AddressLine1))
else tbladdress.AddressLine1
end)
This is edited to be more similar to Gordon's code (position of closing parentheses, substr instead of substring):
order by (case when tbladdress.AddressLine1 like '[0-9]% %'
then substr(tbladdress.AddressLine1, charindex(' ', tbladdress.AddressLine1) + 1), len(tbladdress.AddressLine1)
else tbladdress.AddressLine1
end)
If you assume that the street name is the first or second value in a space separated string, you could try:
order by (case when left(tbladdress.AddressLine1, 1) like '[0-9]% %'
then substr(tbladdress.AddressLine1, charindex(' ', tbladdress.AddressLine1) + 1), len(tbladdress.AddressLine1) )
else tbladdress.AddressLine1
end)
I don't think you need to use REVERSE() at all. That seems like a trap.
ORDER BY
CASE
WHEN ISNUMERIC(LEFT(tbladdress.AddressLine1,CHARINDEX(' ',tbladdress.AddressLine1) - 1))
THEN RIGHT(tbladdress.AddressLine1,LEN(tbladdress.AddressLine1) - CHARINDEX(' ',tbladdress.AddressLine1))
ELSE tbladdress.AddressLine1
END,
CASE
WHEN ISNUMERIC(LEFT(tbladdress.AddressLine1,CHARINDEX(' ',tbladdress.AddressLine1) - 1))
THEN CAST(LEFT(tbladdress.AddressLine1,CHARINDEX(' ',tbladdress.AddressLine1) - 1) AS INT)
ELSE NULL
END
Also, you have a GROUP BY with no aggregate function. While that's not wrong, per se, it is weird. Just use DISTINCT if you're getting duplicate records.
This is the bit of code that works in sql server
order by (case when tbladdress.AddressLine1 like '[0-9]% %'
then substrING(tbladdress.AddressLine1, charindex(' ', tbladdress.AddressLine1) + 1, len(tbladdress.AddressLine1))
else tbladdress.AddressLine1
end)

SQL - field concatenation, based on variable

I have a need to build a string from Last Name, First Name, Middle Initial according to the following rules:
If the Last Name is unique, just
return the Last Name
If the Last
Name isn't unique, but the first
letter of the First Name is unique,
return Last Name + first letter of
First Name
If the Last Name and
first letter of the First Name are
not unique, return the Last Name +
first letter of First Name + Middle
Initial.
For example, the table might be:
MDC MDLast MDFirst MDInit
3 Jones Fred A
21 Smith Sam D
32 Brown Tom E
42 Brown Ted A
55 Smith Al D
The query should return:
MDC MDFormattedName
3 Jones
21 Smith S
32 Brown TE
42 Brown TA
55 Smith A
I've written up a query that almost works, but it is using several nested queries, and will still need several more to (possibly) make a workable solution, and is so inefficient. I'm sure there is a 'proper' way to implement this (for SQL Server 2005, BTW).
This is what I've got so far. It doesn't work, due to the aggregations I lose the IDs can can't do the final join to get ID/Name pairs.
select
CASE
WHEN CountLastFirst > 1 THEN
CASE WHEN MDInit IS NOT NULL THEN MDLastFirst + LEFT(MDInit,1) ELSE MDLastFirst END
WHEN CountLastFirst = 1 AND CountLast > 1 THEN MDLastFirst
ELSE MDLast
END as MDName
FROM
(
select x.MDLast, CountLast, MDLastFirst, CountLastFirst FROM
(
select MDLast,Count(MDLast) as CountLast FROM
MDList
GROUP BY MDLast) as x
INNER JOIN
(select MDLast, MDLastFirst,Count(MDLastFirst) as CountLastFirst FROM
(
select MDLast,
MDLast + ' ' + LEFT(MDFirst,1) as MDLastFirst
From MDList
) as a
GROUP BY MDLastFirst, MDLast) as y ON x.MDLast = y.MDLast
) as z
Assuming a table name of MDCTable, this should work:
SELECT MDCTable.MDC,
CASE MDCCount.NameCount
WHEN 1
THEN MDCTable.MDLast
ELSE
CASE MDFormat1Count
WHEN 1
THEN MDFormat1.MDFormat1Name
ELSE MDCTable.MDLast + ' ' + upper(left(MDCTable.MDFirst, 1)) +
MDCTable.MDInit
END
END AS MDFormattedName
FROM MDCTable
INNER JOIN
(
SELECT COUNT(MDLast) as NameCount, MDLast
FROM MDCTable
GROUP BY MDLast
) MDCCount ON MDCCount.MDLast = MDCTable.MDLast
INNER JOIN (
SELECT COUNT(MDLast + left(MDFirst, 1)) as MDFormat1Count, MDLast + ' ' +
left(MDFirst, 1) AS MDFormat1Name
FROM MDCTable
GROUP BY MDLast + ' ' + left(MDFirst, 1)
) MDFormat1 ON MDCTable.MDLast + ' ' + left(MDCTable.MDFirst, 1) =
MDFormat1.MDFormat1Name
ORDER BY MDCTable.MDC
Have you considered performing this operation in your application instead of directly in an SQL statement? Unless you have a good reason to do this directly in SQL, this is almost always the preferable approach for situations like this.