String operations in BigQuery - sql

Adams Allen "prop1-prop2-pro3"
Burns Bonnie "prop1Burns-prop2bon-prop3-ch"
Cannon Charles "prop1a-prop2b-prop3c"
I have the above table stored in BigQuery and the 3rd column is guaranteed to have 3 properties separated by '-'.
I want to do string operations on 3rd column and return something like 'custom_string-prop1-custom_string2-prop2' for each row. How do I do in BigQuery?

You can use split():
select split('prop1-prop2-pro3', '-')[ordinal(1)] as custom1,
split('prop1-prop2-pro3', '-')[ordinal(2)] as custom2,
split('prop1-prop2-pro3', '-')[ordinal(3)] as custom3
You can also just put them into an array -- that might be as or more convenient:
select split(col, '-') as prop_array

Related

Extract all text except the last word from a column using SQL

Let's say I have a column in my items table called name:
name
----
Wrench
Hammer (label1)
Screwdriver (label1) (label2)
Tape Measure (label1) (label2) (label3)
I want to write a PostgreSQL query to extract all text except the last label (if it exists). So given the data above, I want to end up with:
substring
---------
Wrench
Hammer
Screwdriver (label1)
Tape Measure (label1) (label2)
How can I do this?
use substring and a regular expression.
The syntax is:
substring(string, regularExpression)
The regular expression should use () to delimit what part of the string to extract. For example:
substring('abcef', 'b(..)')
will return 'ce', the two characters that follow b. If the regular expression does not match the string it returns NULL.
Specifically in this case:
dmg#[local] # select substring('Hammer (label1)' from '^(.+)\([^\)]+\)$') ;
substring
-----------
Hammer
(1 row)
dmg#[local] # select substring('Tape Measure (label1) (label2) (label3)' from '^(.+)\([^\)]+\)$') ;
substring
---------------------------------
Tape Measure (label1) (label2)
(1 row)
regexp_replace() is a simple way to handle this:
select v.name,
regexp_replace(name, ' [(][^)]+[)]$', '')
from (values
('Wrench'),
('Hammer (label1)'),
('Screwdriver (label1) (label2)'),
('Tape Measure (label1) (label2) (label3)')
) v(name);
Here is a db<>fiddle.

How to extract alphanumeric phrase from a string with an alphanumeric phrase AND alphabetical characters

I have a column that can have the following possible values -
ITO26218361281- JANE
SBC28791827135 VATS
SOT21092832917 JOHN DOE
TIM INQ12109283291
JANE DOE 12/15
I only want to extract the 14 digit alphanumeric phrase from the strings that can look like above. If the record is like (5), I still want that record to exist to be able to call it out as an error. I don't need the exact text to be the same, I just need it to be flagged for error.
Result expected -
ITO26218361281
SBC28791827135
SOT21092832917
INQ12109283291
JANE DOE 12/15 (or flagged as error)
I will assume that you are working on a fairly recent SQL Server version and have access to the STRING_SPLIT() function (SQL Server 2016 and later).
Split the values with the string_split() function. This gives you a value column to work with.
Remove the dash (-) from your first example row with a replace() function where required. If there are many random trailing characters you might want to prep you data a bit more.
The first 3 characters should not be numeric.
All remaining characters have to be numeric.
Sample data
create table data
(
input nvarchar(50)
);
insert into data (input) values
('ITO26218361281- JANE'),
('SBC28791827135 VATS'),
('SOT21092832917 JOHN DOE'),
('TIM INQ121092832917');
Solution
select replace(s.value, '-', '') as result
from data d
cross apply string_split(d.input, ' ') s
where isnumeric(left(s.value, 3)) = 0
and isnumeric(substring(replace(s.value, '-', ''), 4, 100)) = 1;
Result
result
---------------
ITO26218361281
SBC28791827135
SOT21092832917
INQ121092832917
Fiddle to see things in action.

SQL Query to Split a string, format each result, then build back into a single row

I've been working on a rather convoluted process to format some data for a work project. We received a data extract and need to export it for import during a migration, but some of the data won't import due to case sensitivity (user logons with sentence case for example).
In an ideal world, I could demand the data be sanitised and formatted before it's provided for me to build the import, but I can't, so I'm stuck where I have to format it myself.
Plan:
Take string result
Split string result by pipe delimitation
Format each split results ( ) into lower case (where applicable)
Put all split results back into one string using FOR XML PATH
Example of problem:
Field 'Assigned To' can contain a pipe delimitted string of users and/or user groups, e.g.
John Smith (jsmith)|College Group of Arts|Bob Jones (BJones)
Now as you can see above, John Smith (jsmith) looks fine, as does College Group of Arts, however Bob Jones has had his logon sentence cased, so I need to use a LOWER command, chained with SUBSTRING and CHARINDEX to convert the logon to lower. Standalone, this approach works fine, but the problem I'm having is where I'm using a function found here on Stack Overflow (slightly manipulated to account for pipe delimitation) T-SQL split string.
When I retrieve the table results of the split string, I can't apply CHARINDEX against any characters in the result string, and I can't work out why.
Scenario:
The raw data extract, untouched, returns the below when queried;
|College of Science Administrators|Bob Jones (BJones)|
I then apply the below query, which calls the function queried above;
declare #assignedto nvarchar(max) = (select assigned_to from project where project_id = 1234)
SELECT SUBSTRING(Name,CHARINDEX(Name,'('),255)
FROM dbo.splitstring(#assignedto)
I then get the below results;
College of Science Administrators
Bob Jones (BJones)
What I'd expect to see is;
College of Science Administrators
(BJones)
I could then apply my LOWER logic to change it to lower case.
If that worked, then thought process was then to take those results and pass them back into a single string using a FOR XML PATH.
So I guess technically, there are 2 questions here;
Why won't my function let me manipulate the results with CHARINDEX?
And is this the best way to do what I'm trying to achieve overall?
I would strongly suggest you take that splitstring function you found and throw it away. It is horribly inefficient and doesn't even take the delimiter as a parameter. There are so many better splitter options available. One such example is the DelimitedSplit8K_LEAD which can be found here.
I noticed you also have your delimiters at the beginning and the end so you have to eliminate those but not a big deal. Here is how I would go about parsing this string. I am using a variable for your string here with the value you said is in your table.
declare #Something varchar(100) = '|College of Science Administrators|Bob Jones (BJones)|'
select MyOutput = case when charindex('(', x.Item) > 1 then substring(x.Item, charindex('(', x.Item), len(x.Item)) else Item end
from dbo.DelimitedSplit8K_LEAD(#Something, '|') x
where x.Item > ''
For question #1 you must simply invert parameters in CharIndex :
CHARINDEX('(', Name))

SQL - Split string with multiple delimiters into multiple rows and columns

I am trying to split a string in SQL with the following format:
'John, Mark, Peter|23, 32, 45'.
The idea is to have all the names in the first columns and the ages in the second column.
The query should be "dynamic", the string can have several records depending on user entries.
Does anyone know how to this, and if possible without SQL functions? I have tried the cross apply approach but I wasn't able to make it work.
Any ideas?
This solution uses Jeff Moden's DelimitedSplit8k. Why? Because his solution provides the ordinal position of the item. Ordinal Position something that many others functions, including Microsoft's own STRING_SPLIT, does not provide. It's going to be vitally important for getting this to work correctly.
Once you have that, the solution becomes fairly simple:
DECLARE #NameAges varchar(8000) = 'John, Mark, Peter|23, 32, 45';
WITH Splits AS (
SELECT S1.ItemNumber AS PipeNumber,
S2.ItemNumber AS CommaNumber,
S2.Item
FROM dbo.DelimitedSplit8K (REPLACE(#NameAges,' ',''), '|') S1 --As you have spaces between the delimiters I've removed these. Be CAREFUL with that
CROSS APPLY DelimitedSplit8K (S1.item, ',') S2)
SELECT S1.Item AS [Name],
S2.Item AS Age
FROM Splits S1
JOIN Splits S2 ON S1.CommaNumber = S2.CommaNumber
AND S2.PipeNumber = 2
WHERE S1.PipeNumber = 1;

Trying to removes spaces between initials in sql

I have a column that contains a persons name and I need to extract it to pass to another system but I need to remove the spaces but only from between the initials
for example I might have
Mr A B Bloggs and I want Mr AB Bloggs or
Mrs A B C Bloggs and I want Mrs ABC Bloggs
As there are millions of records in the table I wont know how many initials there are or indeed if there are any initials. All I know is the prefix (Mr, Mrs etc) will be more than 1 character and so will the surname. I've tried using trim, replace, charindex but obviously not in the right combination. Any help would be appreciated.
Unfortunately SQL server does not support regex. You have two options:
Use .Net in CLR to perform the transformation. This link explains how to implement regex in SQL server using CLR: https://www.simple-talk.com/sql/t-sql-programming/clr-assembly-regex-functions-for-sql-server-by-example/.
Other option is to use a cursor to iterate through all the reocords and transform each entry. This may be slow for a large table. For example, you could write a function that returns location of spaces surrounded by single letters and then remove them. The trick is not to remove them until you have recorded all of them, and then remove them from right to left to avoid the location changing.
Try this:
declare #test varchar(100)='Mrs A B C Bloggs'
select (substring (#test,0,charindex(' ',#test)))+' '+
replace(replace(replace(substring(#test,len((substring (#test,0,charindex(' ',#test))))+1,len(#test)),
(substring (#test,0,charindex(' ',#test))),''),reverse((substring (reverse(#test),0,charindex(' ',reverse(#test))))),''),' ','')
+' '+reverse((substring (reverse(#test),0,charindex(' ',reverse(#test)))))