Sql Server Splitting column value into email and name - sql

I am new to sql. I need help with separating 2 values from a column value.
Example column value:
Sam Taylor <Sam.Taylor#gmail.com>
I need 2 columns from that column.
Name Email
Sam Taylor Sam.Taylor#gmail.com
TIA
https://www.db-fiddle.com/f/beu4tdDo4WFAwKXtt9KL8A/0

DECLARE #yourField nvarchar(100)='Sam Taylor <Sam.Taylor#gmail.com>'
SELECT SUBSTRING (#yourField ,0,CHARINDEX('<',#yourField)) as Name, SUBSTRING (#yourField ,CHARINDEX('<',#yourField)+1,CHARINDEX('>',#yourField)-CHARINDEX('<',#yourField)-1) as Email

One way to do this will be like below but assuming that you do not have < or > as data in name.
select datastring,
name=max(case when row=1 then value else null end),
email=max(case when row=2 then value else null end)
from
(
select
datastring,
value=REPLACE(value,'>',''),
row=row_number() over (partition by datastring order by datastring)
from yourtable
cross apply STRING_SPLIT(datastring,'<')
)t
group by datastring

You can use string_split(), but like this:
select t.*, v.*
from t cross apply
(select max(case when s.value not like '%#%>' then trim(s.value) end) as name,
max(case when s.value like '%#%>' then replace(s.value, '>', '') end) as email
from string_split(t.full_email, '<') s
) v;
In older versions, you can use:
select ltrim(rtrim(left(full_email, charindex('<', full_email) - 1))) as name,
replace(stuff(full_email, 1, charindex('<', full_email), ''), '>', '') as name
from t;
Here is a db<>fiddle.

Related

Split one column with weird string into multiple columns by specific delimiter in single select sql

I am seeking/hoping for a simpler solution, although I got a working solution already.
But it is hard for me to accept, that this is the only way. Therefore my hope is, that someone who is a good sql poweruser may have a better idea.
Background:
A simple table looking like that:
weirdstring
ID
A;GHL+BH;BC,NA-NB,[AB]
1
B;GHL+BH;BC,NA-NB,[AB]
2
C;GHL+BH;BC,NA-NB,[AB]
3
CREATE TABLE TESTTABLE (weirdstring varchar(MAX),
ID int);
INSERT INTO TESTTABLE
VALUES ('A;GHL+BH;BC,NA-NB,[AB]', 1);
INSERT INTO TESTTABLE
VALUES ('B;GHL+BH;BC,NA-NB,[AB]', 2);
INSERT INTO TESTTABLE
VALUES ('C;GHL+BH;BC,NA-NB,[AB]', 3);
All I need in the end is the first 3 "letter-groups" (1-3 letterst) from weirdstring (eg.ID 1 = A,GHL and BH, the rest of the string is not important now) in seperate columns:
ID
weirdstring
group1
group2
group3
1
A;GHL+BH;BC,NA-NB,[AB]
A
GHL
BH
2
B;GHL+BH;BC,NA-NB,[AB]
B
GHL
BH
3
C;GHL+BH;BC,NA-NB,[AB]
C
GHL
BH
What have been done so far is:
change all weird delimiters(;+- and potential more) in the string to comma, eliminate the brackets around "letter-groups". REPLACE daisy-chained is being used. So from A;GHL+BH;BC,NA-NB,[AB] to
A,GHL,BH,BC,NA,NB,AB first.
split the new string to columns by comma as delimiter.
The query used is:
SELECT t1.ID,
t1.weirdstring,
t2.group1,
t2.group2,
t2.group3
FROM TESTTABLE t1
LEFT JOIN (SELECT grp1.ID,
grp1.weirdstring AS group1,
grp2.weirdstring AS group2,
grp3.weirdstring AS group3
FROM (SELECT ID,
weirdstring
FROM (SELECT ID,
weirdstring,
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY (SELECT NULL)) AS ROWNUM
FROM (SELECT ID,
value AS weirdstring
FROM TESTTABLE
CROSS APPLY STRING_SPLIT(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(weirdstring, '[', ''), ']', ''), ';', ','), '+', ','), '-', ','), '.', ','), ',')
WHERE weirdstring IS NOT NULL) splitted ) s1
WHERE ROWNUM = 1) grp1
LEFT JOIN (SELECT ID,
weirdstring
FROM (SELECT ID,
weirdstring,
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY (SELECT NULL)) AS ROWNUM
FROM (SELECT ID,
value AS weirdstring
FROM TESTTABLE
CROSS APPLY STRING_SPLIT(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(weirdstring, '[', ''), ']', ''), ';', ','), '+', ','), '-', ','), '.', ','), ',')
WHERE weirdstring IS NOT NULL) splitted ) s2
WHERE ROWNUM = 2) grp2 ON grp1.ID = grp2.ID
LEFT JOIN (SELECT ID,
weirdstring
FROM (SELECT ID,
weirdstring,
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY (SELECT NULL)) AS ROWNUM
FROM (SELECT ID,
value AS weirdstring
FROM TESTTABLE
CROSS APPLY STRING_SPLIT(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(weirdstring, '[', ''), ']', ''), ';', ','), '+', ','), '-', ','), '.', ','), ',')
WHERE weirdstring IS NOT NULL) splitted ) s3
WHERE ROWNUM = 3) grp3 ON grp3.ID = grp2.ID) t2 ON t1.ID = t2.ID;
But I could not believe how much of a query have been created in the end for my small task. At least I believe its small. I am on an older version (14) of sql-server and therefore I cannot use string_split with its third parameter (enable-ordinal) Syntax:
STRING_SPLIT ( string , separator [ , enable_ordinal ] )
Note: https://learn.microsoft.com/en-us/sql/t-sql/functions/string-split-transact-sql?view=sql-server-ver16 : The enable_ordinal argument and ordinal output column are currently supported in Azure SQL Database, Azure SQL Managed Instance, and Azure Synapse Analytics (serverless SQL pool only). Beginning with SQL Server 2022 (16.x) Preview, the argument and output column are available in SQL Server.
Is there some other, shorter ways to achieve the same results? I know that topic has been discussed many many times, but I could not find a solution to my specific problem here. Thanks in advance for any kind of help!
It seems that you are using SQL Server 2017 (v.14), so a possible option is the following JSON-based approach. The idea is to transform the stored text into a valid JSON array (A;GHL+BH;BC,NA-NB,[AB] into ["A","GHL","BH","BC","NA","NB","AB"]) using TRANSLATE() for character replacement and get the expected parts of the string using JSON_VALUE():
SELECT
weirdstring,
JSON_VALUE(jsonweirdstring, '$[0]') AS group1,
JSON_VALUE(jsonweirdstring, '$[1]') AS group2,
JSON_VALUE(jsonweirdstring, '$[2]') AS group3
FROM (
SELECT
weirdstring,
CONCAT('["', REPLACE(TRANSLATE(weirdstring, ';+-,[]', '######'), '#', '","'), '"]') AS jsonweirdstring
FROM TESTTABLE
) t

How to write a BigQuery query that produces the count of the unique transactions and the combination of column names populated

I’m trying to write a query in BigQuery that produces the count of the unique transactions and the combination of column names populated.
I have a table:
TRAN CODE
Full Name
Given Name
Surname
DOB
Phone
The result set I’m after is:
TRAN CODE
UNIQUE TRANSACTIONS
NAME OF POPULATED COLUMNS
A
3
Full Name
A
4
Full Name,Phone
B
5
Given Name,Surname
B
10
Given Name,Surname,DOB,Phone
The result set shows that for TRAN CODE A
3 distinct customers provided Full Name
4 distinct customers provided Full Name and Phone #
For TRAN CODE B
5 distinct customers provided Given Name and Surname
10 distinct customers provided Given Name, Surname, DOB, Phone #
Currently to produce my results I’m doing it manually.
I tried using ARRAY_AGG but couldn’t get it working.
Any advice work be appreciated.
Thank you.
I think you want something like this:
select tran_code,
array_to_string(array[case when full_name is not null then 'full_name' end,
case when given_name is not null then 'given_name' end,
case when surname is not null then 'surname' end,
case when dob is not null then 'dob' end,
case when phone is not null then 'phone' end
], ','),
count(*)
from t
group by 1, 2
Consider below approach - no any dependency on column names rather than TRAN_CODE - quite generic!
select TRAN_CODE,
count(distinct POPULATED_VALUES) as UNIQUE_TRANSACTIONS,
POPULATED_COLUMNS
from (
select TRAN_CODE,
( select as struct
string_agg(col, ', ' order by offset) POPULATED_COLUMNS,
string_agg(val order by offset) POPULATED_VALUES,
string_agg(cast(offset as string) order by offset) pos
from unnest(regexp_extract_all(to_json_string(t), r'"([^"]+?)":')) col with offset
join unnest(regexp_extract_all(to_json_string(t), r'"[^"]+?":("[^"]+?"|null)')) val with offset
using(offset)
where val != 'null'
and col != 'TRAN_CODE'
).*
from `project.dataset.table` t
)
group by TRAN_CODE, POPULATED_COLUMNS
order by TRAN_CODE, any_value(pos)
below is output example
#Gordon_Linoff's solution is the best, but an alternative would be to do the following:
SELECT
TRAN_CODE,
COUNT(TRAN_ROW) AS unique_transactions,
populated_columns
FROM (
SELECT
TRAN_CODE,
TRAN_ROW,
# COUNT(value) AS unique_transactions,
STRING_AGG(field, ",") AS populated_columns
FROM (
SELECT
* EXCEPT(DOB),
CAST(DOB AS STRING ) AS DOB,
ROW_NUMBER() OVER () AS TRAN_ROW
FROM
sample) UNPIVOT(value FOR field IN (Full_name,
Given_name,
Surname,
DOB,
Phone))
GROUP BY
TRAN_CODE,
TRAN_ROW )
GROUP BY
TRAN_CODE,
populated_columns
But this should be more expensive...

Sort varchar with alpha numeric and special character values

I have a invoice_number field as varchar(20)
I have the select query as
SELECT Row_Number() OVER(ORDER BY case isnumeric(invoice_number)
when 1 then convert(bigint,invoice_num)
else 99999999999999999999
end) As id,
name,
submit_date,
invoice_number,
invoice_total,
currency_code
FROM vw_invoice_report
which works fine for a few scenarios but I couldn't make it work for all the the invoice_number values as below
f8ad2a28ddad4f6aa4df
0B849D69741145379079
20190313176617593442
ATOctober2000Promise
00100001010000000061
E285567EF0D0885E9160
SC1805000123000293
1999bernstyin2010
20600006307FFGMG
REVISED INVOICE F...
1111-2222(changzhou)
667339, 667340, 6...
18.12733562GAGA L...
IN-US01235055 ...
SSR-USD/426/2019 - 2
Nanny; Park Doug
184034
376840
376847-1
72692
72691
72690
72689
Am getting Error converting data type varchar to bigint. for some of the above data, can someone please help me make it work for the above test data?
Your problem is that some of your invoice numbers (for example 20190313176617593442) are too large for the BIGINT data type. You can work around this by keeping the values as strings, and left padding with 0 the numeric ones out to 20 digits for sorting. For example:
SELECT Row_Number() OVER(ORDER BY case isnumeric(invoice_number)
when 1 then REPLACE(STR(invoice_number, 20), ' ', '0')
else '99999999999999999999'
end) As id,
Demo (also showing converted invoice numbers) on SQLFiddle
Update
Based on OP comments and additional values to be sorted, this query should satisfy that requirement:
SELECT Row_Number() OVER(ORDER BY case
when isnumeric(invoice_number) = 1 then RIGHT('00000000000000000000' + REPLACE(invoice_number, '.', ''), 20)
when invoice_number like '%[0-9]-[0-9]%' and invoice_number not like '%[^0-9]' then REPLACE(STR(REPLACE(invoice_number, '-', '.'), 20), ' ', '0')
else '99999999999999999999'
end) As id,
invoice_number
FROM vw_invoice_report
Demo on SQLFiddle
Hmmm. I am thinking this might do what you want:
row_number() over (order by (case when isnumeric(invoicenumber) = 1
then len(invoicenumber)
else 99999
end
),
invoicenumber
)

How to combine return results of query in one row

I have a table that save personnel code.
When I select from this table I get 3 rows result such as:
2129,3394,3508,3534
2129,3508
4056
I want when create select result combine in one row such as:
2129,3394,3508,3534,2129,3508,4056
or distinct value such as:
2129,3394,3508,3534,4056
You should ideally avoid storing CSV data at all in your tables. That being said, for your first result set we can try using STRING_AGG:
SELECT STRING_AGG(col, ',') AS output
FROM yourTable;
Your second requirement is more tricky, and we can try going through a table to remove duplicates:
WITH cte AS (
SELECT DISTINCT VALUE AS col
FROM yourTable t
CROSS APPLY STRING_SPLIT(t.col, ',')
)
SELECT STRING_AGG(col, ',') WITHIN GROUP (ORDER BY CAST(col AS INT)) AS output
FROM cte;
Demo
I solved this by using STUFF and FOR XML PATH:
SELECT
STUFF((SELECT ',' + US.remain_uncompleted
FROM Table_request US
WHERE exclusive = 0 AND reqact = 1 AND reqend = 0
FOR XML PATH('')), 1, 1, '')
Thank you Tim

Count the number of not null columns using a case statement

I need some help with my query...I am trying to get a count of names in each house, all the col#'s are names.
Query:
SELECT House#,
COUNT(CASE WHEN col#1 IS NOT NULL THEN 1 ELSE 0 END) +
COUNT(CASE WHEN col#2 IS NOT NULL THEN 1 ELSE 0 END) +
COUNT(CASE WHEN col#3 IS NOT NULL THEN 1 ELSE 0 END) as count
FROM myDB
WHERE House# in (house#1,house#2,house#3)
GROUP BY House#
Desired results:
house 1 - the count is 3 /
house 2 - the count is 2 /
house 3 - the count is 1
...with my current query the results for count would be just 3's
In this case, it seems that counting names is the same as counting the commas (,) plus one:
SELECT House_Name,
LEN(Names) - LEN(REPLACE(Names,',','')) + 1 as Names
FROM dbo.YourTable;
Another option since Lamak stole my thunder, would be to split it and normalize your data, and then aggregate. This uses a common split function but you could use anything, including STRING_SPLIT for SQL Server 2016+ or your own...
declare #table table (house varchar(16), names varchar(256))
insert into #table
values
('house 1','peter, paul, mary'),
('house 2','sarah, sally'),
('house 3','joe')
select
t.house
,NumberOfNames = count(s.Item)
from
#table t
cross apply dbo.DelimitedSplit8K(names,',') s
group by
t.house
Notice how the answers you are getting are quite complex for what they're doing? That's because relational databases are not designed to store data that way.
On the other hand, if you change your data structure to something like this:
house name
1 peter
1 paul
1 mary
2 sarah
2 sally
3 joe
The query now is:
select house, count(name)
from housenames
group by house
So my recommendation is to do that: use a design that's more suitable for SQL Server to work with, and your queries become simpler and more efficient.
One dirty trick is to replace commas with empty strings and compare the lengths:
SELECT house +
' has ' +
CAST((LEN(names) - LEN(REPLACE(names, ',', '')) + 1) AS VARCHAR) +
' names'
FROM mytable
You can parse using xml and find count as below:
Select *, a.xm.value('count(/x)','int') from (
Select *, xm = CAST('<x>' + REPLACE((SELECT REPLACE(names,', ','$$$SSText$$$') AS [*] FOR XML PATH('')),'$$$SSText$$$','</x><x>')+ '</x>' AS XML) from #housedata
) a
select House, 'has '+cast((LEN(Names)-LEN(REPLACE(Names, ',', ''))+1) as varchar)+' names'
from TempTable