SQL mass string manipulation - sql

I'm working with an oracle DB and need to manipulate a string column within it. The column contains multiple email addresses in this format:
jgooooll#gmail.com;dhookep#gmail.com;amoore#outlook.com
What I want to do is take out anything that does not have '#gmail.com' at the end (in this example amoore#outlook.com would be removed) however amoore#outlook.com may be the first email in the next row of the column so in this way there is no real fixed format, the only format being that each address is seperated by a semi-colon.
Is there anyway of implementing this through one command to run through every row in the column and remove anything thats not #gmail.com? I'm not really sure if this kind of processing is possible in SQL. Just looking for your thoughts!!
Thanks a lot you guys. Look forward to hearing from you!

Applicable to Oracle 11g (11.2) onward only. Because listagg function is supported only in 11.2 onward. If you are using 10.1 onward up to 11.1, you can write your own string aggregate function or take this one.
with T1 as (
select 1 id, 'jhd#jk.com;jgooooll#gmail.com;dhookep#gmail.com;amoore#outlook.com' emails from dual union all
select 2 id, 'jhd#jk.com;jgooooll#gmail.com;dhookep#gmail.com;amoore#outlook.com' emails from dual
)
select id
, listagg(email, ';') within group(order by id) emails
from (select id
, regexp_substr(emails,'[^;]+', 1, rn) email
from t1
cross join (select rownum rn
from(select max (regexp_count(emails, '[^;]+')) ml
from t1
)
connect by level <= ml
)
)
where email like '%#gmail.com%'
group by id
Id Emails
--------------------------------------
1 dhookep#gmail.com;jgooooll#gmail.com
2 dhookep#gmail.com;jgooooll#gmail.com
Here is a Demo

This answer is actually for SQL Server, as that is what I know. That being said, perhaps having an example of how to do it in one system will give you an idea of how to do it in yours. Or maybe there is a way to convert the code into the same type of thing in Oracle.
First, the thought process: In SQL Server combining the FOR XML PATH and STUFF functionality allows you to make a comma separated list. I'm adding a WHERE Split.SplitValue LIKE ... clause into this to filter it to only gmail addresses. I'm cross applying this whole thing to the main table, and that turns it into a filtered email list. You could then further filter the main table to run this on a more targeted set of rows.
Second, the SQL Server implementation:
SELECT
*
FROM #Table Base
CROSS APPLY
(
SELECT
STUFF(
(SELECT
';' + Split.SplitValue AS [text()]
FROM dbo.fUtility_Split(Base.Emails, ';') Split
WHERE Split.SplitValue LIKE '%#gmail.com'
FOR XML PATH (''))
, 1, 1, '') Emails
) FilteredEmails
EDIT: I forgot to mention that this answer requires you have some sort of function to split a string column based on a separator value. If you don't have that already, then google for it. There are tons of examples.

Related

Loop through table and update a specific column

I have the following table:
Id
Category
1
some thing
2
value
This table contains a lot of rows and what I'm trying to do is to update all the Category values to change every first letter to caps. For example, some thing should be Some Thing.
At the moment this is what I have:
UPDATE MyTable
SET Category = (SELECT UPPER(LEFT(Category,1))+LOWER(SUBSTRING(Category,2,LEN(Category))) FROM MyTable WHERE Id = 1)
WHERE Id = 1;
But there are two problems, the first one is trying to change the Category Value to upper, because only works ok for 1 len words (hello=> Hello, hello world => Hello world) and the second one is that I'll need to run this query X times following the Where Id = X logic. So my question is how can I update X rows? I was thinking in a cursor but I don't have too much experience with it.
Here is a fiddle to play with.
You can split the words apart, apply the capitalization, then munge the words back together. No, you shouldn't be worrying about subqueries and Id because you should always approach updating a set of rows as a set-based operation and not one row at a time.
;WITH cte AS
(
SELECT Id, NewCat = STRING_AGG(CONCAT(
UPPER(LEFT(value,1)),
SUBSTRING(value,2,57)), ' ')
WITHIN GROUP (ORDER BY CHARINDEX(value, Category))
FROM
(
SELECT t.Id, t.Category, s.value
FROM dbo.MyTable AS t
CROSS APPLY STRING_SPLIT(Category, ' ') AS s
) AS x GROUP BY Id
)
UPDATE t
SET t.Category = cte.NewCat
FROM dbo.MyTable AS t
INNER JOIN cte ON t.Id = cte.Id;
This assumes your category doesn't have non-consecutive duplicates within it; for example, bora frickin bora would get messed up (meanwhile bora bora fickin would be fine). It also assumes a case insensitive collation (which could be catered to if necessary).
In Azure SQL Database you can use the new enable_ordinal argument to STRING_SPLIT() but, for now, you'll have to rely on hacks like CHARINDEX().
Updated db<>fiddle (thank you for the head start!)

Oracle - CSV splitting only one column in a query

Oracle 12cR1 - I have a complex business process I am putting into a query.
In general, the process will be
with t1 as (select CATEGORY, PRODUCT from ... )
select <some manipulation> from t1;
t1 -aka the output of the first line- will look like this:
CATEGORY PRODUCT
Database Oracle, MS SQL Server, DB2
Language C, Java, Python
I need the 2nd line of the SQL query (aka the manipulation) to keep the CATEGORY column, and to split the PRODUCT column on the comma. The output needs to look like this
CATEGORY PRODUCT
Database Oracle
Database MS SQL Server
Database DB2
Language C
Language Java
Language Python
I have looked at a couple of different CSV splitting options. I cannot use the DBMS_Utility.comma_to_Table function as this has restrictions with special characters or starting with numbers. I found a nice TABLE function which will convert a string to separate rows, called f_convert. This function is on StackOverflow about 1/3 the way down the page here.
Since this is a table function, it is called like so...And will give me 3 rows, as expected.
SELECT * FROM TABLE(f_convert('Oracle, MS SQL Server, DB2'));
How do I treat this TABLE function as it is was a "column function"? Although this is totally improper SQL, I am looking for something like
with t1 as (select CATEGORY, PRODUCT from ... )
select CATEGORY from T1, TABLE(f_convert(PRODUCT) as PRODUCT from t1;
Any help appreciated...
Use connect by to "loop" through the elements of the list where a comma-space is the delimiter. regexp_substr gets the list elements (the regex allows for NULL list elements) and the prior clauses keep the categories straight.
with t1(category, product) as (
select 'Database', 'Oracle, MS SQL Server, DB2' from dual union all
select 'Language', 'C, Java, Python' from dual
)
select category,
regexp_substr(product, '(.*?)(, |$)', 1, level, NULL, 1) product
from t1
connect by level <= regexp_count(product, ', ')+1
and prior category = category
and prior sys_guid() is not null;
CATEGORY PRODUCT
-------- --------------------------
Database Oracle
Database MS SQL Server
Database DB2
Language C
Language Java
Language Python
6 rows selected.
SQL>

SQL Server : GROUP CONCAT with DISTINCT is sorting natural data input

I have a similar situation. I start out with a table that has data input into a column from another source. This data is comma delimited coming in. I need to manipulate the data to remove a section at the end of each. So I split the data and remove the end with the code below. (I added the ID column later to be able to sort. I also added WITH SCHEMABINDING later to add an XML index but nothing works. I can remove this ... and the ID column, but I do not see any difference one way or the other):
ALTER VIEW [dbo].[vw_Routing]
WITH SCHEMABINDING
AS
SELECT TOP 99.9999 PERCENT
ROW_NUMBER() OVER (ORDER BY CableID) - 1 AS ID,
CableID AS [CableID],
SUBSTRING(m.n.value('.[1]', 'varchar(8000)'), 1, 13) AS Routing
FROM
(SELECT
CableID,
CAST('<XMLRoot><RowData>' + REPLACE([RouteNodeList], ',', '</RowData><RowData>') + '</RowData></XMLRoot>' AS xml) AS x
FROM
[dbo].[Cables]) t
CROSS APPLY
x.nodes('/XMLRoot/RowData') m (n)
ORDER BY
ID)
Now I need to concatenate data from the Routing column's rows into one row grouped by another column into a column again. I have the code working except that it is reordering my data; I must have the data in the order it is input into the table as it is Cable Routing information. I must also remove duplicates. I use the following code. The SELECT DISTINCT removes the duplicates, but reorders the data. The SELECT (without DISTINCT) keeps the correct data order, but does NOT remove the duplicates:
Substring(
(
SELECT DISTINCT ','+ x3.Routing AS [text()] --This DISTINCT reorders the routes once concatenated.
--SELECT ','+ x3.Routing AS [text()] --This without the DISTINCT does not remove duplicates.
From vw_Routing x3
Where x3.CableID = c.CableId
For XML PATH ('')
), 2, 1000) [Routing],
I tried the code you gave above and it provided the same results with the DISTINCT reordering the data but without DISTINCT not removing the duplicates.
Perhaps GROUP BY with ORDER BY will work:
stuff((select ','+ x3.Routing AS [text()] --This DISTINCT reorders the routes once concatenated.
--SELECT ','+ x3.Routing AS [text()] --This without the DISTINCT does not remove duplicates.
from vw_Routing x3
where x3.CableID = c.CableId
group by x3.Routing
order by min(x3.id)
for XML PATH ('')
), 1, 1, '') as [Routing],
I also replaced the SUBSTRING() with STUFF(). The latter is more standard for this operation.
To https://stackoverflow.com/users/1144035/gordon-linoff
Unfortunately, that did not work. It gave me the same result as my select statement; that is, no dups but reordering data.
HOWEVER, I found the correct answer earlier today:
I figured it out finally!! I still have to get implement it within the other code and add the new Cable Area code, but the hard part it over!!!!!
I am going to post the following to the forums so that they know not to work on it .... I was writing this to send to my friend for his help, but I figured it out myself before I sent it.
I started with raw, comma separated data in the records of a table … the data is from another source. I had to remove some of the information from each value, so I used the following code to split it up and manipulate it:
Code1
Once that was done, I had to put the manipulated data back in the same form in the same order and with no duplicates. So I needed a SELECT DISTINCT. When I used the commented out SELECT DISTINCT below, it removed duplicates but it changed the order of the data which I could not have as it is Cable Tray Routing Data. When I took out the SELECT DISTINCT, it kept correct order, but left duplicates.
Because I was using XML PATH, I had to change this code …… To this code so that I could use SELECT DISTINCT remove the duplicates:Code2 and Code3


Check if a list of items already exists in a SQL database

I want to create a group of users only if the same group does not exist already in the database.
I have a GroupUser table with three columns: a primary key, a GroupId, and a UserId. A group of users is described as several lines in this table sharing a same GroupId.
Given a list of UserId, I would like to find a matching GroupId, if it exists.
What is the most efficient way to do that in SQL?
Let say your UserId list is stored in a table called 'MyUserIDList', the following query will efficiently return the list of GroupId containing exactly your user list. (SQL Server Syntax)
Select GroupId
From (
Select GroupId
, count(*) as GroupMemberCount
, Sum(case when MyUserIDList.UserID is null then 0 else 1 End) as GroupMemberCountInMyList
from GroupUser
left outer join MyUserIDList on GroupUser.UserID=MyUserIDList.UserID
group by GroupId
) As MySubQuery
Where GroupMemberCount=GroupMemberCountInMyList
There are couple of ways of doing this. This answer is for sql server only (as you have not mentioned it in your tags)
Pass the list of userids in comma seperated to a stored procedure and in the SP create a dynamic query with this and use the EXEC command to execute the query. This link will guide you in this regard
Use a table-valued parameter in a SP. This is applicable to sql server 2008 and higher only.
The following link will help you get started.
http://www.codeproject.com/Articles/113458/TSQL-Passing-array-list-set-to-stored-procedure-MS
Hope this helps.
One other solution is that you convert the input list into a table. This can be done with various approaches. Unions, temporary tables and others. A neat solution combines the answer of
user1461607 for another question here on SO, using a comma-separated string.
WITH split(word, csv) AS (
-- 'initial query' (see SQLite docs linked above)
SELECT
'', -- place holder for each word we are looking for
'Auto,A,1234444,' -- items you are looking for
-- make sure the list ends with a comma !!
UNION ALL SELECT
substr(csv, 0, instr(csv, ',')), -- each word contains text up to next ','
substr(csv, instr(csv, ',') + 1) -- next recursion parses csv after this ','
FROM split -- recurse
WHERE csv != '' -- break recursion once no more csv words exist
) SELECT word, exisiting_data
FROM split s
-- now join the key you want to check for existence!
-- for demonstration purpose, I use an outer join
LEFT OUTER JOIN (select 'A' as exisiting_data) as t on t.exisiting_data = s.word
WHERE s.word != '' -- make sure we clamp the empty strings from the split function
;
Results in:
Auto,null
A,A
1234444,null

SQL Server query brings unmatched data with BETWEEN filter

I'm querying on my products table for all products with code between a range of codes, and the result brings a row that should't be there.
This is my SQL query:
select prdcod
from products
where prdcod between 'F-DH1' and 'F-FMS'
order by prdcod
and the results of this query are:
F-DH1
F-DH2
F-DH3
FET-RAZ <-- What is this value doing here!?
F-FMC
F-FML
F-FMS
How can this odd value make it's way into the query results?
PS: I get the same results if I use <= and >= instead of between.
According to OP request promoted next comment to answer:
Seems like your collation excludes '-' sign - this way results make sense, FE is between FD and FM.
:)
between and >= and <= are primarily used for numeric operations (including dates). You're trying to use this for strings, which are difficult at best to determine how those operators will interpret the each string.
Now, while I think I understand your goal here, I'm not entirely sure it's possible using SQL Server queries. This may be some business logic (thanks to the product codes) that needs implemented in code. Something like the Entity Framework or Linq-to-SQL may be better suited to get you the data you're looking for.
How about adding AND LEFT(prdcod, 2) = 'F-'?
Try replacing the "-" with a space so the order is what you would expect:
DECLARE #list table(word varchar(50))
--create list
INSERT INTO #list
SELECT 'F-DH1'
UNION ALL
SELECT 'F-DH2'
UNION ALL
SELECT 'F-DH3'
UNION ALL
SELECT 'FET-RAZ'
UNION ALL
SELECT 'F-FMC'
UNION ALL
SELECT 'F-FML'
UNION ALL
SELECT 'F-FMS'
--original order
SELECT * FROM #list order by word
--show how order changes
SELECT *,replace(word,'-',' ') FROM #list order by replace(word,'-',' ')
--show between condition
SELECT * FROM #list where replace(word,'-',' ') between 'F DH1' and 'F FMS'