SQL Server : exploding CSV for SELECT statement - sql

I have a table structure as below;
id txtName intReferences
------------------------------
1 Fred 1,4,6,444,56,43,
2 Sam 5,33,5904,43
3 Tom 1200
4 Samantha 43,44,888,99
I'd like to write a T-SQL query to return all the records based on a series of numbers provided.
For example, querying for 43 would return Fred, Sam and Samantha. The catch is, when querying for 3, it shouldn't return results for Sam or Samantha, given that that isn't the number in its entirety. Looking for a direct and whole number match.
The CSV value may end in a comma.
I've tried to use the "IN" statement, but it returns results if any portion of the number exists. Ideally trying to achieve without creating a function given some database restrictions.

Use string_split():
select t.*
from t cross apply
string_split(t.intReferences, ',') s
where s.value = '3';
Then, fix your data model so your are not storing integer values in strings. This is bad, bad, bad. Here are some reasons why:
Numbers should be stored as numbers, not strings (using the correct type).
SQL Server has lousy string manipulation functions.
Only one value should be stored in a column.
Foreign key relationships should be properly declared.
Resulting queries cannot be optimized to using indexes or partitions.
SQL has a great way to store lists. It is called a table not a string.

Clearly, the best way to accommodate this situation is to have properly normalized data.
Another method for querying the data with the current structure would be to check for comma + (your number) + comma. Something like this...
Declare #Temp Table(id int, txtName varchar(200), intReferences varchar(200))
Insert Into #Temp Values(1, 'Fred', '1,4,6,444,56,43,')
Insert Into #Temp Values(2, 'Sam', '5,33,5904,43')
Insert Into #Temp Values(3, 'Tom', '1200')
Insert Into #Temp Values(4, 'Samantha', '43,44,888,99')
Select *
From #Temp
Where ',' + intReferences + ',' like '%,' + '43' + ',%'
Select *
From #Temp
Where ',' + intReferences + ',' like '%,' + '3' + ',%'

Related

How to apply trim function inside this query [duplicate]

Below is simple sql query to select records using in condition.
--like this I have 6000 usernames
select * from tblUsers where Username in ('abc ','xyz ',' pqr ',' mnop ' );
I know there are LTrim & Rtrim in sql to remove the leading trailing spaces form left & right respectively.
I want to remove the spaces from left & right in all the usernames that I am supplying to the select query.
Note:-
I want to trim the values that I am passing in the in clause.(I don't want to pass LTrim & RTrim to each value passed).
There are no trailing space in the records but value that I am passing in the clause is copied from excel & then pasted in Visual Studio. Then using ALT key I put '(single quote) at the left & right sides of the string. Due to this some strings has spaces in the right side trailing.
How to use the trim function in the select query?
I am using MS SQL Server 2012
If I understand your question correctly you are pasting from Excel into an IN clause in an adhoc query as below.
The trailing spaces don't matter. It will still match the string foo without any trailing spaces.
But you need to ensure that there are no leading spaces.
As the source of the data is Excel why not just do it all there?
You can use formula
= CONCATENATE("'",TRIM(SUBSTITUTE(A1,"'","''")),"',")
Then copy the result (from column B in the screenshot above) and just need to trim off the extra comma from the final entry.
You can do like this:
select * from tblUsers where LTRIM(RTRIM(Username)) in (ltrim(rtrim('abc')),ltrim(rtrim('xyz')),ltrim(rtrim('pqr')),ltrim(rtrim('mnop')));
However, if you have permission to update the database. Please remove all the spaces in your Username field. It is really not good to do the query like this.
One way to tackle your problem and still be able to benefit from an index on username is to use a persisted computed column:
Setup
-- drop table dbo.tblUsers
create table dbo.tblUsers
(
UserId INT NOT NULL IDENTITY(1, 1) CONSTRAINT PK_UserTest PRIMARY KEY,
Username NVARCHAR(64) NOT NULL,
UsernameTrimmed AS LTRIM(RTRIM(Username)) PERSISTED
)
GO
-- other columns may be included here with INCLUDE (col1, col2)
CREATE INDEX IDX_UserTest ON dbo.tblUsers (UsernameTrimmed)
GO
insert into dbo.tblUsers (Username) VALUES ('abc '),('xyz '),(' pqr '), (' mnop '), ('abc'), (' useradmin '), ('etc'), (' other user ')
GO
-- some mock data to obtain a large number of records
insert into dbo.tblUsers (Username)
select top 20000 SUBSTRING(text, 1, 64) from sys.messages
GO
Test
-- this will use the index (index seek)
select * from tblUsers where UsernameTrimmed in (LTRIM(RTRIM('abc')), LTRIM(RTRIM(' useradmin ')));
This allows for faster retrievals at the expense of extra space.
In order to get rid of query construction (and the ugliness of many LTRIMs and RTRIMs), you can push searched users in a table that looks like tblUsers.
create table dbo.searchedUsers
(
Username NVARCHAR(64) NOT NULL,
UsernameTrimmed AS LTRIM(RTRIM(Username)) PERSISTED
)
GO
Push raw values into dbo.searchedUsers.Username column and the query should look like this:
select U.*
from tblUsers AS U
join dbo.searchedUsers AS S ON S.UsernameTrimmed = U.UsernameTrimmed
The big picture
It is way better to properly trim your data in the service layer of your application (C#) so that future clients of your table may rely on decent information. So, trimming should be performed both when inserting information into tblUsers and when searching for users (IN values)
select *
from tblUsers
where RTRIM(LTRIM(Username)) in ('abc','xyz','pqr','mnop');
Answer: SELECT * FROM tblUsers WHERE LTRIM(RTRIM(Username)) in ('abc','xyz','pqr','mnop');
However, please note that if you have functions in your WHERE clause it defeats the purpose of having an indexes on that column and will use a
scan than a seek.
I would propose you clean your data before inserting into tblUsers
I think you can try this:
Just replace the table2 with you table name form where you are getting the username
select * from tblUsers where Username in ((select distinct
STUFF((SELECT distinct ', ' + RTRIM(LTRIM(t1.Username))
from table2 t1
FOR XML PATH(''), TYPE
).value('.', 'NVARCHAR(MAX)')
,1,2,'') UserName
from table2 t) );
I'd do it in two step:
1) populate a temp table with all your strings with blanks
2) do a select with a subselect
create table a (a char(1))
insert into a values('a')
insert into a values('b')
insert into a values('c')
insert into a values('d')
create table #b (atmp char(5))
insert into #b values ('a ')
insert into #b values (' b')
insert into #b values (' c ')
select * from a where a in (select ltrim(rtrim(atmp)) from #b)

How to minimize sql select?

I have an array of words like this one:
$word1 = array('test1','test2','test3','test4','test5',...,'test20');
I need to search in my table every row that has at least one of these words in the text column. So far, I have this sql query:
SELECT * FROM TABLE WHERE text LIKE '$word1[0]' OR text LIKE '$word1[1]'
OR ... OR text LIKE '$word1[20]'
But I see that this design isn't very efficient. Is there any way I can shorten this query, in such a way that I don't need to write out every word in the where clause?
Example SELECT * FROM TABLE WHERE text IN ($word1)
P.S.: this is an example of what I'm looking for, not an actual query I can run.
If you use a table variable instead of a list to store your words then you can use something like:
DECLARE #T TABLE (Word VARCHAR(255) NOT NULL);
INSERT #T (Word)
VALUES ('test1'), ('test2'), ('test3'), ('test4'), ('test5'), ('test20');
SELECT *
FROM TABLE t
WHERE EXISTS
( SELECT 1
FROM #T
WHERE t.Text LIKE '%' + word + '%'
);
You can also create a table type to store this, then you can pass this as a parameter to a stored procedure if required:
CREATE TYPE dbo.StringList (Value VARCHAR(MAX) NOT NULL);
GO
CREATE PROCEDURE dbo.YourProcedures #Words dbo.StringList READONLY
AS
SELECT *
FROM TABLE t
WHERE EXISTS
( SELECT 1
FROM #Words w
WHERE t.Text LIKE '%' + w.word + '%'
);
GO
DECLARE #T dbo.StringList;
INSERT #T (Value)
VALUES ('test1'), ('test2'), ('test3'), ('test4'), ('test5'), ('test20');
EXECUTE dbo.YourProcedure #T;
For more on this see table-valued Parameters on MSDN.
EDIT
I may have misunderstood your requirements as you used LIKE but with no wild card operator, in which case you can just use IN, however I would still recommend using a table to store your values:
DECLARE #T TABLE (Word VARCHAR(255) NOT NULL);
INSERT #T (Word)
VALUES ('test1'), ('test2'), ('test3'), ('test4'), ('test5'), ('test20');
SELECT *
FROM TABLE t
WHERE t.Text IN (SELECT Word FROM #T);
You can use a SELECT like this without declaring an array:
SELECT * FROM TABLE WHERE text IN ('test1', 'test2', 'test3', 'test4', 'test5')
One solution could be :
Create a table in the database with the searched words in a column called word (by example)- by using wildcard if you need
use this kind of request
SELECT *
FROM TABLE, FILTER_TABLE
WHERE TABLE.text LIKE FILTER_TABLE.word
Although I don't have access to SQL Server 2008 at the moment and SQLfiddle seems sick, it would seem you can use a table value constructor to simplify the expression somewhat;
SELECT * FROM test
JOIN (SELECT w FROM (VALUES('word1'), ('word2'), ('word3'), ('word4')) AS a(w)) a
ON test.text LIKE '%'+a.w+'%';
...which will search the text column in the test table for the words listed as values. If you don't want duplicates of rows where multiple words match, you can just add a DISTINCT to the select.
Note though that you may want to look into fulltext indexing if you're doing extensive searches, a LIKE query to find words in a string in this way will not use any indexes, and will most likely be quite slow unless the data is already in memory.

Splitting delimited values in a SQL column into multiple rows

I would really like some advice here, to give some background info I am working with inserting Message Tracking logs from Exchange 2007 into SQL. As we have millions upon millions of rows per day I am using a Bulk Insert statement to insert the data into a SQL table.
In fact I actually Bulk Insert into a temp table and then from there I MERGE the data into the live table, this is for test parsing issues as certain fields otherwise have quotes and such around the values.
This works well, with the exception of the fact that the recipient-address column is a delimited field seperated by a ; character, and it can be incredibly long sometimes as there can be many email recipients.
I would like to take this column, and split the values into multiple rows which would then be inserted into another table. Problem is anything I am trying is either taking too long or not working the way I want.
Take this example data:
message-id recipient-address
2D5E558D4B5A3D4F962DA5051EE364BE06CF37A3A5#Server.com user1#domain1.com
E52F650C53A275488552FFD49F98E9A6BEA1262E#Server.com user2#domain2.com
4fd70c47.4d600e0a.0a7b.ffff87e1#Server.com user3#domain3.com;user4#domain4.com;user5#domain5.com
I would like this to be formatted as followed in my Recipients table:
message-id recipient-address
2D5E558D4B5A3D4F962DA5051EE364BE06CF37A3A5#Server.com user1#domain1.com
E52F650C53A275488552FFD49F98E9A6BEA1262E#Server.com user2#domain2.com
4fd70c47.4d600e0a.0a7b.ffff87e1#Server.com user3#domain3.com
4fd70c47.4d600e0a.0a7b.ffff87e1#Server.com user4#domain4.com
4fd70c47.4d600e0a.0a7b.ffff87e1#Server.com user5#domain5.com
Does anyone have any ideas about how I can go about doing this?
I know PowerShell pretty well, so I tried in that, but a foreach loop even on 28K records took forever to process, I need something that will run as quickly/efficiently as possible.
Thanks!
If you are on SQL Server 2016+
You can use the new STRING_SPLIT function, which I've blogged about here, and Brent Ozar has blogged about here.
SELECT s.[message-id], f.value
FROM dbo.SourceData AS s
CROSS APPLY STRING_SPLIT(s.[recipient-address], ';') as f;
If you are still on a version prior to SQL Server 2016
Create a split function. This is just one of many examples out there:
CREATE FUNCTION dbo.SplitStrings
(
#List NVARCHAR(MAX),
#Delimiter NVARCHAR(255)
)
RETURNS TABLE
AS
RETURN (SELECT Number = ROW_NUMBER() OVER (ORDER BY Number),
Item FROM (SELECT Number, Item = LTRIM(RTRIM(SUBSTRING(#List, Number,
CHARINDEX(#Delimiter, #List + #Delimiter, Number) - Number)))
FROM (SELECT ROW_NUMBER() OVER (ORDER BY s1.[object_id])
FROM sys.all_objects AS s1 CROSS APPLY sys.all_objects) AS n(Number)
WHERE Number <= CONVERT(INT, LEN(#List))
AND SUBSTRING(#Delimiter + #List, Number, 1) = #Delimiter
) AS y);
GO
I've discussed a few others here, here, and a better approach than splitting in the first place here.
Now you can extrapolate simply by:
SELECT s.[message-id], f.Item
FROM dbo.SourceData AS s
CROSS APPLY dbo.SplitStrings(s.[recipient-address], ';') as f;
Also I suggest not putting dashes in column names. It means you always have to put them in [square brackets].
SQL Server 2016 include a new table function string_split(), similar to the previous solution.
The only requirement is Set compatibility level to 130 (SQL Server 2016)
You may use CROSS APPLY (available in SQL Server 2005 and above) and STRING_SPLIT function (available in SQL Server 2016 and above):
DECLARE #delimiter nvarchar(255) = ';';
-- create tables
CREATE TABLE MessageRecipients (MessageId int, Recipients nvarchar(max));
CREATE TABLE MessageRecipient (MessageId int, Recipient nvarchar(max));
-- insert data
INSERT INTO MessageRecipients VALUES (1, 'user1#domain.com; user2#domain.com; user3#domain.com');
INSERT INTO MessageRecipients VALUES (2, 'user#domain1.com; user#domain2.com');
-- insert into MessageRecipient
INSERT INTO MessageRecipient
SELECT MessageId, ltrim(rtrim(value))
FROM MessageRecipients
CROSS APPLY STRING_SPLIT(Recipients, #delimiter)
-- output results
SELECT * FROM MessageRecipients;
SELECT * FROM MessageRecipient;
-- delete tables
DROP TABLE MessageRecipients;
DROP TABLE MessageRecipient;
Results:
MessageId Recipients
----------- ----------------------------------------------------
1 user1#domain.com; user2#domain.com; user3#domain.com
2 user#domain1.com; user#domain2.com
and
MessageId Recipient
----------- ----------------
1 user1#domain.com
1 user2#domain.com
1 user3#domain.com
2 user#domain1.com
2 user#domain2.com
for table = "yelp_business", split the column categories values separated by ; into rows and display as category column.
SELECT unnest(string_to_array(categories, ';')) AS category
FROM yelp_business;

String manipulation SQL

I have a row of strings that are in the following format:
'Order was assigned to lastname,firsname'
I need to cut this string down into just the last and first name but it is always a different name for each record.
The 'Order was assigned to' part is always the same.......
Thanks
I am using SQL Server. It is multiple records with different names in each record.
In your specific case you can use something like:
SELECT SUBSTRING(str, 23) FROM table
However, this is not very scalable, should the format of your strings ever change.
If you are using an Oracle database, you would want to use SUBSTR instead.
Edit:
For databases where the third parameter is not optional, you could use SUBSTRING(str, 23, LEN(str))
Somebody would have to test to see if this is better or worse than subtraction, as in Martin Smith's solution but gives you the same result in the end.
In addition to the SUBSTRING methods, you could also use a REPLACE function. I don't know which would have better performance over millions of rows, although I suspect that it would be the SUBSTRING - especially if you were working with CHAR instead of VARCHAR.
SELECT REPLACE(my_column, 'Order was assigned to ', '')
For SQL Server
WITH testData AS
(
SELECT 'Order was assigned to lastname,firsname' as Col1 UNION ALL
SELECT 'Order was assigned to Bloggs, Jo' as Col1
)
SELECT SUBSTRING(Col1,23,LEN(Col1)-22) AS Name
from testData
Returns
Name
---------------------------------------
lastname,firsname
Bloggs, Jo
on MS SQL Server:
declare #str varchar(100) = 'Order was assigned to lastname,firsname'
declare #strLen1 int = DATALENGTH('Order was assigned to ')
declare #strLen2 int = len(#str)
select #strlen1, #strLen2, substring(#str,#strLen1,#strLen2),
RIGHT(#str, #strlen2-#strlen1)
I would require that a colon or some other delimiter be between the message and the name.
Then you could just search for the index of that character and know that anything after it was the data you need...
Example with format changing over time:
CREATE TABLE #Temp (OrderInfo NVARCHAR(MAX))
INSERT INTO #Temp VALUES ('Order was assigned to :Smith,Mary')
INSERT INTO #Temp VALUES ('Order was assigned to :Holmes,Larry')
INSERT INTO #Temp VALUES ('New Format over time :LootAt,Me')
SELECT SUBSTRING(OrderInfo, CHARINDEX(':',OrderInfo)+1, LEN(OrderInfo))
FROM #Temp
DROP TABLE #Temp

Concatenating records in a single column without looping?

I have a table with 1 column of varchar values. I am looking for a way to concatenate those values into a single value without a loop, if possible. If a loop is the most efficient way of going about this, then I'll go that way but figured I'd ask for other options before defaulting to that method. I'd also like to keep this inside of a SQL query.
Ultimately, I want to do the opposite of a split function.
Is it possible to do without a loop (or cursor) or should I just use a loop to make this happen?
Edit:
Since there was a very good answer associated with how to do it in MySql (as opposed to MS Sql like I initially intended), I decided to retag so others may be able to find the answer as well.
declare #concat varchar(max)
set #concat = ''
select #concat = #concat + col1 + ','
from tablename1
try this:
DECLARE #YourTable table (Col1 int)
INSERT INTO #YourTable VALUES (1)
INSERT INTO #YourTable VALUES (2)
INSERT INTO #YourTable VALUES (30)
INSERT INTO #YourTable VALUES (400)
INSERT INTO #YourTable VALUES (12)
INSERT INTO #YourTable VALUES (46454)
SELECT
STUFF(
(
SELECT
', ' + cast(Col1 as varchar(30))
FROM #YourTable
WHERE Col1<=400
ORDER BY Col1
FOR XML PATH('')
), 1, 2, ''
)
OUTPUT:
-------------------
1, 2, 12, 30, 400
(1 row(s) affected)
I just tackled a problem like this and looping took forever. So, I concantenated the values in the presentation medium (in this case Crystal Reports) and it was very fast.
Just an idea.
If it is MySQL, you can use GROUP_CONCAT
SELECT a, GROUP_CONCAT(b SEPARATOR ',') FROM table GROUP BY a;
Probably dated now but check out Adam Machanic's post on the topic.
And this one is certainly dated; I wrote it in 2004.
Why do I prefer a function over "keeping it inside a SQL query"? Because you'll probably have to do this more than once. Why not encapsulate that code into a single module instead of repeating it all over the place?