Removing extraneous characters in column using T-SQL - sql

I am attempting to remove extraneous characters from data in a primary key column..the data in this column serves as a control number, and the extra characters are preventing a Web application from effectively interacting with the data.
As an example, one row may look like this:
ocm03204415 820302
I want to remove everything after the space...so the characters '820302'. I could manually do it, but, there are around 2,000 records that have these extra values in the column. It would be great if I could remove them programmatically. I can't do a simple Replace because the characters have no pattern...I couldn't define a rule to discover them...the only thing uniform is the space...although, now that I look at the data set, they do all start with 8.
Is there a way I could remove these characters programmatically? I am familiar with PL/SQL in the Oracle environment, and was wondering if Transactional SQL would offer some possibilities in the MS-SQL environment?
Thanks so much.

You may want to look into the CHARINDEX function to find the space. Then you can use SUBSTRING to grab everything up to the space in a single UPDATE statement.

Try this:
UPDATE YourTable
SET YourColumn = LEFT(YourColumn,CHARINDEX(' ',YourColumn)-1)
WHERE CHARINDEX(' ',YourColumn) > 1

Related

SQL Server 2005 Update/Delete Substring of a Lengthy Column

I'm not sure if it is possible to do what I'm trying to do, but I thought i would give it a shot anyway. Also, I am fairly new to the SQL Server world and this is my fist post, so I apologize if my wording is poor or if I leave information out. Also, I am working with SQL Server 2005.
Let's say I have a table named "table" and a column named "column" The contents of column is a jumbled mess of characters (ntext data type). These characters were all drawn in from multiple entry fields in a front end application. Now one of those entry fields was for sensitive information that we no longer need and would like to get rid of but I can't just get rid of the whole column because it also contains other valuable information. Most of the solutions I have found so far only deal with columns that have short entries so they are just able to update the whole string, but for mine I think I need to identify the the beginning and the end of the specific substring that I need and replace it or delete it somehow. This is the closest I have gotten to at least selecting the data that I need... AAA and /AAA mark the beginning and the end of the substring that I need.
select
substring (column, charindex ('AAA', column), charindex ('/AAA',column))
from table
where column like '%/AAA%'
The problems I am having with this one are that the substring doesn't stop at /AAA, it just keeps going, and some of the results are just blank so it looks something like:
AAA 12345 /AAA abcdefghijklmnop
AAA 12346 /AAA qrstuvwxyzabcdef
AAA 12347 /AAA abcdefghijklmnop
With the characters in bold being the information I need to get rid of. Also even though row 3 is blank, it still does contain the info that I need but I'm guessing that it isn't returning it because it has a different amount of characters before it (for example, rows 1, 2, and 4 might have 50 characters before them but row 3 might have 100 characters before it), at least that's the only reason that I could think of.
So I suppose the first step would probably be to actually select the right substring, then to either delete it or replace it with a different, meaningless substring like "111111" or something.
If there is more information that you need to be provided with or if I was unclear about anything please let me know and thank you for taking the time to read through (and hopefully answer) my question!
Edit: Another one that gets close to the right results goes something like this
select substring(column,charindex('AAA',column),20) from table
where column like '%/AAA%'
I'm not sure if this approach would work better since the substring i'm looking for is always going to have the same amount of characters. The problem with this one though, is that instead of having blank rows, they are replaced with irrelevant substrings from that column, but all of the other rows do return exactly what I want.
First of all, check your usage of SUBSTRING(). The third argument is for length, not end character, so you would need to alter your query to something like:
select substring (column, charindex ('AAA',column)
, charindex ('/AAA',column)-charindex ('AAA',column))
from table where column like '%/AAA%'
Yes your approach of finding it and then either deleting or replacing it is sound.
If some of the results are blank, it's possible that you are finding and replacing the entire string. If it had not found the correct regular expression in there, you would have not returned the row at all, which is different from returning a black value in that column.

SQL Remove Substring From Query Results

I have a query that is returning data from a database. In a single field there is a rather long text comment with a segment, which is clearly defined with marking tags like !markerstart! and !markerend!. I would like to have a query return with the string segment between the two markers removed (and the markers removed too).
I would normally do this client-side after I get the data back, however, the problem is that the query is an INSERT query that gets it's data from a SELECT statement. I don't want the text segment to be stored in the archival/reporting table (working with an OLTP application here), so I need to find a way to get the SELECT statement to return exactly what is to be inserted, which, in this case, means getting the SELECT statement to strip out the unwanted phrase instead of doing it in post-processing client-side.
My only thought is to use some convoluted combination of SUBSTRING, CHARINDEX, and CONCAT, but I'm hoping there is a better way, but, based on this, I don't see how. Anyone have ideas?
Sample:
This is a long string of text in some field in a database that has a segment that needs to be removed. !markerstart! This is the segment that is to be removed. It's length is unknown and variable. !markerend! The part of this field that appears after the marker should remain.
Result:
This is a long string of text in some field in a database that has a segment that needs to be removed. The part of this field that appears after the marker should remain.
SOLUTION USING STUFF:
I really don't like how verbose this is, but I can put it in a function if I really need to. It isn't ideal, but it is easier and faster than a CLR routine.
SELECT STUFF(CAST(Description AS varchar(MAX)), CHARINDEX('!markerstart!', Description), CHARINDEX('!markerend!', Description) + 11 - CHARINDEX('!markerstart!', Description), '') AS Description
FROM MyTable
You may want to consider implementing a CLR user-defined function that returns the parsed data.
The following link demonstrates how to use a CLR UDF RegEx function for pattern matching and data extraction.
http://msdn.microsoft.com/en-us/magazine/cc163473.aspx
Regards,
You can use Stuff function or Replace function and replace your unwanted symbols with ''.
STUFF('EXP',START_POS,'NUMBER_OF_CHARS','REPLACE_EXP')

How to write this kind of SQL in MySQL?

I want to update a column col in table tab,whose data is like follows(comma separated, with the heading comma):
,test,oh,whatever,....,
Which can be too long to display,how can I update the column so that only the first 10 words are left?
You are looking for substring_index
UPDATE table
SET column = SUBSTRING_INDEX(column, ',', 11)
(do check your UPDATES with SELECT before running them)
Not an answer to your question, but I would recommend to do stuff like this on application level, not in the database.
You are not saying what language you are using. In PHP, this would be a job for the wordwrap function. It is able to intelligently chop off strings at the right position.
Alternatively, is storing the full string in the database, and doing the cutting at output time, not an option as well?

Isolate SQL field using regex

I'm trying to isolate a specific field in a SQL dump file so I can edit it but I'm not having any luck.
The regex I'm using is:
^(?:(?:'[^\r\n']*'|[^,\r\n]*),){6}('[^\r\n']*'|[^,\r\n]*)
Which is supposed to grab the seventh field and place it inside reference 1.
The trouble is that this is stumbling when ever it finds a comma inside a text field and counts the partial match as the allowable matches.
Eg. (1, 'Title', 1, 3, '2006-09-29', 'Commas, the bane of my regex', 'This is the target', 2, 4) matches " the bane of my regex'" instead of "'This is the target'".
It might be easier to load the SQL into a temp database and then do a SELECT to get the data in that field.
Do you have control over the dump file, or are they historic or outside of your control?
If you can choose a better delimeter, comma really is a terrible choice.
[^,\r\n]*, matches
'Commas,
I suggest [^,\r\n']*, instead.
I think you will have more luck if you make the regex more specific. I havent tested this but I believe this should work.
Also as Paul suggests you might try a different delimiter to make this easier.
Enjoy!
\d{1,4}(,){1}('){1}[a-zA-Z0-9,]+('){1}\d{1,4}(,){1}\d{1,4}(,){1}('){1}[0-9-]+('){1}(,){1}('){1}[a-zA-Z0-9,]+('){1}(,){1}('){1}[a-zA-Z0-9,]+('){1}(,){1}\d{1,4}(,){1}\d{1,4}(\r\n){1}
Doh!
My fields weren't just split with a comma. They were split with a comma followed by a space.
Correct RegEx is
^(?:(?:'[^\r\n']*'|[^,\r\n]*), ){6}('[^\r\n']*'|[^,\r\n]*)
Now it works.
Sorry to waste you time with this one. It was Beta's response that got me thinking as it was the second alternation in play for all fields. The extra space forced it to use this option rather than the option enclosed within quotes.

Full Text Searching for single characters

I have a table with a TEXT column where the contents is just strings of CSV numbers. Example ",1,76,77,115," Each string can have an arbitrary number of numbers.
I am trying to set up Full Text Indexing so that I can search this column rapidly. This works great. Instead of running queries with
where MY_COL LIKE '%,77,%' and MY_COL LIKE '%,115,%'
I can do
where CONTAINS(MY_COL,'77 and 115')
However, when I try to search for a single character it doesn't work.
where CONTAINS(MY_COL,'1')
But I know that there should be records returned! I quickly found that I need to edit the Noise file and rebuild the index. But even after doing that it still doesn't work.
Working with relational databases that way is going to hurt.
Use a proper schema. Either store the values in different rows or use an array datatype for the column.
That will make solving the problem trivial.
I fixed my own problem, although I'm not exactly sure what fixed it.
I dropped my table and populated a new one (my program does batch processing) and created a new Full Text Index. Maybe I wasn't being patient enough to allow the indexing to fully rebuild.
Agreed. How does 12,15,33 not return that record for a search for 1 with fulltext? Use an actual table schema to accomplish this.