Blank Space in every row of table SQL - sql

Hello i have a table with rows
and i was doing a simple
select from table where column ='string'
and it gives me back no result, but when i use:
select from table where column ='%string%'
it gives me the row that exist in my table,
then i did a select * from table and noticed that there is a blank space before my rows:
Image of my SQL result
If you look closely theres a space at the beginning of the second row, and only in the first row theres no blank space.
so i thought it was a simple white space at the beggining but when i tried using this:
SELECT LTRIM(RTRIM(MATERIAL)) FROM table
nothing happened.
then i tried to copy the result of my
select * from table
to Excel and noticed this:
Excel paste from SQL
my 2nd row got splitted in 2 rows right at the start of the column 'material', so the thing i thught it was a blank space its something like a jump line.
i have never had this problem before or seen this before.

Larnu has commented how to remove all the linebreaks from the data. Here are some other things that could also work, and slightly differently depending on the effect you want:
--trim everything that is not a number or letter off the left hand side only
UPDATE table SET material = SUBSTRING(material, PATINDEX(material, '[0-9a-z]', 99999)
--convert all linebreaks to spaces and trim off the left and right spaces
UPDATE table SET material = RTRIM(LTRIM(REPLACE(material, CHAR(10), ' ')))
Larnu's SQL isn't wrong, it'll just remove every line break anywhere, which may cause more formatting disruption than is wanted. I'd be tempted to replace all the linebreaks with spaces, as two words that are separated by a line break would remain separated by a space rather than become one word if the space was removed
some
word
-> some word (if you replace linebreak with space)
-> someword (if you replace linebreak with nothing)
If all you want is to remove linebreaks from the left side of the field, the patindex method will search the field for the first occurrence of a numbe rof a letter, and return the index, then substring will cut everything from that index for a length of 99999 (use a bigger number if your field is longer). This has the effect of removing only linebreaks at the start of the field
As to how it happened, whoever inserted the data, or the data import program, made some mistakes when it was cutting up the data. Perhaps it was a Windows style text file, whose line endings are CR LF (ascci 13 followed by 10), and the program that did the import decided to cut the file up based on the 13 only, leaving behind the 10 to become "part of" the material field:
this,is,my,data1<13><10>this,is,my,data2<13><10>
//now lets cut it up into 2 records, based on using <13> only to denote the end of line:
record 1= this,is,my,data1
record 2= <10>this,is,my,data2
The program just sees a stream of bytes, it is we humans that interpret "lines". If the program treats 13 as the separator, then all the 10s get left behind as part of the data that gets inserted. The very first record in the file won't have 13/10 (crlf) before it because it's the first line, so one of your rows (the one with ascii (49)) won't suffer this problem
You could "cure" the bad data with a trigger upon insert:
CREATE TRIGGER prevent_bad_data
ON yourtable
INSTEAD OF INSERT
AS
BEGIN
INSERT INTO yourtable(somecolumn,othercolumn,material)
SELECT foo,
bar,
LTRIM(REPLACE(material, CHAR(10), ' '))
FROM Inserted
END
Or you could program the db to reject bad rows and fix the tool that is inserting the bad data:
ALTER TABLE yourtable
ADD CONSTRAINT prevent_bad_material
CHECK material LIKE '[0-9a-z]%'; --check it starts with a number or letter
Edit: though having seen your updated question with screenshots, the material column really should be a number, not a varchar type, then this wouldn't happen

Related

SQL code to remove extra spaces and line breaks in free text?

I'm currently working with a table that deals with patients who have visited a clinic. One of the fields in this table shows the reason for the visit, and it's free text so whoever's booking the appointment can leave a custom note for the doctor depending on what the issue is. Yes, I'm well aware free text is the actual worst, but I did not design this database or the front-end medical record system (which is also the worst) and I'm simply stuck dealing with it. Bear with me.
Because of the special characters, extra spaces, and carriage returns that often find their way into that free text field on the front end, all its contents would show up on a single line in SSMS but would cause all sorts of formatting issues with extra line breaks when the SQL results were pasted into Excel. I did a little research and found a snippet of code that would replace carriage returns, etc. in a given field, thus forcing all the contents of that field to remain in a single cell:
REPLACE(REPLACE(FieldName,char(10),''),char(13),'') as FieldName
This has worked splendidly for this VisitReason field and any other free text fields I've been forced to work with. However, does it account for every possible issue one might find in free text? Yesterday I was working with this table and pasted the results from SSMS into Excel, and there were two people whose VisitReason fields were cut off prematurely and then had all the results (as in multiple fields) from a bunch of other people's visits crammed into that same field (thus making for one really long cell in Excel).
For example, the VisitReason for one of these people showed up in SSMS as complaining of rash, see note. But then when it was pasted into Excel, the results looked like...
PatientID PatientName VisitDate ... VisitReason
----------------------------------------------------------------------------------------------
1001 Smith, John 01/08/2023 ... complaining of rash, see
PatientID1002PatientNameJaneDoeVisitDate01/08/2023VisitRe
asondiabetesfollowupPatientID1003PatientNameBobBrownVisitDa
(and so on)
I can't tell if this has something to do with the free text field, and there's some hidden character in there that's causing the weird line breaks and field merging that my REPLACE function isn't catching, or whether it's an error with Excel (in which case this obviously isn't the right place to be asking). But I wanted to check and see if there was anything that potentially needed to be added to the REPLACE line that would fix the problem.
My full query is really simple:
SELECT
d.PatientID,
d.PatientName,
v.VisitDate,
[some other visit-related fields, none of which are free text],
REPLACE(REPLACE(v.VisitReason,char(10),''),char(13),'') as VisitReason,
[some other demographic fields, none of which are free text]
FROM Demographics d
JOIN Visit v ON d.PatientID = v.PatientID
The REPLACE function works perfectly fine for literally every other patient in the list except for the two with results like what's shown above, which then go on to affect a number of other rows following them. Anyone have any thoughts?
Please try the following solution.
The xs:token data type is stripping out the white space characters.
SQL
USE tempdb;
GO
DROP FUNCTION IF EXISTS dbo.udf_tokenize;
GO
/*
1. All invisible TAB, Carriage Return, and Line Feed characters will be replaced with spaces.
2. Then leading and trailing spaces are removed from the value.
3. Further, contiguous occurrences of more than one space will be replaced with a single space.
*/
CREATE FUNCTION dbo.udf_tokenize(#input VARCHAR(MAX))
RETURNS VARCHAR(MAX)
AS
BEGIN
RETURN (SELECT CAST('<r><![CDATA[' + #input + ' ' + ']]></r>' AS XML).value('(/r/text())[1] cast as xs:token?','VARCHAR(MAX)'));
END
GO
-- DDL and sample data population, start
DECLARE #mockTbl TABLE (ID INT IDENTITY(1,1), col_1 VARCHAR(100), col_2 VARCHAR(100));
INSERT INTO #mockTbl (col_1, col_2) VALUES
(CHAR(13) + ' FL ' + CHAR(9), CHAR(10) + ' Miami'),
(' FL ', ' Fort Lauderdale '),
(' NY ', ' New York '),
(' NY ', ''),
(' NY ', NULL);
-- DDL and sample data population, end
-- before
SELECT *, LEN(col_2) AS [col_2_len]
FROM #mockTbl;
-- remove invisible white space chars
UPDATE #mockTbl
SET col_1 = dbo.udf_tokenize(col_1)
, col_2 = dbo.udf_tokenize(col_2);
-- after
SELECT *, LEN(col_2) AS [col_2_len]
FROM #mockTbl;

How to detect a postcode (5-7 char) always at the right of a string, and insert a SPACE before it, in a SQL update statement

However before the space is inserted, the expression needs to check that its not already inerted the space, so we do not end up with 2 spaces, then 3 and so on, so this...
HAUGHTON HOUSEALFORDAB33 8NA
BURIAL GROUNDTOUGHALFORDAB33 8ER
TORRYCRIEN CROFTGLENKINDIEALFORDAB33 8SQ
KIRKTON FORBESALFORDG1 1DN
Would become...
HAUGHTON HOUSEALFORD AB33 8NA
BURIAL GROUNDTOUGHALFORD AB33 8ER
TORRYCRIEN CROFTGLENKINDIEALFORD AB33 8SQ
KIRKTON FORBESALFORD G1 1DN
This could be tricky - maybe in some way combining with REGEX, however, because letters are before the first letter in POSTCODE it may need to detect a valid POSTCODE

How to break a string apart by a character the occurs multiple times in SQL

I am looking to break a column into two columns by a character. Some of the rows have the character occuring multiple times and I need to key on the last occuring one.
Example:
400000007_MOD-HUD_1-1.jpg
I want to break into two columns
Column 1: 400000007_MOD-HUD_1
Column 2: -1.jpg
The data looks like this,
200000297_R-1_1-1.jpg
400000007_MOD-HUD_1-1.jpg
500000334_R-1_1-1.jpg
500000334_R-2_MOD_HUD_1-1.jpg
500000342_MOD-HUD_1-1.jpg
1200000177_MOD-HUD_1-1.jpg
1300000433_C-1-EQSHED_1-1.jpg
1300000433_C-3-UB_1-1.jpg
2100000375_C_1-5_Barn_1-1.jpg
The character I want to split them by is "-". This character occurs multiple times in some of these file names and I want to key on the last occuring one.
Here's a possible solution you can try. Possible as I'm guessing you want to break on the last - character. Assuming ssms tag implies SQL Server, try the following:
select Left(col,Len(col)-p) Col1, Right(col,p) Col2
from t
cross apply (values(CharIndex('-',Reverse(col))))x(p)
Demo DB<>Fiddle

Excel Macro to Convert Cell Data to Multiple Columns

I have a big bunch of cells in Excel that look like the following:
FName LName, Loc JB
Abbreviations are bad, so: First Name, Last Name, Location, Job.
I need to move that so it looks like this:
FName LName || LOC || JB
Caveats:
Must remove the , after the name.
Must capitalize the location (it's 2 or 3 letters, inconsistently capitalized. I want to make them all caps).
JB is anywhere from 1 to 4 characters on the end. I just need to take that last bit and dump it in.
They're all separated by at least a space (the first has a comma and a space).
I'd like a macro to do this, because I have to do it with relative frequency, and doing 200 rows of this by hand is a pain. Any help?
Sounds like all you need is a formula in the next column. If your values are in column A starting with cell A1, try:
=LEFT(A1,FIND(",",A1)-1)&" || "&UPPER(MID(A1,FIND(",",A1)+2,FIND(" ",A1,FIND(",",A1)+2)-FIND(",",A1)-2))&" || "&RIGHT(A1,LEN(A1)-FIND(" ",A1,FIND(",",A1)+2))
This formula takes everything to the left of the comma and adds " || ". Then it finds the next space starting its search two characters after the comma. Using that index it then can extract the location and make it upper case. Then again we add " || ". Then knowing the index of that space we can grab everything to the right to grab the job. This same logic can be applied in VBA but this is probably a quicker solution and easier to pass between computers.
You don't necessarily need a macro. You can do it with a series of right()'s left()'s mid()'s and find()'s. Also need to use Upper() for the loc.
For instance, if your data is in column A, to get a column with first and last name in column B, in B1 you could use:
=LEFT(A1,FIND(",",A1)-1)
That'll return everything up to but not including the comma. For Loc, assuming there's a comma between Loc and JB in C1 you'd use:
=UPPER(MID(A1,FIND(",",A1)+2,FIND(",",A1,FIND(" ",A1)+1)-FIND(" ",A1)-3))
That'll return the uppercase version of the middle of the string, starting 2 chars after the first comma (so you don't get the space), and ending 2 less than the difference between the first and second commas. If there's no comma, you could do a similar set of searches to find where that last space is.
The last IN d1 is:
=RIGHT(A1,LEN(A1)-FIND(" ",A1,FIND(",",A1)+2))
edited after the clarification of commas and spaces.

how do I retrieve data from a sql table with huge number of inputs for a single column

I have a Company table in SQL Server and I would like to retrieve list of data related to particular companies and list of companies is very huge of around 200 company names and I am trying to use IN clause of T-SQL which is complicating the retrieval as few the companies have special characters in their name like O'Brien and so its throwing up an error as it is obvious.
SELECT *
FROM COMPANY
WHERE COMPANYNAME IN
('Archer Daniels Midland'
'Shell Trading (US) Company - Financial'
'Redwood Fund, LLC'
'Bunge Global Agribusiness - Matt Thibodeaux'
'PTG, LLC'
'Morgan Stanley Capital Group'
'Vitol Inc.'..
.....
....
.....)
Above is the script that is not working for obvious reasons, is there any way I can input those company names from an excel file and retrieve the data?
The easiest way would be to make a table and join it:
CREATE TABLE dbo.IncludedCompanies (CompanyName varchar(1000)
INSERT INTO dbo.IncludedCompanies
VALUES
('Archer Daniels Midland'),
('PTG, LLC')
...
SELECT *
FROM Company C
JOIN IncludedCompanies IC
ON C.CompanyName = IC.CompanyName
I do not think that mysql knows how to handle excel format, but you can fix your query.
Check how complicated names are stored in database (check if they have escape characters in them or anything else".
Replace all ' with \' in your query and it will take care of the ' characters
mysql> select now() as 'O\'Brian'; returns
O'Brian
2014-03-17 15:06:39
So i'm guessing you have a excel sheet with a column containing these names, and you want to use this in your where clause. In addition, some of the values have special characters in them, which needs to be escaped.
First thing you do is to escape the '-characters. You do this in excel, with a search replace for all occurences of ' with '' (the escaped version in sqlserver (\' in MySQL.)) Then, create a new column on each side side of your companies column, and in the first row input a ' on the left hand side, and ', on the right. Then use the copy cell functionality (the little square in the bottom right of the cell when you select it) to copy the cells to the left and right to all the rows, as far as the company list goes (just grab the square and pull it downwards..)
Then, take your list, now containing three columns and x rows and paste it into your favorite text editor. It should look something like this:
' Company#1 ',
' Company with special '' char ',
[...]
' Last company ',
Now, you will have some whitespace to get rid of. Use search replace and replace two space characters with nothing, and repeat (or take the space from the first ' to the start of the text and replace this with nothing.
Now, you should have a list of:
'Company#1',
'Company with special '' char',
[...]
'Last company',
Remove the last comma, and you'll have a valid list of parameters to your in-clause (or a (temporary) table if you want to keep your query a bit cleaner.)