I get a TXT file from one of our sources systems each night. It's basically a terminal report with headers, footers, titles, column headers, sub-totals, etc. I'm trying to scrape discrete data elements from the file using SQL Server. The file is being FTP'd to a Windows file share. The source system is AIX and the file's encoding is UTF-8, with an EOL marker of LF. I'm using SSIS to import the raw text report into a single column table with each report row being one record in my table. The column I'm storing the rows in is a VARCHAR(240) and I'm using SQL Server 2016.
For the report rows that I want to use, the one thing they have in common is that they all start with two spaces. Here's an example record from a text report I've loaded to SQL:
COLUMN_NAME
AD DEPT 0 0 0 0 0 0 0 0 0 0.0 0 0 0.00 0.00 0 0 0 0.0
When I try to select the record using:
SELECT *
FROM TABLE_NAME
WHERE COLUMN_NAME LIKE ' %';
No rows are returned in my result set. However, REPLACE seems to recognize the the row starts with two spaces.
So this:
SELECT REPLACE([COLUMN_NAME],' ','$')
FROM TABLE_NAME
Returns this:
COLUMN_NAME
$$AD$DEPT$$$$$$0$$$$$0$$$$$0$$$0$$$$$0$$$$$0$$$$0$$$$$0$$$$0$$$0.0$$$0$$$$$$0$$0.00$$0.00$$$$$$0$$$$$$0$$$$$$0$$$0.0
Can someone help me understand why REPLACE sees that there are two leading spaces in the row but the plain SELECT does not?
If you know that you have 2 spaces as the start of the column then you can use the single character wildcards in your LIKE expression.
For example:
CREATE TABLE testString (
sampleImport varchar(20)
)
GO
INSERT testString
VALUES (' AD DEPT 0 0 0'), ('BD DEPT 0 0 0')
GO
SELECT *
FROM testString
WHERE sampleImport LIKE '[ ][ ]%'
GO
SELECT *
FROM testString
WHERE sampleImport NOT LIKE '[ ][ ]%'
GO
The [] is used to signify a single character - the specific character is the one enclosed in the brackets. So placing a single space within the brackets allows you to match the spaces.
I also noted that your criteria had only a single space before the % character. Although I cannot see it documented as such, I suspect that your version is failing as it is not seeing a leading space as a valid character for the wildcard (although by definition it should). When using LIKE ' %' it works with my test data.
Related
I have an nvarchar(50) column myCol with values like these 16-digit, alphanumeric values, starting with '0':
0b00d60b8d6cfb19, 0b00d60b8d6cfb05, 0b00d60b8d57a2b9
I am trying to delete rows with myCol values that don't match those 3 criteria.
By following this article, I was able to select the records starting with '0'. However, despite the [a-z0-9] part of the regex, it also keeps selecting myCol values containing special characters like 00-d#!b8-d6/f&#b. Below is my select query:
SELECT * from Table
WHERE myCol LIKE '[0][a-z0-9]%' AND LEN(myCol) = 16
How should the expression be changed to select only rows with myCol values that don't contain special characters?
If the value must only contain a-z and digits, and must start with a 0 you could use the following:
SELECT *
FROM (VALUES(N'0b00d60b8d6cfb19'),
(N'0b00d60b8d6cfb05'),
(N'0b00d60b8d57a2b9'),
(N'00-d#!b8-d6/f&#b'))V(myCol)
WHERE V.myCol LIKE '0%' --Checks starts with a 0
AND V.myCol NOT LIKE '%[^0-9A-z]%' --Checks only contains alphanumerical characters
AND LEN(V.myCol) = 16;
The second clause works as the LIKE will match any character that isn't an alphanumerical character. The NOT then (obviously) reverses that, meaning that the expression only resolves to TRUE when the value only contains alphanumerical characters.
Pattern matching in SQL Server is not awesome, and there is currently no real regex support.
The % in your pattern is what is including the special characters you show in your example. The [a-z0-9] is only matching a single character. If your character lengths are 16 and you're only interested in letters and numbers then you can include a pattern for each one:
SELECT *
FROM Table
WHERE myCol LIKE '[0][a-z0-9][a-z0-9][a-z0-9][a-z0-9][a-z0-9][a-z0-9][a-z0-9][a-z0-9][a-z0-9][a-z0-9][a-z0-9][a-z0-9][a-z0-9][a-z0-9][a-z0-9]';
Note: you don't need the AND LEN(myCol) = 16 with this.
I have encountered a scenario below
Declare #var int = ' 123'
select #var
Declare #var1 int = ' 123'
select #var1
for the first case I have used spaces in front of the value and while execute it returns value as 123
In Second case I have used tab instead of space in front of value and while execute it throws conversion error
Can anyone let know what is the difference between these 2 scenario..
Even though you have put same number of spaces (using spaces and then Tab) the character codes for both of them is different and that is the reason that space and TAB are treated as separately in SQL Server.
More information about character codes and character encoding can be found at below 2 links:-
https://www.computerhope.com/jargon/c/charcode.htm
https://www.pcmag.com/encyclopedia/term/51983/standards-character-codes
Also if you think mathematically and logically:- having spaces before integer numbers does not make sense. It's like having zeros before numbers.
For Example:-' 123' (5 spaces and then 123) is like 00000123.
Yet one more reason that spaces are trimmed before the integer numbers
How would i go about constructing a query, that would return all material numbers that have a "blank white space" either BEFORE or AFTER the number string? We are exporting straight from SSMS to excel and we see the problem in the spreadsheet. If i could return all of the material numbers with spaces.. i could go in and edit them or do a replace to fix this issue prior to exporting! (the mtrl numbers are imported in via a windows application that users upload an excel template to. This template has all of this data and sometimes they place in spaces in or after the material number). The query we have used to work but now it does not return anything, but upon export we identify these problems you see highlighted in the screenshot (left screenshot) and then query to find that mtrl # in the table (right screenshot). And indeed, it has a space before the 1.
Currently the query we use looks like:
SELECT Mtrl
FROM dbo.Source
WHERE Mtrl LIKE '% %'
Since you are getting the data from a query, you should just have that query remove any potential spaces using LTRIM and RTRIM:
LTRIM(RTRIM([MTRL]))
Keep in mind that these two commands remove only spaces, not tabs or returns or other white-space characters.
Doing the above will make sure that the data for the entire set of data is fine, whether or not you find it and/or fix it.
Or, since you are copying-and-pasting from the Results Grid into Excel, you can just CONVERT the value to a number which will naturally remove any spaces:
SELECT CONVERT(INT, ' 12 ');
Returns:
12
So you would just use:
CONVERT(INT, [MRTL])
Now, if you want to find the data that has anything that is not a digit in it, you would use this:
SELECT Mtrl
FROM dbo.Source
WHERE [Mtrl] LIKE '%[^0-9]%'; -- any single non-digit character
If the issue is with non-space white-space characters, you can find out which ones they are via the following (to find them at the beginning instead of at the end, change the RIGHT to be LEFT):
;WITH cte AS
(
SELECT UNICODE(RIGHT([MTRL], 1)) AS [CharVal]
FROM dbo.Source
)
SELECT *
FROM cte
WHERE cte.[CharVal] NOT BETWEEN 48 AND 57 -- digits 0 - 9
AND cte.[CharVal] <> 32; -- space
And you can fix in one shot using the following, which removes regular spaces (char 32 via LTRIM/RTRIM), tabs (char 9), and non-breaking spaces (char 160):
UPDATE src
SET src.[Mtrl] = REPLACE(
REPLACE(
LTRIM(RTRIM(src.[Mtrl])),
CHAR(160),
''),
CHAR(9),
'')
FROM dbo.Source src
WHERE src.[Mtrl] LIKE '%[' -- find rows with any of the following characters
+ CHAR(9) -- tab
+ CHAR(32) -- space
+ CHAR(160) -- non-breaking space
+ ']%';
Here I used the same WHERE condition that you have since if there can't be any spaces then it doesn't matter if you check both ends or for any at all (and maybe it is faster to have a single LIKE instead of two).
I have MyTable with a Column Message NVARCHAR(MAX).
Record with ID 1 contains the Message '0123456789333444 Test'
When I run the following query
DECLARE #Keyword NVARCHAR(100)
SET #Keyword = '0123456789000001*'
SELECT *
FROM MyTable
WHERE CONTAINS(Message, #Keyword)
Record ID 1 is showing up in the results and in my opinion it should not because 0123456789333444 does not contains 0123456789000001.
Can someone explain why the records is showing up anyway?
EDIT
select * from sys.dm_fts_parser('"0123456789333444 Test"',1033,0,0)
returns the following:
group_id phrase_id occurrence special_term display_term expansion_type source_term
1 0 1 Exact Match 0123456789333444 0 0123456789333444 Test
1 0 1 Exact Match nn0123456789333444 0 0123456789333444 Test
1 0 2 Exact Match test 0 0123456789333444 Test
This is because the #Keyword is not wrapped in double quotes. Which forces zero, one, or more matches.
Specifies a match of words or phrases beginning with
the specified text. Enclose a prefix term in double quotation marks
("") and add an asterisk () before the ending quotation mark, so that
all text starting with the simple term specified before the asterisk
is matched. The clause should be specified this way: CONTAINS (column,
'"text"'). The asterisk matches zero, one, or more characters (of the
root word or words in the word or phrase). If the text and asterisk
are not delimited by double quotation marks, so the predicate reads
CONTAINS (column, 'text*'), full-text search considers the asterisk as
a character and searches for exact matches to text*. The full-text
engine will not find words with the asterisk (*) character because
word breakers typically ignore such characters.
When is a phrase, each word contained in the phrase is
considered to be a separate prefix. Therefore, a query specifying a
prefix term of "local wine*" matches any rows with the text of "local
winery", "locally wined and dined", and so on.
Have a look at the MSDN on the topic. MSDN
Have you tried to query the following view to see what's on the system stoplist?
select * from sys.fulltext_system_stopwords where language_id = 1033;
Found a solution that works. I've added language 1033 as an additional parameter.
SELECT * FROM MyTable WHERE CONTAINS(Message, #Keyword, langauge 1033)
Let's say I have a SQL Server table that looks like the following:
ID NAME DESCRIPTION
1 ANDREW COOL
2 MATT NOT COOL
All I need to do is output the data to a space delimited text file. However I want to ensure that the 'NAME' column has at maximum 10 characters. So for example with the first row 'ANDREW' is is 6 characters, then I'd want 4 spaces after it.
Same thing for second row. 'MATT' is 4 characters, so I would want 6 spaces after it. This way as you move to each column the data is lined up, worst case it gets truncated but I'm not concerned with that.
Use this select query then export this to ur text file.
select ID,cast(NAME as char(10)) as NAME,DESCRIPTION from yourtable
you can use convert function
select CONVERT(char(10),'ANDREW')
.
select ID,
CONVERT((char(10),NAME) as NAME,
DESCRIPTION
from <table>