Splitting and Analyzing data

Splitting and Analyzing data - sql

I have a field that sometimes contains a string like the following: 2/23/2013 12:25:55~45
I need to split the string at the ~ and identify if what's left of the ~ is a valid date time value and what's right of the ~ is a valid integer. Basically what i want to return is a True/False whether those conditions are correct.
Keep in mind that the field could contain nulls, could contain any other type of data, and it could contain multiple tildes. In all cases I need to return false. The only time I need to return true is when the field contains a date/time value, a single tilde, and a whole number.

In SQL Server, you could do:
select (case when col like '%~%'
then (case when isdate(left(col, charindex('~', col) - 1)) = 1 and
isnumeric(substring(col, charindex('~', col)+1, 1000)) = 1 and col not like '%~%.%' and col not like '%~%e%'
then 1
else 0
end)
else 0
end) as IsFunkyFormat, substring(col, charindex('~', col)+1, 1000), left(col, charindex('~', col) - 1)
The nested case is to prevent errors when the separator is not found. The not like expressions are intended to rule out number formats that are not integers.

This question is trickier than it looks because it would be easy to get wrong, or to write it in such a way that it works now, for your given set of data, but fails to work for other sets of data.
If this is the data you are storing in your database, I strongly encourage you to learn about database normalization. One tenet of normalization is that you only store one value in a column. In this case, you are storing a datetime and an integer value in the same column. It would be far better to store the data in multiple columns.
That being said, I know that there are times when you are given some raw data that you need to import in to your database. Often times, we cannot control the raw data that we are given, so we must make do with SQL gymnastics. In this particular case, there are several different types of back-flips that will be useful.
Determine the number of ~ characters in your string.
splitting the data on the tilde.
Making sure one of the values is a datetime
Making sure the other value is an integer.
Only 1 of the 4 items is built in to SQL Server. There is a function names IsDate that takes a string parameter and returns a bit indicating whether the date represented by the string can be converted to a date.
To determine the number of ~ in your string, the trick is to determine the length of the string with the tilde's, and the length of the string without the tilde's. We can determine which rows contain a single tilde by doing this:
When Len(Data) = Len(Replace(Data, '~', '')) + 1
The other tricky problem to solve is to determine if a string represents a whole number. There are multiple ways of doing this, but my favorite method is to concatenate hard coded values to your data and then test for numeric. For example, the IsNumeric function will return true for the string 1e4 because the e represents scientific notation and 1e4 could be interpreted as 1000. So if you do this:
IsNumeric(Data + 'e0')
This will return false for scientific notation because the data would be something like 1e4, which is concatenated to 'e0' to get '1e4e0' which is not numeric. Similarly, we can concatenate .0 to the string to check for fractional numbers. If your data is 45.2 (which is numeric) and you concatenate .0 to it, you get '45.2.0' which is not numeric. You can also add '-' to the test to check for positive numbers. '-20' is numeric, but '-' + '-20' (which is '--20') is not numeric.
Select YourColumnHere,
Len(Replace(YourColumnHere, '~', '')) + 1,
Case When Len(YourColumnHere) = Len(Replace(YourColumnHere, '~', '')) + 1
Then
Case When IsDate(Left(YourColumnHere, CharIndex('~', YourColumnHere)-1)) = 1
Then
Case When Right(YourColumnHere, Len(YourColumnHere)-CharIndex('~', YourColumnHere)) > ''
Then IsNumeric('-' + Right(YourColumnHere, Len(YourColumnHere)-CharIndex('~', YourColumnHere)) + '.0e0')
Else 0
End
Else 0
End
Else 0
End
From YourTableNameHere

Related

Problem with using SUBSTRING and CHARINDEX

I have a column (RCV1.ECCValue) in a table which 99% of the time has a constant string format- example being:
T0-11.86-273
the middle part of the two hyphens is a percentage. I'm using the below sql to obtain this figure which is working fine and returns 11.86 on the above example. when the data in that table is in above format
'Percentage' = round(SUBSTRING(RCV1.ECCValue,CHARINDEX('-',RCV1.ECCValue)+1, CHARINDEX('-',RCV1.ECCValue,CHARINDEX('-',RCV1.ECCValue)+1) -CHARINDEX('-',RCV1.ECCValue)-1),2) ,
However...this table is updated from an external source and very occasionally the separators differ, for example:
T0-11.86_273
when this occurs I get the error:
Invalid length parameter passed to the LEFT or SUBSTRING function.
I'm very new to SQL and have got myself out of many challenges but this one has got me stuck. Any help would be mostly appreciated. Is there a better way to extract this percentage value?

Replace '_' with '-' to string in CHARINDEX while specifying length to the substring
'Percentage' = round(SUBSTRING(RCV1.ECCValue,CHARINDEX('-',RCV1.ECCValue)+1, CHARINDEX('-',replace(RCV1.ECCValue,'_','-'),CHARINDEX('-',RCV1.ECCValue)+1) -CHARINDEX('-',RCV1.ECCValue)-1),2) ,

If you can guarantee the structure of these strings, you can try parsename
select round(parsename(translate(replace('T0-11.86_273','.',''),'-_','..'),2), 2)/100
Breakdown of steps
Replace . character in the percentage value with empty string using replace.
Replace - or _, whichever is present, with . using translate.
Parse the second element using parsename.
Round it up to 2 digits, which will also
automatically cast it to the desired numeric type.
Divide by 100
to restore the number as percentage.
Documentation & Gotchas

Use NULLIF to null out such values
round(
SUBSTRING(
RCV1.ECCValue,
NULLIF(CHARINDEX('-', RCV1.ECCValue), 0) + 1,
NULLIF(CHARINDEX('-',
RCV1.ECCValue,
NULLIF(CHARINDEX('-', RCV1.ECCValue), 0) + 1
), 0)
- NULLIF(CHARINDEX('-', RCV1.ECCValue), 0) - 1
),
2)
I strongly recommend that you place the repeated values in CROSS APPLY (VALUES to avoid having to repeat yourself. And do use whitespace, it's free.

SQL Error in Non Case condition of CASE WHEN clause

I tried executing the following SQL statement.
SELECT CASE WHEN CHARINDEX('~','test.pdf') > 0
THEN SUBSTRING('test.pdf',CHARINDEX('~', 'test.pdf'), -10)
ELSE NULL
END
This resulted in an error 'Invalid length parameter passed to the substring function.'. However, this was not expected because it is not going to execute anyway.
This query is a simplified version of my requirement. Actually we are computing the value length for the substring. The real scenario is also given below :
SELECT CASE
WHEN CHARINDEX('~', 'test.pdf') > 0 THEN SUBSTRING('test.pdf', CHARINDEX('~', 'test.pdf') + 1, CHARINDEX('~', 'test.pdf', (CHARINDEX('~', 'test.pdf', 1)) + 1) - CHARINDEX('~', 'test.pdf') - 1)
ELSE NULL
END;
In the example its hardcoded as 'test.pdf' but in real scenario it would be values like '111111~22222~33333~4444.pdf' from Table column. Also, I'm not sure this file name should always follow this format. Hence, a validation is required.
Actually, the computation for length is quite expensive, and don't want to use it twice in this query.

You have passed -10 as a constant to substring(). This function does not allow negative values for the third argument:
length
Is a positive integer or bigint expression that specifies how many characters of the expression will be returned. If length is negative, an error is generated and the statement is terminated. If the sum of start and length is greater than the number of characters in expression, the whole value expression beginning at start is returned.
SQL Server catches this problem during the compile phase. This has nothing to do with CASE expression evaluation, but with parsing the expressions.

Matching on Values, but Erroring on New Value in SQL Server

I am comparing data from two different databases (one MariaDB and one SQL Server) within my Node project, and am then doing inserts and updates as necessary depending on the comparison results.
I have a question about this code that I use to iterate through results in Node, going one at a time and passing in values to check against (note - I am more familiar with Node and JS than with SQL, hence this question):
SELECT TOP 1
CASE
WHEN RM00101.CUSTCLAS LIKE ('%CUSR%')
THEN CAST(REPLACE(LEFT(LR301.DOCNUMBR, CHARINDEX('-', LR301.DOCNUMBR)), '-', '') AS INT)
ELSE 0
END AS Id,
CASE
WHEN LR301.RMDTYPAL = 7 THEN LR301.ORTRXAMT * -1
WHEN LR301.RMDTYPAL = 9 THEN LR301.ORTRXAMT * -1
ELSE LR301.ORTRXAMT
END DocumentAmount,
GETDATE() VerifyDate
FROM
CRDB..RM20101
INNER JOIN
CRDB..RM00101 ON LR301.CUSTNMBR = RM00101.CUSTNMBR
WHERE
CONVERT(BIGINT, (REPLACE(LEFT(LR301.DOCNUMBR, CHARINDEX('-', LR301.DOCNUMBR)), '-', ''))) = 589091
Currently, the above works for me for finding records that match. However, if I enter a value that doesn't yet exist - in this line below, like so:
WHERE CONVERT(BIGINT, (REPLACE(LEFT( LR301.DOCNUMBR, CHARINDEX('-', LR301.DOCNUMBR)), '-', ''))) = 789091
I get this error:
Error converting data type varchar to bigint.
I assume the issue is that, if the value isn't found, it can't cast it to an INTEGER, and so it errors out. Sound right?
What I ideally want is for the query to execute successfully, but just return 0 results when a match is not found. In JavaScript I might doing something like an OR clause to handle this:
const array = returnResults || [];
But I'm not sure how to handle this with SQL.
By the way, the value in SQL Server that's being matched is of type char(21), and the values look like this: 00000516542-000. The value in MariaDB is of type INT.
So two questions:
Will this error out when I enter a value that doesn't currently match?
If so, how can I handle this so as to just return 0 rows when a match isn't found?
By the way, as an added note, someone suggested using TRY_CONVERT, but while this works in SQL Server, it doesn't work when I use it with the NODE mssql package.

I think the issue is happening because the varchar value is not always made of numbers. You can make the comparison in varchar format itself to avoid this issue:
WHERE (REPLACE(LEFT( LR301.DOCNUMBR, CHARINDEX('-', LR301.DOCNUMBR)), '-', '')) = '789091'
Hope this helps.
Edit: based on the format in the comment, this should do the trick;
WHERE REPLACE(LTRIM(REPLACE(REPLACE(LEFT( LR301.DOCNUMBR, CHARINDEX('-', LR301.DOCNUMBR)),'0',' '),'-','')),' ','0') = '789091'

SQL Server - Combine string to integer where integer can have a variable number of leading zeros

I have a report in SQL Server Report Builder which brings back the profession acronym (string) and registration number (integer) for each professional in a separate SQL database.
The registration number can be 5 or more digits long, and may start with one or more zeros. For example:
Profession Registration #
AB 00162
PH 02272
SA 13925
SA 026025
DA 1025927
I'm trying to put the profession acronym and registration number together into a registration ID, because I need to compare this with the registration ID from another (non SQL) database.
I'm trying to get something like this:
Registration ID
AB00162
PH02272
SA13925
SA026025
DA1025927
I've tried converting the integers to strings using the following in my query:
REGISTRY.PROFESSION + right('00000' + cast(REGISTRY.REGISTRATION_NO as varchar(8)), 5) as Full_Reg_Number
However, with the above the integers that are more than 5 digits long get cut off, and if I increase '00000' to, say, '0000000' and the number '5' to '7' in the above, the integers that only have 5 digits are padded with extra leading zeros.
I do not have permission to change the formatting of the integers in either database.

Integers aren't stored with leading zeroes. To be stored like that, then the field is NOT of integer type in the first place. Simply do:
Registry.profession + registry.registration_no
You can confirm that the stored type is not an integer as follows:
select data_type
from information_schema.columns
where table_name = 'registry'
and column_name = 'registration_no'
If you're getting a type conversion error as you mention in your comments, then most likely the error is not coming due to this concatenation. It's probably down the line, such as if you're using 'Full_Reg_Number' in a 'where' statement or other comparison that expects a comparison to an integer, and instead is getting a varchar. After all, you called the column 'Full_Reg_Number' even though it's not a number.

Based on your problems, I suspect those really are integers. You've just shown them with leading zeros in the question.
A simple solution is to use case:
(REGISTRY.PROFESSION +
CASE WHEN REGISTRY.REGISTRATION_NO < 10000 THEN right('00000' + cast(REGISTRY.REGISTRATION_NO as varchar(8)), 5)
ELSE REGISTRY.REGISTRATION_NO
END
) as Full_Reg_Number
An even simpler method uses FORMAT():
(REGISTRY.PROFESSION + FORMAT(REGISTRY.REGISTRATION_NO, '00000')
) as Full_Reg_Number

How to sort varchar column (SQL) that contains number, chars, characters?

'Order by' returns this result below
05
05/1-1
05/1-2
05/1-3
05/1-4
05/1-5
05/1-6
05/1-7
05/1
05/2-1
05/2-2
05/2-3
05/2-4
05/2
05/3
05/4
and this order below is OK
05
05/1
05/1-1
05/1-2
05/1-3
05/1-4
05/1-5
05/1-6
05/1-7
05/2
05/2-1
05/2-2
05/2-3
05/2-4
05/3
05/4
Is there a way to do this?

If possible, try to split up the data, so that any numeric information is in its own field.
String data and numeric data together in a field will always result in string type of data, so that 'A2' > 'A11'.

You need to cast/convert the varchar data to a numeric data type and then perform an order by sort on the data.
You will likely need to split your data string also, so example order by caluse might be:
order by
convert(int,left(columnName,2)) asc,
convert(int,subtring(columnName,4`,2))
This will depend on which string elements represent which date components.
Make sense?

Alter the table and add a compare column. Write a small program which reads the strings and converts them into a format which the database can convert. In your case, a DATE is a good candidate, I guess.
In the general case, use a VARCHAR column and format all numbers to five (or more) digits (with leading zeroes/spaces, i.e. right aligned).
After that, you can use the compare column to order the data.

If I were you I would order by a tricky expression. Let's assume that before a slash you have at most 2 or 3 digits. If you write:
order by case charindex('/', val)
when 0 then convert(int, val)
else convert(int, substr(val, 1, charindex('/', val) -1)
end * 1000
+ case charindex('/', val)
when 0 then 0
else convert(float, replace(substring(val, 1 + charindex('/', val),
length(val)), '-', '.'))
end
If I've not mistyped anything, the following should convert 05 to 5000, 05/1 to 5001, 05/1-1 to 5001.1, and things should sort the way you want, assuming you always have a single digit at most after the hyphen. Otherwise you can probably work around it by splitting and left-padding with the suitable number of zeroes, but the expression would get much uglier ...

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Splitting and Analyzing data - sql

Related

Problem with using SUBSTRING and CHARINDEX

SQL Error in Non Case condition of CASE WHEN clause

Matching on Values, but Erroring on New Value in SQL Server

SQL Server - Combine string to integer where integer can have a variable number of leading zeros

How to sort varchar column (SQL) that contains number, chars, characters?

Categories

Resources