Removing characters after a specified character format - sql

I have a field that should contain 6 digits, a period, and six digits (######.######). The application that I use allows this to be free-form entry. Because users are users and will do what they want I have several fields that have a dash and some letters afterwards (######.######-XYZ).
Using T-SQL how do I identify and subsequently remove the -XYZ so that I can return the integrity of the data. The column is an NVARCHAR(36), PK, and does not allow null values. The column in question does have a unique columnID field.

If the part you want is the first 13 characters, then use left():
select left(field, 13)
You can check if the first 13 characters are what you expect:
select (case when field like '[0-9][0-9][0-9][0-9][0-9][0-9].[0-9][0-9][0-9][0-9][0-9][0-9]%'
then left(field, 13)
else -- whatever you want when the field is bad
end)

since it'a free-form and "users are users", use charindex to find out if 1) there is a - and 2) remove it.
Example:
DECLARE #test NVARCHAR(36) = N'######.######-XYZ'
SELECT SUBSTRING(#test,1,COALESCE(NULLIF(CHARINDEX('-',#test,1),0),LEN(#test)+1)-1)

Related

How to find all entries that match part of a string criteria but not another in Oracle SQL

I have a column like:
Values
111.111.111-Dummy
111.111.111-Dummy2
111.111.111-X
222.222.222-Dummy
222.222.222-Dummy2
333.333.333-Dummy
333.333.333-Dummy2
333.333.333-X
I need to find the numbers that do not have an entry with "-X" in the end.
So in this scenario the query should show: 222.222.222.
My idea so far was to first trim the results to only have the numbers part (or everything before the '-')
But I don't know what to do next. How can I find entries that don't match in the same column and same table?
select substr(values_, 1, instr(values_, '-') - 1) as numbers
from {your-table}
group by substr(values_, 1, instr(values_, '-') - 1)
having count(case when values_ like '%-X' then 1 end) = 0;
values is a reserved keyword in Oracle, and therefore it can't be an identifier (such as a column name); I changed it by adding a trailing underscore.
Note that this assumes all "values" are followed by a dash and a (possibly empty) string. If you may also have values like 111.11.1111 (with no dash at the end) then the query must be modified slightly, but I assumed there aren't any - otherwise you should have included one or two in your sample.
Use not like in a having clause:
select substring_index(values, '-', 1)
from t
group by substring_index(values, '-', 1)
having sum(values like '%-x') = 0;

SQL: Using <= and >= to compare string with wildcard

Assuming I have table that looks like this:
Id | Name | Age
=====================
1 | Jose | 19
2 | Yolly | 26
20 | Abby | 3
29 | Tara | 4
And my query statement is:
1) Select * from thisTable where Name <= '*Abby';
it returns 0 row
2) Select * from thisTable where Name <= 'Abby';
returns row with Abby
3) Select * from thisTable where Name >= 'Abby';
returns all rows // row 1-4
4) Select * from thisTable where Name >= '*Abby';
returns all rows; // row 1-4
5) Select * from thisTable where Name >= '*Abby' and Name <= "*Abby";
returns 0 row.
6) Select * from thisTable where Name >= 'Abby' and Name <= 'Abby';
returns row with Abby;
My question: why I got these results? How does the wildcard affect the result of query? Why don't I get any result if the condition is this Name <= '*Abby' ?
Wildcards are only interpreted when you use LIKE opterator.
So when you are trying to compare against the string, it will be treated literally. So in your comparisons lexicographical order is used.
1) There are no letters before *, so you don't have any rows returned.
2) A is first letter in alphabet, so rest of names are bigger then Abby, only Abby is equal to itself.
3) Opposite of 2)
4) See 1)
5) See 1)
6) This condition is equivalent to Name = 'Abby'.
When working with strings in SQL Server, ordering is done at each letter, and the order those letters are sorted in depends on the collation. For some characters, the sorting method is much easier to understand, It's alphabetical or numerical order: For example 'a' < 'b' and '4' > '2'. Depending on the collation this might be done by letter and then case ('AaBbCc....') or might be Case then letter ('ABC...Zabc').
Let's take a string like 'Abby', this would be sorted in the order of the letters A, b, b, y (the order they would appear would be according to your collation, and i don't know what it is, but I'm going to assume a 'AaBbCc....' collation, as they are more common). Any string starting with something like 'Aba' would have a value sell than 'Abby', as the third character (the first that differs) has a "lower value". As would a value like 'Abbie' ('i' has a lower value than 'y'). Similarly, a string like 'Abc' would have a greater value, as 'c' has a higher value than 'b' (which is the first character that differs).
If we throw numbers into the mix, then you might be surpised. For example the string (important, I didn't state number) '123456789' has a lower value than the string '9'. This is because the first character than differs if the first character. '9' is greater than '1' and so '9' has the "higher" value. This is one reason why it's so important to ensure you store numbers as numerical datatypes, as the behaviour is unlikely to be what you expect/want otherwise.
To what you are asking, however, the wildcard for SQL Server is '%' and '_' (there is also '^',m but I won't cover that here). A '%' represents multiple characters, while '_' a single character. If you want to specifically look for one of those character you have to quote them in brackets ([]).
Using the equals (=) operator won't parse wildcards. you need to use a function that does, like LIKE. Thus, if you want a word that started with 'A' you would use the expression WHERE ColumnName LIKE 'A%'. If you wanted to search for one that consisted of 6 characters and ended with 'ed' you would use WHERE ColumnName LIKE '____ed'.
Like I said before, if you want to search for one of those specific character, you quote then. So, if you wanted to search for a string that contained an underscore, the syntax would be WHERE ColumnName LIKE '%[_]%'
Edit: it's also worth noting that, when using things like LIKE that they are effected by the collations sensitivity; for example, Case and Accent. If you're using a case sensitive collation, for example, then the statement WHERE 'Abby' LIKE 'abb%' is not true, and 'A' and 'a' are not the same case. Like wise, the statement WHERE 'Covea' = 'Covéa' would be false in an accent sensitive collation ('e' and 'é' are not treated as the same character).
A wildcard character is used to substitute any other characters in a string. They are used in conjunction with the SQL LIKE operator in the WHERE clause. For example.
Select * from thisTable WHERE name LIKE '%Abby%'
This will return any values with Abby anywhere within the string.
Have a look at this link for an explanation of all wildcards https://www.w3schools.com/sql/sql_wildcards.asp
It is because, >= and <= are comparison operators. They compare string on the basis of their ASCII values.
Since ASCII value of * is 42 and ASCII values of capital letters start from 65, that is why when you tried name<='*Abby', sql-server picked the ASCII value of first character in your string (that is 42), since no value in your data has first character with ASCII value less than 42, no data got selected.
You can refer ASCII table for more understanding:
http://www.asciitable.com/
There are a few answers, and a few comments - I'll try to summarize.
Firstly, the wildcard in SQL is %, not * (for multiple matches). So your queries including an * ask for a comparison with that literal string.
Secondly, comparing strings with greater/less than operators probably does not do what you want - it uses the collation order to see which other strings are "earlier" or "later" in the ordering sequence. Collation order is a moderately complex concept, and varies between machine installations.
The SQL operator for string pattern matching is LIKE.
I'm not sure I understand your intent with the >= or <= stateements - do you mean that you want to return rows where the name's first letter is after 'A' in the alphabet?

Pull 3 digits from middle of id and sort by even odd

I have file ids in my database that start with:
a single character prefix
a period
a three digit client id
a hyphen
a three digit file number.
Example F.129-123
We have several ids for each client.
I need to be able to strip out the three digit file number and then pull them based on even or odd so that I can assign specific data to each result population.
One added issue. Some of the ids have characters added at the end.
Example: F.129-123A or F.129-123.NF
So I need to be able to just use the three digit file number without any other characters, because the added characters create errors while conversion.
If you are using SQL SERVER,
you can use CHARINDEX() to find the index of - and then
get 3 digits after - using SUBSTRING()
SELECT substring('F.123-234',charindex('-','F.123-234')+1, 3)
If you are using MySQL,
you can use POSITION() to find the index of - and then get 3 digits after - using SUBSTRING()
SELECT SUBSTRING('F.123-234',POSITION( '-' IN 'F.123-234' )+1,3);
If you are using Oracle,
you can use INSTR() to find the index of - and then get 3 digits after - using SUBSTR()
UPDATES:
Based on the requirements in comments, you can use a query like below achieve what you need.
SELECT
SUBSTRING(MatterID,CHARINDEX('-',MatterID)+1, 3) as FileNo
FROM
Matters
WHERE
MatterID LIKE'f.129%'
AND MatterID NOT LIKE '%col%'
AND substring( MatterID, CHARINDEX('-',MatterID)+1, 3) % 2 = 0
If you are working with Microsoft SQL Server, then you could use of patindex() function with substring() function to get the only 3 digits file number
select left(substring(string, PATINDEX('%[0-9][-]%', string)+2, LEN(string)), 3)
Note that if you have other period (i.e. -, /) then you will need to modify chars like PATINDEX('%[0-9][/]%')
In Postgres you can use split_part() to get the part after the hyphen, then cast it to an integer:
select *
from the_table
order by split_part(file_id, '-', 2)::int;
This assumes that there is always exactly one - in the string. I understand your question that this is the case as the format is fixed.
Is this helpful
Create table #tmpFileNames(id int, FileName VARCHAR(50))
insert into #tmpFileNames values(1,'F.129-123')
insert into #tmpFileNames values(2,'F.129-125')
insert into #tmpFileNames values(3,'F.129-124')
insert into #tmpFileNames values(4,'F.129-123A')
insert into #tmpFileNames values(5,'F.129-124B')
insert into #tmpFileNames values(6,'F.129-125.PQ')
insert into #tmpFileNames values(7,'F.129-123.NF')
select SUBSTRING(STUFF(FileName, 1, CHARINDEX('-',FileName), ''),0,4), * from #tmpFileNames
Order by SUBSTRING(STUFF(FileName, 1, CHARINDEX('-',FileName), ''),0,4),id
Drop table #tmpFileNames

Using SQL to make specific changes in a database.

I am trying to figure out some commands/code in SQL.
I have database with names, addresses IDs etc, but I have to convert firstname values ending in “jnr” to “(Jnr)” and those ending in “snr” to “(Snr)”.
How do I do this?
update table TABLE_NAME set NAMES = '*xyz*Jnr' where NAMES like '%jnr'
Update or select:
PASTE(column, CHAR_LENGTH(column)-3, 1, UPPER(SUBSTRING(column FROM CHAR_LENGTH(column)-3 FOR 1)
WHERE column LIKE '%jnr' OR column LIKE '%snr'
PASTE is used to put in one character at position 3 from end,
CHAR_LENGTH to get length of column value,
UPPER converts character to upper case,
SUBSTRING is used to pick one character here (j or s),
LIKE is used to find values ending with jnr, or snr.
All ANSI SQL (no dbms specified!)

Get rows that contain only certain characters

I want to get only those rows that contain ONLY certain characters in a column.
Let's say the column name is DATA.
I want to get all rows where in DATA are ONLY (must have all three conditions!):
Numeric characters (1 2 3 4 5 6 7 8 9 0)
Dash (-)
Comma (,)
For instance:
Value "10,20,20-30,30" IS OK
Value "10,20A,20-30,30Z" IS NOT OK
Value "30" IS NOT OK
Value "AAAA" IS NOT OK
Value "30-" IS NOT OK
Value "30," IS NOT OK
Value "-," IS NOT OK
Try patindex:
select * from(
select '10,20,20-30,30' txt union
select '10,20,20-30,40' txt union
select '10,20A,20-30,30Z' txt
)x
where patindex('%[^0-9,-]%', txt)=0
For you table, try like:
select
DATA
from
YourTable
where
patindex('%[^0-9,-]%', DATA)=0
As per your new edited question, the query should be like:
select
DATA
from
YourTable
where
PATINDEX('%[^0-9,-]%', DATA)=0 and
PATINDEX('%[0-9]%', LEFT(DATA, 1))=1 and
PATINDEX('%[0-9]%', RIGHT(DATA, 1))=1 and
PATINDEX('%[,-][-,]%', DATA)=0
Edit: Your question was edited, so this answer is no longer correct. I won't bother updating it since someone else already has updated theirs. This answer does not fulfil the condition that all three character types must be found.
You can use a LIKE expression for this, although it's slightly convoluted:
where data not like '%[^0123456789,!-]%' escape '!'
Explanation:
[^...] matches any character that is not in the ... part. % matches any number (including zero) of any character. So [^0123456789-,] is the set of characters that you want to disallow.
However: - is a special character inside of [], so we must escape it, which we do by using an escape character, and I've chosen !.
So, you match rows that do not contain (not like) any character that is not in your disallowed set.
Use option with PATINDEX and LIKE logic operator
SELECT *
FROM dbo.test70
WHERE PATINDEX('%[A-Z]%', DATA) = 0
AND PATINDEX('%[0-9]%', DATA) > 0
AND DATA LIKE '%-%'
AND DATA LIKE '%,%'
Demo on SQLFiddle
As already mentioned u can use a LIKE expression but it will only work with some minor modifications, otherwise too many rows will be filtered out.
SELECT * FROM X WHERE T NOT LIKE '%[^0-9!-,]%' ESCAPE '!'
see working example here:
http://sqlfiddle.com/#!3/474f5/6
edit:
to meet all 3 conditions:
SELECT *
FROM X
WHERE T LIKE '%[0-9]%'
AND T LIKE '%-%'
AND T LIKE '%,%'
see: http://sqlfiddle.com/#!3/86328/1
Maybe not the most beautiful but a working solution.