ways to check for invalid characters? oracle sql - sql

Looking for ways to filter out special signs, letters etc. from studentID in oracle SQL and show those records as invalid.
What is the best way to filter out letters a-Z and other characters? (only leaving numbers)
SELECT replace(Translate(studentid,'a-Z<>!-\+=/&', '??'),'?','') as StudentID, 'Invalid Characters in Student ID'
FROM students

The simplest approach is to use regular expressions. For example
select studentid
from student
where regexp_like(studentid, '\D')
;
\D means non-digit character; if the studentid contains at least one such character, in any position, then it will appear in the output. Note that null will not be flagged out, assuming it may appear in the column; perhaps the column is primary key in which case it can't be null. But this would apply to other tables as well, where studentid may be null.
If you have a very large table, or if you must perform this check often, you may want a less simple, but better performing query. Then you would want to use standard string functions, like you were trying to. Something like this will work:
select studentid
from student
where translate(studentid, 'x0123456789', 'x') is not null
;
translate will translate x to itself, and all digits to null (that is, all digits will be removed). The x trick is needed because the last argument must not be null. If the translation doesn't remove all characters from the string, then the studentid will appear in the output, as required.
If you need to show exactly which characters are non-digits (although that should be obvious), you can add the result of translate to the select clause. Note though that if a student id has, for example, trailing spaces, that will not be evident either from looking at the student id or at the result of translate. You may want to add something like dump(studentid) to select; if you are not familiar with dump, you may want to read a bit about it - it is extremely useful in diagnosing such problems, and easy to learn.
Once you find and handle all the exceptions, you may want to add a constraint to the column, to require all student id's to consist entirely of digits. Then you won't have to put up with this kind of errors anymore.

If you want to allow numbers only, column datatype should have been NUMBER, not VARCHAR2.
[EDIT] That's wrong, though - see #mathguy's comment about it, saying that there are situations where values do consist of digits only, but - due to leading zeros - you can't use the NUMBER datatype.
A simple option is to use regexp_like and return rows that contain anything but digits:
SQL> with students (studentid) as
2 (select '12345' from dual union all
3 select 'ABC12' from dual union all
4 select '23x#2' from dual
5 )
6 select studentid
7 from students
8 where not regexp_like(studentid, '^\d+$');
STUDE
-----
ABC12
23x#2
SQL>

You could also use below solution taking advantage of translate function.
select studentid
from students
WHERE translate(studentid, '`0123456789', '`') IS NOT NULL
;
demo

Related

Oracle GroupBy of worlds from text field

I have a table of
Full Name - varchar2(200)
Age - Number
can I get the count of name for example that inside the textfield
and I dont mean to Select Count(*) from table;
not sure about the requirements but I guess this is what you want
select count(*), name from table group by name;
What is the "textfield"? A bind variable? A value in a column in a different table? The standard solution is
select count(*) from <your_table_name> where <textfield> like '%' || "First Name" || '%' ;
I used double-quotes around First Name - that is the only way column names in Oracle may contain spaces (but I hope you didn't actually do that, it is a very poor practice). Also, if there is the risk that "Jackson" appears in the textfield and you don't want that to count as "Jack" (first name), then replace the last '%' with ' %' (single-quote, SPACE, %, single-quote).
This will give the total number of names from your table that are present in the text field. It will not count duplicates (if John appears five times, it will still be only counted once). If this is not your requirement, please state your requirement more clearly. For example, you may instead want to show how many times each of the names in your table appears in the textfield... or any number of other possible interpretations. What do you really need?

How to delete a common word from large number of datas in a Postgres table

I have a table in Postgres. In that table more than 1000 names are there. Most of the names are start with SHRI or SMT. I want to delete this SHRT and SMT from the names and to save original name only. How can I do that with out any database function?
I'll step you through the logic:
Select left(name,3) from table
This select statement will bring back the first 3 chars of a column (the 'left' three). If we are looking for SMT in the first three chars, we can move it to the where statement
select * from table where left(name,3) = 'SMT'
Now from here you have a few choices that can be used. I'm going to keep to the left/right style, though replace could likely be used. We want the chars to the right of the SMT, but we don't know how long each string is to pick out those chars. So we use length() to determine that.
select right(name,length(name)-3) from table where left(name,3) = 'SMT'
I hope my syntax is right there, I'm lacking a postgres environment to test it. The logic is 'all the chars on the right of the string except the last 3 (the minus 3 excludes the 3 chars on the left. change this to 4 if you want all but the last 4 on the left)
You can then change this to an update statement (set name = right(name,length(name)-3) ) to update the table, or you can just use the select statement when you need the name without the SMT, but leave the SMT in the actual data.

Combining concatenation with ORDER BY

I have troubles in combining concatenation with order by in Postgre (9.1.9).
Let's say, I have a table borders with 3 fields:
Table "borders"
Column | Type | Modifiers
---------------+----------------------+-----------
country1 | character varying(4) | not null
country2 | character varying(4) | not null
length | numeric |
The first two fields are codes of the countries and the third one is the length of the border among those countries.
The primary key is defined on the first two fields.
I need to compose a select of a column that would have unique values for the whole table, in addition this column should be selected in decreasing order.
For this I concatenate the key fields with a separator character, otherwise two different rows might give same result, like (AB, C and A, BC).
So I run the following query:
select country1||'_'||country2 from borders order by 1;
However in the result I see that the '_' character is omited from the sorting.
The results looks like this:
?column?
----------
A_CH
A_CZ
A_D
AFG_IR
AFG_PK
AFG_TAD
AFG_TJ
AFG_TM
AFG_UZB
A_FL
A_H
A_I
.
.
You can see that the result is sorted as if '_' doesn't exists in the strings.
If I use a letter (say 'x') as a separator - the order is correct. But I must use some special character that doesn't appear in the country1 and country2 fields, to avoid contentions.
What should I do, in order to make the '_' character to be taken into account during the sorting.
EDIT
It turned out that the concatenation has nothing to do with the problem. The problem is that the order by simply ignores '_' character.
select country1 || '_' || country2 collate "C" as a
from borders
order by 1
sql fiddle demo
Notes according to discussion in comments:
1.) COLLATE "C" applies in the ORDER BY clause as long as it references the expression in the SELECT clause by positional parameter or alias. If you repeat the expression in ORDER BY you also need to repeat the COLLATE clause if you want to affect the sort order accordingly.
sql fiddle demo
2.) In collations where _ does not influence the sort order, it is more efficient to use fog's query, even more so because that one makes use of the existing index (primary key is defined on the first two fields).
However, if _ has an influence, one needs to sort on the combined expression:
sql fiddle demo
Query performance (tested in Postgres 9.2):
sql fiddle demo
PostgreSQL Collation Support in the manual.
Just order by the two columns:
SELECT country1||'_'||country2 FROM borders ORDER BY country1, country2;
Unless you use aggregates or windows, PostgreSQL allows to order by columns even if you don't include them in the SELECT list.
As suggested in another answer you can also change the collation of the combined column but, if you can, sorting on plain columns is faster, especially if you have an index on them.
What happens when you do the following?
select country1||'_'||country2 from borders order by country1||'_'||country2
My knowledge on order by 1 only does an ordinal sort. It won't do anything on concatenated columns. Granted, I'm speaking from SQL Server knowledge, so let me know if I'm way off base.
Edited: Ok; just saw Parado's post as I posted mine. Maybe you could create a view from this query (give it a column name) and then requery the view, order by that column? Or do the following:
select country_group from (
select country1||'_'||country2 as country_group from borders
) a
order by country_group

Sqlite : Sql to finding the most complete prefix

I have a sqlite table containing records of variable length number prefixes. I want to be able to find the most complete prefix against another variable length number in the most efficient way:
eg. The table contains a column called prefix with the following numbers:
1. 1234
2. 12345
3. 123456
What would be an efficient sqlite query to find the second record as being the most complete match against 12345999.
Thanks.
A neat trick here is to reverse a LIKE clause -- rather than saying
WHERE prefix LIKE '...something...'
as you would often do, turn the prefix into the pattern by appending a % to the end and comparing it to your input as the fixed string. Order by length of prefix descending, and pick the top 1 result.
I've never used Sqlite before, but just downloaded it and this works fine:
sqlite> CREATE TABLE whatever(prefix VARCHAR(100));
sqlite> INSERT INTO WHATEVER(prefix) VALUES ('1234');
sqlite> INSERT INTO WHATEVER(prefix) VALUES ('12345');
sqlite> INSERT INTO WHATEVER(prefix) VALUES ('123456');
sqlite> SELECT * FROM whatever WHERE '12345999' LIKE (prefix || '%')
ORDER BY length(prefix) DESC LIMIT 1;
output:
12345
Personally I use next method, it will use indexes:
statement '('1','12','123','1234','12345','123459','1234599','12345999','123459999')'
should be generated by client
SELECT * FROM whatever WHERE prefix in
('1','12','123','1234','12345','123459','1234599','12345999','123459999')
ORDER BY length(prefix) DESC LIMIT 1;
select foo, 1 quality from bar where foo like "123*"
union
select foo, 2 quality from bar where foo like "1234*"
order by quality desc limit 1
I haven't tested it, but the idea would work in other dialects of SQL
a couple of assumptions.
you are joining with some other table so you want to know the largest variable length prefix for each record in the table you are joining with.
your table of prefixes is actually more than just the three you provide in your example...otherwise you can hardcode the logic and move on.
prefix_table.prefix
1234
12345
123456
etc.
foo.field
12345999
123999
select
a.field,
b.prefix,
max(len(b.prefix)) as length
from
foo a inner join prefix_table b on b.prefix = left(a.field, len(b.prefix))
group by
a.field,
b.prefix
note that this is untested but logically should make sense.
Without resorting to a specialized index, the best performing strategy may be to hunt for the answer.
Issue a LIKE query for each possible prefix, starting with the longest. Stop once you get rows returned.
It's certainly not the prettiest way to achieve what you wan't but as opposed to the other suggestions, indexes will be considered by the query planner. As always, it depends on your actual data. In particular, on how many rows in your table, and how long the average hunt will be.

Oracle - Select where field has lowercase characters

I have a table, users, in an Oracle 9.2.0.6 database. Two of the fields are varchar - last_name and first_name.
When rows are inserted into this table, the first name and last name fields are supposed to be in all upper case, but somehow some values in these two fields are mixed case.
I want to run a query that will show me all of the rows in the table that have first or last names with lowercase characters in it.
I searched the net and found REGEXP_LIKE, but that must be for newer versions of oracle - it doesn't seem to work for me.
Another thing I tried was to translate "abcde...z" to "$$$$$...$" and then search for a '$' in my field, but there has to be a better way?
Thanks in advance!
How about this:
select id, first, last from mytable
where first != upper(first) or last != upper(last);
I think BQ's SQL and Justin's second SQL will work, because in this scenario:
first_name last_name
---------- ---------
bob johnson
Bob Johnson
BOB JOHNSON
I want my query to return the first 2 rows.
I just want to make sure that this will be an efficient query though - my table has 500 million rows in it.
When you say upper(first_name) != first_name, is "first_name" always pertaining to the current row that oracle is looking at? I was afraid to use this method at first because I was afraid I would end up joining this table to itself, but they way you both wrote the SQL it appears that the equality check is only operating on a row-by-row basis, which would work for me.
If you are looking for Oracle 10g or higher you can use the below example. Consider that you need to find out the rows where the any of the letter in a column is lowercase.
Column1
.......
MISS
miss
MiSS
In the above example, if you need to find the values miss and MiSS, then you could use the below query
SELECT * FROM YOU_TABLE WHERE REGEXP_LIKE(COLUMN1,'[a-z]');
Try this:
SELECT * FROM YOU_TABLE WHERE REGEXP_LIKE(COLUMN1,'[a-z]','c'); => Miss, miss lower text
SELECT * FROM YOU_TABLE WHERE REGEXP_LIKE(COLUMN1,'[A-Z]','c'); => Miss, MISS upper text
SELECT *
FROM mytable
WHERE FIRST_NAME IN (SELECT FIRST_NAME
FROM MY_TABLE
MINUS
SELECT UPPER(FIRST_NAME)
FROM MY_TABLE )
for SQL server where the DB collation setting is Case insensitive use the following:
SELECT * FROM tbl_user WHERE LEFT(username,1) COLLATE Latin1_General_CS_AI <> UPPER(LEFT(username,1))