Teradata 15: parsing a string - sql

The task is to scan a string varchar value, which can be null or 3 to N symbols length, and find out if it contains a specific combination in it.
Example:
Find if string A001G002F001H003Z701 contains F001 or B004 or J005
Which solution for this task is the most efficient? thx

You can use either LIKE ANY:
WHERE x LIKE ANY ( '%F001%', '%B004%', '%J005%')
or a RegEx:
WHERE RegExp_Instr(x, 'F001|B004|J005') > 0
Run it against a huge table and compare CPU using Query Log

Related

SQL ORACLE : Is it possible to convert NUMBER with CHAR (varchar2 datatype) to NUMBER datatype

I have a big problem right now and I really need your help, because I can't find the right answer.
I am currently writing a script that triggers a migration process from a table with raw data (data we received from an excel file) to a new normalized schema.
My problem is that there is a column PRICE (varchar2 datatype) with a bunch of traps. For example: 540S, 25oo , I200 , S000 .
And I need to convert it to the correct NUMBER(9,2) format so I can get: 5405, 2500, 1200, 5000 as NUMBER for the previous examples and INSERT INTO my_new_table.
Is there any way I can parse every CHAR of these strings that verify certain conditions?
Or others better way?
Thank you :)!
One of the wonderful things about Oracle that some other DBs lack, is the TRANSLATE function:
SELECT TRANSLATE(number, 'SsIilOoxyz', '5511100') FROM t
This will convert:
S, s to 5
I, i and l to 1
O, o to 0
Remove any x, y or z from the number
The second and third arguments to translate define what characters are to be mapped. If the first string is longer than the second then anything over the length of the second is deleted from the resulting string. Mapping is direct based on position:
'SsIilOoxyz',
'5511100'
Look at the columns of the characters; the character above is mapped to the character below:
S->5,
s->5,
I->1,
i->1,
l->1,
O->0,
o->0,
x->removed,
y->removed,
z->removed`
You can use translate() and along with to_number(). Your rules are not exactly clear, but something like this:
select to_number(translate(price, '0123456789IoS', '012345678910'))
from t;
This replaces I with 1, o with 0, and removes S.

Extract the last element of a list for a split string

I'm trying to take a regular expression and split it by a pre-determined character, and then extract the final value of the returned list.
For example, my string may take the form:
name
WAYNE.ROONEY.226
ROSS.BARKLEY.HELLO.113
ADAM.A122
Pythonically, what I'm trying to do is:
for x in list:
my_val = x.split('.')[-1] #Return the last element of the list when split on .
e.g. desired output:
name value
WAYNE.ROONEY.226 226
ROSS.BARKLEY.HELLO.113 113
ADAM.A122 A122
Can anyone provide me any pointers in either Hive or Impala please?
If I can create this as a view, ideally, that would be perfect, but am also happy with generating actual output with it and then re-uploading to a table
Thank you!
For Hive:
select regexp_extract(NAME, '\\.([^\\.]+)$', 1) as VALUE
from WHATEVER
And pleeeease [edit] learn the power of regular expressions...

Implementing Greater Than operation using SQL wildcard

I have some serialized data inside a relational database table, like:
ID | VALUE
60 | "A=18, D=78"
70 | "D=4, A=18"
80 | "A=21, C=44"
The system can perform queries for searching a particular value using wildcards:
LIKE '%A=18%' (returns the ID:60 and ID:70 registers)
But now I require to implement the Greater Than operator in a similar way.
Is it possible using wildcards?
Thanks!
No that is not possible. It will be treated as a string literal.
When you say LIKE '%A=10%' then A=10 is treated as a string for text matching not as an expression to evaluate.
So if you write like LIKE '%A>10%' then it would take A>10 as a string and not perform any math on it and will result in rows which match the text and in your case it would not return anything.

How do I remove the first character of a string and treat the remaining values as an integer in BigQuery

I currently am working with a large data set that was pre-populated in BigQuery. I have a column of orderID's which have the following set-up: o377412876, o380940924, etc. This is stored in a string. I need to do the following and am running into problems:
1) Strip off the first character using the BigQuery query language
2) Convert the remaining (or treat the remaining values), as an integer.
I will then run a join against the values. Now, I would be abundantly happier down this operation in either Python, R, or another language. That said, the challenge I have been given based on client needs is to write all the scripts in BigQuery's querying language.
SELECT 10 * INTEGER(REGEXP_REPLACE(x, '^.', ''))
FROM
(SELECT 'o1234' AS x)
12340
You can use SUBSTR function and SAFE_CAST (in case there are NULL values in your column). INTEGER does not work on BQ.
SELECT SAFE_CAST(SUBSTR(x, 2) AS INT64)
FROM (SELECT 'o1234' AS x)
Output: 1234

Integer comparison as string

I have an integer column and I want to find numbers that start with specific digits.
For example they do match if I look for '123':
1234567
123456
1234
They do not match:
23456
112345
0123445
Is the only way to handle the task by converting the Integers into Strings before doing string comparison?
Also I am using Postgre regexp_replace(text, pattern, replacement) on numbers which is very slow and inefficient way doing it.
The case is that I have large amount of data to handle this way and I am looking for the most economical way doing this.
PS. I am not looking a way how to cast integer into string.
Are you looking for a match at the start of the value?
You might create a functional index like this:
CREATE INDEX my_index ON mytable(CAST(stuff AS TEXT));
It should be used by your LIKE query, but I didn't test it.
As a standard principle (IMHO), a database design should use a number type if and only if the field is:
A number you could sensibly perform maths on
A reference code within the database - keys etc
If it's a number in some other context - phone numbers, IP addresses etc - store it as text.
This sounds to me like your '123' is conceptually a string that just happens to only contain numbers, so if possible I'd suggest altering the design so it's stored as such.
Otherwise, I can't see a sensible way to do the comparison using it as numbers, so you'll need to convert it to strings on the fly with something like
SELECT * FROM Table WHERE CheckVar LIKE '''' + to_char(<num>,'999') + '%'
The best way for performance is to store them as strings with an index on the column and use LIKE '123%'. Most other methods of solving this will likely involve a full table scan.
If you aren't allowed to change the table, you could try the following, but it's not pretty:
WHERE col = 123
OR col BETWEEN 1230 AND 1239
OR col BETWEEN 12300 AND 12399
etc...
This might also result in a table scan though. You can solve by converting the OR to multiple selects and then UNION ALL them to get the final result.