Find a character index in string in spark sql - apache-spark-sql

I am SQL person and new to Spark SQL
I need to find the position of character index '-' is in the string if there is then i need to put the fix length of the character otherwise length zero
string name = 'john-smith'
if '-' is in character position 4 then 10 otherwise length 0
I have done in SQL Server but now need to do in Spark SQL.
select
case
when charindex('-', name) = 4 then 10
else 0
end
I tried in Spark SQL but failed to get results.
select find_in_set('-',name)
Please help. Thanks

You can use instr function as shown next. insrt checks if the second str argument is part of the first one, if so it returns its index starting from 1.
//first create a temporary view if you don't have one already
df.createOrReplaceTempView("temp_table")
//then use instr to check if the name contains the - char
spark.sql("select if(instr(name, '-') = 4, 10, 0) from temp_table")
The arguments for the if statement are:
instr(name, '-') = 4 condition to check
10 result for valid condition
0 result for false condition

Related

Take % out of an SQL column

I'm trying to convert 60% -> 60 of all the columns in a table. I have tried this, but it does not work because % is an SQL operator.
UPDATE host_info set host_response_rate = replace(host_response_rate,'%', '');
But I get all the values to be NULL...
I'm using postgresql
you can use this function, this take out the last n characters of a string. if you use 1, it will take out the last digit.
SELECT RIGHT(host_response_rate, 1)
FROM ...

Split with and get Results from T-SQL

I have some T-SQL code that pulls information from a SQL Server table. I need to parse a column and display according to following result set. I'm bew to SQL, it would be great if you don't mark as duplicate and show me how to do it.
Can you please help?
SELECT Account, CTSFirm, AccountName, BOCodeGMI
FROM StagingEDFACRRBO
BOCodeGMI column contains:
e=01:c=KW:m=10000
c=C-:e=01:m=10000
c=S-:e=01:m=10000
c=06:e=01:m=10
c=07:e=01:m=100
c=W-:e=01:M=10000
Logic to split BOCodeGMI and display two separate columns BOCodeGMI_1 & BOCodeGMI_2:
If string contains e= then display BOCodeGMI_1 as its corresponding value (ex: 01), if string doesn't contain e=, then display BOCodeGMI_1 as NULL
If string contains c= then display BOCodeGMI_2 as its corresponding value (ex: C-), if string doesn't contain c= then display BOCodeGMI_2 as NULL
Finally this is how it suppose to show -
BOCodeGMI BOCodeGMI_1 BOCodeGMI_2
-----------------------------------------------------
e=01:c=KW:m=10000 01 KW
c=C-:e=01:m=10000 01 C-
c=S-:e=01:m=10000 01 S-
Try the next query via using CASE, CHARINDEX & SUBSTRING
SELECT
BOCodeGMI,
CASE
WHEN CHARINDEX('e=', BOCodeGMI) > 0
THEN SUBSTRING (BOCodeGMI, CHARINDEX('e=', BOCodeGMI) + 2 , 2)
END as BOCodeGMI_1,
CASE
WHEN CHARINDEX('c=', BOCodeGMI) > 0
THEN SUBSTRING (BOCodeGMI, CHARINDEX('c=', BOCodeGMI) + 2, 2)
END as BOCodeGMI_2
FROM
tableName
CASE to go through conditions and return a value.
CHARINDEX to search for a substring in a string, and returns the position.
SUBSTRING to extract some characters from a string.

Prevent ORA-01722: invalid number in Oracle

I have this query
SELECT text
FROM book
WHERE lyrics IS NULL
AND MOD(TO_NUMBER(SUBSTR(text,18,16)),5) = 1
sometimes the string is something like this $OK$OK$OK$OK$OK$OK$OK, sometimes something like #P,351811040302663;E,101;D,07112018134733,07012018144712;G,4908611,50930248,207,990;M,79379;S,0;IO,3,0,0
if I would like to know if it is possible to prevent ORA-01722: invalid number, because is some causes the char in that position is not a number.
I run this query inside a procedure a process all the rows in a cursor, if 1 row is not a number I can't process any row
You could use VALIDATE_CONVERSION if it's Oracle 12c Release 2 (12.2),
WITH book(text) AS
(SELECT '#P,351811040302663;E,101;D,07112018134733,07012018144712;G,4908611,50930248,207,990;M,79379;S,0;IO,3,0,0'
FROM DUAL
UNION ALL SELECT '$OK$OK$OK$OK$OK$OK$OK'
FROM DUAL
UNION ALL SELECT '12I45678912B456781234567812345671'
FROM DUAL)
SELECT *
FROM book
WHERE CASE
WHEN VALIDATE_CONVERSION(SUBSTR(text,18,16) AS NUMBER) = 1
THEN MOD(TO_NUMBER(SUBSTR(text,18,16)),5)
ELSE 0
END = 1 ;
Output
TEXT
12I45678912B456781234567812345671
Assuming the condition should be true if and only if the 16-character substring starting at position 18 is made up of 16 digits, and the number is equal to 1 modulo 5, then you could write it like this:
...
where .....
and case when translate(substr(text, 18, 16), 'z0123456789', 'z') is null
and substr(text, 33, 1) in ('1', '6')
then 1 end
= 1
This will check that the substring is made up of all-digits: the translate() function will replace every occurrence of z in the string with itself, and every occurrence of 0, 1, ..., 9 with nothing (it will simply remove them). The odd-looking z is needed due to Oracle's odd implementation of NULL and empty strings (you can use any other character instead of z, but you need some character so no argument to translate() is NULL). Then - the substring is made up of all-digits if and only if the result of this translation is null (an empty string). And you still check to see if the last character is 1 or 6.
Note that I didn't use any regular expressions; this is important if you have a large amount of data, since standard string functions like translate() are much faster than regular expression functions. Also, everything is based on character data type - no math functions like mod(). (Same as in Thorsten's answer, which was only missing the first part of what I suggested here - checking to see that the entire substring is made up of digits.)
SELECT text
FROM book
WHERE lyrics IS NULL
AND case when regexp_like(SUBSTR(text,18,16),'^[^a-zA-Z]*$') then MOD(TO_NUMBER(SUBSTR(text,18,16)),5)
else null
end = 1;

DB2 SQL Anything left of a /

I've been working on this for days and can't seem to work it out. Basically I need return digits from a field before there is a forward slash. e.g. if the field was 1234/TEXT I want to return 1234. I can't just use left fieldname 4 as the digits vary in left e.g. 12345/TEXT, so it needs to be anything left of the forward slash. Now in the World of MS Access, it is something like this - and it works
Left(TABLE!FIELD,InStr(1,TABLE!FIELD,"/")-1)
However, how do I convert this to be used in an IBM\DB2 system? The DB2 SQL seems somewhat different to 'normal' SQL.
Thanks!
Rather than INSTR, maybe LOCATE
LOCATE(char, string)
char is the search term
string is the string being searched
You can achieve this by combining LOCATE with SUBSTR;
Locate information
Substring information
Cheat sheet (for this example);
SUBSTRING('FIELD','START POSITION', 'LENGTH')
LOCATE('SEARCH STRING', 'SOURCE STRING')
SUBSTRING lets you retrieve specific characters from a string, i.e.;
AFIELD = 'Hello'
SUBSTRING(AFIELD,4,2)
Result = 'lo' (position 4 and 5 of Hello)
LOCATE returns the position of the first character of the search string it finds as a number, i.e.;
AFIELD = 'Hello'
LOCATE('ello', AFIELD)
Result = 2 (it starts at position 2)
So you can combine these to do what you want, example;
XTABLE has 1 column called ACOL with the following values in it;
123467/ABCD
1321/ABDD
1123467/ABCD
To just retrieve the numbers;
SELECT SUBSTRING(ACOL,1, LOCATE('/',ACOL)-1)
FROM XRDK/XTABLE
Result;
123467
1321
1123467
What are we doing?
SUBSTRING(
ACOL,
1,
LOCATE('/',ACOL)-1
)
SUBSTRING(
Field ACOL,
Starting at position 1,
Length; using locate set this to where I find a '/' and subtract 1 from the
resulting postion (without the -1 you'd have the / on the end)
)
Try this
SELECT SUBSTRING(CAST (ROUND(COLUMN,2) AS DECIMAL(6,2)), 0, locate('/',CAST (ROUND(COLUMN,2) AS DECIMAL(6,2))))
FROM TABLE

Search for substring, return another substring

I need to search for and display a part of a string field. The string value from record to record may be different. For example:
Record #1
String Value:
IA_UnsafesclchOffense0IA_ReceivedEdServDuringExp0IA_SeriousBodilyInjuryN
Record #2
String Value:
IA_ReasonForRemovalTIA_Beh_Inc_Num1392137419IA_RemovalTypeNIA_UnsafesclchOffense0IA_ReceivedEdServDuringExp0IA_SeriousBodilyInjuryN
Record #3
String Value:
IA_UnsafesclchOffense0IA_RemovalTypeSIA_ReasonForRemovalPIA_ReceivedEdServDuringExp0IA_Beh_Inc_Num1396032888IA_SeriousBodilyInjuryN
In each case, I need to search for IA_Beh_Inc_Num. Assuming it's found, and IF it's followed by numeric data, I want to RETURN the numeric portion of that data. The numeric data, when present, will always be 10 characters.
In other words, record #1 should return no value, record #2 should return 1392137419 and record #3 should return 1396032888
Is there a way to do this within a select statement without having to write a full function with PL/SQL?
This should be easy with a Regular Expression: find a search string and check if it's followed by 10 digits:
REGEXP_SUBSTR(col, '(?<=IA_Beh_Inc_Num)([0-9]{10})')
but Oracle doesn't seem to support RegEx lookahead, so it's bit more complicated:
REGEXP_SUBSTR(value, '(IA_Beh_Inc_Num)([0-9]{10})',1,1,'i',2)
Remarks: the search is case-insensitive and if there are less than 10 digits NULL will be returned.
This would work:
SELECT
CASE WHEN instr(value, 'IA_Beh_Inc_Num') > 0
THEN substr(substr(value, instr(value, 'IA_Beh_Inc_Num'), 25),15,10)
ELSE 'not found'
END AS result
FROM example
See this SQL Fiddle.
Angelo's answer is correct for Oracle, as the question asked.
For those from SQL Server coming across this, the below would work:
SELECT CASE
WHEN CHARINDEX('IA_Beh_Inc_Num', StringColumn) = 0
THEN NULL
ELSE SUBSTRING(StringColumn, CHARINDEX('IA_Beh_Inc_Num', StringColumn) + LEN('IA_Beh_Inc_Num'), 10)
END AS unix_time
,*
FROM MyTable
EDIT:
Modified query to select all rows. The query prints "NOT A TIMESTAMP" if IA_Beh_Inc_Num does not exist within the string or if it is not followed by 10 numbers.
SELECT
DECODE
(
REGEXP_INSTR (value, 'IA_Beh_Inc_Num[0-9]{10}'),
0,
'NOT A TIMESTAMP',
SUBSTR(value, INSTR(value, 'IA_Beh_Inc_Num')+14, 10)
) timestamp
FROM example;
SQL Fiddle