Athena SPLIT_PART last element - sql

I have no normalised fields in a no-sql json extract queried from ATHENA.
I would like to get the last value of my field.
fields examples:
Raleigh, NC, USA
Frankfurt, Germany
In the idea i would like something like this to select last element:
SPLIT_PART(city, ',' , last_element ) AS country
I don't know if i use the right function to perform this.
Bonus: how to select a field with like value.from without raise sql error ? :)

You can use regexp_extract():
select regexp_extract(city, '[^ ,]+$')
This returns the last set of strings in city that do not contain spaces or commas -- the last element you are looking for.

If instead of split_part(), you use split() and element_at() functions of AWS Athena combined together, you would get the desired result.
ELEMENT_AT(SPLIT(city, ','),-1) AS country
For further reference on element_at() function follow this doc.
For further reference on split() function follow this doc.

Related

How can I automatically extract content from a field in a SQL query?

The environment I am currently working in is Snowflake.
As a matter of data sensitivty, I will be using pseudonyms for my following question.
I have a specific field in one of my tables called FIELD_1. The data in this field is structured as such:
I am trying to figure out how to automatically extract from my FIELD_1 the output I have in FIELD_2.
Does anyone have any idea what kind of query I would need to achieve this? Any help would be GREATLYappreciated! I am really quite stuck on this problem.
Thank you!
You seem to want everything up to the first four numbers. Then to replace the underscores with spaces. If so:
select replace(regexp_substr(field_1, '^[^0-9]*[0-9]{4}'), '_', ' ')
Or alternatively, if you want the first three components separated by underscores:
select replace(regexp_substr(field_1, '^[^_]+_[^_]+_[0-9]{4}'), '_', ' ')
If the data is as simplistic in reality as you've described here, you can use a variable-length LEFT() function in conjunction with REPLACE() to get the desired output:
SELECT FIELD_1, REPLACE(LEFT(FIELD_1, LEN(FIELD_1)-10),'_',' ') AS FIELD_2
FROM table_name
See also:
SELECT - Snowflake Documentation
LEFT - Snowflake Documentation
REPLACE - Snowflake Documentation
LENGTH, LEN - Snowflake Documentation

Extract characters in string following keyword and ending right before the other keyword

I have a table that looks like:
id
re|cid|13324242|
wa|cid|13435464|
fs|cid|2343532|
I want to extract information that is contained right after "|cid|" and before the following "|" element. That is:
13324242
13435464
2343532
I thought of substr() but there I don't know how to specify start and end element.
You could use REGEXP_REPLACE here (Standard SQL):
SELECT
id,
CASE WHEN id LIKE '%|cid|%'
THEN REGEXP_REPLACE(id, '^.*\|cid\|(\d+)\|.*$', '\1') END AS cid
FROM yourTable;
The idea is to use a regex replacement to extract the cid value from the id column, should it be present (and if not, we would just return NULL).
Here is a demo showing that the regex logic be correct.
If you want the third element (which appears to be the intention given the sample data), I would recommend split():
select (split(id, '|')[ordinal(3)]

Select query that displays Joined words separately, not using a function

I require a select query that adds a space to the data based on the placement of the capital letters i.e. 'HelpMe' using this query would be displayed as 'Help Me' . Note i cannot use a stored function to do this the it must be done in the query itself. The Data is of variable length and query must be in SQL. Any Help will be appreciated.
Thanks
You need to use user defined function for this until MS give us support for regular expressions. Solution would be something like:
SELECT col1, dbo.RegExReplace(col1, '([A-Z])',' \1') FROM Table
Aldo this would produce leading space that you can remove with TRIM.
Replace regular expresion function:
http://connect.microsoft.com/SQLServer/feedback/details/378520
About dbo.RegexReplace you can read at:
TSQL Replace all non a-z/A-Z characters with an empty string
Assume if you are using Oracle RDBMS, you use the following,
REGEX_REPLACE
SELECT REGEXP_REPLACE('ILikeToWatchCSIMiami',
'([A-Z.])', ' \1')
AS RX_REPLACE
FROM dual
;
Managed to get this output: * SQLFIDDLE
But as you see it doesn't treat well on words such as CSI though.

How to extract group from regular expression in Oracle?

I got this query and want to extract the value between the brackets.
select de_desc, regexp_substr(de_desc, '\[(.+)\]', 1)
from DATABASE
where col_name like '[%]';
It however gives me the value with the brackets such as "[TEST]". I just want "TEST". How do I modify the query to get it?
The third parameter of the REGEXP_SUBSTR function indicates the position in the target string (de_desc in your example) where you want to start searching. Assuming a match is found in the given portion of the string, it doesn't affect what is returned.
In Oracle 11g, there is a sixth parameter to the function, that I think is what you are trying to use, which indicates the capture group that you want returned. An example of proper use would be:
SELECT regexp_substr('abc[def]ghi', '\[(.+)\]', 1,1,NULL,1) from dual;
Where the last parameter 1 indicate the number of the capture group you want returned. Here is a link to the documentation that describes the parameter.
10g does not appear to have this option, but in your case you can achieve the same result with:
select substr( match, 2, length(match)-2 ) from (
SELECT regexp_substr('abc[def]ghi', '\[(.+)\]') match FROM dual
);
since you know that a match will have exactly one excess character at the beginning and end. (Alternatively, you could use RTRIM and LTRIM to remove brackets from both ends of the result.)
You need to do a replace and use a regex pattern that matches the whole string.
select regexp_replace(de_desc, '.*\[(.+)\].*', '\1') from DATABASE;

Can anyone help me write a sql query

jkdfhdjfhjh&name=ijkjkjkjkjkjk&id=kdjkjkjkjkjkjjjd&class=kdfjjfjdhfjhf
The above string has some characters starting with & and ending with =
for example we have &name= and I just need this from the above string.
similarly I need &id=, &class=
I need the output under a single column.
Final Extract
----------------------
&id=, &class=, &name=
can anyone help me out in writing a query for this.
You could try this :
select regexp_replace('jkdfhdjfhjh&name=ijkjkjkjkjkjk&id=kdjkjkjkjkjkjjjd&class=kdfjjfjdhfjhf', '\\w*?(&.*?=)\\w+((?=&)|$)', '\\1, ', 'g');
result:
regexp_replace
-------------------------
&name=, &id=, &class=,
Then it's up to you to remove the last ,.
The regexp_replace function is available in version 8.1 and after.
If you want the values along with each variable, I would implement this by splitting on "&" into an array and then taking a slice of the desired elements:
SELECT (string_to_array('jkdfhdjfhjh&name=ijkjkjkjkjkjk&id=kdjkjkjkjkjkjjjd&class=kdfjjfjdhfjhf','&'))[2:4];
Output in PostgreSQL 8.4 (array type):
{name=ijkjkjkjkjkjk,id=kdjkjkjkjkjkjjjd,class=kdfjjfjdhfjhf}
The example string is very wide so here's the general form to show the array slicing more clearly:
SELECT ((string_to_array(input_field,'&'))[2:4];
NOTE: You must have the extra parentheses around the string_to_array() call in order for the array slicing to work--you'll get an error otherwise.