How to reverse words in a string in HIVE? - hive

Is it possible to reverse the order of words in a string, which are separated by a single space, in Hive SQL?
I saw there are function like reverse but it only work for string (ofc, we can use it to reverse words by doing 1) reverse whole string 2) reverse each word, it's easy to written in java or other language, but I am not sure how to do it in SQL).
Any ideas?
For example,
// assuming I have one column field in a table like
WITH example_data AS (
SELECT "I am learning Hive SQL" as my_string
)
I want to return "SQL Hive learning am I".

Related

Does array like function exist in SQL

Using SQL I would like to know if its possible to do the following:
If I have a variable that the user inputs mutiple strings into seperated by a comma for example ('aa','bbb','c','dfd'), is it possible using LIKE with a wilcard at the end of each string in stead of having the user to enter each variations in multiple macros.
So say if user was looking for employee numbers that start with ('F','E','C') is it possible without using OR statements is the question I guess am asking?
It would be similar to that of an array I guess
No, LIKE is its own operator and therefore needs separated by an OR.
You might prefer ILIKE to LIKE, as it is a case-insensitive comparison.
You can also try to use REGEXP_LIKE, which is similar to what you want, except you'll have to use regex expressions instead of 'FEC%'
That depends on your SQL dialect; I don't know Impala at all, but other SQL engines have support for regular expressions in string matches, so that you can build a query string like
SELECT fld FROM tbl WHERE fld REGEXP '^[FEC].*$';
No matter what you do, you will need to build a query from your user's input. Passing through user input unprocessed into your SQL processor is a big "nope" anyways, from a "don't accidentally delete a table" point of view:

Extracting file extensions from file excluding query parameters SQL

Is there a way to obtain only the file extension excluding query parameters using split_part and reverse from an SQL query?
ie.
www.example.com?hffhqowhf
or
test.jpg?34rfqeyfhhf
Returns:
com
jpg
Not tied down to com or jpg but in general?
Many Thanks
There are a number of ways of achieving this (for example using a combination of INSTR and SUBSTR functions) but the cleanest way is probably to use a regular expression, something like this:
(
assumption: the string always has a query parameter starting with a '?' and that is the only occurrence of this character in the string
caveat: I don't currently have access to Impala so you may need to
adjust the regex expression to get it to work precisely as you
require
)
Reverse the string (REVERSE function) - so that the substring you want is between the '?' and the next '.'. If you don't reverse the string it is harder to identify which '.' in the string you are dealing with
Extract the substring between '?' and '.' but excluding these 2 characters e.g.
select regexp_extract(reverse('www.example.com?hffhqowhf'),'?([^.]+)',1);
Reverse the output again to get the required result

How to use Regular Expressions to replace part of a string in SQLite?

I currently would like some advice on how to find and replace part of a string using regular expressions in SQLite? i am using Rstudio/R as the SQLite connector.
I have the following strings:
my_strings
--------------
1244599arts
3490872testing
4478933great
2342340obvious
gremlin2342678
i would like to replace the numbers with the word "final" - now I would like to use regular expressions to achieve this as I want to be able to capture the numbers only and then replace them with the word "final" and not affect any other part of the string
the output i would like to achieve is the following:
my_strings
--------------
finalarts
finaltesting
finalgreat
finalobvious
gremlinfinal
As you can see the numbers have now been replaced by the word "final" - please note that I have around 8 million rows so I cannot just repeat a REPLACE function as there are simply too many numbers!
I have written some regex to capture those numbers and the following statement will match those numbers:
[0-9]{7}
Here is an example of how the above matches those numbers
Now I would like to use this regex statement to amend these strings - the reason is that I would like to learn how to use regex in sqlite to find and replace matching parts of a string.
Has anyone got any advice?
for reference, I can use the REGEXP function as I have already made a sqlite instance in R.
You can use the sqlean-regexp extension, which provides regular expressions search and replace functions:
-- replace 7 digits with the word 'final'
update t set my_strings = regexp_replace(my_strings, '[0-9]{7}', 'final');

Regex not working in LIKE condition

I'm currently using Oracle SQL developer and am trying to write a query that will allow me to search for all fields that resemble a certain value but always differ from it.
SELECT last_name FROM employees WHERE last_name LIKE 'Do[^e]%';
So the result that I'm after would be: Give me all last names that start with 'Do' but are not 'Doe'.
I got the square brackets method from a general SQL basics book so I assume any SQL database should be able to run it.
This is my first post and I'd be happy to clarify if my question wasn't clear enough.
In Oracle's LIKE no regular expressions can be used. But you can use REGEXP_LIKE.
SELECT * FROM EMPLOYEES WHERE REGEXP_LIKE (Name, '^Do[^e]');
The ^ at the beginning of the pattern anchors it to the beginning of the compared string. In other words the string must start with the pattern to match. And there is no wildcard needed at the end, as there is no anchor for the end of the string (which would be $). And you seem to already know the meaning of [^e].

How to select values around .(dot) using sql

I am running below query in Teradata :
sel requesttext from dbc.tables
where tablename='old_employee_table'
Result:
alter table DB_NAME.employee_table,no fallback ;
I want to get below result using SQL:
DB_NAME.employee_table
Requesttext can be:
create set table DB_NAME.employee_table;
DB Name and table can occur anywhere in the result. Since .(dot) is joining them that's why i want to split with .(dot).
Basically I need sql which can result me surrounding values of .(dot)
I want DBName and Tablename in result.
I'm not a Teradata person, but this should work for both strings given so far, as long as teradata's regexp_substr() supports positive look-behind and positive look-ahead assertions (I might have the Teradata syntax wrong, so a little tweaking may be needed):
SELECT REGEXP_SUBSTR(requesttext, '(?<= )(\w+\.\w+)(?=[,$]?)', 1, 1)
FROM dbc.tables
WHERE tablename='old_employee_table'
See the regex101 example. Hopefully it translates to Teradata easily.
The regex looks for and returns the words either side of and including the period, when preceded by a space, and followed by an optional comma or the end of the line.
You could do this with either regexp_substr() or strtok().
As Jamie Zawinski said:
Some people, when confronted with a problem, think "I know, I'll use
regular expressions." Now they have two problems.
So I would go with the strtok() method. Also I'm lazy and regular expressions are hard.
Function strtok() takes three arguments:
The string being split
The delimiter to split the string
The number of the token to grab.
To get at the <database>.<table> from that string that is returned in your query, we can split by a space, grab the third token, then split that by a comma and grab the first token.
That would look like:
SELECT strtok(strtok(requestText,' ',3),',',1)
FROM dbc.tables
WHERE tablename='old_employee_table'