Get the filename from filepath column in Hive

Get the filename from filepath column in Hive - sql

I have a table containing the FILE_PATH column like below
FILE_PATH
\root\2010\2010-01\1234.zip
\root\2010\2010-02\2345.zip
\root\2010\2010-03\3456.zip
How to extract the filename from the FILE_PATH column, using SELECT query?

I used below query to get the output.
select file_path,substr(file_path,-1*(locate('\\',reverse(file_path),1)-1)) from TABLE_NAME limit 2

you can use regexp_extract to get everything after last backslash:
select regexp_extract(file_path, '\\\\([^\\\\]*)$',1) from TABLE_NAME
Four back slashes is used to represent a single backslash because it is a special character.
Using split and size:
select split(file_path, '\\\\')[size(split(file_path, '\\\\'))-1] from TABLE_NAME
Using split and reverse:
select reverse(split(reverse(file_path), '\\\\')[0]) from TABLE_NAME

Related

Select a string within a string within ATHENA

I have a table in AWS ATHENA that I need to clean up for production, but having difficulties extracting only a specfic portion of a string.
EXAMPLE:
Column A
{"display_value":"TECH_FinOps_SERVICE","link":" https://sdfs.saff-now.com/api/now/v2/table/sys_user_group/8fc10b99dbeedf12321317e15b9619b2"}
Basically I would like to just extract Tech_FinOps_Service from the string in Column_A

Your string looks like json so you can try using json functions:
-- sample data
WITH dataset(column_a) AS (
values ('{"display_value":"TECH_FinOps_SERVICE","link":" https://sdfs.saff-now.com/api/now/v2/table/sys_user_group/8fc10b99dbeedf12321317e15b9619b2"}')
)
-- query
select json_extract_scalar(column_a, '$.display_value') display_value
from dataset;
Output:
display_value
---------------------
TECH_FinOps_SERVICE

I need query to exclude a particular alphanumeric pattern in Oracle query

I have a column with value
String = 'Select Id,name,model_1,model_2,model_30 from employee'
I need the output to exclude ',model%'
i.e o/p should be
'Select Id,name from employee'
I used regexp_replace(string,[',model'+\d]), but this is returning numbers as well.

Assuming that your string is exactly in the form you showed, you have only some syntax issues; you need:
regexp_replace(yourString, ',model_\d+', '')

You need to replace ,model_[numbers]. You can use regexp_replace as follows:
regexp_replace(your_string, '(,model_[0-9]+)','')
Db<>fiddle

Single hive query to remove certain text in data

I have a column data like this in 2 formats
1)"/abc/testapp/v1?FirstName=username&Lastname=test123"
2)"/abc/testapp/v1?FirstName=username"
I want to retrieve the output as "/abc/testapp/v1?FirstName=username" and strip out the data starting with "&Lastname" and ending with "".The idea is to remove the Lastname with its value.
But if the data doesn't contain "&Lastname" then it should also work fine as per the second scenario
The value for Lastname shown in the example is "test123" but in general this will be dynamic
I have started with regexp_replace but i am able to replace "&Lastname" but not its value.
select regexp_replace("/abc/testapp/v1?FirstName=username&Lastname=test123&type=en_US","&Lastname","");
Can someone please help here how i can achieve both these with a single hive query?

Use split function:
with your_data as (--Use your table instead of this example
select stack (2,
"/abc/testapp/v1?FirstName=username&Lastname=test123",
"/abc/testapp/v1?FirstName=username"
) as str
)
select split(str,'&')[0] from your_data;
Result:
_c0
/abc/testapp/v1?FirstName=username
/abc/testapp/v1?FirstName=username
Or use '&Lastname' pattern for split:
select split(str,'&Lastname')[0] from your_data;
It will allow something else with & except starting with &Lastname

for both queries with or without last name its working in this way using split for hive no need for any table to select you can directly execute the function like select functionname
select
split("/abc/testapp/v1FirstName=username&Lastname=test123",'&')[0]
select
split("/abc/testapp/v1FirstName=username",'&')[0]
Result :
_c0
/abc/testapp/v1FirstName=username
you can make a single query :
select
split("/abc/testapp/v1FirstName=username&Lastname=test123",'&')[0],
split("/abc/testapp/v1FirstName=username",'&')[0]
_c0 _c1
/abc/testapp/v1FirstName=username /abc/testapp/v1FirstName=username

Trying to create a regular expression [ORACLE]

Good,
I need help to create a regular expression to just take the name and extension of file of the following directories.
/home/user/work/file1.dbf
/opt/user/file2.dfb
I am trying to create an expression in Oracle12C to only output "file1.dbf" and "file2.dbf".
I am currently trying to do the regular expression on the next page and reading the following documentation.
Thanks in advance and I hope I have explained correctly.

You don't need a regular expression to do this. A combination of substr and instr would be sufficient.
instr(colname,'/',-1) gets the last occurrence of / in the string. And the substring after that position would be the filename as per the data shown.
The filter instr(colname,'/') > 0 restricts the rows which don't have a / in them.
select substr(colname,instr(colname,'/',-1)+1) as filename
from tablename
where instr(colname,'/') > 0
A regular expression for the same would be
select regexp_substr(colname,'(.*/|^)(.+)$',1,1,null,2) as filename
from tablename
(.*/|^) - All the characters upto the last / occurence in the string or the start of the string if there are no / characters.
(.+)$ - All the characters after the last / if it exists in the string or the full string if / doesn't exist.
They are extracted as 2 groups and we are interested in the 2nd group. Hence the argument 2 at the end of regexp_substr.
Read about the arguments to REGEXP_SUBSTR here.

An alternative regex approach would be to strip everything up to the last /:
with demo as
( select '/home/user/work/file1.dbf' as path from dual union all
select '/opt/user/file2.dfb' from dual )
select path
, regexp_replace(path,'^..*/') as filename
from demo;

Oracle db query - data format issue

I'm wondering if there is a way to query the oracle db against formatted field value.
Example:
I have a table of postcodes stored in the format of "part1 part2". I want to be able to find a postcode either by searching it using the above format or "part1part2" format.
What I was thinking is to format the entered postcode by removing the spaces and then query the database like:
SELECT *
FROM POSTCODES_TBL t
WHERE t.postcode.**Format(remove spaces)** = 'part1part2'
The Format(remove spaces would convert the postcode from "part1 part2" to "part1part".
My question is, is it possible?

You can use regexp_replace
SELECT *
FROM POSTCODES_TBL t
WHERE regexp_replace(t.postcode,'\s', '') = 'part1part2'
This will remove any whitespace (space, tab, newlines etc)
or if you only want to get rid of spaces, replace will work just as well:
SELECT *
FROM POSTCODES_TBL t
WHERE replace(t.postcode,' ', '') = 'part1part2'
More details in the manual:
replace
regexp_replace

You could use like
SELECT *
FROM POSTCODES_TBL t
WHERE t.postcode like 'part1%part2'

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Get the filename from filepath column in Hive - sql

I have a table containing the FILE_PATH column like below FILE_PATH \root\2010\2010-01\1234.zip \root\2010\2010-02\2345.zip \root\2010\2010-03\3456.zip How to extract the filename from the FILE_PATH column, using SELECT query?

I used below query to get the output. select file_path,substr(file_path,-1*(locate('\\',reverse(file_path),1)-1)) from TABLE_NAME limit 2

Related

Select a string within a string within ATHENA

I need query to exclude a particular alphanumeric pattern in Oracle query

Single hive query to remove certain text in data

Trying to create a regular expression [ORACLE]

Oracle db query - data format issue

Categories

Resources