Hive query to get expected output - hive

I am new to hive. I got a requirement to build a query. I have table with 3 columns(ENO, ENAME, LOCATION)
ENO, ENAME, LOCATION
001, XYZ, HYD
002, ABC, MU
I need output like below.
001, XYZ, H
001, XYZ, Y
001, XYZ, D
002, ABC, M
002, ABC, U
This is output am looking for any one have idea to get this.
Thanks,
Ranjith

Split the string in the 3rd column and use explode to convert the column array values into rows.
select * from
(
select
ENO,
ENAME,
lateral view explode(split(LOCATION,'')) AS LOCATION
from
table_name
)
where LOCATION != '';

Try using explode command along with split to achieve the above result.
Query:
SELECT ENO,ENAME,SPLIT_LOC FROM <TABLE NAME> LATERAL VIEW explode(split(LOCATION,'')) EXPLOCATION AS SPLIT_LOC
splitting the LOCATION with '' will split your data per character.
Hope this helps :)

Related

Get all records where each words in string exists on any of the columns in a table

I am building a search functionality and need help with a postgres query. My use case is - When a string is an input, what is the best (optimized) way in postgres to get all records where each words in string exists on any of the columns in a table ?
Sample Table: (The table I am working with has 40 columns)
FName
Occupation
John
Engineer
Carlos
Doctor
Case 1: Given a string 'John Doctor', In this case it would return both the records.
Output:
FName
Occupation
John
Engineer
Carlos
Doctor
Case 2: Given a string 'John Engineer', it would only return 1 row
Output:
FName
Occupation
John
Engineer
Case 3: Given a string 'Carlos', it would return 1 row
Output:
FName
Occupation
Carlos
Doctor
Basically, you want to do following:
SELECT FName, Occupation
FROM yourtable
WHERE
'John' IN (FName, Occupation) OR
'Doctor' IN (FName, Occupation);
I don't know if this is already a sufficient answer for you because it's unclear if the logic to fetch the different names from your "search string" must be written as SQL query, too. I think that's a much better task for your application.
If this must also be done in pure SQL, you could use UNNEST to split your string.
Something like this:
WITH sub AS
(SELECT UNNEST(STRING_TO_ARRAY('John Doctor', ' ')) AS searchNames)
SELECT
DISTINCT y.FName, y.Occupation
FROM yourtable y, sub
WHERE
sub.searchNames IN (y.FName, y.Occupation);
This will split your string by spaces into the different names, i.e. you need to provide a search string in the form you have mentioned, with a space between the names.
This will produce the correct results according to your description.
We can verify this here: db<>fiddle1
This can be extended for as many column as needed. Let's for example add a further column col and search Test3 in this column, then the query will be like this:
SELECT FName, Occupation,col
FROM yourtable
WHERE 'John' IN (FName, Occupation, col)
OR 'Doctor' IN (FName, Occupation, col)
OR 'Test3' IN (FName, Occupation, col);
Or again with UNNEST like this:
WITH sub AS
(SELECT UNNEST(STRING_TO_ARRAY('John Doctor Test3', ' ')) AS searchNames)
SELECT
DISTINCT y.FName, y.Occupation, y.col
FROM yourtable y, sub
WHERE
sub.searchNames IN (y.FName, y.Occupation, y.col);
Try this here: db<>fiddle2
Use regexp match operator (case insensitive) and any to find the records that contain at least one of the words in the list.
select *
from the_table t
where t::text ~* any(string_to_array(the_words_list, ' '));
DB Fiddle demo

I am having Issues counting values in a row with separators using SQL

I am new to snowflake and trying the count the number of values in a row with separators using SQL. I am not sure how to go about it. I've googled solutions for this but have not been able to find one.
table name: Lee_tab
user
names
id01
Jon;karl;lee;
id02
Abi;jackson;
id03
don;
id04
what I want to achieve
user
names
name_count
id01
Jon;karl;lee;
3
id02
Abi;jackson;
2
id03
don;
1
id04
0
Here is three solutions using REGEXP_COUNT, SPLIT, ARRAY_SIZE, STRTOK_TO_ARRAY (I would use the REGEXP_COUNT one):
SELECT
column1,
column2,
regexp_count(column2, ';')+1 as solution_1,
ARRAY_SIZE(split(column2, ';')) as solution_2,
ARRAY_SIZE(strtok_to_array(column2, ';')) as solution_3
FROM VALUES
('id01','Jon;karl;lee'),
('id02','Abi;jackson'),
('id03','don');
which gives
COLUMN1
COLUMN2
SOLUTION_1
SOLUTION_2
SOLUTION_3
id01
Jon;karl;lee
3
3
3
id02
Abi;jackson
2
2
2
id03
don
1
1
1
It depends on which DataBase you're using, because there are some different
things in syntax. I made your example with using SQLite Browser and I have a result like this one:
SELECT SUM(length(names) - length(replace(names, ';', '')) +1)
AS TotalCount
FROM Lee_tab where id = USER ID
As I know, in Postgres there's no length, it's just len there, so, pay an attention.
My query-it's just a formula to how count values, separated by ;
To get your result, you should learn how to join.
Here is a different answer, using the Snowflake SPLIT_TO_TABLE function. This function splits the string on the delimiter, creating a row for each value, which we lateral join back to the CTE table, finally we COUNT and GROUP BY using standard SQL syntax:
with cte as (
select 'id01' as user, 'Jon;karl;lee' as names union all
select 'id02' as user, 'Abi;jackson' as names union all
select 'id03' as user, 'don' as names
)
select user, names, count(value) as count_names
from cte, lateral split_to_table(cte.names, ';')
group by user, names;
Rewriting json_stattham's answer using Snowflake syntax. Basically, we are just counting the number of separators (semicolons) in the string and adding 1. There is no need to use the SUM() function as in json_stattham's answer.
with cte as (
select 'id01' as user, 'Jon;karl;lee' as names union all
select 'id02' as user, 'Abi;jackson' as names union all
select 'id03' as user, 'don' as names
)
SELECT user, names, (length(names) - length(replace(names, ';'))) + 1 AS name_count
FROM cte;
This is the answer for your query
select user,names,(len(names) - len(replace(names, ';',''))+1) names_count from Lee_tab;
for more understanding check this ,i have done all
https://www.db-fiddle.com/f/BQuEjw2pthMDb1z8NTdHv/0

How to store before and after decimal value in 2 different column

Name Gender Amount
Ram male 20.56
Bhavna female 78.2
darshan male 12.02
Avni female 50.366
I want to divide the Amount Column in 2 parts where one Column includes the before decimal value (i.e 20.56=20) And Second column includes after decimal value (i.e 20.56=56)...
-- check this query
select amount, decode (pos,0,amount,substr(amount,1,pos-1)) as before_decimal ,
decode(pos,0,0,substr(amount,pos+1,length(amount))) as after_decimal
from (
select instr((substr(amount,1,length(amount))),'.') as pos,amount
from table_name
)
you can get numbers using FORMAT:
FORMAT(your_number,xxxxx) --you can choose xxxxx whatever you want
usage: FORMAT (N, D)
You can look how to use it : https://www.w3resource.com/mysql/string-functions/mysql-format-function.php
You can use this query to get your expected output like,
Amount is : 20.56
To get '20' as a output we can use this query
SELECT FLOOR(20.56) FROM TABLE_NAME
& To get exact '56' as a output we can use this query
SELECT FLOOR((20.56 - FLOOR(20.56))*100) FROM TABLE_NAME
If you want them in separate columns, you can use arithmetic functions:
select t.*, floor(val) as col_left, floor(val * 100) % 100 as col_right
from (select 20.56 as val) t

Oracle: Fuzzy lookup

I'm loading a table looking up an employee table. However sometimes the names from Source files and Employee table does not match correctly.
**Employee table:**
Employee Name
Paul Jaymes
**Source File**
Paul James
I want this to match. What could be the solution.
Use the UTL_MATCH package or the SOUNDEX function:
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE Employees ( Name ) AS
SELECT 'Paul Jaymes' FROM DUAL;
Query 1:
UTL_MATCH.EDIT_DISTANCE:
Calculates the number of changes required to transform string-1 into string-2
SELECT *
FROM Employees
WHERE UTL_MATCH.EDIT_DISTANCE( Name, 'Paul James' ) < 2
Query 2:
UTL_MATCH.EDIT_DISTANCE_SIMILARITY:
Calculates the number of changes required to transform string-1 into string-2, returning a value between 0 (no match) and 100 (perfect match)
SELECT *
FROM Employees
WHERE UTL_MATCH.EDIT_DISTANCE_SIMILARITY( Name, 'Paul James' ) > 90
Query 3:
UTL_MATCH.JARO_WINKLER:
Calculates the measure of agreement between string-1 and string-2
SELECT *
FROM Employees
WHERE UTL_MATCH.JARO_WINKLER( Name, 'Paul James' ) > 0.9
Query 4:
UTL_MATCH.JARO_WINKLER_SIMILARITY:
Calculates the measure of agreement between string-1 and string-2, returning a value between 0 (no match) and 100 (perfect match)
SELECT *
FROM Employees
WHERE UTL_MATCH.JARO_WINKLER_SIMILARITY( Name, 'Paul James' ) > 95
Query 5:
SOUNDEX:
returns a character string containing the phonetic representation of char. This function lets you compare words that are spelled differently, but sound alike in English.
SELECT *
FROM Employees
WHERE SOUNDEX( Name ) = SOUNDEX( 'Paul James' )
Results:
All give the output:
| NAME |
|-------------|
| Paul Jaymes |
Use UTL_MATCH.EDIT_DISTANCE_SIMILARITY function in Oracle.
I would recommend creating a temporary table as below and check if the data is as expected. Usually score above 90-93 should be same with some typo in different systems. If there's only difference in 1 character you would get a score of 92 and above.
select s.employee_name,
utl_match.edit_distance_similarity(initcap(s.employee_name),e.employee_name) as score
from source_table s cross join employee_table e
where utl_match.edit_distance_similarity(initcap(s.employee_name),e.employee_name) >=90 ;

SQL select multiple columns to return one distinct field

I am looking to return a (single) column that has the distinct values of 4 columns that I will be looking up within the same table.
I've tried
"select distinct e1l,e2l,e1s,e2s from jobmovement"
but this is just returning each distinct occurrence of the four tables, so for example if there were 4 lines of 178,178,178,178 it would just return 1 of these.
So for example, I have 4 column headers (E1L,E2L,E1S,E2S):
E1L,E2L,E1S,E2S
178,178,178,178
, ,216,216
,178, ,
217,217,178,216
I would this to return in the single column the distinct values
178
216
217
Any help would be appreciated.
Thanks, Paul.
Use apply to unpivot the data and then use select distinct:
select distinct v.e
from jobmovement jm cross apply
(values (jm.e1l), (jm.e2l), (jm.e1s), (jm.e2s)) v(e);