Hive: Fetch a row and split the values

Hive: Fetch a row and split the values - hive

I am running the following hive query to fetch a row
select * from hive_table where row_id='x'
It returns an output like
10 15 hello world (1 row with four column values).
I am trying to split these values in java so that I could add get the individual column values in an array. Tried splitting them using the ^A delimiter char (the default delimiter when creating a hive table).
hive_result.split("\u0001")
But it still returns the same result (no splits and returns an array of length 1). Want to know how to split the column vals of a single row fetched from a hive query.
Note: I am running a command-line utility to run this hive query, using jdbc I could use resultSet.next() to get each column separately.

Looks like I need to split the row results by tab instead to control-A. This works fine
hive_result.split("\t")

Related

Insert data into an impala table with struct type

I have a table with detailed rows (for 1 id many rows). For this reason I have created a table with struct types in order to reduce the rows and make them 1 id 1 row.
How can I insert data into an impala table with struct types? Also, how can I aggregate after values from a struct type?

Instead of using struct to store multiple values, you can use group_concat(col, separator)
For example, if a customer has 3 account numbers and you want to store them in 1 row separated by comma, you can use below code -
select cust_id, name, group_concat(cust_acc,',') as concat_account
from cust_details
group by 1,2
You can use pipe if your data has comma in it.
Another benifit of above solution is, you can use split_part(concat_account,',',1) to get first account number.
Now, if your data is very complex and you cant use group concat, you can use struct. Its little tricky because you have to prepare data first in the struct format and then load them. Pls refer to below link - How to insert Array<Struct> values in Impala?

How to spread the values from a column in Hive?

One field of table is made up of many values seperated by comma,
for example, a record of this field is:
598423,4803510,599121,98181856,1666529,106317962,4061964,7828860,598752,728067,599809,8799578,1666528,3253720,601990,601235
I want to spread the values in every record of this field in Hive.
Which function or method I can use to realize this?
Thanks.

I'm not entirely sure what you mean by "spread".
If you want an output table that has a value in every row like:
598423
4803510
599121
Then you could use explode(split(data,',')
Otherwise, if each input row has exactly 16 numbers and you want each of the numbers to reside in a different column, you have two options:
Define the comma as a delimiter for the input table ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
Split a single column into 16 columns using the split UDF: SELECT split(data,',')[0] as col1, split(data,',')[1] as col2, ...

SQL - just view the description for explanation

I would like to ask if it is possible to do this:
For example the search string is '009' -> (consider the digits as string)
is it possible to have a query that will return any occurrences of this on the database not considering the order.
for this example it will return
'009'
'090'
'900'
given these exists on the database. thanks!!!!

Use the Like operator.
For Example :-
SELECT Marks FROM Report WHERE Marks LIKE '%009%' OR '%090%' OR '%900%'

Split the string into individual characters, select all rows containing the first character and put them in a temporary table, then select all rows from the temporary table that contain the second character and put these in a temporary table, then select all rows from that temporary table that contain the third character.
Of course, there are probably many ways to optimize this, but I see no reason why it would not be possible to make a query like that work.

It can not be achieved in a straight forward way as there is no sort() function for a particular value like there is lower(), upper() functions.
But there is some workarounds like -
Suppose you are running query for COL A, maintain another column SORTED_A where from application level you keep the sorted value of COL A
Then when you execute query - sort the searchToken and run select query with matching sorted searchToken with the SORTED_A column

Store SQL query result (1 column) as Array

After running my query I get 1 column result as
5
6
98
101
Is there a way to store this result as array so that I can use it later
in queries like
WHERE NOT IN ('5','6','98','101')
I am aware of storing single variable results but is this possible?
I can not use #Table variable as I will be rerunning the query again in the future and it goes out of scope

There are multiple way of storing those column data like using Temporary Tables or View or Table valued function but IMO there is no need of storing that column data anywhere. You can directly use that column in any query saying below (or) perform a JOIN which would be much better option than NOT IN
select * from
table2
where some_column not in (select column1 from this_table);

While this method is not recommended, storing an array in a single column can be done using CSV's(Comma Separated Values). Simply create a VARCHAR array and store it by storing a string containing the values in a specific order. Basically store all of your values into a string with each value being separated by a comma in that string. Store that into a column of your choice. You can later fetch the string and parse it with a string parser i.e using the .split() function in python. AGAIN I do not recommend doing this, I would instead use multiple columns, one referring to each value and access them that way instead
Using separate columns would make it easy to use in a Stored Procedure.

How to retrieve column values in the following format using SQL query?

I have a database with the following kind of values in two columns:
OPERATIONCONTEXT MANAGEDOBJECT
.oc.IN_HSI_service NNMi_NODE .nodei_v1_tns.OMi-DP NODEPrepaid_HSI_Service_MUMBAI
I have the requirement to write a SQL query to retieve these columns values separeted by a comma (,) in such a way that the OPERATIONCONTEXT column value is retrieved as it is but the MANAGEDOBJECT value is retrieved in a way that i get just the first two words separeted by a space.
Ex: I need to write a SQL query to retrieve the following result from the above sample DB data:
.oc.IN_HSI_service,NNMi_NODE .nodei_v1_tns.OMi-DP
I am able to get the two full column values separeted by a comma (,) with the following query:
SELECT distinct OPERATIONCONTEXT ||','|| MANAGEDOBJECT from $ALB_BASE_TABLE where OPERATIONCONTEXT is not NULL;
But, ofcourse along with the result i would like to put a restriction to check for not NULL and distinct values instead of repeated results and want to put the result in a CSV file through shell script. Any idea how to write the query?
PS: This is Oracle Database.

you can use
substr(MANGEDOBJECT, 0, INSTR(MANAGEDOBJECT, ' ', 1, 2))
Of course, it would be good to check if they are two spaces in string.
see SqlFiddle, and OracleDoc for Instr

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Hive: Fetch a row and split the values - hive

Looks like I need to split the row results by tab instead to control-A. This works fine hive_result.split("\t")

Related

Insert data into an impala table with struct type

How to spread the values from a column in Hive?

SQL - just view the description for explanation

Store SQL query result (1 column) as Array

How to retrieve column values in the following format using SQL query?

Categories

Resources