SPLITTING COLUMN BY DELIMITER INTO UNIQUE ROWS IN HIVE

SPLITTING COLUMN BY DELIMITER INTO UNIQUE ROWS IN HIVE - hive

I have a dataset. Please see below a sample row:
94654 6802D326-9F9B-4FC8-B2DD-F878EADE31F2 1460695483:440507;1460777656:440515;1460778054:440488;1460778157:440481,440600;
Each column is separated by a space(in total 3 columns). The columns names are id (int), unid (string), time_stamp (string).
I would like to split the dataset such that the each unique element such into the below rows:-
94654 6802D326-9F9B-4FC8-B2DD-F878EADE31F2 1460695483:440507
94654 6802D326-9F9B-4FC8-B2DD-F878EADE31F2 1460777656:440515
94654 6802D326-9F9B-4FC8-B2DD-F878EADE31F2 1460778054:440488
94654 6802D326-9F9B-4FC8-B2DD-F878EADE31F2 1460778157:440481
94654 6802D326-9F9B-4FC8-B2DD-F878EADE31F2 1460778157:440600
Each sub-point is each row. I have used the following query but it is giving me the output like above. I have used the following code and it is not working:-
select id, unid,time_date
from table
LATERAL VIEW explode (SPLIT (time_date,'\;')) time_date as time_date;
Output:-
94654 6802D326-9F9B-4FC8-B2DD-F878EADE31F2 1460695483:440507;1460777656:440515;1460778054:440488;1460778157:440481,440600; (THE FOLLOWING ROW IS REPEATED 5 times)
Help would be appreciated! Thanks in advance :)

Firstly, I had to replace the semi-colons with a pipe. So:
CREATE temporary TABLE tbl
(id int,
unid string,
time_stamp string);
INSERT INTO tbl
VALUES (
94654, '6802D326-9F9B-4FC8-B2DD-F878EADE31F2' , '1460695483:440507|1460777656:440515|1460778054:440488|1460778157:440481,440600');
SELECT
id,
unid,
time_stamp
FROM
(
SELECT
id,
unid,
split(time_stamp,'\\|') ts
FROM
tbl
) t
lateral VIEW explode(t.ts) bar AS time_stamp;
Which give us:
94654 6802D326-9F9B-4FC8-B2DD-F878EADE31F2 1460695483:440507
94654 6802D326-9F9B-4FC8-B2DD-F878EADE31F2 1460777656:440515
94654 6802D326-9F9B-4FC8-B2DD-F878EADE31F2 1460778054:440488
94654 6802D326-9F9B-4FC8-B2DD-F878EADE31F2 1460778157:440481,440600
You have to do the split and explode in separate steps. So we do the split in a derived table, and the explode/lateral view in the outer query.

Related

I've employee_roles table it stores many organization id in one row by array

the column names along with the sample values from the employee_roles table:
user_id : "1"
org_id : ["1", "2"]
I want to get org_id and org_name by one row with user_id
please help me?

You can use JSON_TABLE() function, if the currently used DB version is at least 12.1.0.2, in order to return the values row-wise such as
SELECT user_id, j.org_id
FROM employee_roles ,
JSON_TABLE(org_id, '$'
COLUMNS (NESTED PATH '$[*]'
COLUMNS (
org_id VARCHAR2 PATH '$'
)
)
) j
Demo

How to combine multiple columns into one row

I'm new to SQL and trying to solve the following problem:
I have rows with the following columns: ID, Sequence, Name
ID can be the same if there are multiple sequences
How can I add the sequences and name and have just one row for each ID with separate columns?
Example : ID 1 Seq 1 Name Blue Seq 2 Name Green Seq 3 Name Red
Hope that makes sense.

you can use the concat option
select id || sequence as id_seq, name from table
the as clause isn't required but can help read-ability

How to combine multiple rows as one in Bigquery?

I have a BigQuery table which has data as shown below image
I wish to create a table out of this data which is as shown below image
So here I wish to
remove the email column data
combine the emp_type column values as comma separated value
have just 1 row per id
I tried using STRING_AGG function of BigQuery but was unable to achieve what I specified above.
The table actually has more than 30 columns but for the sake of explaining the issue i reduced it to 7 columns.
How do I combine multiple rows as one in a query?

Consider below approach
select
any_value((select as struct * except(email, emp_type) from unnest([t]))).*,
string_agg(emp_type, ', ') emp_type
from data t
group by to_json_string((select as struct * except(email, emp_type) from unnest([t])))
if applied to sample data in your question - output is
As you can see here - it will work no matter how many columns you have 30+ or 100+ . you done even need to type them at all!

I see two possible options, if you want to have uniqe row per combination of all parameters except email and emp_type:
SELECT id, name, status, `count`, is_hybrid, STRING_AGG(emp_type, ', ')
FROM data
GROUP BY id, name, status, `count`, is_hybrid
If you want to have just one row per id, you can group by id and select arbitrary value(from rows with this id) for other columns:
SELECT id, ANY_VALUE(name), ANY_VALUE(status), ANY_VALUE(`count`), ANY_VALUE(is_hybrid), STRING_AGG(emp_type, ', ')
FROM data
GROUP BY id

How to parse integer values from regex and sum in BigQuery

I have a column that contains complex string and I am trying to extract out values from this string column. Here is the temp table and values -
with temp as (
select 1 as event_id, ';t-Tew00;1;1.00;252=100.00,;SM-R190;1;1.00;252=200.00,;SM-G998B/DS;1;6347.00;252=300.00,;EF-PG99P;1;249.00;252=400.00' as event_list union all
select 2 as event_id, ';asdI-Tww5300;1;1.00;252=99.00,,;EP-TA845;.252=49.00' as event_list union all
select 3 as event_id, ';asdI-Tww5300;1;1.00;252=10.00,,;EP-TA845;,.252=20.00,:etw:1002:2020,'
)
select *
from temp
I want to extract out all the double/int values after the appearance of 252= in the event_list column. For instance, in the first record, I would like to extract the values 100.00,200.00,300.00 and 400.00
I would like to add a separate column in the output that will add all such values together. So the output column for first record would be 1000.00. Likewise, 99+49 for 2nd record and 10+20 for 3rd record.
If no such appearance of 252= appears then output must be 0.
How can I achieve this in BigQuery

Try below
select event_id,
(
select ifnull(sum(cast(value as float64)), 0)
from unnest(regexp_extract_all(event_list, r'252=(\d*.?\d*)')) value
) as total_252
from temp
if aplied to sample data in your question - output is

SQL MS-Access Select Distinct for multiple columns

sorry for asking on this topic again, but I havent been able to derive a solution to my problem from existing answers.
I have one Table ("Data") from which I need to pull three columns ( "PID", "Manager", "Customer" )
and only the "PID" has to be distinct. I dont care which records are pulled for the other columns ("Manger" / "Customer" ) it could be the first entry or whatever.
SELECT Distinct PID, Manager, Customer
FROM Data;
Will give me all the rows where PID,Manager and Customer are distinct, so if there is two entrys with the same PID but with a different Manager, I will get two records instead of one.
Thank you very much.

You can do this
Hope you will find this helpful
SELECT PID, max(Manager), max(Customer)
FROM Data
group by PID
Or
SELECT PID, min(Manager), min(Customer)
FROM Data
group by PID
EDIT
I will give you an example to explain you the Max & Min Func
Here is the Sample Table
CREATE TABLE data(
PID int ,
Manager varchar(20) ,
Customer varchar(20)
) ;
insert into data
values
(1,'a','b'),
(1,'c','d'),
(3,'1','e'),
(3,'5','e'),
(3,'3','e')
Now,
These are the Three Queries that will return respective outputs,,
select * from data;
SELECT PID, max(Manager), max(Customer)
FROM Data
group by PID;
SELECT PID, min(Manager), min(Customer)
FROM Data
group by PID
Output for the above queries is
Explanation :
MAX :
MAX is returning C & 5 for Manager Coz, C is greater then A & likewise 5 is greater then 1 & 3
Min fuction is totally opposite of MAX function & is self explenatory.
I have also created on demo Please click to see the demo on Fiddle
Click Here To See The Demo

SELECT "PID", max("Manager"), max("Customer")
FROM "Data"
GROUP BY "PID";
This query returns unique "PID"s and max values of "Manager" and "Customer" for each "PID".
DISTINCT is applied for all the columns from the select list. So you need to use GROUP BY + an aggregate function (returns one value for several rows).

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SPLITTING COLUMN BY DELIMITER INTO UNIQUE ROWS IN HIVE - hive

Related

I've employee_roles table it stores many organization id in one row by array

How to combine multiple columns into one row

How to combine multiple rows as one in Bigquery?

How to parse integer values from regex and sum in BigQuery

SQL MS-Access Select Distinct for multiple columns

Categories

Resources