Get specific value from string of text in Hive

Get specific value from string of text in Hive - hive

I need to get the value 5 from the string of text below in a Hive Table. For some reason the floor and split functions I used can get values "0" and "3" but i don't know what to do to get the first value in front of the first "/":
Column name: logsummary
**Record:5/0/3/0/4/4/143504**
Select
floor(split(logsummary, '[/]')[1]) as draws,
floor(split(logsummary, '[/]')[2]) as losses
from table A

Use 0 value instead of 1 then you will get 5 value
hive> Select floor(split('5/0/3/0/4/4/143504', '[/]')[0]) as draws;
+--------+--+
| draws |
+--------+--+
| 5 |
+--------+--+
Below statement demonstrates results of floor(0,1,2) from your Record
hive> Select floor(split('5/0/3/0/4/4/143504', '[/]')[0]) as draws,floor(split('5/0/3/0/4/4/143504', '[/]')[1]),floor(split('5/0/3/0/4/4/143504', '[/]')[2]) losses;
+--------+------+---------+--+
| draws | _c1 | losses |
+--------+------+---------+--+
| 5 | 0 | 3 |
+--------+------+---------+--+

The Hive split() function takes two parameters (string, regex pattern), and splits the string as per the regex. The splits will be returned in an array.
Each split can be accessed via an array index. You need 5, which is available at index 0.
Hence, the query should be:
Select
floor(split('5/0/3/0/4/4/143504', '[/]')[0]) as draws,
floor(split('5/0/3/0/4/4/143504', '[/]')[1]) as losses;
--Output:
draws losses
5 0
Just to expand on this example, these are all the splits:
Select
floor(split('5/0/3/0/4/4/143504', '[/]')[0]) as e0,
floor(split('5/0/3/0/4/4/143504', '[/]')[1]) as e1,
floor(split('5/0/3/0/4/4/143504', '[/]')[2]) as e2,
floor(split('5/0/3/0/4/4/143504', '[/]')[3]) as e3,
floor(split('5/0/3/0/4/4/143504', '[/]')[4]) as e4,
floor(split('5/0/3/0/4/4/143504', '[/]')[5]) as e5,
floor(split('5/0/3/0/4/4/143504', '[/]')[6]) as e6;
--Output
e0 e1 e2 e3 e4 e5 e6
5 0 3 0 4 4 143504

Related

Assigning Score based on Order Sequence in pandas

Following are the dataframes I have
score_df
col1_id col2_id score
1 2 10
5 6 20
records_df
date col_id
D1 6
D2 4
D3 1
D4 2
D5 5
D6 7
I would like to compute a score based on the following criteria:
When 2 occurs after 1 the score should be assigned 10 or when 1 occurs after 2, score should be assigned 10.
i.e when (1,2) gives a score 10 .. (2,1) also get the same score 10.
considering (1,2) . When 1 occurs first time we dont assign a score. We flag the row and wait for 2 to occur. When 2 occurs in the column we give the score 10.
considering (2,1). When 2 comes first. We assign value 0 and wait for 1 to occur. When 1 occurs, we give the score 10.
So, for the first time - dont assign the score and wait for the corresponding event to occur and then assign the score
So, my result dataframe should look something like this
result
date col_id score
D1 6 0 -- Eventhough 6 is there in score list, it occured for first time. So 0
D2 4 0 -- 4 is not even there in list
D3 1 0 -- 1 occurred for first time . So 0
D4 2 10 -- 1 occurred previously. 2 occurred now.. we can assign 10.
D5 5 20 -- 6 occurred previously. we can assign 20
D6 7 0 -- 7 is not in the list
I have around 100k rows in both score_df and record_df. Looping and assigning score is taking the time. Can someone help with logic without looping the entire dataframe?

From what i understand , you can try melt for unpivotting and then merge. keeping the index from the melted df , we check where the index is duplicated , and then return score from the merge else 0.
m = score_df.reset_index().melt(['index','uid','score'],
var_name='col_name',value_name='col_id')
final = records_df.merge(m.drop('col_name',1),on=['uid','col_id'],how='left')
c = final.duplicated(['index']) & final['index'].notna()
final = final.drop('index',1).assign(score=lambda x: x['score'].where(c,0))
print(final)
uid date col_id score
0 123 D1 6 0.0
1 123 D2 4 0.0
2 123 D3 1 0.0
3 123 D4 2 10.0
4 123 D5 5 20.0
5 123 D6 7 0.0

Find the largest value from column using SQL?

I am using SQL where I have column having values like
A B
X1 2 4 6 8 10
X2 2 33 44 56 78 98 675 891 11111
X3 2 4 672 234 2343 56331
X4 51 123 232 12 12333
I want a query to get the value from col B with col A which has max count of values. I.e output should be
x2 2 33 44 56 78 98 675 891 11111
Query I use:
select max(B) from table
Results in
51 123 232 12 12333

Assuming that both columns are strings, and that column B uses single space for separators and no leading/trailing spaces, you can use this approach:
SELECT A, B
FROM MyTable
ORDER BY DESC LENGTH(B)-LENGTH(REPLACE(B, ' ', ''))
FETCH FIRST 1 ROW ONLY
The heart of this solution is LENGTH(B)-LENGTH(REPLACE(B, ' ', '')) expression, which counts the number of spaces in the string B.
Note: FETCH FIRST N ROWS ONLY is Oracle-12c syntax. For earlier versions use ROWNUM approach described in this answer.

In case there is more than one separating space or more then one row meets criteria do this: count number of spaces (or groups of spaces) in each row using regexp_count(). Use rank to find most (groups of) spaces. Take only rows ranked as 1:
demo
select *
from (select t.*, rank() over (order by regexp_count(b, ' +') desc) rnk from t)
where rnk = 1

How do I compare rows of a table against all other rows of the table?

I would like to create a script that takes the rows of a table which have a specific mathematical difference in their ASCII sum and to add the rows to a separate table, or even to flag a different field when they have that difference.
For instance, I am looking to find when the ASCII sum of word A and the ASCII sum of word B, both stored in rows of a table, have a difference of 63 or 31.
I could probably use a loop to select these rows, but SQL is not my greatest virtue.
ItemID | asciiSum |ProperDiff
-------|----------|----------
1 | 100 |
2 | 37 |
3 | 69 |
4 | 23 |
5 | 6 |
6 | 38 |
After running the code, the field ProperDiff will be updated to contain 'yes' for ItemID 1,2,3,5,6, since the AsciiSum for 1 and 2 (100-37) = 63 etc.

This will not be fast, but I think it does what you want:
update t
set ProperDiff = 'yes'
where exists (select 1
from t t2
where abs(t2.AsciiSum - t.AsciiSum) in (63, 31)
);
It should work okay on small tables.

Given a column of numbers, return the row number in which the maximum value is present using LibreOffice SpreadSheet

Let's say I have a column A:
A
1 | 10
2 | 20
3 | 33
4 | 42
On line 5 I can calculate the maximum of the row: MAX(A1:A4), which returns 42. In row 6 I would like to get the row number for the maximum, i.e. row number 4.
Thanks

A5 contains =MAX(A1:A4)
A6 then should have formula =MATCH(A5;A1:A4;0)
A6 returns the n'th row of the search matrix where your value can be found.
Does this help to solve your problem?

How to get the last non empty value of a hierarchy?

I've got a hierarchy with the appropriate value linked to each level, let's say :
A 100
A1 NULL
A2 NULL
B
B1 NULL
B2 1000
B21 500
B22 500
B3 NULL
This hierarchy is materialized in my database as a parent-child hierarchy
Hierarchy Table
------------------------
Id Code Parent_Id
1 A NULL
2 A1 1
3 A2 3
4 B NULL
5 B1 4
6 B2 4
7 B21 6
8 B22 6
9 B3 4
And here is my fact table :
Fact Table
------------------------
Hierarchy_Id Value
1 100
6 1000
7 500
8 500
My question is : do you know/have any idea of how to get only the last non empty value of my hiearchy?
I know that there an MDX function which could do this job but I'd like to do this in an another way.
To be clear, the desired output would be :
Fact Table
------------------------
Hierarchy_Id Value
1 100
7 500
8 500
(If necessary, the work of flatten the hierarchy is already done...)
Thank you in advance!

If the codes for your hierarchy are correct, then you can use the information in the codes to determine the depth of the hierarchy. I think you want to filter out any "code" where there is a longer code that starts with it.
In that case:
select f.*
from fact f join
hierarchy h
on f.hierarchyId = h.hierarchyId
where not exists (select 1
from fact f2 join
hierarchy h2
on f2.hierarchyId = h2.hierarchyId
where h2.code like concat(h.code, '%') and
h2.code <> h.code
)
Here I've used the function concat() to create the pattern. In some databases, you might use + or || instead.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Get specific value from string of text in Hive - hive

Related

Assigning Score based on Order Sequence in pandas

Find the largest value from column using SQL?

How do I compare rows of a table against all other rows of the table?

Given a column of numbers, return the row number in which the maximum value is present using LibreOffice SpreadSheet

How to get the last non empty value of a hierarchy?

Categories

Resources