How to do this in Hive? - hive

I have 2 questions in hive.
1. I have data like 234336899. if the last 3 digits are i.e 899 >500 it will print 999 otherwise if 899 <500 it should print 000.
Can you please tell me how to do it in hive?
I have another scenario like input as follows.
0 1 2
3 1 2
0 1 4
3 1 4
i want to print output as follows.
0 1
3 1
1 2
1 4
How to do it in Hive?
Thanks in adv,

Your first question can be solved as below:
create table sample (col1 bigint);
insert into table sample values(234336899),(234336399);
select
col1,
case when substr(col1,-3) > 500 then '999'
when substr(col1,-3) < 500 then '000'
end as case_col1
from sample;
Here the substr hive function has been used to take the last 3 digits from col1.

Related

How to merge two rows if same values in sql server

I have the Following Output:
Sno
Value Stream
Duration
Inspection
1
Test1
3
1
2
ON
14
0
3
Start
5
0
4
Test1
5
1
5
OFF
0
1
6
Start
0
1
7
Test2
0
1
8
ON
3
1
9
START
0
1
10
Test2
2
2
I want to merge the same value after that before START values charge to after ON. For example S.no 4 will merge to s.no4.
1 | Test1 | 8 | 2 |
If the combination is not equal then don't allow it to merge. For Example, we have to consider only On/Start. If the condition is OFF/Start then don't allow to merge. E.g. S.no 5 and 6 OFF/Start then don't allow to merge s.no 4 & 7.
I think you are talking about summarization not merging:
select [Value Stream],
min(Sno) as First_Sno,
sum(Duration) as total_Duration,
sum(Inspection) as Inspection
from yourtable
group by [Value Stream]
Will give you the result

SQL update by groups?

I'd like to update approximately the first X number of rows in a table but I want to always update all rows with a matching column at the same time. So if my table has:
MyID Transaction Amount Date Status
1 1 2 02/08/2016 0
1 1 4 02/08/2016 0
2 4 1 02/08/2016 0
2 3 2 02/08/2016 0
3 10 1 02/08/2016 0
3 6 4 02/08/2016 0
I want to update Status to 1 on approximately the first 5 rows, but I don't want to split up matching MyID values, how can I do that? I could update the first 4 or 6 in this example.
Here is one method:
update t
set status = 1
where myId in (select top 5 MyId from t order by MyId);

How to do a last observation carrying forward using SAS PROC SQL

I have the data below. I want to write a sas proc sql code to get the last non-missing values for each patient(ptno).
data sda;
input ptno visit weight;
format ptno z3. ;
cards;
1 1 122
1 2 123
1 3 .
1 4 .
2 1 156
2 2 .
2 3 70
2 4 .
3 1 60
3 2 .
3 3 112
3 4 .
;
run;
proc sql noprint;
create table new as
select ptno,visit,weight,
case
when weight = . then weight
else .
end as _weight_1
from sda
group by ptno,visit
order by ptno,visit;
quit;
The sql code above does not work well.
The desire output data like this:
ptno visit weight
1 1 122
1 2 123
1 3 123
1 4 123
2 1 156
2 2 .
2 3 70
2 4 70
3 1 60
3 2 .
3 3 112
3 4 112
Since you do have effectively a row number (visit), you can do this - though it's much slower than the data step.
Here it is, broken out into a separate column for demonstration purposes - of course in your case you will want to coalesce this into one column.
Basically, you need a subquery that determines the maximum visit number less than the current one that does have a legitimate weight count, and then join that to the table to get the weight.
proc sql;
select ptno, visit, weight,
(
select weight
from sda A,
(select ptno, max(visit) as visit
from sda D
where D.ptno=S.ptno
and D.visit<S.visit
and D.weight is not null
group by ptno
) V
where A.visit=V.visit and A.ptno=V.ptno
)
from sda S
;
quit;
Although you don't describe it that way you do not carry forward VISIT 1 right?
I don't know why you would want to do this using SQL. In SAS a data step is much better suited to the task. I like using the "update trick". If you're interested in how this works I will leave it to you to study the UPDATE statement.
data locf;
update sda(obs=0 keep=ptno) sda;
by ptno;
output;
if visit eq 1 then call missing(weight);
run;

Collating data in SQL Server

I have the following data in SQL Server
St 1 2 3 4 5 6 7 8
===========================================
603 2 5 1.5 3 0 0 0 0
603 0 0 0 0 2 1 3 5
As I insert the data by batches, each batch only has 4 columns each and I want to collate the data to the following
St 1 2 3 4 5 6 7 8
===========================================
603 2 5 1.5 3 2 1 3 5
but most of the threads I see here are about concatenating strings of a single column.
Anyone has any idea on how to collate or even merge different rows into a single row.
You can use the group by and Sum key word of the t-SQL
SELECT SUM(COL1) , SUM(COL2)..... FROM tbl GROUP BY ST
You can use the GROUP BY clause and aggregate with SUM fields 1-8 :
SELECT St, SUM(1), SUM(2),.. FROM tbl GROUP BY St

How to calculate the number of pairs in an Excel spreadsheet?

I have two columns of integers between 1 and 16 in an excel file. I'd like to count the number of pairs of integers in these columns. There are 256 cases and I'd like to have a column which tells me how many pairs exist for each case. For instance, I have a table like below:
1 2
1 1
1 3
1 4
1 1
1 8
1 1
16 16
1 2
...
And I'd like to calculate a column like this:
3 (number of 1 1s)
2 (number of 1 2s)
1 (number of 1 3s)
1 (number of 1 4s)
0 (number of 1 5s)
0 (number of 1 6s)
0 (number of 1 7s)
1 (number of 1 8s)
...
1 (number of 16 16s)
I'd appreciate if someone can help me with the calculation.
First you need to create two columns with all possible combinations:
1 1
1 2
1 3
...
2 1
2 2
...
16 16
Let's assume these are in columns C,D and your data are in columns A, B, in rows 1 to 1000. Then you can use an array formula:
=SUM(IF(($A$1:$A$1000=C1)*($B$1:$B$1000=D1);1;0))
You must press Shift+Ctrl+Enter when entering array formula.