How use query result in current layer in Hive?

How use query result in current layer in Hive? - hive

for exmaple
select
sum(a) over(partition by aa) as x,
sum(b) over(partition by bb) as y,
(x- y) / y as c -- look here
from
table
I want to use x,y as a variable here,
how avoid write as
((sum(a) over(partition by aa) as x) - (sum(b) over(partition by bb) as y)) / sum(b) over(partition by bb) as y

Hive does not let you use aliases directly, you can use your query as an inner query or a derived table like below:
select x,y,(x- y) / y as c from
(select
sum(a) over(partition by aa) as x,
sum(b) over(partition by bb) as y,
from
table1)
Alternately, if you have an identifier column say id, you can even use the WITH function of Hive:
with sub_table as (
select id, sum(a) over(partition by aa) as x,
sum(b) over(partition by bb) as y
from table1)
select a.id,
b.x,
b.y,
(b.x- b.y) / b.y as c
from table1 a
join sub_table b
on(a.id = b.id);

Related

Which window function is faster?

count(*) OVER (PARTITION BY a, b ORDER BY a, b, c) * 10
This produces the same result as:
dense_rank() OVER (PARTITION BY a, b ORDER BY a, b, c) * 10
Used in a query like this:
SELECT
dense_rank() OVER (ORDER BY a, b) ,
a || b,
count(*) OVER (
PARTITION BY a, b
ORDER BY a, b, c
) * 10 ,
a2,
b1,
c1,
cc1,
c2,
FROM
join ....
ORDER BY 1, 6;
I'm happy with my query result.
But should I appreciate one approach over the other and why?

After PARTITION BY a, b there is no point in adding aor b to ORDER BY, like David commented.
So we simplify to:
count(*) OVER (PARTITION BY a, b ORDER BY c) * 10
dense_rank() OVER (PARTITION BY a, b ORDER BY c) * 10
These two only happen to be equivalent while c is UNIQUE. Else they are not.
You'd need to define exactly what the number is supposed to signify, and show your table definition, and the exact query because joins can introduce duplicates and NULL values.
row_numer() or rank() are similar window functions ...
Performance is practically the same for all of them.

SQL multiply by AVG

I have a column ([A]) with some numbers - say A1, A2, A3.... And and I need to have a second one ([B]) which is a result of multiplying first one with average of all numbers there - e.g.
B1=A1*(A1+A2+.....+An)/n
B2=A2*(A1+A2+.....+An)/n
and so on. MS SQL Server 2016

This should do it:
SELECT a * AVG(a) OVER () AS b
FROM t
AVG(a) is identical to SUM(a) / COUNT(a)
AVG(a) OVER () is identical to SELECT AVG(a) FROM t.

Try this:
select A as A, A * (select sum(A) from tbl) / (select count(A) from tbl) as B
from tbl

you can use window function
select A ,A*( sum(A) over()/count(A)over()) as B
from table

Have a derived table (the sub-query), where you calculate sum(A) and n. CROSS JOIN.
select t1.A * t2.sumA / t2.n
from tablename t1
cross join (select sum(A) as sumA, count(*) as n from tablename) t2

Select MAX(DateTime) returning multiple lines

I'm trying to select the last MAX(DateTime) status from the table "Zee" but if the DateTime is the same it returns two lines, and I would like to get only the last one (maybe last inserted?).
here is the query:
SELECT Z."ID" AS ID,Z."A" AS A,Z."B" AS B,Z."C" AS C,Z."D" AS D
FROM ZEE Z
INNER JOIN
(SELECT ID, A, B, MAX(C) AS C
FROM ZEE
GROUP BY A, B) groupedtt
ON Z.A = groupedtt.A
AND Z.B = groupedtt.B
AND Z.C = groupedtt.C
WHERE (
Z.B = 103
OR Z.B = 104
);
and the result:
Thanks,
Regards.

I usually use rank() for such things:
select Z."ID" AS ID,Z."A" AS A,Z."B" AS B,Z."C" AS C,Z."D" AS D
from (select Z.*, rank()over(partition by A,B order by C desc, rownum) r from ZEE Z
)Z where Z.r=1

Use the ROW_NUMBER() analytic function (you will also eliminate the self-join):
SELECT ID, A, B, C, D
FROM (
SELECT ID,
A,
B,
C,
D,
ROW_NUMBER() OVER ( PARTITION BY A, B ORDER BY C DESC ) As rn
FROM ZEE
)
WHERE rn = 1;

Group aggregation and descriptive columns

A group is defined by column a, b and c. Column x, y and z from each group are the same. Sample:
a|b|c|x|y|z| ....
1 1 1 p r s
1 1 1 p r s
1 1 1 p r s
2 1 2 t u v
2 1 2 t u v
I am looking to achieve the following however without using aggregate function (max(t.x), ...)
select t.a, t.b, t.c,count(*), t.x, t.y, t.z, ....
from t
group by t.a, t.b, t.c;
Is there any other function that can be used in the select statement to include columns x,y and z?
Would you rather use another join to add the descriptive column?

If the columns are the same within a group, just include them in the group by clause:
select t.a, t.b, t.c,count(*), t.x, t.y, t.z, ....
from t
group by t.a, t.b, t.c, t.x, t.y, t.z
If you want a random row with the count, then use window functions:
select t.*
from (select t.*,
count(*) over (partition by a, b, c) as cnt,
row_number() over (partition by a, b, c order by (select NULL)) as seqnum
from t
) t
where seqnum = 1
The order by (select NULL) is used in SQL Server. I'm not sure if it will work in Netezza. Any expression will work for the order by.

Select distinct rows from two fields

I have a table that has millions of records.
So I might have these columns
a, b, c, d
I need to select all the distinct records based on columns a and b.
But I need to select columns a, b, c and d not just a and b.
Can I do this?
edit
Data might be
1,1,frog,green
1,1,frog,brown
2,1,cat,black
2,4,dog,white
so i need;
1,1,frog,green
2,1,cat,black
2,4,dog,white

SQL Server supports Common Table Expression and Window Function. The query below uses ROW_NUMBER() which ranks the record according to group. It sorts by c ASC, d ASC (just play with it).
WITH records
AS
(
SELECT a, b, c, d,
ROW_NUMBER() OVER(PARTITION BY a, b ORDER BY c, d) rn
FROM TableName
)
SELECT a, b, c, d
FROM records
WHERE rn = 1
SQLFiddle Demo
TSQL Ranking Functions

partition by is your man
SELECT a, b, c, d FROM (
SELECT a, b, c, d, ROW_NUMBER() OVER (PARTITION BY a, b ORDER BY a, b) rn
FROM table
) sq
where rn = 1

Please try:
select *
From(
select
row_number() over (partition by a, b order by a, b) RNum,
*
from
YourTable
)x
where RNum=1
Sample
select * From(
select row_number() over (partition by a, b order by a, b) RNum, *
from(
select 1 a, 1 b, 'frog' c, 'green' d union all
select 1 a, 1 b, 'frog' c, 'brown' d union all
select 2 a, 1 b, 'cat' c, 'black' d union all
select 2 a, 4 b, 'dog' c, 'white' d)x
)y
where RNum=1

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How use query result in current layer in Hive? - hive

for exmaple select sum(a) over(partition by aa) as x, sum(b) over(partition by bb) as y, (x- y) / y as c -- look here from table I want to use x,y as a variable here, how avoid write as ((sum(a) over(partition by aa) as x) - (sum(b) over(partition by bb) as y)) / sum(b) over(partition by bb) as y

Related

Which window function is faster?

SQL multiply by AVG

Select MAX(DateTime) returning multiple lines

Group aggregation and descriptive columns

Select distinct rows from two fields

Categories

Resources