How do I get the latest record? - hive

This is my table:
create table test (
id string,
name string,
age string,
modified string)
and this is my data:
id name age modifed
1 a 10 2011-11-11 11:11:11
1 a 11 2012-11-11 12:00:00
2 b 20 2012-12-10 10:11:12
2 b 20 2012-12-10 10:11:12
2 b 20 2012-12-12 10:11:12
2 b 20 2012-12-15 10:11:12
I want to get the latest record (include every columns id,name,age,modified) group by id,as the data above,the correct result is:
1 a 11 2012-11-11 12:00:00
2 b 20 2012-12-15 10:11:12
I am using below query in hive, it is working fine in sql http://sqlfiddle.com/#!2/bfbd5/42 but it is not working fine in hive
select * from test where (id, modified) in(select id, max(modified) from test group by id)
I am using 0.13 version of hive.

Hive only allows one column in an IN subquery. Try a left semijoin:
SELECT *
FROM test a
LEFT SEMI JOIN
(select id, max(modified) as modified from test) b
ON (a.modified = b.modified and a.id=b.id);
It sure seems like you could easily get he right answer using a straight forward query though. Select the max of the two columns and be sure to group by the columns that don't have aggregate functions.
select id
, name
, max(age) as age
, max(modified) as modified
from test
group by id, name;

Related

How to grouping with distinct or group max sql

i have date like this Data
id name period difference
6172 A 6 10
6172 A 3 10
10099 AB 12 24
10099 AB 6 24
10099 AB 3 24
10052 ABC 12 26
10052 ABC 6 26
10052 ABC 3 26
9014 ABCD 12 21
9014 ABCD 6 21
9014 ABCD 3 21
how to get result like this
id name period difference
6172 A 6 10
10099 AB 12 24
10052 ABC 12 26
9014 ABCD 12 4
i try with distinct on (id), but the result like this
id name period difference
6172 A 6 10
10099 AB 6 24
10052 ABC 6 26
9014 ABCD 6 4
The query you want looks something like:
SELECT DISTINCT ON (id) *
FROM Data
ORDER BY id, period DESC;
Demo
This is probably the most efficient way to write your query on Postgres. Note that DISTINCT ON syntax does not support more than one column in the ON clause. The above logic happens to work here assuming that id would uniquely identify each group (that is, that id would always be unique). If not, then we might have to resort to using ROW_NUMBER with a partition over id and name.
using max()
select id, name, t2.period, difference from tableA t1
inner join
(select id, max(period) as period from tableA
group by id) t2 on t2.id = t1.id
using distinct()
select distinct id, name, t2.period, difference from tableA
it seems you need just max()
select id,name,max(period),max(difference)
from table group by id,name
Though i have not found difference=4 in your sample data but you used that on output,so i guessed its your typo
Use max()
select id, name, max(period), difference from tablename
group by id, name,difference
You can try my code:
SELECT
id, name, max(period), difference
FROM
data_table
group by id, name,difference
order by name
This is a demo link http://sqlfiddle.com/#!17/9ab8d/2

SQL Server get distinct counts of name by each ID

I have a dataset like :
ID NAME
1 Aaron
2 Theon
3 Jon Snow
4 Jon Snow
4 Dany
5 Arya
5 Robert
5 Tyrion
I need to add a new column to this that shows the output based on the number of distinct names per ID. So expected output would be:
ID NAME Mapping
1 Aaron 1
2 Theon 1
3 Jon Snow 1
4 Jon Snow 2
4 Dany 2
5 Arya 3
5 Robert 3
5 Tyrion 3
I am confused about how to achieve this since I have tried a case statement where count(distinct(name)) does not return the right values.
You may try using COUNT as an analytic function:
SELECT
ID,
Name,
COUNT(*) OVER (PARTITION BY ID) Mapping
FROM yourTable
ORDER BY
ID;
Another approach to get COUNT of DISTINCT Name for each ID
SELECT *,
(SELECT Count(DISTINCT NAME)
FROM #table T
WHERE T1.id = T.id) Mapping
FROM #table T1
Online Demo
You can simply use below query
SELECT COUNT(DISTINCT NAME)
FROM YOUR_TABLE
GROUP BY ID
Thanks
Other method (specif SQL Server, otherwise use INNER JOIN LATERAL):
SELECT *
FROM #table f1
CROSS APPLY
(
select Count(*) Nb from #table f2
where f2.ID=f1.ID
) f3

Assigning a value of data for each record having the same condition in SQL Server 2008

I have a table in SQL Server 2008 like:
Period Name Value
1 A 10
2 A 20
3 A 30
4 A 40
1 B 50
2 B 80
3 B 70
4 B 60
What I need to write a select query includes a new column MainValue which contains the value where period=4 for a name for each data.
Example:
Period Name Value MainValue
1 A 10 40
2 A 20 40
3 A 30 40
4 A 40 40
1 B 50 60
2 B 80 60
3 B 70 60
4 B 60 60
How can I provide this? I tried the one below, but it is not working as I want.
Select
*,
(select Value where Period = 4) as MainValue
from myTable;
Any help would be appreciated.
Try this:
SELECT Period, Name, Value,
MAX(CASE WHEN Period=4 THEN Value END) OVER (PARTITION BY Name) AS MainValue
FROM mytable
The query uses a window function with a condition applied over Name partitions: the function returns the Value corresponding to Period=4 inside each partition.
You can do this a number of ways. A correlated sub-query as the column, a cross apply to a correlated query, or a cte. I personally like the cte approach. It would look something like this.
with MainValues as
(
select Name
, Value
from SomeTable
where Period = 4
)
select st.*
, mv.Value as MainValue
from SomeTable st
join MainValues mv on st.Name = mv.Name

How to use a select command to find all the records that has the maximum date value for a specific item?

Say I have a table like this, we call it tbl_test
ID thedate actionid songid
1 2014-10-01 100 10
2 2014-09-30 100 10
3 2014-10-01 80 10
4 2014-09-30 80 10
5 2014-10-01 80 21
6 2014-09-30 100 21
Now I want to find all the record thats in the tbl_test where actionid=100 and with the latest [thedate] value. In this case, I want the final select result to be
(this is the result I want, not an existing table)
ID thedate actionid songid
1 2014-10-01 100 10
6 2014-09-30 100 21
Question, how am I going to do that use nothing but a single select command in MS SQL Server?
Use a join to a query that returns the latest date for each song:
select tbl_test.*
from tbl_test
join (select songid, max(theDate) maxDate
from tbl_test
where actionId = 100
group by songid) t on t.songId = tbl_test.songId and theDate = maxDate
where actionid = 100
This should perform pretty well as it makes only 2 passes over the table - one for the inner query that determines the latest date, and another to output the matching rows
A general SQL way to get this is using not exists:
select t.*
from tbl_test t
where actionid = 100 and
not exists (select 1
from tbl_test t2
where t2.songid = t.songid and t2.actionid = 100 and t2.thedate > t.thedate
);
For performance, you want an index on songid, actionid, thedate.

PostgreSQL query issues

I have a DB with multiple tables which also have multiple rows/columns with a layout similar to that shown below.
the site is in a table named sites and ID and Type are in a table labelled Site_EQ
Site ID Type
A0004 2 abc
A0004 3 abcd
A0004 4 abcde
A0005 2 abc
A0005 3 abcd
A0005 4 abcde
A0005 5 abc
A0005 6 abcd
A0005 7 abcde
Essentially what I am trying to do is filter the results by site finding the highest ID value per site and removing the others, so if for example A0010 had ID's 1-20 I would like the result to show.
A0010 20 Bla
and ignore the
A0010 1 Bla
A0010 2 Bla
and so on, but am not sure how to go about doing so as there is no set number of ID it could be 1-3 or 1-30, essentially giving me 30 results for a single site with only 1 column different (which I would like to filter to the highest value only).
Try:
select Site, ID, Type from
(select s.*, row_number() over (partition by Site order by ID desc) rn
from Site_EQ) q
where rn=1
This should do it:
SELECT T1.Site, T1.ID, T1.Type
FROM SomeTable T1, (SELECT Site, MAX(ID) AS ID
FROM SomeTable
GROUP BY Site) T2
WHERE T2.Site = T1.Site
AND T2.ID = T1.ID
The trick is have a sub query that gives you the Site + maximum ID, you can't retrieve the Type at the same time though due to the grouping so you need to join it again with the actual table.
Here you can find the explanation on Table Alias (T1 and T2):
http://www.postgresql.org/docs/9.1/static/queries-table-expressions.html#QUERIES-TABLE-ALIASES