How to Drop Output Field - hive

Problem:
My SELECT statement results with two outputs, but the requirement is for ONLY ID:
My current Hive QL output:
ID player.weight
My current Hive Query:<br></br>
SELECT DISTINCT b.id ,m.weight as weight
FROM batting b
JOIN
Master m on (m.id = b.id)
WHERE b.year = 2005 AND b.triples >4
ORDER BY weight DESC
LIMIT 1;
Is there a way to save my results, and then query the first ID column? Thanks for any help.

I think I figured it out. I needed to wrap the whole thing as a subquery, then just select the id.

Related

syntax error in Hive Query

I am trying to answer this question
Of the right-handed batters who were born in October and died in
2011, which one had the most hits in his career?
My attempt to get the query, Please ignore the total, it supposed to for sums for b.hits, dont know how to alias it.
SELECT n.id, n.bmonth, n.dyear,n.bats, SUM(b.hits) FROM master n
JOIN (SELECT b.id , b.hits FROM batting GROUP BY id) o
WHERE n.bmonth == 10 AND n.dyear == 2011) x
ON x.id=n.id
ORDER BY total DESC;
Incase anyone needs the schema of the two tables used, look below.
INSERT OVERWRITE DIRECTORY '/home/hduser/hivetest/answer4'
SELECT n.id, n.bmonth, n.dyear,n.bats, SUM(b.hits) FROM master n
JOIN (SELECT b.id , b.hits FROM batting GROUP BY id) o
WHERE n.bmonth == 10 AND n.dyear == 2011) x
ON x.id=n.id
ORDER BY total DESC;
First, although Hive accepts ==, that doesn't mean you should use it. The standard SQL equality operator is simply =. There is no reason to use a synonym.
I suspect the problem is several things:
The lack of group by.
Mis-use of aggregation functions.
Missing aliases
SQL query clauses in the correct order
Unbalanced parentheses
In other words, the query is just a mess. You need to review the basics of query syntax. Does this work?
SELECT m.id, m.bmonth, m.dyear, m.bats, b.hits as total
FROM master m JOIN
(SELECT b.id, SUM(b.hits) as hits
FROM batting b
GROUP BY id
) b
ON b.id = m.id
WHERE m.bmonth = 10 AND m.dyear = 2011
ORDER BY total DESC;

How can I join 3 tables and calculate the correct sum of fields from 2 tables, without duplicate rows?

I have tables A, B, C. Table A is linked to B, and table A is linked to C. I want to join the 3 tables and find the sum of B.cost and the sum of C.clicks. However, it is not giving me the expected value, and when I select everything without the group by, it is showing duplicate rows. I am expecting the row values from B to roll up into a single sum, and the row values from C to roll up into a single sum.
My query looks like
select A.*, sum(B.cost), sum(C.clicks) from A
join B
left join C
group by A.id
having sum(cost) > 10
I tried to group by B.a_id and C.another_field_in_a also, but that didn't work.
Here is a DB fiddle with all of the data and the full query:
http://sqlfiddle.com/#!9/768745/13
Notice how the sum fields are greater than the sum of the individual tables? I'm expecting the sums to be equal, containing only the rows of the table B and C once. I also tried adding distinct but that didn't help.
I'm using Postgres. (The fiddle is set to MySQL though.) Ultimately I will want to use a having clause to select the rows according to their sums. This query will be for millions of rows.
If I understand the logic correctly, the problem is the Cartesian product caused by the two joins. Your query is a bit hard to follow, but I think the intent is better handled with correlated subqueries:
select k.*,
(select sum(cost)
from ad_group_keyword_network n
where n.event_date >= '2015-12-27' and
n.ad_group_keyword_id = 1210802 and
k.id = n.ad_group_keyword_id
) as cost,
(select sum(clicks)
from keyword_click c
where (c.date is null or c.date >= '2015-12-27') and
k.keyword_id = c.keyword_id
) as clicks
from ad_group_keyword k
where k.status = 2 ;
Here is the corresponding SQL Fiddle.
EDIT:
The subselect should be faster than the group by on the unaggregated data. However, you need the right indexes: ad_group_keyword_network(ad_group_keyword_id, ad_group_keyword_id, event_date, cost) and keyword_click(keyword_id, date, clicks).
I found this (MySQL joining tables group by sum issue) and created a query like this
select *
from A
join (select B.a_id, sum(B.cost) as cost
from B
group by B.a_id) B on A.id = B.a_id
left join (select C.keyword_id, sum(C.clicks) as clicks
from C
group by C.keyword_id) C on A.keyword_id = C.keyword_id
group by A.id
having sum(cost) > 10
I don't know if it's efficient though. I don't know if it's more or less efficient than Gordon's. I ran both queries and this one seemed faster, 27s vs. 2m35s. Here is a fiddle: http://sqlfiddle.com/#!15/c61c74/10
Simply split the aggregate of the second table into a subquery as follows:
http://sqlfiddle.com/#!9/768745/27
select ad_group_keyword.*, SumCost, sum(keyword_click.clicks)
from ad_group_keyword
left join keyword_click on ad_group_keyword.keyword_id = keyword_click.keyword_id
left join (select ad_group_keyword.id, sum(cost) SumCost
from ad_group_keyword join ad_group_keyword_network on ad_group_keyword.id = ad_group_keyword_network.ad_group_keyword_id
where event_date >= '2015-12-27'
group by ad_group_keyword.id
having sum(cost) > 20
) Cost on Cost.id=ad_group_keyword.id
where
(keyword_click.date is null or keyword_click.date >= '2015-12-27')
and status = 2
group by ad_group_keyword.id

SQL - extract column from subquery

I'm working on a database meant for auctions and I would like get the id of all the winning bids. The hard part is extracting it from a subquery that returns 2 columns: the id and the amount.
It looks something like this:
SELECT id FROM Bid WHERE id IN (Select ID,max(amount) FROM Bid group by bid.idAuction)
Can I somehow extract just one column from the subquery? Any other sugestions to do this task are helpfull too.
Thank you!
Your query is close, but you need a correlated subquery to make this work:
SELECT b.id
FROM Bid b
WHERE b.amount = (SELECT max(amount)
FROM Bid b2
WHERE b2.idAuction = b.idAuction
);
SELECT id, maxBid.MAmount, Bid.Amount
FROM Bid
INNER JOIN (Select ID,max(amount) mamount FROM Bid group by bid.idAuction) MaxBid
on MaxBid.ID = Bid.ID
RDBMS and SQL operate most effectively in SET based operations. So in this case we generate a set based on ID and max bid. We then join it back to the base set so that only the max bids are treturned.

Rewrite SQL and use of group by

I have written below sql for one of the requirement and is fetching my results. But, I am wondering if there is any better way of writing this query rather than using alias table as A.
SELECT A.*,B.OPRDEFNDESC FROM
( select OPRID_ENTERED_BY ,COUNT(*)
from ps_req_hdr
where entered_dt > '01-JUL-2012'
GROUP BY OPRID_ENTERED_BY
ORDER BY COUNT(*) DESC) A, PSOPRDEFN B
WHERE A.OPRID_ENTERED_BY=B.OPRID
You may be able to use a simple INNER JOIN to do the same thing...
SELECT A.OPRID_ENTERED_BY, COUNT(*), B.OPRDEFNDESC
FROM ps_req_hdr A
JOIN PSOPRDEFN B ON A.OPRID_ENTERED_BY = B.OPRID
WHERE A.entered_dt > '01-JUL-2012'
GROUP BY A.OPRID_ENTERED_BY, B.OPRDEFNDESC
ORDER BY COUNT(*) DESC
NOTE
As per the comments below, the COUNT(*) result for this query will NOT include records that don't have corresponding matches in table B, and it will inflate for non-unique matches in table B. What this means is: if B.OPRID is not a unique field or if A.OPRID_ENTERED_BY is not a foreign key for B.OPRID then this answer will not yield the same results as the original query.

How can you add 2 joins in a subquery?

I am trying to get information from 3 tables in my database. I am trying to get 4 fields. 'kioskid', 'kioskhours', 'videotime', 'sessiontime'. In order to do this, i am trying a join in a subquery. This is what I have so far:
SELECT k.kioskid, k.hours, v.time, s.time
FROM `nsixty_kiosks` as k
LEFT JOIN (SELECT time
FROM `nsixty_videos`
ORDER BY videoid) as v
ON kioskid = k.kioskid LEFT JOIN
(SELECT kioskid, time
FROM `sessions`
ORDER BY pingid desc LIMIT 1) as s ON s.kioskid = k.kioskid
WHERE hours is NOT NULL
When I run this query, it works but it shows every row instead of just showing the last row of each kiosk id. Which is meant to show based on the line 'ORDER BY pingid desc LIMIT 1'.
Any body have some ideas?
Instead of joining to s, you can use a correlated subquery:
SELECT k.kioskid,
k.hours,
v.time,
( SELECT time
FROM sessions
WHERE sessions.kioskid = k.kioskid
ORDER
BY pingid DESC
LIMIT 1
)
FROM nsixty_kiosks AS k
LEFT
JOIN ( SELECT time
FROM `nsixty_videos`
ORDER BY videoid
) AS v
ON kioskid = k.kioskid
WHERE hours IS NOT NULL
;
N.B. I didn't fix your LEFT JOIN (...) AS v, because I don't understand what it's trying to do, but it too is broken; the ON clause doesn't refer to any of its columns, and there's no point in having an ORDER BY in a subquery unless you also have a LIMIT or whatnot in there.
Well, your join on the 'v' subquery doesn't actually reference the 'v' subquery, nor does the 'v' subquery even contain a kioskid field to JOIN on, so that's undoubtedly part of the problem.
To go much further we'd need to see schema and sample data.