syntax error in Hive Query - sql

I am trying to answer this question
Of the right-handed batters who were born in October and died in
2011, which one had the most hits in his career?
My attempt to get the query, Please ignore the total, it supposed to for sums for b.hits, dont know how to alias it.
SELECT n.id, n.bmonth, n.dyear,n.bats, SUM(b.hits) FROM master n
JOIN (SELECT b.id , b.hits FROM batting GROUP BY id) o
WHERE n.bmonth == 10 AND n.dyear == 2011) x
ON x.id=n.id
ORDER BY total DESC;
Incase anyone needs the schema of the two tables used, look below.
INSERT OVERWRITE DIRECTORY '/home/hduser/hivetest/answer4'
SELECT n.id, n.bmonth, n.dyear,n.bats, SUM(b.hits) FROM master n
JOIN (SELECT b.id , b.hits FROM batting GROUP BY id) o
WHERE n.bmonth == 10 AND n.dyear == 2011) x
ON x.id=n.id
ORDER BY total DESC;

First, although Hive accepts ==, that doesn't mean you should use it. The standard SQL equality operator is simply =. There is no reason to use a synonym.
I suspect the problem is several things:
The lack of group by.
Mis-use of aggregation functions.
Missing aliases
SQL query clauses in the correct order
Unbalanced parentheses
In other words, the query is just a mess. You need to review the basics of query syntax. Does this work?
SELECT m.id, m.bmonth, m.dyear, m.bats, b.hits as total
FROM master m JOIN
(SELECT b.id, SUM(b.hits) as hits
FROM batting b
GROUP BY id
) b
ON b.id = m.id
WHERE m.bmonth = 10 AND m.dyear = 2011
ORDER BY total DESC;

Related

Sql - "missing right parenthesis" error explanation

My code is syntactically correct and I don't think that I need any parenthesis,
however it keeps coming up with an error "00907. 00000 - "missing right parenthesis"" without any explanation of the cause
SELECT DISTINCT BG.name
FROM Brand_Group BG
WHERE BG.pid=(SELECT P.pid
FROM Indicia_Publisher IP
LEFT JOIN Publisher P ON IP.pid=P.pid
WHERE (IP.cid=(SELECT Country.cid
FROM Country
WHERE Country.name='Belgium') AND ROWNUM<=1)
GROUP BY P.pid
ORDER BY COUNT(P.pid));
The issue is to do with the ORDER BY in the comparison subquery - it's not valid syntax, which you can see by running the following query:
SELECT * FROM dual WHERE dummy IN (SELECT dummy FROM dual ORDER BY dummy);
Remove the ORDER BY clause and your query should run without issue.
Also, if you're wanting to get distinct rows returned and you're not using an aggregate function (e.g. MAX, SUM, etc), then you should use DISTINCT, not GROUP BY - it makes your intention much clearer. However, since you've restricted the results to a single row with and rownum = 1, there's not much point in using either!
Your query should probably be something along the lines of:
SELECT DISTINCT bg.name
FROM brand_group bg
WHERE bg.pid = (SELECT p.pid
FROM indicia_publisher ip
LEFT JOIN publisher p
ON ip.pid = p.pid
WHERE ip.cid = (SELECT country.cid
FROM country
WHERE country.name = 'Belgium') AND rownum <= 1);
ETA: I see that I misread your original SQL slightly and misinterpreted what you were after. It looks like you're after the p.pids with the highest count, so the following should do what you're after:
SELECT DISTINCT bg.name
FROM brand_group bg
WHERE bg.pid = (SELECT p.pid
FROM (SELECT p.pid,
COUNT(*) cnt,
MAX(COUNT(*)) OVER (PARTITION BY p.id) max_cnt
FROM indicia_publisher ip
LEFT JOIN publisher p
ON ip.pid = p.pid
WHERE ip.cid = (SELECT country.cid
FROM country
WHERE country.name = 'Belgium')
GROUP BY p.pid)
WHERE cnt = max_cnt
AND ROWNUM = 1);
If there are two or more different p.pids that have the highest count, the and rownum = 1 ensures only one will be picked (but it'll be random). May you want to use IN rather than = in the outer query's comparison, and that would remove the need for the rownum = 1 predicate.
is this Oracle, SQL Server? Oracle does not use left, Sql server does not have rownum. Do you expect one or more value to be return in subquery. it looks like more than 1. Then change BG.cid = () to ip.cid in (). Do you need the group in that subquery. You don't have count in that subquery.
I suggest you execute subquery one by one to find out what happens
Your original one has mixed of syntax (oracle/SqL server). I am going assume this is Oracle database
SELECT DISTINCT BG.name
FROM Brand_Group BG
WHERE BG.pid in
(SELECT P.pid
FROM Indicia_Publisher IP
WHERE
IP.pid=P.pid (+) and
(IP.cid=(SELECT Country.cid
FROM Country
WHERE Country.name='Belgium') AND ROWNUM<=1)
);

Microsoft Access SQL query count distinct

I am trying to create a queries in MS Access SQL that performs two separate counts function and have drafted the below code:
SELECT DISTINCT A.Name, Count(A.Name) AS X, Count(b.Address) AS Y
FROM PEOPLE AS A INNER JOIN PEOPLE Sub AS b ON A.PID = b.PID
GROUP BY A.Nam
The problem with this query is that both count functions provide a total count of the number of address for each name and I want the first count function to provide a count of names, therefore I would be grateful if someone could advise how I amend this code to change the first count function to a count distinct
Thanks
Nick
Your Query is wrong - GROUP BY must have A.Name - i think it´s an error by copying.
Otherwise change this. What happens if you do it without DISTINCT? Try it with SUM not with COUNT.
Distinct in your query is obsolete because of the GROUP BY clause.
Furthermore it is not clear if your 'People sub' refers to another table or is a self-join. The following code should work:
SELECT P.Name
, COUNT(P.*) AS X
, COUNT(DISTINCT A.Address) AS Y
FROM PEOPLE AS P
INNER JOIN ADDRESS AS A ON A.PID = P.PID
GROUP BY P.Name

How can I join 3 tables and calculate the correct sum of fields from 2 tables, without duplicate rows?

I have tables A, B, C. Table A is linked to B, and table A is linked to C. I want to join the 3 tables and find the sum of B.cost and the sum of C.clicks. However, it is not giving me the expected value, and when I select everything without the group by, it is showing duplicate rows. I am expecting the row values from B to roll up into a single sum, and the row values from C to roll up into a single sum.
My query looks like
select A.*, sum(B.cost), sum(C.clicks) from A
join B
left join C
group by A.id
having sum(cost) > 10
I tried to group by B.a_id and C.another_field_in_a also, but that didn't work.
Here is a DB fiddle with all of the data and the full query:
http://sqlfiddle.com/#!9/768745/13
Notice how the sum fields are greater than the sum of the individual tables? I'm expecting the sums to be equal, containing only the rows of the table B and C once. I also tried adding distinct but that didn't help.
I'm using Postgres. (The fiddle is set to MySQL though.) Ultimately I will want to use a having clause to select the rows according to their sums. This query will be for millions of rows.
If I understand the logic correctly, the problem is the Cartesian product caused by the two joins. Your query is a bit hard to follow, but I think the intent is better handled with correlated subqueries:
select k.*,
(select sum(cost)
from ad_group_keyword_network n
where n.event_date >= '2015-12-27' and
n.ad_group_keyword_id = 1210802 and
k.id = n.ad_group_keyword_id
) as cost,
(select sum(clicks)
from keyword_click c
where (c.date is null or c.date >= '2015-12-27') and
k.keyword_id = c.keyword_id
) as clicks
from ad_group_keyword k
where k.status = 2 ;
Here is the corresponding SQL Fiddle.
EDIT:
The subselect should be faster than the group by on the unaggregated data. However, you need the right indexes: ad_group_keyword_network(ad_group_keyword_id, ad_group_keyword_id, event_date, cost) and keyword_click(keyword_id, date, clicks).
I found this (MySQL joining tables group by sum issue) and created a query like this
select *
from A
join (select B.a_id, sum(B.cost) as cost
from B
group by B.a_id) B on A.id = B.a_id
left join (select C.keyword_id, sum(C.clicks) as clicks
from C
group by C.keyword_id) C on A.keyword_id = C.keyword_id
group by A.id
having sum(cost) > 10
I don't know if it's efficient though. I don't know if it's more or less efficient than Gordon's. I ran both queries and this one seemed faster, 27s vs. 2m35s. Here is a fiddle: http://sqlfiddle.com/#!15/c61c74/10
Simply split the aggregate of the second table into a subquery as follows:
http://sqlfiddle.com/#!9/768745/27
select ad_group_keyword.*, SumCost, sum(keyword_click.clicks)
from ad_group_keyword
left join keyword_click on ad_group_keyword.keyword_id = keyword_click.keyword_id
left join (select ad_group_keyword.id, sum(cost) SumCost
from ad_group_keyword join ad_group_keyword_network on ad_group_keyword.id = ad_group_keyword_network.ad_group_keyword_id
where event_date >= '2015-12-27'
group by ad_group_keyword.id
having sum(cost) > 20
) Cost on Cost.id=ad_group_keyword.id
where
(keyword_click.date is null or keyword_click.date >= '2015-12-27')
and status = 2
group by ad_group_keyword.id

GROUP BY not working in left join query

I m trying to use group by clause in left join sql query and it is not working.
Please help me out, thanks in advance.
SELECT Cust_Mst_Det.Cust_Hd_Code,
Cust_Mst_Det.First_Name,
SL_HEAD20152016.vouch_date AS invoice_2,
SL_HEAD20142015.vouch_date AS invoice_1,
Cust_Mst_Hd.EMail
FROM Cust_Mst_Det
LEFT JOIN SL_HEAD20142015 ON Cust_Mst_Det.Cust_Hd_Code=SL_HEAD20142015.Member_Code
LEFT JOIN SL_HEAD20152016 ON Cust_Mst_Det.Cust_Hd_Code=SL_HEAD20152016.Member_Code
LEFT JOIN Cust_Mst_Hd ON Cust_Mst_Det.Cust_Hd_Code=Cust_Mst_Hd.Cust_Hd_Code
WHERE cust_mst_det.first_name!='NIL'
GROUP BY Cust_Mst_Det.Cust_Hd_Code
ORDER BY SL_HEAD20152016.vouch_date DESC,
SL_HEAD20142015.vouch_date
I'm not sure which DBMS you are using, but on an Oracle your query will not work at all.
First issue: The GROUP BY statement is used in conjunction with the aggregate functions to group the result-set by one or more columns. You do not have any aggregating function in your SELECT statement (count, max, etc.)
Second issue: you must specify all columns from SELECT statement in your GROUP BY statement (excluding columns that represents results of aggregation).
As I said I don't know which DB is used by you, but those two points should be applicable for the most of SQL standards.
It appears that it is impossible to use an ORDER BY on a GROUP BY summarisation. My fundamental logic is flawed. I will need to run the following subquery.
ex :
SELECT p.*, pp.price
FROM products p
LEFT JOIN ( SELECT price FROM product_price ORDER BY date_updated DESC ) pp
ON p.product_id = pp.product_id GROUP BY p.product_id;
This will take a performance hit but as it is the same subquery for each row it shouldn't be too bad.

Self Join bringing too many records

I have this query to express a set of business rules.
To get the information I need, I tried joining the table on itself but that brings back many more records than are actually in the table. Below is the query I've tried. What am I doing wrong?
SELECT DISTINCT a.rep_id, a.rep_name, count(*) AS 'Single Practitioner'
FROM [SE_Violation_Detection] a inner join [SE_Violation_Detection] b
ON a.rep_id = b.rep_id and a.hcp_cid = b.hcp_cid
group by a.rep_id, a.rep_name
having count(*) >= 2
You can accomplish this with the having clause:
select a, b, count(*) c
from etc
group by a, b
having count(*) >= some number
I figured out a simpler way to get the information I need for one of the queries. The one above is still wrong.
--Rep violation for different HCP more than 5 times
select distinct rep_id,rep_name,count(distinct hcp_cid)
AS 'Multiple Practitioners'
from dbo.SE_Violation_Detection
group by rep_id,rep_name
having count(distinct hcp_cid)>4
order by count(distinct hcp_cid)