row_number error when trying to rank items - sql

I'm trying to get back into SQL query and am having a frustrating problem. I have two questions:
I'm trying to take all items in my dataset and rank them by partitions. I researched this and think it should look like this:
select g.ticker, g.sector, g.industry, g.countryname, g.exchange, c.carbon, c.year,
ROW_NUMBER() OVER (
PARTITION BY g.sector, g.industry, g.countryname, g.exchange
ORDER BY c.carbon DESC
) AS 'Rank'
from "General" g
INNER JOIN carbon c ON upper(c.ticker) =g.ticker ;
The output would be a rank for each group in the partition in this case it would be sector, industry, country name and exchange then the rows are ranked based on their carbon emissions.
I'm getting this error:
Error occurred during SQL script execution
Reason:
SQL Error [42601]: ERROR: syntax error at or near "'Rank'"
Position: 1305
if I remove the rank section, the data joins and provides results(obviously not ranked like I want but I know the base query works). What am I doing wrong?
Second(related) question, I forgot how much I hated SQL error messages. The above error tells me there's syntax error then I went to the docs and couldn't see anything different in my code vs their example. Assuming lack of experience, is there a better way to get actionable error messages(i.e. in python I get a stack trace that I can read to see what part of my code went wrong)?
Thank you!

Don't use single quotes for column aliases. Also, I would suggest avoiding anything that is part of standard SQL (which has a rank() function. I often use seqnum:
select g.ticker, g.sector, g.industry, g.countryname, g.exchange, c.carbon, c.year,
row_number() over (
partition by g.sector, g.industry, g.countryname, g.exchange
order by c.carbon desc
) as seqnum
from "General" g join
carbon c
on upper(c.ticker) = g.ticker ;
Note: You should only use single quotes for string and date constants. If you want to escape a column name, use double quotes (just as your query does for the table name General).

Related

Teradata error 3504 (non-aggregate values must be part of group) when using windowing function

So I wrote a query that uses a window function and I keep getting an error 3504 in Teradata, eventhough I'm sure I have the correct columns in the group by clause (all non-aggregate columns). It has something to do with the windowing function I'm using, because when I comment it out I don't get the error, but I have no idea how to resolve it.
This is the query:
select
n.acct_id as bd_acct_id
,n.tran_nr as tran_order
,t.trade_dt - n.tran_dt as days_until_trade
,n.n_total
,sum(t.trade_ct) as trades_ct
,sum(t.trade_gross_am) as tot_trades
,sum(t.trade_gross_am) over (partition by bd_acct_id, tran_order order by tran_order) as running_total
from nnae n
left join trades t
on n.acct_id = t.acct_id
having days_until_trade > 0
group by 1,2,3,4
order by 1,2,3
Would appreciate any help. Thanks!
Presumably, you intend something like this:
sum(sum(t.trade_gross_am)) over (partition by n.acct_id, n.tran_nr
order by min(n.tran_dt)
rows between unbounded preceding and current row
) as running_total
It seems odd to have a running total, without the date column explicitly in the result set.
Also, I replaced the aliases with the original column names. Not all databases support aliases in window functions, so this is just a habit I'm used to.

Nth(n,split()) in bigquery

I am running the following query and keep getting the error message:
SELECT NTH(2,split(Web_Address_,'.')) +'.'+NTH(3,split(Web_Address_,'.')) as D , Web_Address_
FROM [Domains.domain
limit 10
Error message: Error: (L1:110): (L1:119): SELECT clause has mix of
aggregations 'D' and fields 'Web_Address_' without GROUP BY
clause Job ID:
symmetric-aura-572:job_axsxEyfYpXbe2gpmlYzH6bKGdtI
I tried to use group by clause on field D and/or Web_address_, but still getting errors about group by.
Does anyone know why this is the case? I have had success with similar query before.
You probably want to use WITHIN RECORD aggregation here, not GROUP BY
select concat(p1, '.', p2), Web_Address_ FROM
(SELECT
NTH(2,split(Web_Ad`enter code here`dress_,'.')) WITHIN RECORD p1,
NTH(3,split(Web_Address_,'.')) WITHIN RECORD p2, Web_Address_
FROM (SELECT 'a.b.c' as Web_Address_))
P.S. If you just trying to cut off first part of web address, it will be easier to do with RIGHT and INSTR functions.
You can also consider using URL functions: HOST, DOMAIN and TLD

Oracle SQL - Comparing AVG functions in WHERE

I'm trying to write a few Oracle SQL scripts for an assignment. I've managed to get all of it to work, except for one part. To summarize, I have to display data from 2 tables if the average of 1 column in table A is greater than the average of another column in table B. I realize you cannot include AVG functions in a WHERE clause or HAVING clause since it seems unable to properly access the data (from what I've read). When I exclude this clause, the script executes properly, so I'm confident there are no other errors.
I've tried writing it as follows but the error I get is ORA-00936: missing expression and it is just before the > sign. I thought this may be due to improper bracket placing but none of my attempts resolved this. Here is my attempt:
SELECT l.l_category, SUM(r.r_sold), AVG(l.l_cost)
FROM promos l
INNER JOIN sales r
ON r.promo_id = l.promo_id
GROUP BY l.l_category
HAVING (SELECT AVG(l.l_cost) OVER (PARTITION BY l.l_cost)) >
(SELECT AVG(r.r_sold) OVER (PARTITION BY r.r_sold));
I tried doing this without the OVER (PARTITION BY ...) as well as putting it into a WHERE clause but it didn't resolve the error. I'm pretty sure I need to put it into a SELECT statement somehow but I'm at a loss.
You do not need to use the OVER clause when applying the aggregate functions in the HAVING clause. Just use the aggregate functions on their own.
SELECT l.l_category, SUM(r.r_sold), AVG(l.l_cost)
FROM promos l
INNER JOIN sales r
ON r.promo_id = l.promo_id
GROUP BY l.l_category
HAVING HAVING AVG(l.l_cost) > AVG(r.r_sold)

Finding most popular and most unique records using SQL

My mom wanted a baby name game for my brother's baby shower. Wanting to learn python, I volunteered to do it. I pretty much have the python bit, it's the SQL that is throwing me.
The way the game is supposed to work is everyone at the shower writes down names on paper, I manually enter them into Excel (normalizing spellings as much as possible) and export to MS Access. Then I run my python program to find the player with the most popular names and the player with the most unique names. The database, called "babynames", is just four columns.
ID | BabyFirstName | BabyMiddleName | PlayerName
---|---------------|----------------|-----------
My mom has changed things every so often, but as they stand right now, I have to figure out :
a) The most popular name (or names if there is a tie) out of all first and middle names
b) The most unique name (or names if there is a tie) out of all the first and middle names
c) The player that has the most number of popular names (wins a prize)
d) The player that has the most number of unique names (wins a prize)
I've been working on this for about a week now and can't even get a SQL query for a) and b) to work, much less c) and d). I'm more than just a bit frustrated.
BTW, I'm just looking at spellings of the names, not phonetics. As I manually enter names, I will change names like "Kris" to "Chris" and "Xtina" to "Christina" etc.
Editing to add a couple of the most recent queries I tried for a)
SELECT [BabyFirstName],
COUNT ([BabyFirstName]) AS 'FirstNameOccurrence'
FROM [babynames]
GROUP BY [BabyFirstName]
ORDER BY 'FirstNameOccurrence' DESC
LIMIT 1
and
SELECT [BabyFirstName]
FROM [babynames]
GROUP BY [BabyFirstName]
HAVING COUNT(*) =
(SELECT COUNT(*)
FROM [babynames]
GROUP BY [BabyFirstName]
ORDER BY COUNT(*) DESC
LIMIT 1)
These both lead to syntax errors.
pyodbc.ProgrammingError: ('42000', '[42000] [Microsoft][ODBC Microsoft Access Driver] Syntax error in ORDER BY clause. (-3508) (SQLExecDirectW)')
I've tried using [FirstNameOccurrence] and just FirstNameOccurrence as well with the same error. Not sure why it's not recognizing it by that column name to order by.
pyodbc.ProgrammingError: ('42000', "[42000] [Microsoft][ODBC Microsoft Access Driver] Syntax error. in query expression 'COUNT(*) = (SELECT COUNT(*) FROM [babynames] GROUP BY [BabyFirstName] ORDER BY COUNT(*) DESC LIMIT 1)'. (-3100) (SQLExecDirectW)")
I'll admit that I'm not really grokking all of the COUNT(*) commands here, but this was a solution for a similar issue here in stackoverflow that I figured I'd try when my other idea didn't pan out.
For A and B, use a group by clause in your SQL, and then count, and order by the count. Use descending order for A and ascending order for B, and just take the first result for each.
For C and D, essentially use the same strategy but now just add the PlayerName (e.g. group by babyname,playername) and then use the ascending order/descending order question.
Here's Microsoft's write-up for a group by clause in MS Access: https://office.microsoft.com/en-us/access-help/group-by-clause-HA001231482.aspx
Here's an even better write-up demonstrating how to do both group by and order by at the same time: http://rogersaccessblog.blogspot.com/2009/06/select-queries-part-3-sorting-and.html
For the first query you tried, change it to:
SELECT TOP 1 [BabyFirstName],
COUNT ([BabyFirstName]) AS 'FirstNameOccurrence'
FROM [babynames]
GROUP BY [BabyFirstName]
ORDER BY 'FirstNameOccurrence' DESC
For the second, change it to:
SELECT [BabyFirstName]
FROM [babynames]
GROUP BY [BabyFirstName]
HAVING COUNT(*) =
(SELECT TOP 1 COUNT(*)
FROM [babynames]
GROUP BY [BabyFirstName]
ORDER BY COUNT(*) DESC)
Limiting the number of records returned by a SQL Statement in Access is achieved by adding a TOP statement directly after SELECT, not with ORDER BY... LIMIT
Also, Access TOP statement will return all instances of the top n (or n percent) unique records, so if there are two or more identical records in the query output (before TOP), and TOP 1 is specified, you'll see them all.

Error in group by using hive

I am using the following code and getting the error below
select d.searchpack,d.context, d.day,d,txnid,d.config, c.sgtype from ds3resultstats d join
context_header c on (d.context=c.contextid) where (d.day>='2012-11-15' and d.day<='2012-11-25' and c.sgtype='Tickler' and d.config like
'%people%') GROUP BY d.context limit 10;
FAILED: Error in semantic analysis: line 1:7 Expression Not In Group By Key d
I am guessing I am using the group by incorrectly
when you use group by, you cannot select other additional field. You can only select group key with aggregate function.
See hive group by for more information.
Related questions.
Code example:
select d.context,count(*)
from ds3resultstats
...
group by d.context
or group by multiply fields.
select d.context, d.field2, count(*)
from ds3resultstats
...
group by d.context, d.field2
It is expecting all the columns to be added with group by.
Even I am facing the same issue however I managed to get a work around to these kind of issues.
you can use collect_set with the column name to get the output. For example
select d.searchpack,collect_set(d.context) from sampletable group by d.searchpack;