How to get the row number when using alias in orderby - sql

I have a query. I use an alias in order by when using row_number and I got
[42703] ERROR: column "total_comments" does not exist error Position: 335
How can I fix this?
select
cr_seller_history_id,
c.created_at,
company_name,
business_name,
brand,
kep_mail,
address,
phone,
mail,
slug,
name,
point,
contact_positive,
contact_negative,
product_number,
(product_positive + product_negative) as total_comments,
ROW_NUMBER() OVER(ORDER BY total_comments) as rank
from cr_companies a
INNER JOIN cr_sellers b ON a.cr_company_id = b.cr_company_id
INNER JOIN cr_seller_histories c ON b.cr_seller_id = c.cr_seller_id
WHERE DATE(c.created_at) = DATE 'yesterday'
ORDER BY total_comments DESC NULLS LAST

The other solutions are a subquery, CTE, or a lateral join. So, you can write:
select . . .
v.total_comments,
row_number() over (order by v.total_comments) as rank
from cr_companies c join
cr_sellers s
on c.cr_company_id = s.cr_company_id join
cr_seller_histories sh
on s.cr_seller_id = sh.cr_seller_id, lateral
(values (product_positive + product_negative)) v(total_comments)
where DATE(c.created_at) = date 'yesterday'
order by v.total_comments desc nulls last;
Notice that I also changed the table aliases to be abbreviations for the table names. This is a best practice and makes it much easier to write, read, and modify queries.

the problem is in:
ROW_NUMBER() OVER(ORDER BY total_comments) as rank
you can't use alias like this - order by accepts alias in select, not in window function:
https://www.postgresql.org/docs/current/static/sql-select.html#SQL-SELECT-LIST
An output column's name can be used to refer to the column's value in
ORDER BY and GROUP BY clauses, but not in the WHERE or HAVING clauses;
there you must write out the expression instead.
instead try:
ROW_NUMBER() OVER(ORDER BY (product_positive + product_negative)) as rank
or use subquery - then alias can be used in window function

Related

Select second higher value with Rank() function in PosgreSql

I tried this :
SELECT
code_nuance, nb_voix,
RANK () OVER ( ORDER BY nb_voix DESC) AS rank
FROM election_2015.resultat_nuance_departement
WHERE rank = 2;
And same with alias :
SELECT
rnd.code_nuance, rnd.nb_voix,
RANK () OVER ( ORDER BY rnd.nb_voix DESC) AS rank
FROM election_2015.resultat_nuance_departement rnd
WHERE rank = 2;
Rank is not recognized in the WHERE close.
It says "Rank doesn't exist"
Any one?
Any suggestions welcomed, ty !
As with all other column aliases, you cannot use the alias in the where clause. For this purpose a subquery is handy:
SELECT x.*
FROM (SELECT rnd.code_nuance, rnd.nb_voix,
RANK() OVER (ORDER BY rnd.nb_voix DESC) AS rank
FROM election_2015.resultat_nuance_departement rnd
) x
WHERE rank = 2;
If the values are all unique, you can also use FETCH:
select rnd.*
from election_2015.resultat_nuance_departement rnd
order by rnd.db_voix desc
offset 1 fetch first 1 row only;
from the postgresql docs:
An output column's name can be used to refer to the column's value in ORDER BY and GROUP BY clauses, but not in the WHERE or HAVING clauses; there you must write out the expression instead.
This is because WHERE clause is resolved before column aliases are considered.
Another solution could be using a subquery.

Use Rank() over a pseudo Column Name

I have a table with columns:
StudentName
Marks1
Marks2
from which I need to perform a query that will calculate the average of two marks and rank the rows from highest average to least.
I executed the following query:
SELECT
*,
(SELECT AVG(c) FROM (VALUES(Marks1),(Marks2)) T (c)) AS Average,
RANK() OVER (ORDER BY Average DESC) AS Position
from Marks;
But that gives an error:
Average is an Invalid Column Name.
How do I fix this? How do I give a query to perform Rank() over Average.
You can't reference a column by its alias in the SELECT; the only place you can reference its alias is in the ORDER BY clause.
What you can do, however, is move the subquery to the FROM, and then you can reference the column returned in your (outer) SELECT:
SELECT M.*,--List your columns here, don't use *
A.Average,
RANK() OVER (ORDER BY A.Average DESC) AS Position
FROM Marks M
CROSS APPLY(SELECT AVG(Mark) AS Average FROM (VALUES(Marks1),(Marks2)) V(Mark) ) A;
You should just use the average of the two marks inlined in the outer query:
SELECT *, RANK() OVER (ORDER BY (Marks1 + Marks2) / 2 DESC) AS Position
FROM Marks
ORDER BY (Marks1 + Marks2) / 2 DESC;

Get minimum without using row number/window function in Bigquery

I have a table like as shown below
What I would like to do is get the minimum of each subject. Though I am able to do this with row_number function, I would like to do this with groupby and min() approach. But it doesn't work.
row_number approach - works fine
SELECT * FROM (select subject_id,value,id,min_time,max_time,time_1,
row_number() OVER (PARTITION BY subject_id ORDER BY value) AS rank
from table A) WHERE RANK = 1
min() approach - doesn't work
select subject_id,id,min_time,max_time,time_1,min(value) from table A
GROUP BY SUBJECT_ID,id
As you can see just the two columns (subject_id and id) is enough to group the items together. They will help differentiate the group. But why am I not able to use the other columns in select clause. If I use the other columns, I may not get the expected output because time_1 has different values.
I expect my output to be like as shown below
In BigQuery you can use aggregation for this:
SELECT ARRAY_AGG(a ORDER BY value LIMIT 1)[SAFE_OFFSET(1)].*
FROM table A
GROUP BY SUBJECT_ID;
This uses ARRAY_AGG() to aggregate each record (the a in the argument list). ARRAY_AGG() allows you to order the result (by value) and to limit the size of the array. The latter is important for performance.
After you concatenate the arrays, you want the first element. The .* transforms the record referred to by a to the component columns.
I'm not sure why you don't want to use ROW_NUMBER(). If the problem is the lingering rank column, you an easily remove it:
SELECT a.* EXCEPT (rank)
FROM (SELECT a.*,
ROW_NUMBER() OVER (PARTITION BY subject_id ORDER BY value) AS rank
FROM A
) a
WHERE RANK = 1;
Are you looking for something like below-
SELECT
A.subject_id,
A.id,
A.min_time,
A.max_time,
A.time_1,
A.value
FROM table A
INNER JOIN(
SELECT subject_id, MIN(value) Value
FROM table
GROUP BY subject_id
) B ON A.subject_id = B.subject_id
AND A.Value = B.Value
If you do not required to select Time_1 column's value, this following query will work (As I can see values in column min_time and max_time is same for the same group)-
SELECT
A.subject_id,A.id,A.min_time,A.max_time,
--A.time_1,
MIN(A.value)
FROM table A
GROUP BY
A.subject_id,A.id,A.min_time,A.max_time
Finally, the best approach is if you can apply something like CAST(Time_1 AS DATE) on your time column. This will consider only the date part regardless of the time part. The query will be
SELECT
A.subject_id,A.id,A.min_time,A.max_time,
CAST(A.time_1 AS DATE) Time_1,
MIN(A.value)
FROM table A
GROUP BY
A.subject_id,A.id,A.min_time,A.max_time,
CAST(A.time_1 AS DATE)
-- Make sure the syntax of CAST AS DATE
-- in BigQuery is as I written here or bit different.
Below is for BigQuery Standard SQL and is most efficient way for such cases like in your question
#standardSQL
SELECT AS VALUE ARRAY_AGG(t ORDER BY value LIMIT 1)[OFFSET(0)]
FROM `project.dataset.table` t
GROUP BY subject_id
Using ROW_NUMBER is not efficient and in many cases lead to Resources exceeded error.
Note: self join is also very ineffective way of achieving your objective
A bit late to the party, but here is a cte-based approach which made sense to me:
with mins as (
select subject_id, id, min(value) as min_value
from table
group by subject_id, id
)
select distinct t.subject_id, t.id, t.time_1, t.min_time, t.max_time, m.min_value
from table t
join mins m on m.subject_id = t.subject_id and m.id = t.id

Ambiguous column name using row_number() without alias

I'm trying to implement pagination in a query that is built using information from a view, and I need to use the row_number() function over a column when I don't know which table it is from.
SELECT * FROM (
SELECT class.ID as ID, user.ID as USERID, row_number() over (ORDER BY
ID desc) as row_number FROM class, user
) out_q WHERE row_number > #startrow ORDER BY row_number
The problem is that I only have the result column name (ID or USERID) that came from a previous query. If I execute this query, it will raise the error 'Ambiguous column name "ID"'. Is there a way to specify that I'm referencing the column ID that is being selected and not from a different table?
Is it possible to specify an alias to the query result itself?
I have already tried the following,
SELECT TOP 30 * FROM (
SELECT *, row_number() over (ORDER BY ID desc) as row_number FROM(
SELECT class.ID as ID, user.ID as USERID FROM class, user
) in_q
) out_q WHERE row_number > #startrow ORDER BY row_number
It works, but the SGBD gets confused on which query plan it has to use, because of the small row goal present in the outer query and the big set of results returned by the inner query, when #startrow is a small number, the query executes in less than one second, when it is a big number the query takes minutes to execute.
Your problem is the id in the row_number itself. If you want a stable sort, then include both ids:
SELECT *
FROM (SELECT class.ID as ID, user.ID as USERID,
row_number() over (ORDER BY class.ID desc, user.id) as row_number
FROM class CROSS JOIN user
) out_q
WHERE row_number > #startrow
ORDER BY row_number;
I assume the cartesian product is intentional. Sometimes, this indicates an error in the query. In general, I would advise you to avoid using commas in the from clause. If you do want a cartesian product, then be explicit by using CROSS JOIN.
You could try using the option you already tried, then use the OPTIMIZE FOR hint.
OPTION ( OPTIMIZE FOR (#startrow = 100000) );
See a description of the hint in MSDN docs here: https://msdn.microsoft.com/en-us/library/ms181714.aspx.

SQL - aggregate function to get value from same row as MAX()

I have one table with columns channel, value and timestamp, and another table with 7 other columns with various data.
I'm joining these two together, and I want to select the maximum value of the value column within an hour, and the timestamp of the corresponding row. This is what I've tried, but it (obviously) doesn't work.
SELECT
v.channel,
MAX(v.value),
v.timestamp,
i.stuff,
...
FROM
Values v
INNER JOIN
#Information i
ON i.type = v.type
GROUP BY channel, DATEPART(HOUR, timestamp), i.stuff, ...
I'm (not very surprisingly) getting the following error:
"dbo.Values.timestamp" is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause
How should I do this correctly?
You could use the RANK() or DENSE_RANK() features to get the results as appropriate. Something like:
;WITH RankedResults AS
(
SELECT
channel,
value,
timestamp,
type,
RANK() OVER (PARTITION BY DATEPART(hour,timestamp) ORDER BY value desc) as Position
FROM
Values
)
SELECT
v.channel,
v.value,
v.timestamp,
i.stuff
/* other columns */
FROM
RankedResults v
inner join
#Information i
on
v.type = i.type
WHERE
v.Position = 1
(whether to use RANK or DENSE_RANK depends on what you want to do in the case of ties, really)
(Edited the SQL to include the join, in response to Tomas' comment)
you must include 'v.timestamp' in the Group By clause.
Hope this will help for you.