Redshift percentile_disc query and group by - sql

I have a table that looks like this (containing the number of times a particular user has visited a particular page)
n | context_page_path | user_id
--------------------------------
10 | /some/path/ | 1
23 | /some/path/ | 2
30 | /some/other/p/ | 1
...
I'm trying to get the 75% percentile of visits to each page like so:
select
context_page_path,
percentile_disc(0.75) within group (order by n) over (partition by context_page_path) as percentile_75
from my_table
group by context_page_path
However, when I run this query, Redshift wants me to include n in the group by clause.
I'm not sure why it's asking for this?
If I want the average, I can do that easily like so with no complaints.
select
context_page_path,
avg(n)
from my_table
group by context_page_path

percentile_disc() is a window function, rather than an aggregation function. So, you can use:
select distinct context_page_path,
percentile_disc(0.75) within group (order by n) over (partition by context_page_path) as percentile_75
from my_table;

You are trying to use the Window Function syntax instead of the Aggregate Function syntax. Try this:
select
context_page_path,
approximate percentile_disc(0.75) within group (order by n) as percentile_75
from my_table
group by context_page_path
https://docs.aws.amazon.com/redshift/latest/dg/r_APPROXIMATE_PERCENTILE_DISC.html
It's a bit confusing but the two functions are in two different sections of the docs.

Related

Selecting pair(including reverse order) with highest date value

I have a messages table like this
Messages Table
I want to select each unique pair (including reversed order) with highest date. Therefore resulting SQL Select Statement would be like this:
from_id | to_id | date | message
1 2 13:06 I'm Alp
2 3 13:06 I'm Oliver
3 1 11:38 From third to one
I tried to use distinct with max function but it didn't help.
You can use window functions:
select *
from (
select m.*,
row_number() over(partition by min(from_id, to_id), max(from_id, to_id) order by date desc) rn
from messages m
) m
where rn = 1
Note: counter-intuitively enough, SQLite's min() and max() functions, when given several arguments, are the equivalent of least() and greatest() in other databases.

Basic SQL question for Group By (in Netezza)

This may sound stupid, but I'm having trouble with my SQL code for Group By in Netezza that I cannot seem to figure out. Basically, I'm doing simple sum and am trying to group the results, which is where I'm coming across issues. Current table looks like:
id date daily_count
---------------------------
1 4/1/20 2
1 4/2/20 1
2 4/1/20 3
2 4/1/20 2
2 4/3/20 1
I want to make it to looks like:
id date daily_count
---------------------------
1 4/1/20 2
1 4/2/20 1
2 4/1/20 5
2 4/3/20 1
my select statement is:
select id, date, sum(count) over (partition by date, id) as daily_count
If I do group by clause including the sum field (group by id, date, daily_count), I get warning saying:
Windowed aggregates not allowed in a GROUP BY clause
But if I exclude sum field in group by clause (group by id, date), then I get warning saying:
Attribute count must be GROUPed or used in an aggregate function
count is the variable that I'm summing, so if I group that, it won't produce the right sum amount.
Does this mean that grouping has to happen outside of this query, meaning cte or subquery? I'm hoping to get some advice to know what exactly is happening and what is the best course of action.
You want simple aggregation rather than window functions:
select id, date, sum(daily_count) daily_count
from mytable
group by id, date
You seem to just want aggregation:
select id, date, sum(count) as daily_count
from t
group by id, date;
I'm not sure why you are trying to use a window function here.

PostgreSQL using sum in where clause

I have a table which has a numeric column named 'capacity'. I want to select first rows which the total sum of their capacity is no greater than X, Sth like this query
select * from table where sum(capacity )<X
But I know I can not use aggregation functions in where part.So what other ways exists for this problem?
Here is some sample data
id| capacity
1 | 12
2 | 13.5
3 | 15
I want to list rows which their sum is less than 26 with the order of id, so a query like this
select * from table where sum(capacity )<26 order by id
and it must give me
id| capacity
1 | 12
2 | 13.5
because 12+13.5<26
A bit late to the party, but for future reference, the following should work for a similar problem as the OP's:
SELECT id, sum(capacity)
FROM table
GROUP BY id
HAVING sum(capacity) < 26
ORDER by id ASC;
Use the PostgreSQL docs for reference to aggregate functions: https://www.postgresql.org/docs/9.1/tutorial-agg.html
Use Having clause
select * from table order by id having sum(capacity)<X
You can use the window variant of sum to produce a cumulative sum, and then use it in the where clause. Note that window functions can't be placed directly in the where clause, so you'd need a subquery:
SELECT id, capacity
FROM (SELECT id, capacity, SUM(capacity) OVER (ORDER BY id ASC) AS cum_sum
FROM mytable) t
WHERE cum_sum < 26
ORDER BY id ASC;

SQL Query to group text based on numeric column

I have a table 'TEST' as shown below
Number | Seq | Name
-------+-------+------
123 | 1 | Hello
123 | 2 | Hi
123 | 3 | Greetings
234 | 1 | Goodbye
234 | 2 | Bye
I want to write a query, to group the table by 'Number', and select the rows with the maximum sequence number (MAX(Seq)). The output of the query would be
Number | Seq | Name
-------+-------+------
123 | 3 | Greetings
234 | 2 | Bye
How do I go about this?
EDIT: TEST is actually a table that is the result from a long query (joining multiple tables) that I have already written. I already have a (SELECT ...) statement to get the values I need. Is there a way to remove duplicate rows (with the same 'Number' as shown above) and select only the one with maximum 'Seq' value.
I am on Microsoft SQL Server 2008 (SP2)
I was hoping there would be a way to achieve this by
SELECT * FROM (SELECT ...) TEST <condition to group>
You can use a select win in clause
select * from test
where (number, count) in (select number, max(count) from test group by Number)
Another option is to use a windowed ROW_NUMBER() function with a partition on the number:
With Cte As
(
Select *,
Row_Number() Over (Partition By Number Order By Count Desc) RN
From TEST
)
Select Number, Count, Name
From Cte
Where RN = 1
SELECT *
FROM (SELECT test.*, MAX (seq) OVER (PARTITION BY num) max_seq
FROM test)
WHERE seq = max_seq
I changed the column name from number because you can't use a reserved word for a column name. This is pretty much the same as the other answers, except that it explicitly gets the maximum sequence number for each NUM.
You want to use an ANALYTIC function together with a conditional clause to get you only the rows of TEST that you desire.
WITH TEST as (
...your really complex query that generates TEST...
)
SELECT
Number, Seq, Name,
RANK() OVER (PARTITION By Number ORDER BY Seq DESC) AS aRank
FROM Test
WHERE aRank = 1
;
This returns the Number, Seq, Name for each Number grouping where the Seq is maximum. Yes, it also returns a column named aRank with all '1' in it...hopefully it can be ignored.
The solution to this is to do an self join on only the MAX(Seq) values.
This answer can be found at SQL Select only rows with Max Value on a Column

Summing and ordering at once

I have a table of orders. There I need to find out which 3 partner_id's have made the largest sum of amount_totals, and sort those 3 from biggest to smallest.
testdb=# SELECT amount_total, partner_id FROM sale_order;
amount_total | partner_id
--------------+------------
1244.00 | 9
3065.90 | 12
3600.00 | 3
2263.00 | 25
3000.00 | 10
3263.00 | 3
123.00 | 25
5400.00 | 12
(8 rows)
Just starting SQL, I find it confusing ...
Aggregated amounts
If you want to list aggregated amounts, it can be as simple as:
SELECT partner_id, sum(amount_total) AS amout_suptertotal
FROM sale_order
GROUP BY 1
ORDER BY 2 DESC
LIMIT 3;
The 1 in GROUP BY 1 is a numerical parameter, that refers to the position in the SELECT list. Just a notational shortcut for GROUP BY partner_id in this case.
This ignores the special case where more than three partner would qualify and picks 3 arbitrarily (for lack of definition).
Individual amounts
SELECT partner_id, amount_total
FROM sale_order
JOIN (
SELECT partner_id, rank() OVER (ORDER BY sum(amount) DESC) As rnk
FROM sale_order
GROUP BY 1
ORDER BY 2
LIMIT 3
) top3 USING (partner_id)
ORDER BY top3.rnk;
This one, on the other hand includes all peers if more than 3 partner qualify for the top 3. The window function rank() gives you that.
The technique here is to group by partner_id in the subquery top3 and have the window function rank() attach ranks after the aggregation (window functions execute after aggregate functions). ORDER BY is applied after window functions and LIMIT is applied last. All in one subquery.
Then I join the base table to this subquery, so that only the top dogs remain in the result and order by rnk.
Window functions require PostgreSQL 8.4 or later.
This is rather advanced stuff. You should start learning SQL with something simpler probably.
select amount_total, partner_id
from (
select
sum(amount_total) amount_total,
partner_id
from sale_order
group by partner_id
) s
order by amount_total desc
limit 3