Window functions limited by value in separate column

Window functions limited by value in separate column - sql

I have a "responses" table in my postgres database that looks like
| id | question_id |
| 1 | 1 |
| 2 | 2 |
| 3 | 1 |
| 4 | 2 |
| 5 | 2 |
I want to produce a table with the response and question id, as well as the id of the previous response with that same question id, as such
| id | question_id | lag_resp_id |
| 1 | 1 | |
| 2 | 2 | |
| 3 | 1 | 1 |
| 4 | 2 | 2 |
| 5 | 2 | 4 |
Obviously pulling "lag(responses.id) over (order by responses.id)" will pull the previous response id regardless of question_id. I attempted the below subquery, but I know it is wrong since I am basically making a table of all lag ids for each question id in the subquery.
select
responses.question_id,
responses.id as response_id,
(select
lag(r2.id, 1) over (order by r2.id)
from
responses as r2
where
r2.question_id = responses.question_id
)
from
responses
I don't know if I'm on the right track with the subquery, or if I need to do something more advanced (which may involve "partition by", which I do not know how to use).
Any help would be hugely appreciated.

Use partition by. There is no need for a correlated subquery here.
select id,question_id,
lag(id) over (partition by question_id order by id) lag_resp_id
from responses

Related

SQL - Group two rows by columns that value and null on different columns

Question
Say I have a table with such rows:
id | country | place | last_action | second_to_last_action
----------------------------------------------------------
1 | US | 2 | reply |
1 | US | 2 | | comment
4 | DE | 5 | reply |
4 | | | | comment
What I want to do is to combine these by id, country and place so that the last_action and second_to_last_action would be on the same row
id | country | place | last_action | second_to_last_action
----------------------------------------------------------
1 | US | 2 | reply | comment
4 | DE | 5 | reply | comment
How would I approach this? I guess I would need an aggregate here but my mind is hitting completely blank on which one should I use.
It can be expected that there will always be a matching pair.
Background:
Note: this table has been derived from something like this:
id | country | place | action | time
----------------------------------------------------------
1 | US | 2 | reply | 16:15
1 | US | 2 | comment | 15:16
1 | US | 2 | view | 13:16
4 | DE | 5 | reply | 17:15
4 | DE | 5 | comment | 16:16
4 | DE | 5 | view | 14:12
Code used to partition was:
row_number() over (partition by id order by time desc) as event_no
And then I got the last and second_to_last action by getting event_no 1 & 2. So if there's more efficient way to get the last two actions in two distinct columns I would be happy to hear that.

You can fix your first data by using aggregation:
select id, country, place, max(last_action), max(second_to_last_action)
from derived
group by id, country, place;
You can do this from the original table using conditional aggregation:
select id, country, place,
max(case when seqnum = 1 then action end) as last_action,
max(case when seqnum = 2 then action end) as second_to_last_action
from (select t.*,
row_number() over (partition by id order by time desc) as seqnum
from t
) t
group by id, country, place;

More efficient way to SELECT rows from PARTITION BY

Suppose I have the following table:
+----+-------------+-------------+
| id | step_number | employee_id |
+----+-------------+-------------+
| 1 | 1 | 3 |
| 1 | 2 | 3 |
| 1 | 3 | 4 |
| 2 | 2 | 3 |
| 2 | 3 | 4 |
| 2 | 4 | 5 |
+----+-------------+-------------+
My desired results are:
+----+-------------+-------------+
| id | step_number | employee_id |
+----+-------------+-------------+
| 1 | 1 | 3 |
| 2 | 2 | 3 |
+----+-------------+-------------+
My current solution is:
SELECT
*
FROM
(SELECT
id,
step_number,
MIN(step_number) OVER (PARTITION BY id) AS min_step_number,
employee_id
FROM
table_name) AS t
WHERE
t.step_number = t.min_step_number
Is there a more efficient way I could be doing this?
I'm currently using postgresql, version 12.

In Postgres, I would recommend using distinct on to adress this greatest-n-per-group problem:
select distinct on (id) t.*
from mytbale t
order by id, step_number
This Postgres extension to the SQL standard has usually better performance than the standard approach using window functions (and, as a bonus, the syntax is neater).
Note that this assumes unicity of (id, step_number) tuples: otherwise, the results might be different than those of your query (which allows ties, while distinct on does not).

What Clause would most optimally create this query?

So I don't have much experience with SQL, and am trying to learn. An interview question I came across had this question. I'm trying to learn more SQL but maybe I'm missing a piece of info to solve this? Or maybe I'm approaching the problem wrong.
This is the question:
We have following two tables , below is their info:
POLICY (id as int, policy_content as varchar2)
POLICY_VOTES (vote as boolean, policy_id as int)
Write a single query that returns the policy_id, number of yes(true) votes and number of no(false) votes with a row for each policy up for a vote stored
My first thought when approaching this was to use a WITH clause to get the policy_ids and use an inner join to get the votes for yes and no but I can't find a way to make it work, which is what leads me to believe that there's another clause in SQL I'm not aware of or couldn't find that would make it easier. Either that or I'm thinking of the problem in the wrong way.

Good question.
I cannot answer too specifically, since you did not specify a DBMS, but what you will want to do is count or situationally sum based on criteria. When you use an aggregate function like that, you also need GROUP BY.
Here are two example tables I made with test data:
policy
| id | policy_content |
|----|----------------|
| 1 | foo |
| 2 | foo |
| 3 | foo |
| 4 | foo |
| 5 | foo |
policy votes
| vote | policy_id |
|------|-----------|
| yes | 1 |
| no | 1 |
| yes | 2 |
| yes | 2 |
| no | 3 |
| no | 3 |
| no | 4 |
| yes | 4 |
| yes | 5 |
| yes | 5 |
Using the below query:
SELECT
policy_votes.policy_id,
SUM(CASE WHEN vote = 'yes' THEN 1 ELSE 0 END) AS yes_votes,
SUM(CASE WHEN vote = 'no' THEN 1 ELSE 0 END) AS no_votes
FROM
policy_votes
GROUP BY
policy_votes.policy_id
You get:
| POLICY_ID | YES_VOTES | NO_VOTES |
|-----------|-----------|----------|
| 1 | 1 | 1 |
| 2 | 2 | 0 |
| 4 | 1 | 1 |
| 5 | 2 | 0 |
| 3 | 0 | 2 |
Here is an SQL Fiddle for you to try it out.

Try this:
select p.id, p.content,
Count(case when pv.vote='true' then 1 end) as number_of_yes,
Count(case when pv.vote='false' then 1 end) as number_of_no
From policy p join policy_votes pv
On(p.id = pv.policy_id)
Group by p.id, p.content
Cheers!!

SQL Group by one column and decide which column to choose

Let's say I have data like this :
| id | code | name | number |
-----------------------------------------------
| 1 | 20 | A | 10 |
| 2 | 20 | B | 20 |
| 3 | 10 | C | 30 |
| 4 | 10 | D | 80 |
I would like to group rows by code value, but get real rows back (not some aggregate function).
I know that just
select *
from table
group by code
won't work because database don't know which row to return where code is the same.
So my question is how to tell database to select (for example) the lower number column so in my case
| id | code | name | number |
-----------------------------------------------
| 1 | 20 | A | 10 |
| 3 | 10 | C | 30 |
P.S.
I know how to do this by PARTITION but this is only allowed in Oracle databases and can't be created in JPA criteria builder (what is my ultimate goal).

Why You don't use code like this?
SELECT
id,
code,
name,
number
FROM
(
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY code ORDER BY number ASC) AS RowNo
FROM table
) s
WHERE s.RowNo = 1

You can look at this site;
Data Partitioning

selecting data with highest field value in a field

I have a table, and I'd like to select rows with the highest value. For example:
----------------
| user | index |
----------------
| 1 | 1 |
| 2 | 1 |
| 2 | 2 |
| 3 | 4 |
| 3 | 7 |
| 4 | 1 |
| 5 | 1 |
----------------
Expected result:
----------------
| user | index |
----------------
| 1 | 1 |
| 2 | 2 |
| 3 | 7 |
| 4 | 1 |
| 5 | 1 |
----------------
How may I do so? I assume it can be done by some oracle function I am not aware of?
Thanks in advance :-)

You can use MAX() function for that with grouping user column like this:
SELECT "user"
,MAX("index") AS "index"
FROM Table1
GROUP BY "user"
ORDER BY "user";
Result:
| USER | INDEX |
----------------
| 1 | 1 |
| 2 | 2 |
| 3 | 7 |
| 4 | 1 |
| 5 | 1 |
See this SQLFiddle

if you have more than one column
select user , index
from (
select u.* , row_number() over (partition by user order by index desc) as rnk
from some_table u)
where rnk = 1
user is a reserved word - you should use a different name for the column.

select user,max(index) index from tbl
group by user;

Alternatively, you can use analytic functions:
select user,index, max(index) over (partition by user order by 1 ) highest from YOURTABLE
Note: Try NOT to use words like user, index, date etc.. as your column names, as they are reserved words for Oracle. If you will use, then use them with quotation marks, eg. "index", "date"...

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Window functions limited by value in separate column - sql

Use partition by. There is no need for a correlated subquery here. select id,question_id, lag(id) over (partition by question_id order by id) lag_resp_id from responses

Related

SQL - Group two rows by columns that value and null on different columns

More efficient way to SELECT rows from PARTITION BY

What Clause would most optimally create this query?

SQL Group by one column and decide which column to choose

selecting data with highest field value in a field

Categories

Resources