PostgreSQL previous and next group value - sql

The problem is the following:
Suppose, I have a table of such view (it is a sub-sample of the table I'm working with):
| col1 | col2 |
|------|------|
| 1 | a2 |
| 1 | b2 |
| 2 | c2 |
| 2 | d2 |
| 2 | e2 |
| 1 | f2 |
| 1 | g2 |
| 3 | h2 |
| 1 | j2 |
I need to add two new columns
prev containing the previous value in col1 not equal to the current
next containing the next value in col1 not equal to the current
If there is no previous value, prev should contain the current col1's value as well as next should contain the current value if no next values exist.
Result should have the following form:
| col1 | col2 | prev | next |
|------|------|------|------|
| 1 | a2 | 1 | 2 |
| 1 | b2 | 1 | 2 |
| 2 | c2 | 1 | 1 |
| 2 | d2 | 1 | 1 |
| 2 | e2 | 1 | 1 |
| 1 | f2 | 2 | 3 |
| 1 | g2 | 2 | 3 |
| 3 | h2 | 1 | 1 |
| 1 | j2 | 3 | 1 |
I will be grateful any help.

You can try this using combination of window functions lead, lag, first_value, last_value and sum.
select
t.col1, t.col2, n,
coalesce(first_value(y) over (partition by x order by col2), col1) prev_val,
coalesce(last_value(y2) over (partition by x order by col2
rows between current row and unbounded following), col1) next_val
from (
select
t.*,
case when col1 <> lag(col1) over (order by col2) then lag(col1) over (order by col2) end y,
case when col1 <> lead(col1) over (order by col2) then lead(col1) over (order by col2) end y2,
sum(n) over (order by col2) x
from (
select
t.*,
case when col1 <> lag(col1) over (order by col2) then 1 else 0 end n
from t
) t
) t;
It finds the lead/lag per group of rows.

If I assume that you have an id column that specifies the ordering, then this is possible. I'm just not sure this is easily expressed using window functions.
You can use correlated subqueries:
select t.*,
(select t2.col1
from t t2
where t2.id < t.id and t2.col1 <> t.col1
order by t2.id desc
fetch first 1 row only
) as prev_col1,
(select t2.col1
from t t2
where t2.id > t.id and t2.col1 <> t.col1
order by t2.id asc
fetch first 1 row only
) as prev_col2
from t;
You can add the coalece() for missing previous and next values. That is not the interesting part of the problem.

WITH cte AS (
SELECT row_number() over() rowid, *
FROM unnest(array[1,1,2,2,2,1,1,3,1], array['a2','b2','c2','d2','e2','f2','g2','h2','j2']) t(col1,col2)
)
SELECT t.col1,
t.col2,
COALESCE(prev.col1,t.col1) prev,
COALESCE("next".col1,t.col1) "next"
FROM cte t
LEFT JOIN LATERAL (SELECT prev.col1
FROM cte prev
WHERE prev.rowid < t.rowid
AND prev.col1 != t.col1
ORDER BY prev.rowid DESC
LIMIT 1
) prev ON True
LEFT JOIN LATERAL (SELECT "next".col1
FROM cte "next"
WHERE "next".rowid > t.rowid
AND "next".col1 != t.col1
ORDER BY "next".rowid ASC
LIMIT 1
) "next" ON True

Related

Assign unique id based on combination in sql

The data looks like this:
Need to assign id based on the combination of 2 columns and get the id of each value in 2 columns
final output should look like:
I tried with
WITH RNS AS (
SELECT *, ROW_NUMBER() OVER () AS rn
FROM test
),
IDS AS (
SELECT t1.coLA, t1.colB, t1.rn, MIN(COALESCE(t2.rn, t1.rn)) AS id
FROM RNS t1
LEFT JOIN RNS t2 ON t1.rn > t2.rn
AND (t1.colA = t2.colA OR t1.colA = t2.colB OR
t1.colB = t2.colA OR t1.colB = t2.colB)
GROUP BY t1.coLA, t1.colB, t1.rn
ORDER BY t1.rn
)
SELECT colA, colB, DENSE_RANK() OVER (ORDER BY id) AS id
FROM IDS
ORDER BY rn
but not working as expected
Using RECURSIVE CTE in BigQuery, you may try below query
WITH RECURSIVE test AS (
SELECT * EXCEPT(offset)
FROM UNNEST(SPLIT('abaaeghjc', '')) colA WITH OFFSET
JOIN UNNEST(SPLIT('bccdfhikl', '')) colB WITH OFFSET USING (offset)
),
IDS AS (
SELECT *, DENSE_RANK() OVER (ORDER BY colA) id
FROM test t1
WHERE NOT EXISTS (SELECT 1 FROM test t2 WHERE t1.colA = t2.colB)
UNION ALL
SELECT t.*, id FROM IDS i JOIN test t ON i.colB = t.colA
)
SELECT DISTINCT col, id FROM IDS, UNNEST([colA, colB]) col
ORDER BY 1;
Query results:
+-----+----+
| col | id |
+-----+----+
| a | 1 |
| b | 1 |
| c | 1 |
| d | 1 |
| e | 2 |
| f | 2 |
| g | 3 |
| h | 3 |
| i | 3 |
| j | 4 |
| k | 4 |
| l | 1 |
+-----+----+

SQL: Select Most Recent Sequentially Distinct Value w/ Grouping

I am having trouble writing a query that would select the last "new" sequentially distinct value (let's call this column Col A) grouped based on another column (Col B). Since this is a bit ambiguous/confusing, here is an example to explain (assume row number is indicative of sequence inside groups; in my issue the rows are ordered by date):
|--------|-------|-------|
| RowNum | Col A | Col B |
|--------|-------|-------|
| 1 | A | A |
| 2 | B | A |
| 3 | C | A |
| 4 | B | B |
| 5 | A | B |
| 6 | B | B |
Would select:
| 3 | C | A |
| 6 | B | B |
Note that although B also appears in row 4, the fact that row 5 contains A means that the B in row 6 is sequentially distinct. But if table looked like this:
|--------|-------|-------|
| RowNum | Col A | Col B |
|--------|-------|-------|
| 1 | A | A |
| 2 | B | A |
| 3 | C | A |
| 4 | B | B |
| 5 | A | B |
| 6 | A | B | <--
Then we would want to select:
| 3 | C | A |
| 5 | A | B |
I think that this would be an easier problem if I wasn't concerned with values being distinct but not sequential. I'm not really sure how to even consider sequence when making a query.
I have attempted to solve this by calculating the min/max row numbers where each value of Col A appears. That calculation (using the second sample table) would produce a result like this:
|--------|--------|--------|--------|
| ColA | ColB | MinRow | MaxRow |
|--------|--------|--------|--------|
| A | A | 1 | 1 |
| B | A | 2 | 2 |
| C | A | 3 | 3 |
| A | B | 5 | 6 |
| B | B | 4 | 4 |
A solution raised in a related post (SQL: Select Row with Last New Sequentially Distinct Value) went on a similar path, essentially taking the most recent RowNum which differs from the last ColA and then picks the next row. However, in that question I failed to address the need for the query to work for multiple groups, hence the new post.
Any help with this problem, if it is at all possible to do in SQL, would be greatly appreciated. I am running SQL 2008 SP4.
Hmmm . . . One method is to get the last value. Then choose all the last rows with that value and aggregate:
select min(rownum), colA, colB
from (select t.*,
first_value(colA) over (partition by colB order by rownum desc) as last_colA
from t
) t
where rownum > all (select t2.rownum
from t t2
where t2.colB = t.colB and t2.colA <> t.last_colA
)
group by colA, colB;
Or, without the aggregation:
select t.*
from (select t.*,
first_value(colA) over (partition by colB order by rownum desc) as last_colA,
lag(colA) over (partition by colB order by rownum) as prev_clA
from t
) t
where rownum > all (select t2.rownum
from t t2
where t2.colB = t.colB and t2.colA <> t.last_colA
) and
(prev_colA is null or prev_colA <> colA);
But in SQL Server 2008, let's treat this as a gaps-and-islands problem:
select t.*
from (select t.*,
min(rownum) over (partition by colB, colA, (seqnum_b - seqnum_ab) ) as min_rownum_group,
max(rownum) over (partition by colB, colA, (seqnum_b - seqnum_ab) ) as max_rownum_group
from (select t.*,
row_number() over (partition by colB order by rownum) as seqnum_b,
row_number() over (partition by colB, colA order by rownum) as seqnum_ab,
max(rownum) over (partition by colB order by rownum) as max_rownum
from t
) t
) t
where rownum = min_rownum_group and -- first row in the group defined by adjacent colA, colB
max_rownum_group = max_rownum -- last group for each colB;
This identifies each of the groups using a difference of row numbers. It calculates the maximum rownum for the group and overall in the data. These are the same for the last group.

Fetch 2 records for each group name of element of column 1 that in column 2 have the most value

I have a table that like below. Now i want to fetch 2 records for each group name of element of column 1 that in column 2 have the most value. For example, Fetch 85 and 75 for A, 65 and 45 for B ...
I use oracle database.
TNX
----------------------
|Column 1 | Column 2 |
----------------------
| A | 85 |
----------------------
| A | 75 |
---------------------
| A | 60 |
---------------------
| A | 50 |
---------------------
| B | 65 |
---------------------
| B | 45 |
---------------------
| B | 35 |
---------------------
| B | 25 |
---------------------
You can use row_number() :
select t.*
from (select t.*,
row_number() over (partition by col1 order by col2 desc) as seq
from table t
) t
where seq <= 2;
However, fetch first . . . clause also helpful :
select t.*
from table t
where t.col2 in (select t1.col2
from table t1
where t1.col1 = t.col1
order by t1.col2 desc
fetch first 2 rows only
);
use row_number window function
with t1 as
(
select col1,col2
,row_number() over(partition by col1 order by col2 desc) rn
from table_name
) select * from t1 where rn<=2
Try using row_number()
select * from
(
select *, row_number() over(partition by col1 order by col2 desc) as rn
from tablename
) A where rn in (1,2)
Try this
Here is demo solution
SQL
select col1,col2 from (select col1,col2, rank() over(partition by col1 order by col2 desc) as rn from t) q where q.rn<=2;

If 2 rows have the same ID select one with the greater other column value

I'm having difficulty getting my head round this one, which should be simple.
When selecting from the table, if multiple rows have the same ID then select the row which has a greater value in Col2.
Here is my sample table:
ID | Col2 |
----------------
123 | 1 |
123 | 2 |
1234 | 2 |
12345 | 3 |
Expected output:
ID | Col2 |
----------------
123 | 2 |
1234 | 2 |
12345 | 3 |
For this example, group by is sufficient;
select id, max(col2) as col2
from t
group by id;
If you want the row with the maximum column, then I would often recommend row_number():
select t.*
from (select t.*, row_number() over (partition by id order by col2 desc) as seqnum
from t
) t
where seqnum = 1;
However, the "old-fashioned" method might have better performance:
select t.*
from t
where t.col2 = (select max(t2.col2) from t t2 where t2.id = t.id);
NOT EXISTS operator can also be used:
SELECT * FROM Table1 t1
WHERE NOT EXISTS(
SELECT 'Anything' FROM Table1 t2
WHERE t1.id = t2.id
AND t1.Col2 < t2.col2
)
Demo: http://sqlfiddle.com/#!18/5e1d6/3
| ID | Col2 |
|-------|------|
| 123 | 2 |
| 1234 | 2 |
| 12345 | 3 |

Trying to write a query that will display duplicates results as null

I have a table that looks like the first example.
I'm trying to write a MSSQL2012 statement that that will display results like the second example.
Basically I want null values instead of duplicate values in columns 1 and 2. This is for readability purposes during reporting.
This seems like it should be possible, but I'm drawing a blank. No amount of joins or unions I've written has rendered the results I need.
| Col1 | Col2 | Col3 |
+------+------+------+
| 1 | 2 | 4 |
| 1 | 2 | 5 |
| 1 | 3 | 6 |
| 1 | 3 | 7 |
+------+------+------+
| Col1 | Col2 | Col3 |
+------+------+------+
| 1 | 2 | 4 |
| Null | null | 5 |
| null | 3 | 6 |
| null | null | 7 |
+------+------+------+
I would do this with no subqueries at all:
select (case when row_number() over (partition by col1 order by col2, col3) = 1
then col1
end) as col1,
(case when row_number() over (partition by col2 order by col3) = 1
then col2
end) as col2,
col3
from t
order by t.col1, t.col2, t.col3;
Note that the order by at the end of the query is very important. The result set that you want depends critically on the ordering of the rows. Without the order by, the result set could be in any order. So, the query might look like it works, and then suddenly fail one day or on a slightly different set of data.
Using a common table expression with row_number():
;with cte as (
select *
, rn_1 = row_number() over (partition by col1 order by col2, col3)
, rn_2 = row_number() over (partition by col1, col2 order by col3)
from t
)
select
col1 = case when rn_1 > 1 then null else col1 end
, col2 = case when rn_2 > 1 then null else col2 end
, col3
from cte
without the cte
select
col1 = case when rn_1 > 1 then null else col1 end
, col2 = case when rn_2 > 1 then null else col2 end
, col3
from (
select *
, rn_1 = row_number() over (partition by col1 order by col2, col3)
, rn_2 = row_number() over (partition by col1, col2 order by col3)
from t
) sub
rextester demo: http://rextester.com/UYA17142
returns:
+------+------+------+
| col1 | col2 | col3 |
+------+------+------+
| 1 | 2 | 4 |
| NULL | NULL | 5 |
| NULL | 3 | 6 |
| NULL | NULL | 7 |
+------+------+------+