Consolidate duplicate rows based on subgrouping

Consolidate duplicate rows based on subgrouping - sql

I have a table of temporal values in which there exist repeated values in groupings, but I want to remove all but one for each grouping and maintain the order (can't just say the distinct values).
If the sequence of rows was as such in order
+------+-----+
| time | col |
+------+-----+
| 1 | A |
| 2 | A |
| 3 | A |
| 4 | B |
| 5 | B |
| 6 | B |
| 7 | C |
| 8 | D |
| 9 | E |
| 10 | A |
| 11 | A |
| 12 | B |
+------+-----+
Then it should be resulted as
+-----+
| col |
+-----+
| A |
| B |
| C |
| D |
| E |
| A |
| B |
+-----+
Is there a way to do this without a cursor? How I would do it in not SQL would be to iterate over the list and say if the current index matches the previous index, then pop it.

SQL tables represent unordered sets. The rows you want to remove depend on the ordering, specifically adjacent identical values are being removed.
In order to have an ordering, the data needs a column that specifies it. Let me assume you have one.
If so, this is easily handled with lag():
select col
from (select t.*, lag(col) over (order by orderingcol) as prev_col
from t
) t
where prev_col <> col or prev_col is null;

Related

SQL SERVER - Select next row value upto 5 characters and then replace first character with new one

I have a situation where i want to read next value of the same column and concat it upto five characters and store it in different column but i am not able to do so
Please see below for better visualization.
Here's the input
------------------------------
| ID | word |
------------------------------
| 1 | M |
| 2 | V |
| 3 | V |
| 4 | M |
| 5 | V |
| 6 | M |
| 7 | V |
| 8 | M |
| 9 | V |
| 10 | V |
------------------------------
Desired output:
--------------------------------------------
| ID | word | expected |
--------------------------------------------
| 1 | M | M |
| 2 | V | MV |
| 3 | V | MVV |
| 4 | M | MVVM |
| 5 | V | MVVMV |
| 6 | M | VVMVM |
| 7 | V | VMVMV |
| 8 | M | MVMVM |
| 9 | V | VMVMV |
| 10 | V | MVMVV |
--------------------------------------------
In this expected column after appending the 5th character when it goes to 6th row and tries to append, it will first remove first character 'M' from 'MVVMV' (5th row) and then append 'M' from 6th row at the end of 'MVVMV' which will be 'VVMVM'
i hope you get this logic as i have tried many ways to achieve this but no luck
Thank you

You can use lag() and concat():
select t.*,
concat(lag(word, 4) over (order by id),
lag(word, 3) over (order by id),
lag(word, 2) over (order by id),
lag(word, 1) over (order by id),
word
) as concat_5
from t;
Unfortunately, SQL Server does not (yet) support STRING_AGG() as a window function. If it did, you could use:
select t.*,
string_agg(word) within group (order by id) over
(order by id rows between 4 preceding and current row) as concat_5
from t;

Select one row inside a group according to a criteria in PostgreSQL

I have a table as such (tbl):
+----+-----+------+-----+
| pk | grp | attr | val |
+----+-----+------+-----+
| 0 | 0 | ohif | 4 |
| 1 | 0 | foha | 56 |
| 2 | 0 | slns | 2 |
| 3 | 1 | faso | 11 |
| 4 | 1 | tepj | 4 |
| 5 | 2 | bnda | 12 |
| 6 | 2 | ojdf | 9 |
| 7 | 2 | anaw | 1 |
+----+-----+------+-----+
I would like to select one row from each group, in particular that with the maximum val for each group.
I can easily select grp and val:
SELECT grp, MAX(val)
FROM tbl
GROUP BY grp
Yielding this table (tbl2):
+-----+-----+
| grp | val |
+-----+-----+
| 0 | 56 |
| 1 | 11 |
| 2 | 12 |
+-----+-----+
However, I want this table:
+----+-----+------+-----+
| pk | grp | attr | val |
+----+-----+------+-----+
| 1 | 0 | foha | 56 |
| 3 | 1 | faso | 11 |
| 5 | 2 | bnda | 12 |
+----+-----+------+-----+
Since (grp, val) constitutes a key, I could left-join tbl2 with tbl on same grp and val.
However, I was wondering if there was any other solution: in my real-world situation tbl is a pretty complex and heavy derived table, and I have the design constrain of not being able to use temp tables. Is there any way to order the rows inside each group according to val and to then take the first record for each group?
I'm on PostgreSQL 10, but a standard SQL solution would be the best.

In Postgres, the best approach is distinct on:
SELECT DISTINCT ON (t.grp) t.*
FROM tbl
ORDER BY grp, val DESC;
In particular, this can take advantage of an index on (grp, val desc).

Sorting column A based on a column B which contains previous values from column A

I'd like to sort column A based on a column B which contains previous values from column A.
This is what I have:
+----+----------+----------+
| ID | A | B |
+----+----------+----------+
| 1 | 17209061 | |
| 2 | 53199491 | 51249612 |
| 3 | 61249612 | 17209061 |
| 4 | 51249612 | 61249612 |
+----+----------+----------+
And this is what I'd like to have:
+----+----------+----------+----------+
| ID | A | B | Sort_seq |
+----+----------+----------+----------+
| 1 | 17209061 | | 1 |
| 3 | 61249612 | 17209061 | 2 |
| 4 | 51249612 | 61249612 | 3 |
| 2 | 53199491 | 51249612 | 4 |
+----+----------+----------+----------+
I'm sure there's an easy way to do this. Do you have any ideas?
Thank you!

Just use lag() in order by:
order by lag(a) over (order by id) nulls first
If you want a column, then:
select t.id, t.a, t.prev_a,
row_number() over (order by prev_a nulls first)
from (select t.*, lag(a) over (order by id) as prev_a
from t
) t;

SQL - LIMITing AND filtering a join at same time

I need a solution for following Problem.
I have two Tables:
ids from new user (got by subquery)
+------------+
| user_id |
+------------+
| 1 |
| 4 |
| 5 |
+------------+
users (table with all users)
+------------+
| user_id |
+------------+
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
| ... |
+------------+
i need to join this two tables. every new user needs exactly 3 connections to other users.
for example:
+----------+------+
| new_user | user |
+----------+------+
| 1 | 2 |
| 1 | 3 |
| 1 | 4 |
| 4 | 1 |
| 4 | 2 |
| 4 | 3 |
| 5 | 1 |
| 5 | 2 |
| 5 | 3 |
+----------+------+
the problem is to limit the entries to exactly 3 and to exclude redundant entries (like 1|1, 3|3, ...)

In PostgreSQL you can use a lateral query to retrieve a limited number of rows in a subquery.
I don't know the exact structure of your main query or subquery but it should look like:
select t.*, ls.*
from main_table t,
lateral ( -- lateral subquery
select * from secondary_table s
where s.col1 = t.col2 -- filtering condition, if needed
fetch first 3 rows only -- limit to a max of 3 rows
) ls;
The lateral subquery is executed once per every row in the main_table.

SQL get interection of values across multiple rows grouped by primary key

I have table with data as follows
+----+------+
| id | code |
+----+------+
| 1 | M |
| 1 | Y |
| 2 | M |
| 2 | S |
| 3 | M |
| 3 | Q |
+----+------+
I would like to know if its possible to write a query that would return a list of codes that are unique to each ID? If there is no intersection the query should return no rows.
In the example above the only value common to all is M.
+----+------+
| id | code |
+----+------+
| 1 | M |
| 1 | S |
| 2 | M |
| 2 | S |
| 2 | H |
| 3 | M |
| 3 | S |
| 3 | Q |
+----+------+
The above would return M and S, common to all three ID's
Thanks

Try this:
SELECT code
FROM mytable
GROUP BY code
HAVING COUNT(*) = (SELECT COUNT(DISTINCT id) FROM mytable)
The above query assumes that code can appear only once per id.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Consolidate duplicate rows based on subgrouping - sql

Related

SQL SERVER - Select next row value upto 5 characters and then replace first character with new one

Select one row inside a group according to a criteria in PostgreSQL

Sorting column A based on a column B which contains previous values from column A

SQL - LIMITing AND filtering a join at same time

SQL get interection of values across multiple rows grouped by primary key

Categories

Resources