How do I select sequential numbers for non-unique values? - sql

I've got a column with non-unique values like so:
ID COL_A
0 A
1 B
2 B
3 C
4 D
5 D
6 D
7 E
I would like to select an offset in addition to those two columns which produces the following output:
ID COL_A OFFSET
0 A 0
1 B 0
2 B 1
3 C 0
4 D 0
5 D 1
6 D 2
7 E 0
The offset should be applied so that the value with the lower primary key receives the lower offset.
I could probably come up with a PL/SQL approach to get this, but is this possible in pure SQL?

The row_number() window function is just what the doctor prescribed:
SELECT id, col_a, ROW_NUMBER() OVER (PARTITION BY col_a ORDER BY id) - 1 AS offset
FROM mytable

USE ROW_NUMBER
SELECT COL1,COL2,ROW_NUMBER() OVER (PARTITION BY COL2 ORDER BY Col1) Seq
FROM TableNAme

Related

Repeated values should not show together in SQL

I want to display some data and my requirement is repeated values should not be shown adjacent.
Right now the data in the table is in this order
ID Name
1 A
2 A
3 B
4 C
5 B
6 B
7 C
8 C
9 C
Expected result - It should be in below order
ID Name
1 A
3 B
4 C
2 A
5 B
7 C
6 B
8 C
9 C
This can be done using the ROW_NUMBER window function.
SELECT *, ROW_NUMBER() OVER (PARTITION BY Name ORDER BY ID) AS rn
FROM mytable
ORDER BY rn, Name
db<>fiddle
You can put row_number() directly in the order by. I would recommend:
select t.*
from t
order by row_number() over (partition by name order by id),
name;

Count the number of unique values with at least k occurrences per group in postgres

I have a table with 3 columns that looks like this :
ID | obs_type | Value
1 A 0.1
1 A 0.2
1 B 0.4
2 B 0.5
2 C 0.2
2 C 0.3
3 B 0.1
I want to have the count of IDs with at least k observations in each group Type.
In the example above, if k = 2 (at least 2 observations of the same ID to be counted), I would like to have :
obs_type | count
A 1
B 0
C 1
As there is a single ID with two observations of type A and single ID with two observations of type C.
There are no ID with two observations of type B.
For k = 1, I just do :
SELECT obs_type, COUNT(DISTINCT ID ) FROM table_x GROUP BY obs_type;
But I'm looking for a solution that would work for arbitrary k.
Thanks !!!!
Do the aggregation in two steps:
k = 2 here:
select count(case when cnt >= 2 then cnt end), obs_type
from
(
select count(*) cnt, obs_type
from table_x
group by id, obs_type
) dt
group by obs_type
The derived table (subquery) returns:
cnt obs_type
================ ========
2 A
1 B
1 B
2 C
1 B
Then use a case expression to do conditional aggregation, and you'll get:
SQL>select count(case when cnt >= 2 then cnt end), obs_type
SQL&from
SQL&(
SQL& select count(*) cnt, obs_type
SQL& from table_x
SQL& group by id, obs_type
SQL&) dt
SQL&group by obs_type;
obs_type
==================== ========
1 A
0 B
1 C
3 rows found

PLPGSQL - stored procedure to get a set of rows with count

I am using PostgreSQL.
I need stored procedure using PLPGSQL language that will return table (SET OF RECORDS) containing count of top 2 and bottom 2 results from my_table.
For example:
my_table
id value
1 a
2 a
3 a
4 b
5 b
6 c
7 c
8 e
9 f
10 g
11 g
12 g
13 g
14 h
15 h
Returns:
count value
4 g
3 a
1 e
1 f
Thank you
You can use window functions with aggration
select v.value, v.cnt
from (select value, count(*) as cnt,
row_number() over (order by count(*) desc) as seqnum_desc,
row_number() over (order by count(*) asc) as seqnum_asc
from t
group by value
) v
where seqnum_desc <= 2 or seqnum_asc <= 2;
Note: In the case of ties -- particularly likely at the bottom end -- this returns arbitrary values with the same count. You can adjust for this using rank() or dense_rank(), depending on what you want in this case.

Shuffle column in Google's BigQuery based on groupby

I want to randomly shuffle the values for one single column of a table based on a groupby. E.g., I have two columns A and B. Now, I want to randomly shuffle column B based on a groupby on A.
For an example, suppose that there are three distinct values in A. Now for each distinct value of A, I want to shuffle the values in B, but just with values having the same A.
Example input:
A B C
-------------------
1 1 x
1 3 a
2 4 c
3 6 d
1 2 a
3 5 v
Example output:
A B C
------------------
1 3 x
1 2 a
2 4 c
3 6 d
1 1 a
3 5 v
In this case, for A=1 the values for B got shuffled. The same happened for A=2, but as there is only one row it stayed like it was. For A=3 by chance the values for B also stayed like they were. The values for column C stay as they are.
Maybe this can be solved by using window functions, but I am unsure how exactly.
As a side note: This should be achieved in Google's BigQuery.
Is this what you're after ? (you tagged with both Mysql and Oracle .. so I answer here using Oracle)
[edit] corrected based on confirmed logic [/edit]
with w_data as (
select 1 a, 1 b from dual union all
select 1 a, 3 b from dual union all
select 2 a, 4 b from dual union all
select 3 a, 6 b from dual union all
select 1 a, 2 b from dual union all
select 3 a, 5 b from dual
),
w_suba as (
select a, row_number() over (partition by a order by dbms_random.value) aid
from w_data
),
w_subb as (
select a, b, row_number() over (partition by a order by dbms_random.value) bid
from w_data
)
select sa.a, sb.b
from w_suba sa,
w_subb sb
where sa.aid = sb.bid
and sa.a = sb.a
/
A B
---------- ----------
1 3
1 1
1 2
2 4
3 6
3 5
6 rows selected.
SQL> /
A B
---------- ----------
1 3
1 1
1 2
2 4
3 5
3 6
6 rows selected.
SQL>
Logic breakdown:
1) w_data is just your sample data set ...
2) randomize column a (not really needed, you could just rownum this, and let b randomize ... but I do so love (over)using dbms_random :) heh )
3) randomize column b - (using partition by analytic creates "groups" .. order by random radomizes the items within each group)
4) join them ... using both the group (a) and the randomized id to find a random item within each group.
by doing the randomize this way you can ensure that you get the same # .. ie you start with one "3" .. you end with one "3" .. etc.
I feel below should work in BigQuery
SELECT
x.A as A, x.B as Old_B, x.c as C, y.B as New_B
FROM (
SELECT A, B, C,
ROW_NUMBER() OVER(PARTITION BY A ORDER BY B, C) as pos
FROM [your_table]
) as x
JOIN (
SELECT
A, B, ROW_NUMBER() OVER(PARTITION BY A ORDER BY rnd) as pos
FROM (
SELECT A, B, RAND() as rnd
FROM [your_table]
)
) as y
ON x.A = y.A AND x.pos = y.pos

SQLSERVER group by (aggregate column based on other column)

I have a table which has 3 columns A, B, C
I want to do a query like this:
select A, Max(B), ( C in the row having max B ) from Table group by A.
is there a way to do such a query?
Test Data:
A B C
2 5 3
2 6 1
4 5 1
4 7 9
6 5 0
the expected result would be:
2 6 1
4 7 9
6 5 0
;WITH CTE AS
(
SELECT A,
B,
C,
RN = ROW_NUMBER() OVER(PARTITION BY A ORDER BY B DESC)
FROM YourTable
)
SELECT A, B, C
FROM CTE
WHERE RN = 1
Try this
select t.*
from table t
join (Select A,max(b) B from table group by A) c
on c.a=t.a
and c.b=a.b