ROW_NUMBER() for rows which consists of more rows - sql

I have this table
ObjectId| Value
---------------------
1 | A
1 | A
1 | A
5 | B
5 | B
5 | B
ordered by value and try to get "row number" this way (one row consists from multiple rows):
RowNumber | ObjectId | Value
------------------------------------
1 | 1 | A
1 | 1 | A
1 | 1 | A
2 | 5 | B
2 | 5 | B
2 | 5 | B
Any idea?
Thank you

You are looking for dense_rank:
select dense_rank() over (order by Value), ObjectId, Value
from thistable;
You can include two columns like this:
select dense_rank() over (order by ObjectId, Value), ObjectId, Value
from thistable;

Look at dense_rank(), this will continue with the next number in sequence. There's an example here.
SQL Fiddle
Returns the rank of rows within the partition of a result set, without
any gaps in the ranking. The rank of a row is one plus the number of
distinct ranks that come before the row in question.

Related

SQL : picking distinct values based on rank

I'm trying to find out the rank/row_number of IDs in a dataset and assign one ID to one cluster based on rank. The catch is, the same ID can be rank 1 for two different clusters. In this case, if one ID has already been assigned to one cluster, then the next rank should be assigned to the other cluster.
CLUSTER
ID
RNK
CLST1
ID1
1
CLST1
ID2
2
CLST2
ID1
1
CLST2
ID2
2
In this dataset, if ID1 is assigned to CLST1, then ID2 must be picked for CLST2 based on rank. How can I achieve this in Redshift?
If you don't want duplicate rank numbers nor gaps use row_number().
The following script shows the difference between rank(), dense_rank() and row_number() when there is a duplicate value.
select
id,
rank() over (order by id) "rank",
dense_rank() over (order by id) "dense_rank",
row_number() over (order by id) "row_number"
from t;
id | rank | dense_rank | row_number
-: | ---: | ---------: | ---------:
1 | 1 | 1 | 1
2 | 2 | 2 | 2
3 | 3 | 3 | 3
3 | 3 | 3 | 4
4 | 5 | 4 | 5
5 | 6 | 5 | 6
MySQL db<>fiddle here
PostgreSQL db<>fiddle here

How to count/increment the current number of occurances of a table column in a MS SQL select

I have a table which looks like this:
id | name| fk_something
----------------
0 | 25 | 3
1 | 25 | 2
2 | 23 | 1
and I want to add another column with a number which increments everytime row name occurs, e.g.:
id | name| fk_something| n
--------------------------
0 | 25 | 3 | 1
1 | 25 | 2 | 2
2 | 23 | 1 | 1
I'm not really sure how to achieve this. Using count() I will only get the total number of occurances of name but I want to increment n so that I have a distinct value for each row.
You want row_number() :
select t.*, row_number() over (partition by name order by id) as n
from table t;
You may try using COUNT as an analytic function:
SELECT
id,
name,
fk_something,
COUNT(*) OVER (PARTITION BY name ORDER BY id) n
FROM yourTable
ORDER BY
id;
Demo

How can I select each particular data up to a certain quantity?

How can I select each particular data upto a certain quantity. For example in the below table, there are 4 A, 4 B, 2 C and 1 D. Now I want to select all letters but not more than two each of it, Which will yield 2 A, 2 B, 2 C and 1 D.
+====+========+
| ID | Letter |
+====+========+
| 1 | A |
+----+--------+
| 2 | B |
+----+--------+
| 3 | B |
+----+--------+
| 4 | C |
+----+--------+
| 5 | A |
+----+--------+
| 6 | A |
+----+--------+
| 7 | C |
+----+--------+
| 8 | B |
+----+--------+
| 9 | B |
+----+--------+
| 10 | D |
+----+--------+
| 11 | A |
+----+--------+
Can anyone please help me for the above scenario?
I can think of a simple way:
select
case
when count(*) > 1
then 2
else count(*)
end,
second_column
from your_table
group by second_column;
This will give the result you want, but it won't really 'select ONLY two or less records' of each.
Using a ROW_NUMBER() function and a derived table:
CREATE TABLE myTable (id int, Letter varchar(1))
INSERT INTO myTable
VALUES (1,'A')
,(2,'B')
,(3,'B')
,(4,'C')
,(5,'A')
,(6,'A')
,(7,'C')
,(8,'B')
,(9,'B')
,(10,'D')
,(11,'A')
SELECT id, Letter
FROM
(SELECT *
,ROW_NUMBER() OVER(PARTITION BY Letter ORDER BY Letter) as rn
FROM myTable) myTable
WHERE rn = 1 or rn = 2
In essence, "cut" (PARTITION) the rows by Letters, and assign them each a number for its unique group, then pick the first two of each Letter.
Try it here:
http://rextester.com/WTKYCE51114
Use ROW_NUMBER() function to tag each record the row number and PARTITION it BY (grouping by) letter and ORDER it BY (id)
SELECT id,
letter
FROM (SELECT *,
ROW_NUMBER() OVER(PARTITION BY letter ORDER BY id) rnum
FROM myTable
) t
WHERE rnum <=2
Ordering it by id, you will have the first two instances of each letter in ascending order, thus you will have below result (note that id 1 and 5 are selected for A, 2 and 3 for B)
id letter
1 A
5 A
2 B
3 B
4 C
7 C
10 D

Efficient progressive sum

I have the following table:
+----+-------+
| id | value |
+----+-------+
| 1 | 10 |
| 2 | 11 |
| 3 | 12 |
+----+-------+
I want to calculate a column on the fly to sum value of all the previous rows, to come up with something like this:
+----+-------+--------+
| id | value | offset |
+----+-------+--------+
| 1 | 10 | 0 |
| 2 | 11 | 10 |
| 3 | 12 | 21 |
+----+-------+--------+
What is an efficient way to do this?
Credit goes to Egor Skriptunoff.
select
id,
value,
nvl(
sum(value) over (
order by id rows between unbounded preceding and 1 preceding
), 0) as offset
from table
The great thing about analytic function sum is it's progressive, in the sense that in each iteration the engine remembers the value that was calculated for the previous row and only adds value of the previous row to the total. In other words, for each offset to be calculated, it is summing the previous row offset with value. This is very efficient and scales up nicely.
If your id values will be in sequence like 1,2,3 etc.. then
select a.*,(select sum(decode(a.id,1,0,b.value)) off_set from table b where b.id<=a.id-1)
from table a;
If your id's are not in sequence then try below code
select a.*,(select sum(decode(a.rn,1,0,b.value)) off_set from (select table.*,rownum rn from table) b
where b.rn<=a.rn-1)
from (select table.*,rownum rn from table) a;

SQL distinct/groupby on combination of columns

I am trying to do a SQL select on a table based on two columns, but not in the usual way where the combination of values in both columns must be unique; I want to select where the value can only appear once in either column.
Given the dataset:
|pkid | fkself | otherData |
|-----+--------+-----------|
| 1 | 4 | there |
| 4 | 1 | will |
| 3 | 6 | be |
| 2 | 5 | other |
| 5 | 2 | data |
| 6 | 3 | columns |
I need to return either
|pkid | fkself | otherData |
|-----+--------+-----------|
| 1 | 4 | there |
| 3 | 6 | be |
| 2 | 5 | other |
or
|pkid | fkself | otherData |
|-----+--------+-----------|
| 4 | 1 | will |
| 5 | 2 | data |
| 6 | 3 | columns |
The only way I can think of to do this is to concatenate `pkid and fkid in order so that both row 1 and row 2 would concatenate to 1,4, but I'm not sure how to do that, or if it is even possible.
The rows will have other data columns, but it does not matter which row I get, only that I get each ID only once, whether the value is in pkid or fkself.
You can use least and greatest to get the smallest or biggest value of the two. That allows you to put them in the right order to generate those keys for you. You could concatenate the values as you suggested, but it's not needed in this solution. With dense_rank you can generate a sequence for each of those fictional keys. Then, you can get the first OtherData from that sequence.
select
pkid,
fkself,
otherData
from
(select
pkid,
fkself,
otherData,
dense_rank() over (partition by least(pkid, fkself), greatest(pkid, fkself) order by pkid) as rank
from
YourTable t)
where
rank = 1
Your idea is possible, and it should produce the results you want.
SELECT DISTINCT joinedID
FROM (
SELECT min(id) & "," & max(id) as joinedID
FROM (
SELECT pkid as id, someUniqueValue
FROM table
UNION ALL
SELECT fkself as id, someUniqueValue
FROM table)
GROUP BY someUniqueValue )
This will give you a unique list of IDs, concatenated as you like. You can easily include other fields by adding them to each SELECT statement. Also, someUniqueValue can be either an existing unique field, a new unique field, or the concatenated pkid and fkself, if that combination is unique.
The only way I can think of to do this is to concatenate `pkid and
fkid in order so that both row 1 and row 2 would concatenate to 1,4,
but I'm not sure how to do that, or if it is even possible.
You could do it using a CASE statement in Oracle:
SQL> SELECT * FROM sample
2 /
PKID FKSELF
---------- ----------
1 4
4 1
3 6
2 5
5 2
7 7
6 rows selected.
SQL> l
1 SELECT DISTINCT *
2 FROM (
3 SELECT CASE WHEN pkid <= fkself THEN pkid||','||fkself
4 ELSE fkself||','||pkid
5 END "JOINED"
6 FROM sample
7* )
SQL> /
JOINED
-------------------------------------------------------------------------------
1,4
2,5
3,6
7,7