How can I select each particular data up to a certain quantity? - sql

How can I select each particular data upto a certain quantity. For example in the below table, there are 4 A, 4 B, 2 C and 1 D. Now I want to select all letters but not more than two each of it, Which will yield 2 A, 2 B, 2 C and 1 D.
+====+========+
| ID | Letter |
+====+========+
| 1 | A |
+----+--------+
| 2 | B |
+----+--------+
| 3 | B |
+----+--------+
| 4 | C |
+----+--------+
| 5 | A |
+----+--------+
| 6 | A |
+----+--------+
| 7 | C |
+----+--------+
| 8 | B |
+----+--------+
| 9 | B |
+----+--------+
| 10 | D |
+----+--------+
| 11 | A |
+----+--------+
Can anyone please help me for the above scenario?

I can think of a simple way:
select
case
when count(*) > 1
then 2
else count(*)
end,
second_column
from your_table
group by second_column;
This will give the result you want, but it won't really 'select ONLY two or less records' of each.

Using a ROW_NUMBER() function and a derived table:
CREATE TABLE myTable (id int, Letter varchar(1))
INSERT INTO myTable
VALUES (1,'A')
,(2,'B')
,(3,'B')
,(4,'C')
,(5,'A')
,(6,'A')
,(7,'C')
,(8,'B')
,(9,'B')
,(10,'D')
,(11,'A')
SELECT id, Letter
FROM
(SELECT *
,ROW_NUMBER() OVER(PARTITION BY Letter ORDER BY Letter) as rn
FROM myTable) myTable
WHERE rn = 1 or rn = 2
In essence, "cut" (PARTITION) the rows by Letters, and assign them each a number for its unique group, then pick the first two of each Letter.
Try it here:
http://rextester.com/WTKYCE51114

Use ROW_NUMBER() function to tag each record the row number and PARTITION it BY (grouping by) letter and ORDER it BY (id)
SELECT id,
letter
FROM (SELECT *,
ROW_NUMBER() OVER(PARTITION BY letter ORDER BY id) rnum
FROM myTable
) t
WHERE rnum <=2
Ordering it by id, you will have the first two instances of each letter in ascending order, thus you will have below result (note that id 1 and 5 are selected for A, 2 and 3 for B)
id letter
1 A
5 A
2 B
3 B
4 C
7 C
10 D

Related

delete all duplicated rows in sql

I have table like this
|-------------------------|
| A | B | C |
|-------------------------|
| 1 | 2 | 5 |
|-------------------------|
| 1 | 2 | 10 |
|-------------------------|
| 1 | 2 | 2 |
|-------------------------|
I need to delete all duplicated rows with equals A nad B value and lower C value
after running sql script i need to have only this row with top C Value for every equals A and B columns
|-------------------------|
| A | B | V |
|-------------------------|
| 1 | 2 | 10 |
|-------------------------|
One method is window functions:
select t.*
from (select t.*,
row_number() over (partition by a, b order by v desc) as seqnum
from t
) t
where seqnum = 1;
This returns the entire row, which can be handy if you want additional columns. If you really need just the three columns, then aggregation does what you want:
select a, b, max(v)
from t
group by a, b;
In standard SQL, you can keep only the maximum value using:
delete from t
where t.v < (select max(t2.v) from t t2 where t2.a = t.a and t2.b = t.b);

Selecting the first row of group with additional group by columns

Say I have a table with the following results:
How is it possible for me to select such that I only want distinct parent_ids with the min result of object0_behaviour?
Expected output:
parent_id | id | object0_behaviour | type
------------------------------------------
1 | 1 | 5 | IP
2 | 3 | 5 | IP
3 | 5 | 7 | ID
4 | 6 | 7 | ID
5 | 8 | 5 | IP
6 | 18 | 7 | ID
7 | 10 | 7 | ID
8 | 9 | 5 | IP
I have tried:
SELECT parent_id, min(object0_behaviour) FROM table GROUP BY parent_id
It works, however if I wanted the other 2 additional columns, I am required to add into GROUP BY clause and things go back to square one.
I saw examples with R : Select the first row by group
Similar output from what I need, but I can't seem to convert it into SQL
You can try using row_number() window function
select * from
(
select *, row_number() over(partition by parent_id order by object0_behaviour) as rn
from tablename
)A where rn=1
select * from table
join (
SELECT parent_id, min(object0_behaviour) object0_behaviour
FROM table GROUP BY parent_id
) grouped
on grouped.parent_id = table.parent_id
and grouped.object0_behaviour = table.object0_behaviour

SQL: Select Most Recent Sequentially Distinct Value w/ Grouping

I am having trouble writing a query that would select the last "new" sequentially distinct value (let's call this column Col A) grouped based on another column (Col B). Since this is a bit ambiguous/confusing, here is an example to explain (assume row number is indicative of sequence inside groups; in my issue the rows are ordered by date):
|--------|-------|-------|
| RowNum | Col A | Col B |
|--------|-------|-------|
| 1 | A | A |
| 2 | B | A |
| 3 | C | A |
| 4 | B | B |
| 5 | A | B |
| 6 | B | B |
Would select:
| 3 | C | A |
| 6 | B | B |
Note that although B also appears in row 4, the fact that row 5 contains A means that the B in row 6 is sequentially distinct. But if table looked like this:
|--------|-------|-------|
| RowNum | Col A | Col B |
|--------|-------|-------|
| 1 | A | A |
| 2 | B | A |
| 3 | C | A |
| 4 | B | B |
| 5 | A | B |
| 6 | A | B | <--
Then we would want to select:
| 3 | C | A |
| 5 | A | B |
I think that this would be an easier problem if I wasn't concerned with values being distinct but not sequential. I'm not really sure how to even consider sequence when making a query.
I have attempted to solve this by calculating the min/max row numbers where each value of Col A appears. That calculation (using the second sample table) would produce a result like this:
|--------|--------|--------|--------|
| ColA | ColB | MinRow | MaxRow |
|--------|--------|--------|--------|
| A | A | 1 | 1 |
| B | A | 2 | 2 |
| C | A | 3 | 3 |
| A | B | 5 | 6 |
| B | B | 4 | 4 |
A solution raised in a related post (SQL: Select Row with Last New Sequentially Distinct Value) went on a similar path, essentially taking the most recent RowNum which differs from the last ColA and then picks the next row. However, in that question I failed to address the need for the query to work for multiple groups, hence the new post.
Any help with this problem, if it is at all possible to do in SQL, would be greatly appreciated. I am running SQL 2008 SP4.
Hmmm . . . One method is to get the last value. Then choose all the last rows with that value and aggregate:
select min(rownum), colA, colB
from (select t.*,
first_value(colA) over (partition by colB order by rownum desc) as last_colA
from t
) t
where rownum > all (select t2.rownum
from t t2
where t2.colB = t.colB and t2.colA <> t.last_colA
)
group by colA, colB;
Or, without the aggregation:
select t.*
from (select t.*,
first_value(colA) over (partition by colB order by rownum desc) as last_colA,
lag(colA) over (partition by colB order by rownum) as prev_clA
from t
) t
where rownum > all (select t2.rownum
from t t2
where t2.colB = t.colB and t2.colA <> t.last_colA
) and
(prev_colA is null or prev_colA <> colA);
But in SQL Server 2008, let's treat this as a gaps-and-islands problem:
select t.*
from (select t.*,
min(rownum) over (partition by colB, colA, (seqnum_b - seqnum_ab) ) as min_rownum_group,
max(rownum) over (partition by colB, colA, (seqnum_b - seqnum_ab) ) as max_rownum_group
from (select t.*,
row_number() over (partition by colB order by rownum) as seqnum_b,
row_number() over (partition by colB, colA order by rownum) as seqnum_ab,
max(rownum) over (partition by colB order by rownum) as max_rownum
from t
) t
) t
where rownum = min_rownum_group and -- first row in the group defined by adjacent colA, colB
max_rownum_group = max_rownum -- last group for each colB;
This identifies each of the groups using a difference of row numbers. It calculates the maximum rownum for the group and overall in the data. These are the same for the last group.

ROW_NUMBER() for rows which consists of more rows

I have this table
ObjectId| Value
---------------------
1 | A
1 | A
1 | A
5 | B
5 | B
5 | B
ordered by value and try to get "row number" this way (one row consists from multiple rows):
RowNumber | ObjectId | Value
------------------------------------
1 | 1 | A
1 | 1 | A
1 | 1 | A
2 | 5 | B
2 | 5 | B
2 | 5 | B
Any idea?
Thank you
You are looking for dense_rank:
select dense_rank() over (order by Value), ObjectId, Value
from thistable;
You can include two columns like this:
select dense_rank() over (order by ObjectId, Value), ObjectId, Value
from thistable;
Look at dense_rank(), this will continue with the next number in sequence. There's an example here.
SQL Fiddle
Returns the rank of rows within the partition of a result set, without
any gaps in the ranking. The rank of a row is one plus the number of
distinct ranks that come before the row in question.

SQL distinct/groupby on combination of columns

I am trying to do a SQL select on a table based on two columns, but not in the usual way where the combination of values in both columns must be unique; I want to select where the value can only appear once in either column.
Given the dataset:
|pkid | fkself | otherData |
|-----+--------+-----------|
| 1 | 4 | there |
| 4 | 1 | will |
| 3 | 6 | be |
| 2 | 5 | other |
| 5 | 2 | data |
| 6 | 3 | columns |
I need to return either
|pkid | fkself | otherData |
|-----+--------+-----------|
| 1 | 4 | there |
| 3 | 6 | be |
| 2 | 5 | other |
or
|pkid | fkself | otherData |
|-----+--------+-----------|
| 4 | 1 | will |
| 5 | 2 | data |
| 6 | 3 | columns |
The only way I can think of to do this is to concatenate `pkid and fkid in order so that both row 1 and row 2 would concatenate to 1,4, but I'm not sure how to do that, or if it is even possible.
The rows will have other data columns, but it does not matter which row I get, only that I get each ID only once, whether the value is in pkid or fkself.
You can use least and greatest to get the smallest or biggest value of the two. That allows you to put them in the right order to generate those keys for you. You could concatenate the values as you suggested, but it's not needed in this solution. With dense_rank you can generate a sequence for each of those fictional keys. Then, you can get the first OtherData from that sequence.
select
pkid,
fkself,
otherData
from
(select
pkid,
fkself,
otherData,
dense_rank() over (partition by least(pkid, fkself), greatest(pkid, fkself) order by pkid) as rank
from
YourTable t)
where
rank = 1
Your idea is possible, and it should produce the results you want.
SELECT DISTINCT joinedID
FROM (
SELECT min(id) & "," & max(id) as joinedID
FROM (
SELECT pkid as id, someUniqueValue
FROM table
UNION ALL
SELECT fkself as id, someUniqueValue
FROM table)
GROUP BY someUniqueValue )
This will give you a unique list of IDs, concatenated as you like. You can easily include other fields by adding them to each SELECT statement. Also, someUniqueValue can be either an existing unique field, a new unique field, or the concatenated pkid and fkself, if that combination is unique.
The only way I can think of to do this is to concatenate `pkid and
fkid in order so that both row 1 and row 2 would concatenate to 1,4,
but I'm not sure how to do that, or if it is even possible.
You could do it using a CASE statement in Oracle:
SQL> SELECT * FROM sample
2 /
PKID FKSELF
---------- ----------
1 4
4 1
3 6
2 5
5 2
7 7
6 rows selected.
SQL> l
1 SELECT DISTINCT *
2 FROM (
3 SELECT CASE WHEN pkid <= fkself THEN pkid||','||fkself
4 ELSE fkself||','||pkid
5 END "JOINED"
6 FROM sample
7* )
SQL> /
JOINED
-------------------------------------------------------------------------------
1,4
2,5
3,6
7,7