Selecting records that appear several times in a row - sql

My problem is that I would like to select some records which appears in a row.
For example we have table like this:
x
x
x
y
y
x
x
y
Query should give answer like this:
x 3
y 2
x 2
y 1

SQL tables represent unordered sets. Your question only makes sense if there is a column that specifies the ordering. If so, you can use the difference-of-row-numbers to determine the groups and then aggregate:
select col1, count(*)
from (select t.*,
row_number() over (order by <ordering col>) as seqnum,
row_number() over (partition by col1 order by <ordering col>) as seqnum_2
from t
) t
group by col1, (seqnum - seqnum_2)

I made a SQL Fiddle
http://sqlfiddle.com/#!18/f8900/5
CREATE TABLE [dbo].[SomeTable](
[data] [nchar](1) NULL,
[id] [int] IDENTITY(1,1) NOT NULL
);
INSERT INTO SomeTable
([data])
VALUES
('x'),
('x'),
('x'),
('y'),
('y'),
('x'),
('x'),
('y')
;
select * from SomeTable;
WITH SomeTable_CTE (Data, total, BaseId, NextId)
AS
(
SELECT
Data,
1 as total,
Id as BaseId,
Id+1 as NextId
FROM SomeTable
where not exists(
Select * from SomeTable Previous
where Previous.Id+1 = SomeTable.Id
and Previous.Data = SomeTable.Data)
UNION ALL
select SomeTable_CTE.Data, SomeTable_CTE.total+1, SomeTable_CTE.BaseId as BaseId, SomeTable.Id+1 as NextId
from SomeTable_CTE inner join SomeTable on
SomeTable.Data = SomeTable_CTE.Data
and
SomeTable.Id = SomeTable_CTE.NextId
)
SELECT Data, max(total) as total
FROM SomeTable_CTE
group by Data, BaseId
order by BaseId

The elephant in the room is the missing column(s) to establish the order of rows.
SELECT col1, count(*)
FROM (
SELECT col1, order_column
, row_number() OVER (ORDER BY order_column)
- row_number() OVER (PARTITION BY col1 ORDER BY order_column) AS grp
FROM tbl
) t
GROUP BY col1, grp
ORDER BY min(order_column);
To exclude partitions with only a single row, add a HAVING clause:
SELECT col1, count(*)
FROM (
SELECT col1, order_column
, row_number() OVER (ORDER BY order_column)
- row_number() OVER (PARTITION BY col1 ORDER BY order_column) AS grp
FROM tbl
) t
GROUP BY col1, grp
HAVING count(*) > 1
ORDER BY min(order_column);
db<>fiddle here
Add a final ORDER BY to maintain original order (and a meaningful result). You may want to add a column like min(order_column) as well.
Related:
Find the longest streak of perfect scores per player
Select longest continuous sequence
Group by repeating attribute

Related

How to get MAX rownumber with rowNumber in same query

I have ROW_NUMBER() OVER (ORDER BY NULL) rnum in a sql statement to give me row numbers. is there a way to attach the max rnum to the same dataset going out?
what I want is the row_number() which I get, but I also want the MAXIMUM rownumber of the total return on each record.
SELECT
ROW_NUMBER() OVER (ORDER BY NULL) rnum,
C.ID, C.FIELD1, C."NAME", C.FIELD2, C.FIELD3
FROM SCHEMA.TABLE
WHERE (C.IS_CRNT = 1)
), MAX_NUM as (
SELECT DATA.ID, max(rnum) as maxrnum from DATA GROUP BY DATA.COMPONENT_ID
) select maxrnum, DATA.* from DATA JOIN MAX_NUM on DATA.COMPONENT_ID = MAX_NUM.COMPONENT_ID
DESIRED RESULT (ASSUMING 15 records):
1 15 DATA
2 15 DATA
3 15 DATA
etc...
I think you want count(*) as a window function:
SELECT ROW_NUMBER() OVER (ORDER BY NULL) as rnum,
COUNT(*) OVER () as cnt,
C.ID, C.FIELD1, C."NAME", C.FIELD2, C.FIELD3
FROM SCHEMA.TABLE
WHERE C.IS_CRNT = 1
Based on my assumptions in your dataset, this is the approach I would take:
WITH CTE AS (
select C.ID, C.FIELD1, C."NAME", C.FIELD2, C.FIELD3, ROW_NUMBER() OVER (PARTITION BY ID ORDER BY ID)
FROM SCHEMA.TABLE WHERE (C.IS_CRNT = 1))
SELECT *, (select count(*) from cte) "count" from cte;

Random records in Oracle table based on conditions

I have a Oracle table with the following columns
Table Structure
In a query I need to return all the records with CPER>=40 which is trivial. However, apart from CPER>=40 I need to list 5 random records for each CPID.
I have attached a sample list of records. However, in my table I have around 50,000 records.
Appreciate if you can help.
Oracle solution:
with CTE as
(
select t1.*,
row_number() over(order by DBMS_RANDOM.VALUE) as rn -- random order assigned
from MyTable t1
where CPID <40
)
select *
from CTE
where rn <=5 -- pick 5 at random
union all
select t2.*, null
from my_table t2
where CPID >= 40
SQL Server:
with CTE as
(
select t1.*,
row_number() over(order by newid()) as rn -- random order assigned
from MyTable t1
where CPID <40
)
select *
from CTE
where rn <=5 -- pick 5 at random
union all
select t2.*, null
from my_table t2
where CPID >= 40
How about something like this...
SELECT *
FROM (SELECT CID,
CVAL,
CPID,
CPER,
Row_number() OVER (partition BY CPID ORDER BY CPID ASC ) AS RN
FROM Table) tmp
WHERE CPER>=40 OR pids <= 5
However, this is not random.
Assuming that you want five additional random records, you can do:
select t.*
from (select t.*,
row_number() over (partition by cpid,
(case when cper >= 40 then 1 else 2 end)
order by dbms_random.value
) as seqnum
from t
) t
where seqnum <= 5 or cper >= 40;
The row_number() is enumerating the rows for each cpid in two groups -- based on the cper value. The outer where is taking all cper values in the range you want as well as five from the other group.

How to ignore column in SQL Server

I have this query:
Select *
from
(Select
*
ROW_NUMBER() OVER (PARTITION BY TID ORDER BY TID) AS RowNumber
from
MyTable
where
Eid = 'C1') as a
where
a.RowNumber = 1
and it displays these results:
Column1 Column2 RowNumber
------------------------------
Value1 value2 1
I want to ignore the RowNumber column in the select statement and I don't want to list all columns in select query (100+ columns and given is just an example).
How to do this in SQL Server?
Well, you would have to list all the columns in the outer select, if you use a subquery and row_number() to get a unique row.
An alternative method uses a correlated subquery, but requires having some unique column in the table. If you have one:
select t.*
from mytable t
where t.col = (select max(t2.col) from mytable t2 where t2.tid = t.tid and t2.eid = 'C1');
With the right indexes, this can have better performance than the row_number() version.
If you don't have a unique column, you can do:
select t.*
from (select distinct tid from mytable where eid = 'C1') tc cross apply
(select top 1 t.*
from mytable t
where t.tid = tc.tid and t.eid = 'C1'
) t;
Wrap your query as a subquery and select specific columns from it like so:
SELECT x.Column1, x.Column2
FROM
(
Select * from (Select * ROW_NUMBER() OVER (PARTITION BY TID ORDER BY TID)
AS RowNumber from MyTable where Eid="C1") as a where a.RowNumber=1
) AS x
OR Change your original Select to:
Select a.[Column1], a.[Column2]
from
(
Select * ROW_NUMBER() OVER (PARTITION BY TID ORDER BY TID)
AS RowNumber from MyTable where Eid="C1"
) as a
Where a.RowNumber=1
Replace * from your query in clarify exactly columnd which you whant
select x.Column1, x.Column2 FROM (
Select * from (Select * ROW_NUMBER() OVER (PARTITION BY TID ORDER BY TID)
AS RowNumber from MyTable where Eid="C1") as a where a.RowNumber=1) AS x

Grouping array elements in the correct order on PostgreSQL

Is it possible to group array elements in PostgreSQL?
Example, I have 2 related arrays like this (I say related because the first array indicates actions and the second array represents those action's times:
col0 = 'any_value'
col1 = array1['a','b','b','c','c','a','a','a','c']
col2 = array2[1,2,3,4,5,6,7,8,9]
and I would like to output the following result:
col0 = 'any_value'
array_result1['a','b','c','a','c']
array_result2[1,2,4,6,9]
A way the array can be unnested is by using ordinality, this is an example query, but it returns a distinct selection of the array elements which removes the repeated ones:
select col0,
array_agg(x order by rn) as unique_array1
from (
select
distinct on (col0, a.x) col0,
a.x,
a.rn
from table_a,
unnest(array1) with ordinality as a (x,rn)
order by 1,2,3
) unnested_ordered
group by col0;
So the result of this would be:
col0 = 'any_value'
array_result1['a','b','c']
But as you can see it is missing many elements.
EDIT:
To describe more my issue, In the end I would like to know when each of the array_result1 actions are initially done.
So for the example result
array_result1['a','b','c','a','c']
*array_result2[1,2,4,6,9]
*I supose the position of the array starts at 1 and not 0, i also fixed the last element, it should be 9 not 7
would help me to know, when did the first action 'a' happen and when did the second action 'a' happen so I can calculate the time for action 'a' to return into the path I am building.
So first time action 'a' happened was = 1
Second time it happened was = 6
So action 'a' appears twice in the path(array) and it takes 5 time units to re appear. That is why I need the second array with the times on which the actions happened (the first time each action happened)
You could use LATERAL and calculate group using ROW_NUMBER:
DROP TABLE IF EXISTS table_a;
CREATE TABLE table_a(col0 VARCHAR(10), col1 text[],col2 int[]);
INSERT INTO table_a(col0, col1, col2)
VALUES ('any_value',array['a','b','b','c','c','a','a','a','c'],
array[1,2,3,4,5,6,7,8,9]);
Main query:
SELECT col0,
col1,
unique_col1
FROM table_a,
LATERAL (SELECT ARRAY_AGG(x ORDER BY grp) AS unique_col1
FROM ( SELECT DISTINCT x,
rn - ROW_NUMBER() OVER(PARTITION BY x ORDER BY rn) AS grp
FROM unnest(col1) WITH ORDINALITY AS a(x,rn)
) AS sub
) AS lat1
Output:
EDIT:
Calculating second array:
SELECT col0,
col1,
unique_col1,
col2,
unique_col2
FROM table_a,
LATERAL (SELECT ARRAY_AGG(x ORDER BY grp) AS unique_col1
FROM ( SELECT DISTINCT x,
rn - ROW_NUMBER() OVER(PARTITION BY x ORDER BY rn) AS grp
FROM unnest(col1) WITH ORDINALITY AS a(x,rn)
) AS sub
) AS lat1,
LATERAL (
SELECT array_agg(x ORDER BY rn) AS unique_col2
FROM unnest(col2) WITH ORDINALITY AS b(x,rn)
WHERE rn IN (
SELECT SUM(c) OVER(ORDER BY grp) - (c-1) AS result
FROM (SELECT grp, COUNT(*) AS c
FROM ( SELECT x,
rn - ROW_NUMBER() OVER(PARTITION BY x ORDER BY rn) AS grp
FROM unnest(col1) WITH ORDINALITY AS a(x,rn)
) AS sub
GROUP BY grp) AS s
)
) AS lat2
Remark:
It generates second array from values, not its position, so when you have:
col2 = array[9,8,7,6,5,4,3,2,1]
you will get:
[9,8,6,4,1]
If you want only positions you could use:
...
LATERAL (
SELECT array_agg(result ORDER BY result) AS unique_col2
FROM (
SELECT SUM(c) OVER(ORDER BY grp) - (c-1) AS result
FROM (SELECT grp, COUNT(*) AS c
FROM ( SELECT x,
rn - ROW_NUMBER() OVER(PARTITION BY x ORDER BY rn) AS grp
FROM unnest(col1) WITH ORDINALITY AS a(x,rn)
) AS sub
GROUP BY grp) AS s
) AS s1
) AS lat2
And the result will be:
[1,2,4,6,9]
EDIT 2
In above version there is small mistake. The ARRAY_AGG should be ordered by rn not grp:
DROP TABLE IF EXISTS table_a;
CREATE TABLE table_a(col0 VARCHAR(10), col1 text[],col2 int[]);
INSERT INTO table_a(col0, col1, col2)
VALUES ('any_value',array['a','b','b','c','c','a','a','a','c'],
array[1,2,3,4,5,6,7,8,9]);
INSERT INTO table_a(col0, col1, col2)
VALUES ('any_value2',array['a','b','a','a','c','a'],array[1,2,3,4,5,6]);
SELECT *
FROM table_a,
LATERAL (SELECT ARRAY_AGG(x ORDER BY rn) AS unique_col1
FROM
(SELECT x, grp, MIN(rn) AS rn
FROM (SELECT x,
rn - ROW_NUMBER() OVER(PARTITION BY x ORDER BY rn) AS grp,
rn
FROM unnest(col1) WITH ORDINALITY AS a(x,rn)
) AS sub
GROUP BY x, grp) AS s
) AS lat1;

SQL Server ROW_NUMBER() Issue

I'm using row_number() expression but I don't get result as I expected. I have a sql table and some rows are duplicate. They have same 'BATCHID' and I want to get second row number for these, for others I use first row number. How can I do it?
SELECT * FROM (SELECT * , ROW_NUMBER() OVER (PARTITION BY BATCHID ORDER BY SCAQTY) Rn FROM SAYIMDCPC ) t
WHERE Rn=1
This code returns to me only first rows, but I want to get second rows for duplicated items.
ROW_NUMBER() gives every row a unique counter. You'd want to use RANK(), which is similar, but gives rows with identical values the same score:
SELECT *
FROM (SELECT * , RANK() OVER (PARTITION BY batchid ORDER BY scaqry) rk
FROM sayimdcpc) t
WHERE rk = 1
If some values are only shown once, but some twice (and perhaps more than twice), you don't want the "first" row, you want the "max" row. Try reversing your order condition:
SELECT *
FROM (SELECT * ,
ROW_NUMBER() OVER (PARTITION BY BATCHID ORDER BY SCAQTY DESC) Rn
FROM SAYIMDCPC ) t
WHERE Rn=1
As a side note, it's still better to explicitly list out all columns; for instance, you probably don't need Rn outside of this query...
Simply try this,
SELECT * FROM (SELECT * , ROW_NUMBER() OVER (ORDER BY SCAQTY) Rn FROM SAYIMDCPC ) t
WHERE Rn=1
To rephrase it another way, it sounds like you're saying, "When there's a single row for the Batch ID, return the single row. When there are 2 or more rows, return the second row." That's going to require inspecting your Rn value to see what its max is. I don't think you can do it in a single query.
So I'd try something like this:
WITH NumberedRows AS (
SELECT * ,
ROW_NUMBER() OVER (PARTITION BY BATCHID ORDER BY SCAQTY) Rn
FROM SAYIMDCPC
)
, MaxNumber AS (
SELECT Max(RN) as MaxRn,
BATCHID
FROM NumberedRows
GROUP BY BATCHID
)
, NonDupes AS (
SELECT *
FROM NumberedRows
WHERE BATCHID NOT IN (SELECT BATCHID FROM MaxNumber WHERE MaxNumber = 1)
)
, SecondRows AS (
SELECT (
FROM NumberedRows
WHERE BATCHID NOT IN (SELECT BATCHID FROM MaxNumber WHERE MaxNumber > 1)
AND Rn = 2
)
SELECT
FROM NonDupes
UNION ALL
SELECT *
FROM SecondRows
Please try this. It will select max row num, if no duplicate then it should be first one otherwise second
select * from (SELECT * , ROW_NUMBER() OVER (PARTITION BY BATCHID ORDER BY SCAQTY) Rn FROM SAYIMDCPC ) d,
(SELECT batchid,max(Rn) maxRn FROM (SELECT * , ROW_NUMBER() OVER (PARTITION BY BATCHID ORDER BY SCAQTY) Rn FROM SAYIMDCPC ) t
group by batchid) q
where d.batchid = q.batchid and d.rn = q.maxrn
correct me if i am wrong
e.g sample data
BatchID, SCAQTY
1 , 10
2 , 10
2 , 20
2 , 30
is your expectation result like below?
**Expectation Result 1**
BatchID , SCAQty
1 , 10
2 , 30
or
**Expectation Result 2**
BatchID , SCAQty
1 , 10
2 , 20
2 , 30
based on my understanding what you want to perform is Expectation Result 1, so i guess Query below should able to help u, you just need to add desc for SCAQTY in your query
SELECT * FROM (SELECT * ,
ROW_NUMBER() OVER (PARTITION BY BATCHID ORDER BY SCAQTY DESC) Rn FROM SAYIMDCPC ) t
WHERE Rn=1
Total result set with duplicates and non duplicates. The first column "IsDuplicate" indicates if the column is a duplicate or not.
;WITH d1 AS (
SELECT Seq = ROW_NUMBER() OVER (PARTITION BY BATCHID ORDER BY SCAQTY)
,*
FROM SAYIMDCPC
)
SELECT IsDuplicate = CONVERT(BIT, Seq)
,*
FROM d1
This will give you only the duplicates:
;WITH d1 AS (
SELECT Seq = ROW_NUMBER() OVER (PARTITION BY BATCHID ORDER BY SCAQTY)
,*
FROM SAYIMDCPC
)
SELECT IsDuplicate = CONVERT(BIT, Seq)
,*
FROM d1
WHERE Seq > 1
This will give you only the non duplicates (as in your first query)
;WITH d1 AS (
SELECT Seq = ROW_NUMBER() OVER (PARTITION BY BATCHID ORDER BY SCAQTY)
,*
FROM SAYIMDCPC
)
SELECT IsDuplicate = CONVERT(BIT, Seq)
,*
FROM d1
WHERE Seq = 1