Select Contiguous Rows for Run Length Encoding - sql

I have some enormous tables of values, and dates, that I want to compress using run length encoding. The most obvious way (to me) to do this is to select all the distinct value combinations, and the minimum and maximum dates. The problem with this is that it would miss any instances where a mapping stops, and then starts again.
Id | Value1 | Value2 | Value3 | DataDate
------------------------------------------
01 | 1 | 2 | 3 | 2000-01-01
01 | 1 | 2 | 3 | 2000-01-02
01 | 1 | 2 | 3 | 2000-01-03
01 | 1 | 2 | 3 | 2000-01-04
01 | A | B | C | 2000-01-05
01 | A | B | C | 2000-01-06
01 | 1 | 2 | 3 | 2000-01-07
Would be encoded this way as
Id | Value1 | Value2 | Value3 | FromDate | ToDate
-----------------------------------------------------
01 | 1 | 2 | 3 | 2000-01-01| 2000-01-07
01 | A | B | C | 2000-01-05| 2000-01-06
Which is clearly wrong.
What I'd like is a query that would return each set of continuous dates that exist for each set of values.
Alternatively, if I'm looking at this arse-backwards, any other advice would be appreciated.

Try this:
DECLARE #MyTable TABLE (
Id INT,
Value1 VARCHAR(10),
Value2 VARCHAR(10),
Value3 VARCHAR(10),
DataDate DATE
);
INSERT #MyTable
SELECT 01, '1', ' 2', '3', '2000-01-01' UNION ALL
SELECT 01, '1', ' 2', '3', '2000-01-02' UNION ALL
SELECT 01, '1', ' 2', '3', '2000-01-03' UNION ALL
SELECT 01, '1', ' 2', '3', '2000-01-04' UNION ALL
SELECT 01, 'A', ' B', 'C', '2000-01-05' UNION ALL
SELECT 01, 'A', ' B', 'C', '2000-01-06' UNION ALL
SELECT 01, '1', ' 2', '3', '2000-01-07'
SELECT Id, Value1, Value2, Value3,
MIN(DataDate) AS FromDate, MAX(DataDate) AS ToDate
FROM (
SELECT x.Id, x.Value1, x.Value2, x.Value3,
x.DataDate,
GroupNum =
DATEDIFF(DAY, 0, x.DataDate) -
ROW_NUMBER() OVER(PARTITION BY x.Id, x.Value1, x.Value2, x.Value3 ORDER BY x.DataDate)
FROM #MyTable x
) y
GROUP BY Id, Value1, Value2, Value3, GroupNum
Results:
Id Value1 Value2 Value3 FromDate ToDate
-- ------ ------ ------ ---------- ----------
1 1 2 3 2000-01-01 2000-01-04
1 1 2 3 2000-01-07 2000-01-07
1 A B C 2000-01-05 2000-01-06

Try this:
SELECT Id, Value1, Value2, Value3, MIN(DataDate) AS FromDate, MAX(DataDate) AS ToDate
FROM YourTable
GROUP BY Id, Value1, Value2, Value3

You'll probably want to use windowing functions. Try something like this:
select
id, value1, value2, value3,
from_date=update_date,
to_date=lead(update_date) over (partition by id order by update_date)
from (
select
t.*
,is_changed=
case when
value1 <> lag(value1) over (partition by id order by update_date) or
(lag(value1) over (partition by id order by update_date) is null and value1 is not null) or
value2 <> lag(value2) over (partition by id order by update_date) or
(lag(value2) over (partition by id order by update_date) is null and value2 is not null) or
value3 <> lag(value3) over (partition by id order by update_date) or
(lag(value3) over (partition by id order by update_date) is null and value3 is not null)
then 1 else 0 end
from test t
) t2
where is_changed = 1
order by id, update_date
Please note that this query relies on the LAG() function, and two other things:
Separate tests for each "value" column; if you have a lot of columns to test, you might consider creating a single hash value to simplify the equality checks
The "to_date" is identical to the next record's "from_date", which means you might need to test for values using >= from_date and < to_date to make the run-lengths mutually exclusive
Note that I used the following sample data in my testing:
create table test(id int, value1 varchar(3), value2 varchar(3), value3 varchar(3), update_date datetime)
insert into test values
(1, 'A', 'B', 'C', '1/1/2014'),
(1, 'A', 'B', 'C', '2/1/2014'),
(1, 'X', 'Y', 'Z', '3/1/2014'),
(1, 'A', 'B', 'C', '4/1/2014'),
(2, 'D', 'E', 'F', '1/1/2014'),
(2, 'D', 'E', 'F', '6/1/2014')
Good luck!

Related

Eliminating duplicate rows except one column with condition

I am having trouble trying to find an appropriate query(SQL-SERVER) for selecting records with condition however, the table I will be using has more than 100,000 rows and more than 20 columns.
So I need a code that satisfies the following condition:
1.)If [policy] and [plan] column is unique between rows then I will select that record
2.)If [policy] and [plan] return 2 or more rows then I will select the record which 'code' column isn't 999
3.)In some cases the unwanted rows may not have '999' in [code] column but may be other specifics
In other words, I would like to get row number 1,2,4,5,7.
Here is an example of what the table looks like
row #|policy|plan|code
-----------------------
1 | a | aa |111
-----------------------
2 | b | bb |112
-----------------------
3 | b | bb |999
-----------------------
4 | c | cc |111
-----------------------
5 | c | cc |112
-----------------------
6 | c | cc |999
-----------------------
7 | d | dd |999
-----------------------
I'm expecting to see something like
row #|policy|plan|code
-----------------------
1 | a | aa |111
-----------------------
2 | b | bb |112
-----------------------
4 | c | cc |111
-----------------------
5 | c | cc |112
-----------------------
7 | d | dd |999
-----------------------
Thank you in advance
This sounds like a prioritization query. You an use row_number():
select t.*
from (select t.*,
row_number() over (partition by policy, plan
order by code
) as seqnum
from t
) t
where seqnum = 1;
The expected output makes this a bit clearer:
select t.*
from (select t.*,
rank() over (partition by policy, plan
order by (case when code = 999 then 1 else 2 end) desc
) as seqnum
from t
) t
where seqnum = 1;
The OP wants all codes that are not 999 unless the only codes are 999. So, another approach is:
select t.*
from t
where t.code <> 999
union all
select t.*
from t
where t.code = 999 and
not exists (select 1
from t t2
where t2.policy = t.policy and t2.plan = t.plan and
t2.code <> 999
);
May be you want this (eliminate the last row if more than one)?
select t.*
from (select t.*
, row_number() over (partition by policy, plan
order by code desc
) AS RN
, COUNT(*) over (partition by policy, plan) AS RC
from t
) t
where RN > 1 OR RN=RC;
Output:
row policy plan code RN RC
1 1 a aa 111 1 1
2 2 b bb 112 2 2
3 5 c cc 112 2 3
4 4 c cc 111 3 3
5 7 d dd 999 1 1
CREATE TABLE #Table2
([row] int, [policy] varchar(1), [plan] varchar(2), [code] int)
;
INSERT INTO #Table2
([row], [policy], [plan], [code])
VALUES
(1, 'a', 'aa', 111),
(2, 'b', 'bb', 112),
(3, 'b', 'bb', 999),
(4, 'c', 'cc', 111),
(5, 'c', 'cc', 112),
(6, 'c', 'cc', 999),
(7, 'd', 'dd', 999)
;
with cte
as
(
select *,
row_number() over (partition by policy, [plan]
order by code
) as seqnum
from #Table2
)
select [row], [policy], [plan], [code] from cte where seqnum=1

Count Top 5 Elements spread over rows and columns

Using T-SQL for this table:
+-----+------+------+------+-----+
| No. | Col1 | Col2 | Col3 | Age |
+-----+------+------+------+-----+
| 1 | e | a | o | 5 |
| 2 | f | b | a | 34 |
| 3 | a | NULL | b | 22 |
| 4 | b | c | a | 55 |
| 5 | b | a | b | 19 |
+-----+------+------+------+-----+
I need to count the TOP 3 names (Ordered by TotalCount DESC) across all rows and columns, for 3 Age groups: 0-17, 18-49, 50-100. Also, how do I ignore the NULLS from my results?
If it's possible, how I can also UNION the results for all 3 age groups into one output table to get 9 results (TOP 3 x 3 Age groups)?
Output for only 1 Age Group: 18-49 would look like this:
+------+------------+
| Name | TotalCount |
+------+------------+
| b | 4 |
| a | 3 |
| f | 1 |
+------+------------+
You need to unpivot first your table and then exclude the NULLs. Then do a simple COUNT(*):
WITH CteUnpivot(Name, Age) AS(
SELECT x.*
FROM tbl t
CROSS APPLY ( VALUES
(col1, Age),
(col2, Age),
(col3, Age)
) x(Name, Age)
WHERE x.Name IS NOT NULL
)
SELECT TOP 3
Name, COUNT(*) AS TotalCount
FROM CteUnpivot
WHERE Age BETWEEN 18 AND 49
GROUP BY Name
ORDER BY COUNT(*) DESC
ONLINE DEMO
If you want to get the TOP 3 for each age group:
WITH CteUnpivot(Name, Age) AS(
SELECT x.*
FROM tbl t
CROSS APPLY ( VALUES
(col1, Age),
(col2, Age),
(col3, Age)
) x(Name, Age)
WHERE x.Name IS NOT NULL
),
CteRn AS (
SELECT
AgeGroup =
CASE
WHEN Age BETWEEN 0 AND 17 THEN '0-17'
WHEN Age BETWEEN 18 AND 49 THEN '18-49'
WHEN Age BETWEEN 50 AND 100 THEN '50-100'
END,
Name,
COUNT(*) AS TotalCount
FROM CteUnpivot
GROUP BY
CASE
WHEN Age BETWEEN 0 AND 17 THEN '0-17'
WHEN Age BETWEEN 18 AND 49 THEN '18-49'
WHEN Age BETWEEN 50 AND 100 THEN '50-100'
END,
Name
)
SELECT
AgeGroup, Name, TotalCount
FROM(
SELECT *,
rn = ROW_NUMBER() OVER(PARTITION BY AgeGroup, Name ORDER BY TotalCount DESC)
FROM CteRn
) t
WHERE rn <= 3;
ONLINE DEMO
The unpivot technique using CROSS APPLY and VALUES:
An Alternative (Better?) Method to UNPIVOT (SQL Spackle) by Dwain Camps
You can check below multiple-CTE SQL select statement
Row_Number() with Partition By clause is used ordering records within each group categorized by ages
/*
CREATE TABLE tblAges(
[No] Int,
Col1 VarChar(10),
Col2 VarChar(10),
Col3 VarChar(10),
Age SmallInt
)
INSERT INTO tblAges VALUES
(1, 'e', 'a', 'o', 5),
(2, 'f', 'b', 'a', 34),
(3, 'a', NULL, 'b', 22),
(4, 'b', 'c', 'a', 55),
(5, 'b', 'a', 'b', 19);
*/
;with cte as (
select
col1 as col, Age
from tblAges
union all
select
col2, Age
from tblAges
union all
select
col3, Age
from tblAges
), cte2 as (
select
col,
case
when age < 18 then '0-17'
when age < 50 then '18-49'
else '50-100'
end as grup
from cte
where col is not null
), cte3 as (
select
grup,
col,
count(grup) cnt
from cte2
group by
grup,
col
)
select * from (
select
grup, col, cnt, ROW_NUMBER() over (partition by grup order by cnt desc) cnt_grp
from cte3
) t
where cnt_grp <= 3
order by grup, cnt

select rows where ID has variation of values

I have two columns, an ID, and the other a value which is either 0 or 1. I am trying to select all Rows for the ID where it has a 0 and a 1, for example,
RowNumber ------------- ID ------- value
1 ------------------- 001 ------- 1
2 ------------------- 001 ------- 1
3 ------------------- 001 ------- 1
4 ------------------- 002 ------- 1
5 ------------------- 002 ------- 0
6 ------------------- 003 ------- 1
7 ------------------- 003 ------- 1
8 --------------------004 ------- 1
9 -------------------- 004 ------- 0
10 ------------------- 004 ------- 1
The result should select rows 4, 5, 8, 9, 10
You can use window version of COUNT:
SELECT RowNumber, ID, value
FROM (
SELECT RowNumber, ID, value,
COUNT(CASE WHEN value = 1 THEN 1 END) OVER (PARTITION BY ID) AS cntOnes,
COUNT(CASE WHEN value = 0 THEN 1 END) OVER (PARTITION BY ID) AS cntZeroes
FROM test
WHERE value IN (0,1) ) AS t
WHERE cntOnes >= 1 AND cntZeroes >= 1
COUNT(DISTINCT value) has a value of 2 if both 0, 1 values exist within the same ID slice.
DISTINCT is indeed not allowed in a windowed version of the COUNT, so you can use MIN and MAX instead.
DECLARE #T TABLE(RN int, ID int, value int);
INSERT INTO #T (RN, ID, value) VALUES
(1, 001, 1),
(2, 001, 1),
(3, 001, 1),
(4, 002, 1),
(5, 002, 0),
(6, 003, 1),
(7, 003, 1),
(8, 004, 1),
(9, 004, 0),
(10, 004, 1);
WITH
CTE
AS
(
SELECT
RN, ID, value
,MIN(value) OVER (PARTITION BY ID) AS MinV
,MAX(value) OVER (PARTITION BY ID) AS MaxV
FROM #T AS T
)
SELECT RN, ID, value
FROM CTE
WHERE MinV <> MaxV
;
Result
+----+----+-------+
| RN | ID | value |
+----+----+-------+
| 4 | 2 | 1 |
| 5 | 2 | 0 |
| 8 | 4 | 1 |
| 9 | 4 | 0 |
| 10 | 4 | 1 |
+----+----+-------+
create table #shadowTemp (
RowNumber int not null,
Id char(3) not null,
value bit not null
)
insert into #shadowTemp values ( 1,'001', 0 )
insert into #shadowTemp values ( 2,'001', 1 )
insert into #shadowTemp values ( 3,'001', 1 )
insert into #shadowTemp values ( 4,'002', 0 )
insert into #shadowTemp values ( 5,'003', 0 )
insert into #shadowTemp values ( 6,'003', 1 )
select * from #shadowTemp;
;with cte ( Id ) As (
select Id
from #shadowTemp
group by Id
having sum( value + 1 ) >= 3
)
select a.*
from
#shadowTemp a
inner join cte b on ( a.Id = b.Id )
drop table #shadowTemp

Select record with Highest value inside another select

I need some assistance with a Select.
The following is an attempt to give you an example of the data.
Number: 1 | Date: 2014-05-01 | ClientCode: 001 | Status: P | Sequence:
0 |
Number: 1 | Date: 2014-05-01 | ClientCode: 001 | Status: X | Sequence:
1 |
Number: 2 | Date: 2014-06-30 | ClientCode: 005 | Status: X | Sequence:
0 |
Number: 2 | Date: 2014-06-30 | ClientCode: 005 | Status: Z | Sequence:
1 |
Number: 2 | Date: 2014-06-30 | ClientCode: 005 | Status: A | Sequence:
2 |
I need a Select that give me all the records with the highest "Sequence" within the ones with the same "Number"
The desired output would return the lines with the Number 1/Sequence 1 and Number 2/Sequence 4
I managed to do this with the use a temp table but it is very slow.
Can you help me?
Thanks in Advance
SELECT *
FROM table t
WHERE NOT EXISTS ( SELECT 'a'
FROM table t2
WHERE t2.number = t.number
AND t2.sequence > t.sequence
)
Use ROW_NUMBER() for this:
;WITH TestTable(Number, Date, ClientCode, Status, Sequence) AS(
SELECT 1, '2014-05-01', '001', 'P', 0 UNION ALL
SELECT 1, '2014-05-01', '001', 'X', 1 UNION ALL
SELECT 2, '2014-06-30', '005', 'X', 0 UNION ALL
SELECT 2, '2014-06-30', '005', 'P', 1 UNION ALL
SELECT 2, '2014-06-30', '005', 'X', 2 UNION ALL
SELECT 10, '2015-01-01', '555', 'P', 0 UNION ALL
SELECT 15, '2015-02-08', '666', 'P', 0 UNION ALL
SELECT 15, '2015-02-08', '666', 'C', 1 UNION ALL
SELECT 15, '2015-02-08', '666', 'T', 2 UNION ALL
SELECT 15, '2015-02-08', '666', 'X', 3 UNION ALL
SELECT 15, '2015-02-08', '666', 'X', 4
)
SELECT
Number,
Date,
ClientCode,
Status,
Sequence
FROM(
SELECT *,
RN = ROW_NUMBER() OVER(PARTITION BY Number ORDER BY Sequence DESC)
FROM TestTable
)t
WHERE RN = 1
RESULT
Number Date ClientCode Status Sequence
----------- ---------- ---------- ------ -----------
1 2014-05-01 001 X 1
2 2014-06-30 005 X 2
10 2015-01-01 555 P 0
15 2015-02-08 666 X 4
Use a correlated sub-query to find each number's max sequence value:
SELECT *
FROM TableName t1
where sequence = (select max(sequence) from tablename t2
where t1.number = t2.number)
Will return both rows if one number with two same max sequence rows.
Try this..
With x as (select number, max(sequence)from yourfile group by number)
select y.number,y.sequence,y.clientcode
from x, yourfile y
where x.number=y.number
and y.sequence=x.sequence
Try this:
SELECT
Number,
Date,
ClientCode,
Status,
Max( Sequence)
FROM
TableName
Group by
ClientCode;
Just group by client code and use the max aggregate function on sequence to get the value associated with max sequence code.

Ordinal numbers in select

First sorry for my bad english it's not my native language :(
I'm kinda new in Oracle and I need help with following. I have several records with same ID, several values (which can be same) and different creation date.
I would like to select an ordinal number for IDs which have same value, but different date.
For example
ID | Value | Date | Number
A | Value1 | 01.11. | 1
A | Value1 | 02.11. | 2
A | Value2 | 03.11. | null
A | Value2 | 01.11. | null
B | Value1 | 01.11. | 1
B | Value1 | 03.11. | 2
B | Value2 | 01.11. | null
C | Value1 | 01.11. | 1
C | Value2 | 01.11. | null
So for every ID in first coloumn where I have Value1 I want to have increment and for the rest of the values I don't need to have anything.
I hope I'm not posting double question I have tried to look it up, but I couldn't find any answer.
Thank you in advance!
Edit: Will accept one instead of null for other values.
The basic idea is row_number() to get the sequential value and rank() to rank the values. You only want the first set to be enumerated. "First" corresponds to rank() having a value of 1. The rest get NULL:
select id, value, date,
(case when rank() over (partition by id order by value) = 1
then row_number() over (partition by id order by value)
end) as number
from table t;
EDIT:
I realize that you might actually want the first value by time and not some other ordering. For that, use keep instead of rank():
select id, value, date,
(case when value = max(value) keep (dense_rank first order by value) over (partition by id)
then row_number() over (partition by id order by value)
end) as number
from table t;
Hmm... Hope I understood correctly:
with my_table as (
select 'A' ID, 'Value1' value, '01.11.' dt, 1 num from dual union all
select 'A', 'Value1', '02.11.', 2 from dual union all
select 'A', 'Value2', '03.11.', null from dual union all
select 'A', 'Value2', '01.11.', null from dual union all
select 'B', 'Value1', '01.11.', 1 from dual union all
select 'B', 'Value1', '03.11.', 2 from dual union all
select 'B', 'Value2', '01.11.', null from dual union all
select 'C', 'Value1', '01.11.', 1 from dual union all
select 'C', 'Value2', '01.11.', null from dual)
select *
from (select t.*, count(distinct dt) over (partition by value, id) diff_cnt
from my_table t) tt
where tt.diff_cnt > 1;
result:
ID VALUE DT NUM DIFF_CNT
-- ------ ------ ---------- ----------
A Value1 01.11. 1 2
A Value1 02.11. 2 2
B Value1 01.11. 1 2
B Value1 03.11. 2 2
A Value2 01.11. 2
A Value2 03.11. 2