How to calculate overall times when consecutive rows appear in PostgreSQL - sql

So I have a table looks like this:
id value ts
123 T ts1
123 T ts2
123 F ts3
123 T ts4
456 F ts5
456 T ts6
456 T ts7
456 F ts8
......
What I want to do is to count the times when consecutive 'T' appears under each id partition(each id partition should be ordered by column ts). But not only that, I want to know how many times two consecutive 'T's appear; how many times three 'T's appear...
So finally, I want a table that has two columns:
num_of_consecutives
times_of_occurrences_for_this_number_of_consecutives
In this case, 2 consecutive 'T's appear one time and 1 consecutive T appears one time for id 123; 2 consecutive 'T's appear one time for id 456. Therefore, summing them up, the final table should look like this:
num_of_consecutives times_of_occurrences_for_this_number_of_consecutives
1 1
2 2

Please check this solution (fiddle):
with cte(id, value, ts) as (
select 123, 'T' , 'ts1'
union all
select 123, 'T' , 'ts2'
union all
select 123, 'F' , 'ts3'
union all
select 123, 'T' , 'ts4'
union all
select 456, 'F' , 'ts5'
union all
select 456, 'T' , 'ts6'
union all
select 456, 'T' , 'ts7'
union all
select 456, 'F' , 'ts8'
)
select cnt as num_of_consecutives, count(cnt) as times_of_occurrences_for_this_number_of_consecutives from(
select value, count(*) cnt from(
select *,row_number() over (order by ts) - row_number() over (partition by value order by ts) grp
from cte)q
group by grp, value
)q1
where value = 'T'
group by value, cnt
order by cnt;
This discussion could be also be useful.

Related

Fetching Specific Group of Rows

I have a table with 'Name', 'Flag' and some other columns. I want to select specific group of rows from table. Data is already sorted based on another time-stamp column.
Name Flag
------ ------
A D
B D
C D
D I
E I
D D
E D
B I
D I
F I
I want to fetch 1st set of 'D' Flag and last set of 'I' flag. Is it possible in SQL (only select statement, not PL/SQL) somehow?
Desired Output:
Name Flag
------ ------
A D
B D
C D
B I
D I
F I
SQL tables represent unordered sets. So, there is no "first" or "last", unless you have a column that specifies the ordering. Note that this applies to both SQL queries and to PL/SQL code. Of course, you specify that you have two columns, so no such column exists in your data.
But let me assume that you do have one. If so, you can do:
select t.*
from t
where (t.flag = 'D' and
t.orderingcol < (select min(t2.orderingcol) from t t2 where t2.flag <> 'D'
) or
(t.flag = 'I' and
t.orderingcol > (select max(t2.orderingcol) from t t2 where t2.flag <> 'I'
)
order by t.orderingcol;
Assuming you have some sort of column that determines the ordering of the result set (e.g. the id column in my query below), this is easy enough to do with a technique known as Tabibitosan:
WITH sample_data AS (SELECT 1 ID, 'A' NAME, 'D' flag FROM dual UNION ALL
SELECT 2 ID, 'B' NAME, 'D' flag FROM dual UNION ALL
SELECT 3 ID, 'C' NAME, 'D' flag FROM dual UNION ALL
SELECT 4 ID, 'D' NAME, 'I' flag FROM dual UNION ALL
SELECT 5 ID, 'E' NAME, 'I' flag FROM dual UNION ALL
SELECT 6 ID, 'D' NAME, 'D' flag FROM dual UNION ALL
SELECT 7 ID, 'E' NAME, 'D' flag FROM dual UNION ALL
SELECT 8 ID, 'B' NAME, 'I' flag FROM dual UNION ALL
SELECT 9 ID, 'D' NAME, 'I' flag FROM dual UNION ALL
SELECT 10 ID, 'F' NAME, 'I' flag FROM dual)
SELECT ID,
NAME,
flag
FROM (SELECT ID,
NAME,
flag,
grp,
MIN(CASE WHEN flag = 'D' THEN grp END) OVER (PARTITION BY flag) min_d_grp,
MAX(CASE WHEN flag = 'I' THEN grp END) OVER (PARTITION BY flag) max_i_grp
FROM (SELECT ID,
NAME,
flag,
row_number() OVER (ORDER BY ID) - row_number() OVER (PARTITION BY flag ORDER BY ID) grp
FROM sample_data
WHERE flag IN ('D', 'I')))
WHERE (flag = 'D' AND grp = min_d_grp)
OR (flag = 'I' AND grp = max_i_grp)
ORDER BY id;
ID NAME FLAG
---------- ---- ----
1 A D
3 C D
2 B D
9 D I
8 B I
10 F I
This query uses the tabibitosan method to generate an additional "grp" column, which you can then use to find the lowest number for the D flag rows and the highest for the I flag rows.
ETA: This may or may not perform better than Gordon's answer, but I would recommend you test both answers to see which works better for your tables/indexes/data etc.

How to write a sql to dynamically add some calculated rows in Oracle?

I have a table like this:
id name value
1 elec 10
1 water 20
2 elec 15
2 water 45
Now I need to dynamically add some rows to the result of select query:
id name value
1 elec 10
1 water 20
1 ratio 0.5
2 elec 15
2 water 45
2 ratio 0.33
Add two rows dynamically,how can i do?
It would make a lot more sense to "present" the results with ELEC, WATER and RATIO columns - one row per ID. The solution below shows how you can do that efficiently (reading the base table only one time).
with
inputs ( id, name, value ) as (
select 1, 'elec' , 10 from dual union all
select 1, 'water', 20 from dual union all
select 2, 'elec' , 15 from dual union all
select 2, 'water', 45 from dual
)
-- End of simulated inputs (not part of the solution).
-- SQL query begins BELOW THIS LINE. Use your actual table and column names.
select id, elec, water, round(elec/water, 2) as ratio
from inputs
pivot ( min(value) for name in ('elec' as elec, 'water' as water ) )
;
ID ELEC WATER RATIO
---------- ---------- ---------- ----------
1 10 20 .5
2 15 45 .33
If instead you need the results in the format you showed in your original post, you can unpivot like so (still reading the base table only once):
with
inputs ( id, name, value ) as (
select 1, 'elec' , 10 from dual union all
select 1, 'water', 20 from dual union all
select 2, 'elec' , 15 from dual union all
select 2, 'water', 45 from dual
)
-- End of simulated inputs (not part of the solution).
-- SQL query begins BELOW THIS LINE. Use your actual table and column names.
select id, name, value
from (
select id, elec, water, round(elec/water, 2) as ratio
from inputs
pivot ( min(value) for name in ('elec' as elec, 'water' as water ) )
)
unpivot ( value for name in (elec as 'elec', water as 'water', ratio as 'ratio') )
;
ID NAME VALUE
---------- ----- ----------
1 elec 10
1 water 20
1 ratio .5
2 elec 15
2 water 45
2 ratio .33
Here is one method:
with t as (
<your query here>
)
select id, name, value
from ((select t.*, 1 as ord
from t
) union all
(select id, 'ratio',
max(case when name = 'elec' then value end) / max(case when name = 'water' then value end)
), 2 as ord
from t
group by id
)
) tt
order by id, ord;
If you are fine with slight change in ordering, try this.
SELECT id,name,value FROM yourtable
UNION ALL
SELECT
a.id ,
'ratio' name,
a.value/b.value value
FROM
yourtable a
JOIN yourtable b on a.id = b.id
WHERE a.name = 'elec'
and b.name = 'water'
ORDER BY
id ,
VALUE DESC;
If you need to add the rows to table itself, then use.
INSERT INTO yourtable
SELECT
a.id ,
'ratio' name,
a.value/b.value value
FROM
yourtable a
JOIN yourtable b on a.id = b.id
WHERE a.name ='elec'
and b.name ='water';

Select Max Versions value

I need to extract data based on a certain version of a given record. I want to extract the max version based on the final save of the first user for an ID. Is this possible?
--In my mock up I have version numbers as 1,2,3 but the numbers are actually randomly assigned in my database.
I am trying to use:
select id, max(version) over partition by id
from t1
here is my data:
T1
ID User Version
1 123 1
1 123 2
1 123 3
1 456 4
1 456 5
1 789 6
2 452 1
2 452 2
2 587 3
2 123 4
3 901 1
3 767 2
3 456 3
here is what I am trying to extract:
T1
ID User MaxVersion
1 123 3
2 452 2
3 901 1
I think you want:
select t1.*
from (select id,
row_number() over (partition by id, user order by version desc) as seqnum,
max(user) keep (dense_rank first order by version) over (partition by id) as first_user
from t1
) t1
where seqnum = 1 and user = first_user;
You need to look for the user and the last record separately.
EDIT:
If you need the "first" final version, I would go with:
select t1.*
from (select t1.*,
min(case when user <> first_user then version end) over (partition by id) as last_version_plus_1
from (select id,
max(user) keep (dense_rank first order by version) over (partition by id) as first_user
from t1
) t1
where seqnum = 1 and user = first_user
) t1
where version < max_version;
Or, you can do this with correlated subqueries:
select t1.*
from t1
where t1.user = (select min(tt1.user) keep (dense_rank first order by tt1.version)
from t1 tt1
where tt1.id = t1.id
) and
t1.version < (select min(tt1.version)
from t1 tt1
where tt1.id = t1.id and tt1.user <> t1.user
);
This is the "old-fashioned" approach (pre-analytic functions). But it captures exactly the idea. The first makes sure the user is the first user. The second makes sure the version is from the first records for that user.
In Oracle 12.1 and above, match_recognize can do quick work of such requirements. (One benefit, compared to analytic functions solutions, is that the max(version) is calculated for just one user for each ID, without requiring a subquery to achieve this efficiency).
The match_recognize clause partitions by id and within each id it orders by version (ascending). Then a "match" is from the start of the partition only (^ in the pattern clause), and consists only of rows that have the same id as the first row (in that partition by id). All other rows for that id are ignored. Then the last version value is collected for the output.
NOTE: This assumes that, if for a given ID, the first user changes to a second, a third etc. but then reverts to the first user, the highest version number from the FIRST set of rows for that user is required. If instead the highest version number from ALL rows for that user is required, the query can be changed accordingly (specifically the PATTERN clause needs a change).
with
inputs ( id, usr, ver ) as (
select 1, 123, 1 from dual union all
select 1, 123, 2 from dual union all
select 1, 123, 3 from dual union all
select 1, 456, 4 from dual union all
select 1, 456, 5 from dual union all
select 1, 789, 6 from dual union all
select 2, 452, 1 from dual union all
select 2, 452, 2 from dual union all
select 2, 587, 3 from dual union all
select 2, 123, 4 from dual union all
select 3, 901, 1 from dual union all
select 3, 767, 2 from dual union all
select 3, 456, 3 from dual
)
-- End of simulated inputs (not part of the solution).
-- SQL query begins BELOW THIS LINE. Use your actual table and column names.
select id, usr, ver
from inputs
match_recognize (
partition by id
order by ver
measures last(usr) as usr,
last(ver) as ver
pattern ( ^ a+ )
define a as usr = first(usr)
);
ID USR VER
-- --- ---
1 123 3
2 452 2
3 901 1
EDIT:
For completeness, here is what the PATTERN should look like if a user may appear over non-consecutive rows, and the very last occurrence of that user (even if non-consecutive) for a given id must be considered:
...
pattern ( ^ a (x* a)? )
...
Here the first row in the partition is an a, and if the same user appears again for the same id there is at least one more a row; the last such row is caught by the optional part of the pattern, with the greedy match on x*.
select rnk,id, user,mv from (select rownum as rnk,id,user,max(version) from T1
group by id, user )where rnk=1;

ORA-00905: missing keyword when using Case in order by

I have the below query, where if the edit date is not null, then the most recent record needs to be returned and also should be randomized else the records should be randomized. I tried the below order by , but I am getting the missing keyword error.
SELECT * FROM ( SELECT c.id,c.edit_date, c.name,l.title
FROM tableA c, tableb l
WHERE c.id = l.id
AND c.published_ind = 'Y'
AND lc.type_id != 4
AND TRIM(c.img_file) IS NOT NULL
ORDER BY DBMS_RANDOM.VALUE
)
WHERE ROWNUM = 1
order by case when c.edit_date = 'null'
then DBMS_RANDOM.VALUE
else DBMS_RANDOM.VALUE, c.edit_date desc
end
If I get you correct, you try to get a record per ID with either the highest date (a random one if more records with the same date exists) or with a NULL date (again random one when more NULL records with the same ID exists.
So assuming this data
ID EDIT_DATE TEXT
---------- ------------------- ----
1 01.01.2015 00:00:00 A
1 01.01.2016 00:00:00 B
1 01.01.2016 00:00:00 C
2 01.01.2015 00:00:00 D
2 01.01.2016 00:00:00 E
2 F
2 G
You expect either B or C for ID =1 and either F or G for ID = 2.
This query do it.
The features used are ordering with NULLS FIRST and adding a random value as a last ordering column - to get random result if all preceeding columns are the same..
with dta as (
select 1 id, to_date('01012015','ddmmyyyy') edit_date, 'A' text from dual union all
select 1 id, to_date('01012016','ddmmyyyy') edit_date, 'B' text from dual union all
select 1 id, to_date('01012016','ddmmyyyy') edit_date, 'C' text from dual union all
select 2 id, to_date('01012015','ddmmyyyy') edit_date, 'D' text from dual union all
select 2 id, to_date('01012016','ddmmyyyy') edit_date, 'E' text from dual union all
select 2 id, NULL edit_date, 'F' text from dual union all
select 2 id, NULL edit_date, 'G' text from dual),
dta2 as (
select ID, EDIT_DATE, TEXT,
row_number() over (partition by ID order by edit_date DESC NULLS first, DBMS_RANDOM.VALUE) as rn
from dta)
select *
from dta2 where rn = 1
order by id
;
ID EDIT_DATE TEXT RN
---------- ------------------- ---- ----------
1 01.01.2016 00:00:00 B 1
2 F 1
Hopefully you can re-use thhe idea if you need a bit different result...
Statement WHERE always apply before statement ORDER BY. So in your query at first will applied WHERE ROWNUM = 1 and only after that will applied order by case ... for single record.
Perhaps you need add another subquery that at first execute ORDER BY, get rowset in proper order and after that execute WHERE ROWNUM = 1 to select single row.
Statment ORDER BY ... DBMS_RANDOM.VALUE, c.edit_date look strange. In fact, recordset will be sorted by DBMS_RANDOM.VALUE and if rowset has couple of rows have equal DBMS_RANDOM.VALUE we additionally will sort them by c.edit_date.

SQL Grouping by Ranges

I have a data set that has timestamped entries over various sets of groups.
Timestamp -- Group -- Value
---------------------------
1 -- A -- 10
2 -- A -- 20
3 -- B -- 15
4 -- B -- 25
5 -- C -- 5
6 -- A -- 5
7 -- A -- 10
I want to sum these values by the Group field, but parsed as it appears in the data. For example, the above data would result in the following output:
Group -- Sum
A -- 30
B -- 40
C -- 5
A -- 15
I do not want this, which is all I've been able to come up with on my own so far:
Group -- Sum
A -- 45
B -- 40
C -- 5
Using Oracle 11g, this is what I've hobbled togther so far. I know that this is wrong, by I'm hoping I'm at least on the right track with RANK(). In the real data, entries with the same group could be 2 timestamps apart, or 100; there could be one entry in a group, or 100 consecutive. It does not matter, I need them separated.
WITH SUB_Q AS
(SELECT K_ID
, GRP
, VAL
-- GET THE RANK FROM TIMESTAMP TO SEPARATE GROUPS WITH SAME NAME
, RANK() OVER(PARTITION BY K_ID ORDER BY TMSTAMP) AS RNK
FROM MY_TABLE
WHERE K_ID = 123)
SELECT T1.K_ID
, T1.GRP
, SUM(CASE
WHEN T1.GRP = T2.GRP THEN
T1.VAL
ELSE
0
END) AS TOTAL_VALUE
FROM SUB_Q T1 -- MAIN VALUE
INNER JOIN SUB_Q T2 -- TIMSTAMP AFTER
ON T1.K_ID = T2.K_ID
AND T1.RNK = T2.RNK - 1
GROUP BY T1.K_ID
, T1.GRP
Is it possible to group in this way? How would I go about doing this?
I approach this problem by defining a group which is the different of two row_number():
select group, sum(value)
from (select t.*,
(row_number() over (order by timestamp) -
row_number() over (partition by group order by timestamp)
) as grp
from my_table t
) t
group by group, grp
order by min(timestamp);
The difference of two row numbers is constant for adjacent values.
A solution using LAG and windowed analytic functions:
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE TEST ( "Timestamp", "Group", Value ) AS
SELECT 1, 'A', 10 FROM DUAL
UNION ALL SELECT 2, 'A', 20 FROM DUAL
UNION ALL SELECT 3, 'B', 15 FROM DUAL
UNION ALL SELECT 4, 'B', 25 FROM DUAL
UNION ALL SELECT 5, 'C', 5 FROM DUAL
UNION ALL SELECT 6, 'A', 5 FROM DUAL
UNION ALL SELECT 7, 'A', 10 FROM DUAL;
Query 1:
WITH changes AS (
SELECT t.*,
CASE WHEN LAG( "Group" ) OVER ( ORDER BY "Timestamp" ) = "Group" THEN 0 ELSE 1 END AS hasChangedGroup
FROM TEST t
),
groups AS (
SELECT "Group",
VALUE,
SUM( hasChangedGroup ) OVER ( ORDER BY "Timestamp" ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS grp
FROM changes
)
SELECT "Group",
SUM( VALUE )
FROM Groups
GROUP BY "Group", grp
ORDER BY grp
Results:
| Group | SUM(VALUE) |
|-------|------------|
| A | 30 |
| B | 40 |
| C | 5 |
| A | 15 |
This is typical "star_of_group" problem (see here: https://timurakhmadeev.wordpress.com/2013/07/21/start_of_group/)
In your case, it would be as follows:
with t as (
select 1 timestamp, 'A' grp, 10 value from dual union all
select 2, 'A', 20 from dual union all
select 3, 'B', 15 from dual union all
select 4, 'B', 25 from dual union all
select 5, 'C', 5 from dual union all
select 6, 'A', 5 from dual union all
select 7, 'A', 10 from dual
)
select min(timestamp), grp, sum(value) sum_value
from (
select t.*
, sum(start_of_group) over (order by timestamp) grp_id
from (
select t.*
, case when grp = lag(grp) over (order by timestamp) then 0 else 1 end
start_of_group
from t
) t
)
group by grp_id, grp
order by min(timestamp)
;