I need to extract data based on a certain version of a given record. I want to extract the max version based on the final save of the first user for an ID. Is this possible?
--In my mock up I have version numbers as 1,2,3 but the numbers are actually randomly assigned in my database.
I am trying to use:
select id, max(version) over partition by id
from t1
here is my data:
T1
ID User Version
1 123 1
1 123 2
1 123 3
1 456 4
1 456 5
1 789 6
2 452 1
2 452 2
2 587 3
2 123 4
3 901 1
3 767 2
3 456 3
here is what I am trying to extract:
T1
ID User MaxVersion
1 123 3
2 452 2
3 901 1
I think you want:
select t1.*
from (select id,
row_number() over (partition by id, user order by version desc) as seqnum,
max(user) keep (dense_rank first order by version) over (partition by id) as first_user
from t1
) t1
where seqnum = 1 and user = first_user;
You need to look for the user and the last record separately.
EDIT:
If you need the "first" final version, I would go with:
select t1.*
from (select t1.*,
min(case when user <> first_user then version end) over (partition by id) as last_version_plus_1
from (select id,
max(user) keep (dense_rank first order by version) over (partition by id) as first_user
from t1
) t1
where seqnum = 1 and user = first_user
) t1
where version < max_version;
Or, you can do this with correlated subqueries:
select t1.*
from t1
where t1.user = (select min(tt1.user) keep (dense_rank first order by tt1.version)
from t1 tt1
where tt1.id = t1.id
) and
t1.version < (select min(tt1.version)
from t1 tt1
where tt1.id = t1.id and tt1.user <> t1.user
);
This is the "old-fashioned" approach (pre-analytic functions). But it captures exactly the idea. The first makes sure the user is the first user. The second makes sure the version is from the first records for that user.
In Oracle 12.1 and above, match_recognize can do quick work of such requirements. (One benefit, compared to analytic functions solutions, is that the max(version) is calculated for just one user for each ID, without requiring a subquery to achieve this efficiency).
The match_recognize clause partitions by id and within each id it orders by version (ascending). Then a "match" is from the start of the partition only (^ in the pattern clause), and consists only of rows that have the same id as the first row (in that partition by id). All other rows for that id are ignored. Then the last version value is collected for the output.
NOTE: This assumes that, if for a given ID, the first user changes to a second, a third etc. but then reverts to the first user, the highest version number from the FIRST set of rows for that user is required. If instead the highest version number from ALL rows for that user is required, the query can be changed accordingly (specifically the PATTERN clause needs a change).
with
inputs ( id, usr, ver ) as (
select 1, 123, 1 from dual union all
select 1, 123, 2 from dual union all
select 1, 123, 3 from dual union all
select 1, 456, 4 from dual union all
select 1, 456, 5 from dual union all
select 1, 789, 6 from dual union all
select 2, 452, 1 from dual union all
select 2, 452, 2 from dual union all
select 2, 587, 3 from dual union all
select 2, 123, 4 from dual union all
select 3, 901, 1 from dual union all
select 3, 767, 2 from dual union all
select 3, 456, 3 from dual
)
-- End of simulated inputs (not part of the solution).
-- SQL query begins BELOW THIS LINE. Use your actual table and column names.
select id, usr, ver
from inputs
match_recognize (
partition by id
order by ver
measures last(usr) as usr,
last(ver) as ver
pattern ( ^ a+ )
define a as usr = first(usr)
);
ID USR VER
-- --- ---
1 123 3
2 452 2
3 901 1
EDIT:
For completeness, here is what the PATTERN should look like if a user may appear over non-consecutive rows, and the very last occurrence of that user (even if non-consecutive) for a given id must be considered:
...
pattern ( ^ a (x* a)? )
...
Here the first row in the partition is an a, and if the same user appears again for the same id there is at least one more a row; the last such row is caught by the optional part of the pattern, with the greedy match on x*.
select rnk,id, user,mv from (select rownum as rnk,id,user,max(version) from T1
group by id, user )where rnk=1;
Related
I have this table:
id
RANK
111
1
111
2
111
3
222
1
222
2
I want to add two colums that will show if this is the first/last row for each id
id
first
last
111
YES
NO
111
NO
NO
111
NO
YES
222
YES
NO
222
NO
YES
Let's first point out that sorting without column to sort this is no good idea.
Usually, an id is unique and will be incremented, so it will already be sufficient to order by id.
If this is not the case, there should be at least be another column with a meaningful value (for example also an incrementing number or a datetime) which can be used to sort the result.
So you should fix your table design if possible and add such a column or make your already existing id column unique.
If this is not possible and you really have to order just by the row number, you could do following:
SELECT id,
CASE WHEN rn = 1 THEN 'YES' ELSE 'NO' END AS first,
CASE WHEN rn = COUNT(*) OVER (PARTITION BY id)
THEN 'YES' ELSE 'NO' END AS last
FROM
(
SELECT
id,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY id) rn
FROM yourtable
);
If you have a column to sort (let's name it "rank"), this will be much safer:
SELECT id,
CASE WHEN rn1 = 1 THEN 'YES' ELSE 'NO' END AS first,
CASE WHEN rn2 = 1 THEN 'YES' ELSE 'NO' END AS last
FROM
(
SELECT
id,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY rank) rn1,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY rank DESC) rn2
FROM yourtable
);
Here's one option:
Sample data:
SQL> with
2 test (id, rank) as
3 (select 111, 1 from dual union all
4 select 111, 2 from dual union all
5 select 111, 3 from dual union all
6 select 222, 1 from dual union all
7 select 222, 2 from dual
8 ),
Query begins here:
9 temp as
10 (select id,
11 rank,
12 first_value(rank) over (partition by id) rnk_min,
13 last_value(rank) over (partition by id ) rnk_max
14 from test
15 )
16 select id,
17 case when rank = rnk_min then 'Yes' else 'No' end first,
18 case when rank = rnk_max then 'Yes' else 'No' end last
19 from temp
20 order by id, rank;
ID FIRST LAST
---------- ------- -------
111 Yes No
111 No No
111 No Yes
222 Yes No
222 No Yes
SQL>
If you don't have rows with the same rank per id, you may use lag/lead functions to mark first and last rows with a flag using default argument of these functions, which is used when the function leaves a window boundary.
with sample_tab (id, rank) as (
select 111, 1 from dual union all
select 111, 2 from dual union all
select 111, 3 from dual union all
select 222, 1 from dual union all
select 222, 2 from dual
)
select
id
, lag('No', 1, 'Yes') over(partition by id order by rank asc) as last
, lead('No', 1, 'Yes') over(partition by id order by rank asc) as last
from sample_tab
ID
LAST
LAST
111
Yes
No
111
No
No
111
No
Yes
222
Yes
No
222
No
Yes
If the data may have the same rank for multiple rows per id, you may use the same technique (a case when function goes beyound window boundary) with coalesce.
with sample_tab (id, rank) as (
select 111, 1 from dual union all
select 111, 2 from dual union all
select 111, 2 from dual union all
select 222, 1 from dual union all
select 222, 2 from dual
)
select
id
, coalesce(max('No') over(
partition by id order by rank asc
/*RANGE for logical offset,
setting the same flag for a group of first/last rows*/
range between 1 preceding and 1 preceding
), 'Yes') as first
, coalesce(max('No') over(
partition by id order by rank asc
range between 1 following and 1 following
), 'Yes') as last
from sample_tab
ID
FIRST
LAST
111
Yes
No
111
No
Yes
111
No
Yes
222
Yes
No
222
No
Yes
fiddle
Here is the table where ORGID/USERID makes unique combination:
ORGID USERID
1 1
1 2
1 3
1 4
2 1
2 5
2 6
2 7
3 9
3 10
3 11
I need to select all records (organizations and users) wherever USERID 1 is present. So USERID=1 is present in ORGID 1 and 2 so then select all users for these organizations including user 1 itself, i.e.
ORGID USERID
1 1
1 2
1 3
1 4
2 1
2 5
2 6
2 7
Is it possible to do it with one SQL query rather than SELECT *.. WHERE USERID IN (SELECT...
You could use exists:
select *
from mytable t
where exists (select 1 from mytable t1 where t1.orgid = t.orgid and t1.userid = 1)
Another option is window functions. In Postgres:
select *
from (
select t.*,
bool_or(userid = 1) over(partition by orgid) has_user_1
from mytable t
) t
where has_user_1
Or a more generic approach, that uses portable expressions:
select *
from (
select t.*,
max(case when userid = 1 then 1 else 0 end) over(partition by orgid) has_user_1
from mytable t
) t
where has_user_1 = 1
Yes, you can do it with a single select statement - no in or exists conditions, no analytic or aggregate functions in a subquery, etc. Why you want to do it that way is not clear; in any case, it is possible that the solution below is also more efficient than the alternatives. You will have to test on your real-life data to see if that is true.
The solution below has two potential disadvantages: it only works in Oracle (it uses a proprietary extension of SQL, the match_recognize clause); and it only works in Oracle 12.1 or higher.
with
my_table(orgid, userid) as (
select 1, 1 from dual union all
select 1, 2 from dual union all
select 1, 3 from dual union all
select 1, 4 from dual union all
select 2, 1 from dual union all
select 2, 5 from dual union all
select 2, 6 from dual union all
select 2, 7 from dual union all
select 3, 9 from dual union all
select 3, 10 from dual union all
select 3, 11 from dual
)
-- End of SIMULATED data (for testing), not part of the solution.
-- In real life you don't need the WITH clause; reference your actual table.
select *
from my_table
match_recognize(
partition by orgid
all rows per match
pattern (x* a x*)
define a as userid = 1
);
Output:
ORGID USERID
---------- ----------
1 1
1 2
1 3
1 4
2 1
2 5
2 7
2 6
You can use exists:
select ou.*
from orguser ou
where exists (select 1
from orguser ou ou2
where ou2.orgid = ou.orgid and ou2.userid = 1
);
Apart from Exists and windows function, you can use IN as follows:
select *
from your_table t
where t.orgid in (select t1.orgid from your_table t1 where t1.userid = 1)
Suppose I have an SQL (Oracle Toad) table named "test", which has the following fields and entries (dates are in dd/mm/yyyy format):
id ref_date value
---------------------
1 01/01/2014 20
1 01/02/2014 25
1 01/06/2014 3
1 01/09/2014 6
2 01/04/2015 7
2 01/08/2015 43
2 01/09/2015 85
2 01/12/2015 4
I know from how the table has been created that, since there are value entries for id = 1 for February 2014 and June 2014, the values for March through May 2014 must be 0. The same applies to July and August 2014 for id = 1, and for May through July 2015 and October through November 2015 for id = 2.
Now, if I want to calculate, say, the median of the value column for a given id, I will not arrive at the correct result using the table as it stands - as I'm missing 5 zero entries for each id.
I would therefore like to create/use the following (potentially just temporary table)...
id ref_date value
---------------------
1 01/01/2014 20
1 01/02/2014 25
1 01/03/2014 0
1 01/04/2014 0
1 01/05/2014 0
1 01/06/2014 3
1 01/07/2014 0
1 01/08/2014 0
1 01/09/2014 6
2 01/04/2015 7
2 01/05/2015 0
2 01/06/2015 0
2 01/07/2015 0
2 01/08/2015 43
2 01/09/2015 85
2 01/10/2015 0
2 01/11/2015 0
2 01/12/2015 4
...on which I could then compute the median by id:
select id, median(value) as med_value from test group by id
How do I do this? Or would there be an alternative way?
Many thanks,
Mr Clueless
In this solution, I build a table with all the "needed dates" and value of 0 for all of them. Then, instead of a join, I do a union all, group by id and ref_date and ADD the values in each group. If the date had a row with a value in the original table, then that's the resulting value; and if it didn't, the value will be 0. This avoids a join. In almost all cases a union all + aggregate will be faster (sometimes much faster) than a join.
I added more input data for more thorough testing. In your original question, you have two id's, and for both of them you have four positive values. You are missing five values in each case, so there will be five zeros (0) which means the median is 0 in both cases. For id=3 (which I added) I have three positive values and three zeros; the median is half of the smallest positive number. For id=4 I have just one value, which then should be the median as well.
The solution includes, in particular, an answer to your specific question - how to create the temporary table (which most likely doesn't need to be a temporary table at all, but an inline view). With factored subqueries (in the WITH clause), the optimizer decides if to treat them as temporary tables or inline views; you can see what the optimizer decided if you look at the Explain Plan.
with
inputs ( id, ref_date, value ) as (
select 1, to_date('01/01/2014', 'dd/mm/yyyy'), 20 from dual union all
select 1, to_date('01/02/2014', 'dd/mm/yyyy'), 25 from dual union all
select 1, to_date('01/06/2014', 'dd/mm/yyyy'), 3 from dual union all
select 1, to_date('01/09/2014', 'dd/mm/yyyy'), 6 from dual union all
select 2, to_date('01/04/2015', 'dd/mm/yyyy'), 7 from dual union all
select 2, to_date('01/08/2015', 'dd/mm/yyyy'), 43 from dual union all
select 2, to_date('01/09/2015', 'dd/mm/yyyy'), 85 from dual union all
select 2, to_date('01/12/2015', 'dd/mm/yyyy'), 4 from dual union all
select 3, to_date('01/01/2016', 'dd/mm/yyyy'), 12 from dual union all
select 3, to_date('01/03/2016', 'dd/mm/yyyy'), 23 from dual union all
select 3, to_date('01/06/2016', 'dd/mm/yyyy'), 2 from dual union all
select 4, to_date('01/11/2014', 'dd/mm/yyyy'), 9 from dual
),
-- the "inputs" table constructed above is for testing only,
-- it is not part of the solution.
ranges ( id, min_date, max_date ) as (
select id, min(ref_date), max(ref_date)
from inputs
group by id
),
prep ( id, ref_date, value ) as (
select id, add_months(min_date, level - 1), 0
from ranges
connect by level <= 1 + months_between( max_date, min_date )
and prior id = id
and prior sys_guid() is not null
),
v ( id, ref_date, value ) as (
select id, ref_date, sum(value)
from ( select id, ref_date, value from prep union all
select id, ref_date, value from inputs
)
group by id, ref_date
)
select id, median(value) as median_value
from v
group by id
order by id -- ORDER BY is optional
;
ID MEDIAN_VALUE
-- ------------
1 0
2 0
3 1
4 9
If ref_date is date and is second
with int1 as (select id
, max(ref_date) as max_date
, min(ref_date) as min_date from test group by id )
, s(n) as (select level -1 from dual connect by level <= (select max(months_between(max_date, min_date)) from int1 ) )
select i.id
, add_months(i.min_date,s.n) as ref_date
, nvl(value,0) as value
from int1 i
join s on add_months(i.min_date,s.n) <= i.max_date
LEFT join test t on t.id = i.id and add_months(i.min_date,s.n) = t.ref_date
And with median
with int1 as (select id
, max(ref_date) as max_date
, min(ref_date) as min_date from test group by id )
, s(n) as (select level -1 from dual connect by level <= (select max(months_between(max_date, min_date)) from int1 ) )
select i.id
, MEDIAN(nvl(value,0)) as value
from int1 i
join s on add_months(i.min_date,s.n) <= i.max_date
LEFT join test t on t.id = i.id and add_months(i.min_date,s.n) = t.ref_date
group by i.id
select distinct account_num from account order by account_num;
The above query gave the below result
account_num
1
2
4
7
12
18
24
37
45
59
I want to split the account_num column into tuple of three account_num's like (1,2,4);(7,12,18);(24,37,45),(59); The last tuple has only one entry as there are no more account_num's left. Now I want a query to output the min and max of each tuple. (please observe that the max of one tuple is less than the min of the next tuple). Output desired is shown below
1 4
7 18
24 45
59 59
Edit: I have explained my requirement in the best way I could
You can use the example below as a scratch, this is only based on information you have provided so far. For further documentation, you can consult Oracle's analytical functions docs:
with src as( --create a source data
select 1 col from dual union
select 2 from dual union
select 4 from dual union
select 7 from dual union
select 12 from dual union
select 18 from dual union
select 24 from dual union
select 37 from dual union
select 45 from dual union
select 59 from dual
)
select
col,
decode(col_2, 0, max_col, col_2) col_2 -- for the last row we get the maximum value for the row
from (
select
col,
lead(col, 2, 0) over (order by col) col_2, -- we get the values from from two rows behind
max(col) over () max_col, -- we get the max value to be used for the last row in the result
rownum rn from src -- we get the rownum to handle the final output
) where mod(rn - 1, 3) = 0 -- only get rows having a step of two
This is another solution.
SELECT *
FROM (SELECT DISTINCT MIN(val) over(PARTITION BY gr) min_,
MAX(val) over(PARTITION BY gr) max_
FROM (SELECT val,
decode(trunc(rn / 3), rn / 3, rn / 3, ceil(rn / 3)) gr
FROM (SELECT val,
row_number() over(ORDER BY val) rn
FROM (select distinct account_num from account order by account_num)))) ORDER BY min_
UPDATED
Solution without analytic function.
SELECT MIN(val) min_,
MAX(val) max_
FROM (SELECT val,
ceil(rn / 3) gr
FROM (SELECT val,
rownum rn
FROM A_DEL_ME)) GROUP BY gr
Please add more information on what you want to do. What is the connection between account_number 1 and number 4, 7 and 18? Is there any? If not, why would you want to split this into two columns and what is the rule for splitting it?
With what you have posted, you could do something like this:
select 1 as account_num, 4 as account_num1 from dual
union all select 7 as account_num, 18 as account_num1 from dual
...
and so on, but I don't see the use for this.
I have 2 columns in a one-to-many relationship. I want to sort on the "many" and return the first occurrence of the "one". I need to page through the data so, for example, I need to be able to get the 3rd group of 10 unique "one" values.
I have a query like this:
SELECT id, name
FROM table1
INNER JOIN table2 ON table2.fkid = table1.id
ORDER BY name, id;
There can be multiple rows in table2 for each row in table1.
The results of my query look like this:
id | name
----------------
2 | apple
23 | banana
77 | cranberry
23 | dark chocolate
8 | egg
2 | yak
19 | zebra
I need to page through the result set with each page containing n unique ids. For example, if start=1 and n=4 I want to get back
2
23
77
8
in the order they were sorted on (i.e., name), where id is returned in the position of its first occurrence. Likewise if start=3 and n=4 and order = desc I want
8
23
77
2
I tried this:
SELECT * FROM (
SELECT id, ROWNUM rnum FROM (
SELECT DISTINCT id FROM (
SELECT id, name
FROM table1
INNER JOIN table2 ON table2.fkid = table1.id
ORDER BY name, id)
WHERE ROWNUM <= 4)
WHERE rnum >=1)
which gave me the ids in numerical order, instead of being ordered as the names would be.
I also tried:
SELECT * FROM (
SELECT DISTINCT id, ROWNUM rnum FROM (
SELECT id FROM (
SELECT id, name
FROM table1
INNER JOIN table2 ON table2.fkid = table1.id
ORDER BY name, id)
WHERE ROWNUM <= 4)
WHERE rnum >=1)
but that gave me duplicate values.
How can I page through the results of this data? I just need the ids, nothing from the "many" table.
update
I suppose I'm getting closer with changing my inner query to
SELECT id, name, rank() over (order by name, id)
FROM table1
INNER JOIN table2 ON table2.fkid = table1.id
...but I'm still getting duplicate ids.
You may need to debug it a little, but but it will be something like this:
SELECT * FROM (
SELECT * FROM (
SELECT id FROM (
SELECT id, name, row_number() over (partition by id order by name) rn
FROM table1
INNER JOIN table2 ON table2.fkid = table1.id
)
) WHERE rn=1 ORDER BY name, id
) WHERE rownum>=1 and rownum<=4;
It's a bit convoluted (and I would tend to suspect that it could be simplified) but it should work. You'd can put whatever start and end position you want in the WHERE clause-- I'm showing here with start=2 and n=4 are pulled from a separate table but you could simplify things by using a couple of parameters instead.
SQL> ed
Wrote file afiedt.buf
1 with t as (
2 select 2 id, 'apple' name from dual union all
3 select 23, 'banana' from dual union all
4 select 77, 'cranberry' from dual union all
5 select 23, 'dark chocolate' from dual union all
6 select 8, 'egg' from dual union all
7 select 2, 'yak' from dual union all
8 select 19, 'zebra' from dual
9 ),
10 x as (
11 select 2 start_pos, 4 n from dual
12 )
13 select *
14 from (
15 select distinct
16 id,
17 dense_rank() over (order by min_id_rnk) outer_rnk
18 from (
19 select id,
20 min(rnk) over (partition by id) min_id_rnk
21 from (
22 select id,
23 name,
24 rank() over (order by name) rnk
25 from t
26 )
27 )
28 )
29 where outer_rnk between (select start_pos from x) and (select start_pos+n-1 from x)
30* order by outer_rnk
SQL> /
ID OUTER_RNK
---------- ----------
23 2
77 3
8 4
19 5