Related
I have two tables.
The first one "DISORDERS" with the fields: "ID" number, "Description" varchar2, "MINIMUM_SYMPTOMS" number
with these data:
The second table is "SYMPTOMS" with the fields: "ID" number, "SYMPTOM" varchar2, "DISORDER_ID" number (FK_DISORDERS_ID)
In my second table i have these data:
With this query i can find the minimum symptoms (records from the second table) in order for my patient to have the disorder:
select DISORDERS.DISORDER as DISORDER, ROUND((COUNT(SYMPTOMS.ID) / 2) ,0) as MINIMUM_SYMPTOMS
from SYMPTOMS SYMPTOMS,
DISORDERS DISORDERS
where SYMPTOMS.DISORDER_ID=DISORDERS.ID
group by DISORDERS.DISORDER
and i am getting correctly this result:
I want to update in my first table the field "MINIMUM_SYMPTOMS" and put the MINIMUM_SYMPTOMS value from the last query.
I didn't feel like typing that much so I created smaller sample data set.
SQL> select * from disorders order by id;
ID DISO MINIMUM_SYMPTOMS
---------- ---- ----------------
64 OCRD 0
65 OCD 0
248 GPD 0
SQL> select * from symptoms order by disorder_id, id;
ID SYMPTO DISORDER_ID
---------- ------ -----------
2 test 3 64
3 test 1 64
4 test 5 64
5 test 2 64
7 test 4 64
3 test 1 65
5 test 2 65
7 test 4 65
3 test 1 248
4 test 3 248
5 test 2 248
11 rows selected.
Query - similar to yours - but based only on the symptoms table. Why? You don't need disorders now; this query will be used a little bit later, in merges using clause:
SQL> select s.disorder_id,
2 round((count(s.id) / 2), 0) minsym
3 from symptoms s
4 group by s.disorder_id;
DISORDER_ID MINSYM
----------- ----------
248 2
64 3
65 2
SQL>
OK, let's now merge those results with the disorders table:
SQL> merge into disorders d
2 using (select s.disorder_id,
3 round((count(s.id) / 2), 0) minsym
4 from symptoms s
5 group by s.disorder_id
6 ) x
7 on (d.id = x.disorder_id)
8 when matched then update set
9 d.minimum_symptoms = x.minsym;
3 rows merged.
Result:
SQL> select * from disorders order by id;
ID DISO MINIMUM_SYMPTOMS
---------- ---- ----------------
64 OCRD 3
65 OCD 2
248 GPD 2
SQL>
So, yes - that's "how" you'll do that (at least, one option).
You can use update with a correlated subquery:
update disorders d
set minimum_symptoms = (select round(count(*) / 2)
from symptoms s
where s.disorder_id = d.id
);
Your example is updating all rows in disorders. If you only wanted to update rows that have a match in symptoms:
update disorders d
set minimum_symptoms = (select round(count(*) / 2)
from symptoms s
where s.disorder_id = d.id
)
where exists (select 1
from symptoms s
where s.disorder_id = d.id
);
Using Oracle 12c DB, I have the following table data example that I need assistance with using SQL and PL/SQL.
Table data is as follows:
Table Name: my_data
ID ITEM ITEM_LOC
------- ----------- ----------------
1 Item-1 0,1
2 Item-2 0,1,2,3,4,7
3 Item-3 0-48
4 Item-4 0,1,2,3,4,5,6,7,8
5 Item-5 1-33
6 Item-6 0,1
7 Item-7 0,1,5,8
Using the data above within the my_data table, what is the best way to process this ITEM_LOC as I need to use the values in this column as an individual value, i.e:
0,1 means the SQL needs to return either 0 or 1 or
range values, i.e:
0-48 means the SQL needs to return a value between 0 and 48.
The returned values for both scenarios should commence from lowest to highest and can't be re-used once processed.
Based on the above, it would be great to have a function that takes the ID and returns an individual value from ITEM_LOC that hasn't been used, based on my description above. This could be a comma-separated string value or a range string value.
Desired result for ID = 2 could be 7. For this ID = 2, ITEM_LOC = 7 could not be used again.
Desired result for ID = 5 could be 31. For this ID = 5, ITEM_LOC = 31 could not be used again.
For the ITEM_LOC data that could not be used again, against that ID, I am looking at holding another table to hold this or perhaps separate all data into separate rows with a new column called VALUE_USED.
This query shows how to extract list of ITEM_LOC values based on whether they are comma-separated (which means "take exactly those values") or dash-separated (which means "find all values between starting and end point"). I modified your sample data a little bit (didn't feel like displaying ~50 values if 5 of them do the job).
lines #1 - 6 represent sample data.
the first select (lines #7 - 15) splits comma-separated values into rows
the second select (lines #17 - 26) uses a hierarchical query which adds 1 to the starting value, up to item's end value.
SQL> with my_data (id, item, item_loc) as
2 (select 2, 'Item-2', '0,2,4,7' from dual union all
3 select 7, 'Item-7', '0,1,5' from dual union all
4 select 3, 'Item-3', '0-4' from dual union all
5 select 8, 'Item-8', '5-8' from dual
6 )
7 select id,
8 item,
9 regexp_substr(item_loc, '[^,]+', 1, column_value) loc
10 from my_data
11 cross join table(cast(multiset
12 (select level from dual
13 connect by level <= regexp_count(item_loc, ',') + 1
14 ) as sys.odcinumberlist))
15 where instr(item_loc, '-') = 0
16 union all
17 select id,
18 item,
19 to_char(to_number(regexp_substr(item_loc, '^\d+')) + column_value - 1) loc
20 from my_data
21 cross join table(cast(multiset
22 (select level from dual
23 connect by level <= to_number(regexp_substr(item_loc, '\d+$')) -
24 to_number(regexp_substr(item_loc, '^\d+')) + 1
25 ) as sys.odcinumberlist))
26 where instr(item_loc, '-') > 0
27 order by id, item, loc;
ID ITEM LOC
---------- ------ ----------------------------------------
2 Item-2 0
2 Item-2 2
2 Item-2 4
2 Item-2 7
3 Item-3 0
3 Item-3 1
3 Item-3 2
3 Item-3 3
3 Item-3 4
7 Item-7 0
7 Item-7 1
7 Item-7 5
8 Item-8 5
8 Item-8 6
8 Item-8 7
8 Item-8 8
16 rows selected.
SQL>
I don't know what you meant by saying that "item_loc could not be used again". Used where? If you use the above query in, for example, cursor FOR loop, then yes - those values would be used only once as every loop iteration fetches next item_loc value.
As others have said, it's a bad idea to store data in this way. You very likely could have input like this, and you likely could need to display the data like this, but you don't have to store the data the way it is input or displayed.
I'm going to store the data as individual LOC elements based on the input. I assume the data contains only integers separated by commas, or pairs of integers separated by a hyphen. Whitespace is ignored. The comma-separated list does not have to be in any order. In pairs, if the left integer is greater than the right integer I return no LOC element.
create table t as
with input(id, item, item_loc) as (
select 1, 'Item-1', ' 0,1' from dual union all
select 2, 'Item-2', '0,1,2,3,4,7' from dual union all
select 3, 'Item-3', '0-48' from dual union all
select 4, 'Item-4', '0,1,2,3,4,5,6,7,8' from dual union all
select 5, 'Item-5', '1-33' from dual union all
select 6, 'Item-6', '0,1' from dual union all
select 7, 'Item-7', '0,1,5,8,7 - 11' from dual
)
select distinct id, item, loc from input, xmltable(
'let $item := if (contains($X,",")) then ora:tokenize($X,"\,") else $X
for $i in $item
let $j := if (contains($i,"-")) then ora:tokenize($i,"\-") else $i
for $k in xs:int($j[1]) to xs:int($j[count($j)])
return $k'
passing item_loc as X
columns loc number path '.'
);
Now to "use" an element I just delete it from the table:
delete from t where rowid = (
select min(rowid) keep (dense_rank first order by loc)
from t
where id = 7
);
To return the data in the same format it was input, use MATCH_RECOGNIZE:
select id, item, listagg(item_loc, ',') within group(order by first_loc) item_loc
from t
match_recognize(
partition by id, item order by loc
measures a.loc first_loc,
a.loc || case count(*) when 1 then null else '-'||b.loc end item_loc
pattern (a b*)
define b as loc = prev(loc) + 1
)
group by id, item;
ID ITEM ITEM_LOC
1 Item-1 0-1
2 Item-2 0-4,7
3 Item-3 0-48
4 Item-4 0-8
5 Item-5 1-33
6 Item-6 0-1
7 Item-7 1,5,7-11
Note that the output here will not be exactly like the input, because any consecutive integers will be compressed into a pair.
My table contains data about Employee. However it is a temporary table and EmployeeID here isn't the primary key. The table may contain a given EmployeeID multiple times.
Now, I have to select batch of records of batchSize, let's consider 200 for now. I'll send these batches to multiple threads.
I have written this query:
WITH SingleBatch AS
(
SELECT
*,
ROW_NUMBER() OVER(ORDER BY EmployeeId) AS RowNumber
FROM
TemperoryTable
)
SELECT *
FROM SingleBatch
WHERE RowNumber BETWEEN 1 AND 200;
the result might be:
EmployeeID EffectiveDate
1 123 01/01/2016
2 541 01/01/2016
------------------------
------------------------
200 978 18/06/2015
for one batch.
This works fine and row numbers change with thread number.
Now suppose, second batch starts with EmployeeId 978. Then this employee will be in first batch as well as second batch. That is, same employee is being sent to multiple threads and may possibly cause conflict.
Although the scenario is very rare, I must avoid this.
What could be the possible solution here?
Sorry I don't get it before, you wish same empolyee can be gotten together? but the total return rows count possible is not fix number. May this is helpful for you.
;WITH t(RowNumber,EmployeeId,other)AS
(
SELECT 1,'a','1' UNION ALL
SELECT 2,'a','12' UNION ALL
SELECT 3,'a','13' UNION ALL
SELECT 4,'b','21' UNION ALL
SELECT 5,'d','41' UNION ALL
SELECT 6,'c','31' UNION ALL
SELECT 7,'c','32'
)
SELECT *,DENSE_RANK()OVER(ORDER BY EmployeeId) AS FilterID,RANK()OVER(ORDER BY EmployeeId) RowsCount FROM t
RowNumber EmployeeId other FilterID RowsCount
----------- ---------- ----- -------------------- --------------------
2 a 12 1 1
3 a 13 1 1
1 a 1 1 1
4 b 21 2 4
6 c 31 3 5
7 c 32 3 5
5 d 41 4 7
Same employeeid has same FilterID, and the RowsCount to control return rows count.
You should get data by RowsCount but rownumber.
For example:
Actual return 6 lines when the RowsCount between 1 and 5.
because the employeeID c have two lines.
Between mean RowNumber>=1 and RowNumber<=200
So next batch should be
RowNumber BETWEEN 201 AND 400
also you can change where clause to
RowNumber>=1 and RowNumber <200 (1-199)
RowNumber>=200 and RowNumber <400 (200-399)
I have a table called crewWork as follows :
CREATE TABLE crewWork(
FloorNumber int, AptNumber int, WorkType int, simTime int )
After the table was populated, I need to know how many times a change in apt occurred and how many times a change in floor occurred. Usually I expect to find 10 rows on each apt and 40-50 on each floor.
I could just write a scalar function for that, but I was wondering if there's any way to do that in t-SQL without having to write scalar functions.
Thanks
The data will look like this:
FloorNumber AptNumber WorkType simTime
1 1 12 10
1 1 12 25
1 1 13 35
1 1 13 47
1 2 12 52
1 2 12 59
1 2 13 68
1 1 14 75
1 4 12 79
1 4 12 89
1 4 13 92
1 4 14 105
1 3 12 115
1 3 13 129
1 3 14 138
2 1 12 142
2 1 12 150
2 1 14 168
2 1 14 171
2 3 12 180
2 3 13 190
2 3 13 200
2 3 14 205
3 3 14 216
3 4 12 228
3 4 12 231
3 4 14 249
3 4 13 260
3 1 12 280
3 1 13 295
2 1 14 315
2 2 12 328
2 2 14 346
I need the information for a report, I don't need to store it anywhere.
If you use the accepted answer as written now (1/6/2023), you get correct results with the OP dataset, but I think you can get wrong results with other data.
CONFIRMED: ACCEPTED ANSWER HAS A MISTAKE (as of 1/6/2023)
I explain the potential for wrong results in my comments on the accepted answer.
In this db<>fiddle, I demonstrate the wrong results. I use a slightly modified form of accepted answer (my syntax works in SQL Server and PostgreSQL). I use a slightly modified form of the OP's data (I change two rows). I demonstrate how the accepted answer can be changed slightly, to produce correct results.
The accepted answer is clever but needs a small change to produce correct results (as demonstrated in the above db<>fiddle and described here:
Instead of doing this as seen in the accepted answer COUNT(DISTINCT AptGroup)...
You should do thisCOUNT(DISTINCT CONCAT(AptGroup, '_', AptNumber))...
DDL:
SELECT * INTO crewWork FROM (VALUES
-- data from question, with a couple changes to demonstrate problems with the accepted answer
-- https://stackoverflow.com/q/8666295/1175496
--FloorNumber AptNumber WorkType simTime
(1, 1, 12, 10 ),
-- (1, 1, 12, 25 ), -- original
(2, 1, 12, 25 ), -- new, changing FloorNumber 1->2->1
(1, 1, 13, 35 ),
(1, 1, 13, 47 ),
(1, 2, 12, 52 ),
(1, 2, 12, 59 ),
(1, 2, 13, 68 ),
(1, 1, 14, 75 ),
(1, 4, 12, 79 ),
-- (1, 4, 12, 89 ), -- original
(1, 1, 12, 89 ), -- new , changing AptNumber 4->1->4 ges)
(1, 4, 13, 92 ),
(1, 4, 14, 105 ),
(1, 3, 12, 115 ),
...
DML:
;
WITH groupedWithConcats as (SELECT
*,
CONCAT(AptGroup,'_', AptNumber) as AptCombo,
CONCAT(FloorGroup,'_',FloorNumber) as FloorCombo
-- SQL SERVER doesnt have TEMPORARY keyword; Postgres doesn't understand # for temp tables
-- INTO TEMPORARY groupedWithConcats
FROM
(
SELECT
-- the columns shown in Andriy's answer:
-- https://stackoverflow.com/a/8667477/1175496
ROW_NUMBER() OVER ( ORDER BY simTime) as RN,
-- AptNumber
AptNumber,
ROW_NUMBER() OVER (PARTITION BY AptNumber ORDER BY simTime) as RN_Apt,
ROW_NUMBER() OVER ( ORDER BY simTime)
- ROW_NUMBER() OVER (PARTITION BY AptNumber ORDER BY simTime) as AptGroup,
-- FloorNumber
FloorNumber,
ROW_NUMBER() OVER (PARTITION BY FloorNumber ORDER BY simTime) as RN_Floor,
ROW_NUMBER() OVER ( ORDER BY simTime)
- ROW_NUMBER() OVER (PARTITION BY FloorNumber ORDER BY simTime) as FloorGroup
FROM crewWork
) grouped
)
-- if you want to see how the groupings work:
-- SELECT * FROM groupedWithConcats
-- otherwise just run this query to see the counts of "changes":
SELECT
COUNT(DISTINCT AptCombo)-1 as CountAptChangesWithConcat_Correct,
COUNT(DISTINCT AptGroup)-1 as CountAptChangesWithoutConcat_Wrong,
COUNT(DISTINCT FloorCombo)-1 as CountFloorChangesWithConcat_Correct,
COUNT(DISTINCT FloorGroup)-1 as CountFloorChangesWithoutConcat_Wrong
FROM groupedWithConcats;
ALTERNATIVE ANSWER
The accepted-answer may eventually get updated to remove the mistake. If that happens I can remove my warning but I still want leave you with this alternative way to produce the answer.
My approach goes like this: "check the previous row, if the value is different in previous row vs current row, then there is a change". SQL doesn't have idea or row order functions per se (at least not like in Excel for example; )
Instead, SQL has window functions. With SQL's window functions, you can use the window function RANK plus a self-JOIN technique as seen here to combine current row values and previous row values so you can compare them. Here is a db<>fiddle showing my approach, which I pasted below.
The intermediate table, showing the columns which has a value 1 if there is a change, 0 otherwise (i.e. FloorChange, AptChange), is shown at the bottom of the post...
DDL:
...same as above...
DML:
;
WITH rowNumbered AS (
SELECT
*,
ROW_NUMBER() OVER ( ORDER BY simTime) as RN
FROM crewWork
)
,joinedOnItself AS (
SELECT
rowNumbered.*,
rowNumberedRowShift.FloorNumber as FloorShift,
rowNumberedRowShift.AptNumber as AptShift,
CASE WHEN rowNumbered.FloorNumber <> rowNumberedRowShift.FloorNumber THEN 1 ELSE 0 END as FloorChange,
CASE WHEN rowNumbered.AptNumber <> rowNumberedRowShift.AptNumber THEN 1 ELSE 0 END as AptChange
FROM rowNumbered
LEFT OUTER JOIN rowNumbered as rowNumberedRowShift
ON rowNumbered.RN = (rowNumberedRowShift.RN+1)
)
-- if you want to see:
-- SELECT * FROM joinedOnItself;
SELECT
SUM(FloorChange) as FloorChanges,
SUM(AptChange) as AptChanges
FROM joinedOnItself;
Below see the first few rows of the intermediate table (joinedOnItself). This shows how my approach works. Note the last two columns, which have a value of 1 when there is a change in FloorNumber compared to FloorShift (noted in FloorChange), or a change in AptNumber compared to AptShift (noted in AptChange).
floornumber
aptnumber
worktype
simtime
rn
floorshift
aptshift
floorchange
aptchange
1
1
12
10
1
0
0
2
1
12
25
2
1
1
1
0
1
1
13
35
3
2
1
1
0
1
1
13
47
4
1
1
0
0
1
2
12
52
5
1
1
0
1
1
2
12
59
6
1
2
0
0
1
2
13
68
7
1
2
0
0
Note instead of using the window function RANK and JOIN, you could use the window function LAG to compare values in the current row to the previous row directly (no need to JOIN). I don't have that solution here, but it is described in the Wikipedia article example:
Window functions allow access to data in the records right before and after the current record.
If I am not missing anything, you could use the following method to find the number of changes:
determine groups of sequential rows with identical values;
count those groups;
subtract 1.
Apply the method individually for AptNumber and for FloorNumber.
The groups could be determined like in this answer, only there's isn't a Seq column in your case. Instead, another ROW_NUMBER() expression could be used. Here's an approximate solution:
;
WITH marked AS (
SELECT
FloorGroup = ROW_NUMBER() OVER ( ORDER BY simTime)
- ROW_NUMBER() OVER (PARTITION BY FloorNumber ORDER BY simTime),
AptGroup = ROW_NUMBER() OVER ( ORDER BY simTime)
- ROW_NUMBER() OVER (PARTITION BY AptNumber ORDER BY simTime)
FROM crewWork
)
SELECT
FloorChanges = COUNT(DISTINCT FloorGroup) - 1,
AptChanges = COUNT(DISTINCT AptGroup) - 1
FROM marked
(I'm assuming here that the simTime column defines the timeline of changes.)
UPDATE
Below is a table that shows how the distinct groups are obtained for AptNumber.
AptNumber RN RN_Apt AptGroup (= RN - RN_Apt)
--------- -- ------ ---------
1 1 1 0
1 2 2 0
1 3 3 0
1 4 4 0
2 5 1 4
2 6 2 4
2 7 3 4
1 8 5 => 3
4 9 1 8
4 10 2 8
4 11 3 8
4 12 4 8
3 13 1 12
3 14 2 12
3 15 3 12
1 16 6 10
… … … …
Here RN is a pseudo-column that stands for ROW_NUMBER() OVER (ORDER BY simTime). You can see that this is just a sequence of rankings starting from 1.
Another pseudo-column, RN_Apt contains values produces by the other ROW_NUMBER, namely ROW_NUMBER() OVER (PARTITION BY AptNumber ORDER BY simTime). It contains rankings within individual groups of identical AptNumber values. You can see that, for a newly encountered value, the sequence starts over, and for a recurring one, it continues where it stopped last time.
You can also see from the table that if we subtract RN from RN_Apt (could be the other way round, doesn't matter in this situation), we get the value that uniquely identifies every distinct group of same AptNumber values. You might as well call that value a group ID.
So, now that we've got these IDs, it only remains for us to count them (count distinct values, of course). That will be the number of groups, and the number of changes is one less (assuming the first group is not counted as a change).
add an extra column changecount
CREATE TABLE crewWork(
FloorNumber int, AptNumber int, WorkType int, simTime int ,changecount int)
increment changecount value for each updation
if want to know count for each field then add columns corresponding to it for changecount
Assuming that each record represents a different change, you can find changes per floor by:
select FloorNumber, count(*)
from crewWork
group by FloorNumber
And changes per apartment (assuming AptNumber uniquely identifies apartment) by:
select AptNumber, count(*)
from crewWork
group by AptNumber
Or (assuming AptNumber and FloorNumber together uniquely identifies apartment) by:
select FloorNumber, AptNumber, count(*)
from crewWork
group by FloorNumber, AptNumber
I am generating some test-data and use dbms_random. I encountered some strange behavior when using dbms_random in the condition of the JOIN, that I can not explain:
------------------------# test-data (ids 1 .. 3)
With x As (
Select Rownum id From dual
Connect By Rownum <= 3
)
------------------------# end of test-data
Select x.id,
x2.id id2
From x
Join x x2 On ( x2.id = Floor(dbms_random.value(1, 4)) )
Floor(dbms_random.value(1, 4) ) returns a random number out of (1,2,3), so I would have expected all rows of x to be joined with a random row of x2, or maybe always the same random row of x2 in case the random number is evaluated only once.
When trying several times, I get results like that, though:
(1) ID ID2 (2) ID ID2 (3)
---- ---- ---- ---- no rows selected.
1 2 1 3
1 3 2 3
2 2 3 3
2 3
3 2
3 3
What am I missing?
EDIT:
SELECT ROWNUM, FLOOR(dbms_random.VALUE (1, 4))
FROM dual CONNECT BY ROWNUM <= 3
would get the result in this case, but why does the original query behave like that?
To generate three rows with one predictable value and one random value, try this:
SQL> with x as (
2 select rownum id from dual
3 connect by rownum <= 3
4 )
5 , y as (
6 select floor(dbms_random.value(1, 4)) floor_val
7 from dual
8 )
9 select x.id,
10 y.floor_val
11 from x
12 cross join y
13 /
ID FLOOR_VAL
---------- ----------
1 2
2 3
3 2
SQL
edit
Why did your original query return an inconsistent set of rows?
Well, without the random bit in the ON clause your query was basically a CROSS JOIN of X against X - it would have returned nine rows (at least it would have if the syntax had allowed it). Each of those nine rows executes a call to DBMS_RANDOM.VALUE(). Only when the random value matches the current value of X2.ID is the row included in the result set. Consequently the query can return 0-9 rows, randomly.
Your solution is obviously simpler - I didn't refactor enough :)