Counting Rows under a Specific Header Row - sql

I am trying to count the number of rows under specific "header rows" - for example, I have a table that looks like this:
Row # | Description | Repair_Code | Data Type
1 | FRONT LAMP | (null) | Header
2 | left head lamp | 1235 | Database
3 | right head lamp | 1236 | Database
4 | ROOF | (null) | Header
5 | headliner | 1567 | Database
6 | WHEELS | (null) | Header
7 | right wheel | 1145 | Database
Rows 1, 4 and 6 are header rows (categories) and the others are descriptors under each of those categories. The Data Type column denotes if the row is a header or not.
I want to be able to count the number of rows under the header rows to return something that looks like:
Header | Occurrences
FRONT LAMP | 2
ROOF | 1
WHEELS | 1
Thank you for the help!

Data model looks wrong. If that's some kind of a hierarchy, table should have yet another column which represents a "parent row#".
The way it is now, it's kind of questionable whether you can - or can not - do what you wanted. The only thing you can rely on is row#, which is sequential in your example. If that's not the case, then you have a problem.
So: if you use a lead analytic function for all header rows, then you could do something like this (sample data in rows #1 - 7; query that might help begins at line #8):
SQL> with test (rn, description, code) as
2 (select 1, 'front lamp' , null from dual union all
3 select 2, 'left head lamp' , 1235 from dual union all
4 select 3, 'right head lamp', 1236 from dual union all
5 select 4, 'roof' , null from dual union all
6 select 5, 'headliner' , 1567 from dual
7 ),
8 hdr as
9 -- header rows
10 (select rn,
11 description,
12 lead(rn) over (order by rn) next_rn
13 from test
14 where code is null
15 )
16 select h.description,
17 count(*)
18 from hdr h join test t on t.rn > h.rn
19 and (t.rn < h.next_rn or h.next_rn is null)
20 group by h.description;
DESCRIPTION COUNT(*)
--------------- ----------
front lamp 2
roof 1
SQL>
If data model was different (note parent_rn column), then you wouldn't depend on sequential row# values, but
SQL> with test (rn, description, code, parent_rn) as
2 (select 0, 'items' , null, null from dual union all
3 select 1, 'front lamp' , null, 0 from dual union all
4 select 2, 'left head lamp' , 1235, 1 from dual union all
5 select 3, 'right head lamp', 1236, 1 from dual union all
6 select 4, 'roof' , null, 0 from dual union all
7 select 5, 'headliner' , 1567, 4 from dual
8 ),
9 calc as
10 (select parent_rn,
11 sum(case when code is null then 0 else 1 end) cnt
12 from test
13 connect by prior rn = parent_rn
14 start with parent_rn is null
15 group by parent_rn
16 )
17 select t.description,
18 c.cnt
19 from test t join calc c on c.parent_rn = t.rn
20 where nvl(c.parent_rn, 0) <> 0;
DESCRIPTION CNT
--------------- ----------
front lamp 2
roof 1
SQL>

I would approach this using window functions. Assign a group to each header by doing a cumulative count of the NULL values of repair_code. Then aggregate:
select max(case when repair_code is null then description end) as description,
count(repair_code) as cnt
from (select t.*,
sum(case when repair_code is null then 1 else 0 end) over (order by row#) as grp
from t
) t
group by grp
order by min(row#);
Here is a db<>fiddle.

Related

How to combine two tables in Oracle SQL on a quantitative base

beacause of a really old db design I need some help. This might be quite simple I'm just not seeing the wood for the trees at the moment.
TABLE A:
ID
1
2
3
4
5
TABLE B:
ID
VALUE B
1
10
1
20
2
10
2
20
3
10
3
20
3
30
4
10
TABLE C:
ID
VALUE C
1
11
1
21
2
11
2
21
2
31
3
11
5
11
Expected result:
where ID = 1
ID
VALUE B
VALUE C
1
10
11
1
20
21
where ID = 2
ID
VALUE B
VALUE C
2
10
11
2
20
21
2
null
31
where ID = 3
ID
VALUE B
VALUE C
3
10
11
3
20
null
3
30
null
where ID = 4
ID
VALUE B
VALUE C
4
10
null
where ID = 5
ID
VALUE B
VALUE C
5
null
11
The entries in table B and C are optional and could be unlimited, the ID from table A is the connection.
B and C are not directly connected. I need a quantitative comparision to find gaps in the database. The number of entries of table B and C should be the same (but not the value), usually entries are missing in either B or C.
I tried it with outer joins but I'm getting two much rows, because I need B or C join only one time per single row.
I hope anybody understand my problem and can help me.
It looks like, for each distinct ID, you want the nth row (ordered by VALUE) from TABLE_A to match with the nth row from TABLE_B. And if one table - A or B - has more values, you want those to match to null.
Your solution will have two parts. First, use row_number() over ( partition by id order by value) to order the rows in both tables. Then, use FULL OUTER JOIN to join on (id, rownumber).
Here is a full example:
-- WITH clauses are just test data...+
with table_a (id) as (
SELECT 1 FROM DUAL UNION ALL
SELECT 2 FROM DUAL UNION ALL
SELECT 3 FROM DUAL UNION ALL
SELECT 4 FROM DUAL UNION ALL
SELECT 5 FROM DUAL ),
table_b (id, value) as (
SELECT 1,10 FROM DUAL UNION ALL
SELECT 1,20 FROM DUAL UNION ALL
SELECT 2,10 FROM DUAL UNION ALL
SELECT 2,20 FROM DUAL UNION ALL
SELECT 3,10 FROM DUAL UNION ALL
SELECT 3,20 FROM DUAL UNION ALL
SELECT 3,30 FROM DUAL UNION ALL
SELECT 4,10 FROM DUAL ),
table_c (id, value) as (
SELECT 1,11 FROM DUAL UNION ALL
SELECT 1,21 FROM DUAL UNION ALL
SELECT 2,11 FROM DUAL UNION ALL
SELECT 2,21 FROM DUAL UNION ALL
SELECT 2,31 FROM DUAL UNION ALL
SELECT 3,11 FROM DUAL UNION ALL
SELECT 5,11 FROM DUAL )
-- Solution begins here
SELECT id, b.value b_value, c.value c_value
FROM ( SELECT b.*,
row_number() OVER ( PARTITION BY b.id ORDER BY b.value ) rn
FROM table_b b ) b
FULL OUTER JOIN ( SELECT c.*,
row_number() OVER ( PARTITION BY c.id ORDER BY c.value ) rn
FROM table_c c ) c USING (id, rn)
ORDER BY id, b_value, c_value;
+----+---------+---------+
| ID | B_VALUE | C_VALUE |
+----+---------+---------+
| 1 | 10 | 11 |
| 1 | 20 | 21 |
| 2 | 10 | 11 |
| 2 | 20 | 21 |
| 2 | | 31 |
| 3 | 10 | 11 |
| 3 | 20 | |
| 3 | 30 | |
| 4 | 10 | |
| 5 | | 11 |
+----+---------+---------+

Filtering SQL using a oracle database

I would like to know if the following is possible
For example I have a shoe factory. In this factory I have a production line, Every step in this production line is recorded into the oracle database.
if the shoe has completed a production step the result is = 1
example table
Shoe_nr production step result
1 1 1
1 2 1
1 3
2 1 1
2 2 1
2 3
3 1
3 2
3 3
Now the question, is it possible to filter out production step 3 where only the shoes have passed production step 2 which is equal to 1 in result.
I know if it can be done it's probably very easy but if you dont know i found out it's a little bit tricky.
Thanks,
Chris
Yes, you can do it with IN and a Subselect
select *
from shoes
where shoe.id in (
select shoe.id
from shoes
where production_step = 2
and result = 1
)
and production_step = 3
This might be one option; see comments within code (lines #1 - 12 represent sample data; you already have that and don't type it. Query you might be interested in begins at line #13).
SQL> with shoes (shoe_nr, production_step, result) as
2 -- sample data
3 (select 1, 1, 1 from dual union all
4 select 1, 2, 1 from dual union all
5 select 1, 3, null from dual union all
6 select 2, 1, 1 from dual union all
7 select 2, 2, 1 from dual union all
8 select 2, 3, null from dual union all
9 select 3, 1, null from dual union all
10 select 3, 2, null from dual union all
11 select 3, 3, null from dual
12 ),
13 -- which shoes' production step #3 should be skipped?
14 skip as
15 (select shoe_nr
16 from shoes
17 where production_step = 2
18 and result = 1
19 )
20 -- finally:
21 select a.shoe_nr, a.production_step, a.result
22 from shoes a
23 where (a.shoe_nr, a.production_step) not in (select b.shoe_nr, 3
24 from skip b
25 )
26 order by a.shoe_nr, a.production_step;
SHOE_NR PRODUCTION_STEP RESULT
---------- --------------- ----------
1 1 1
1 2 1
2 1 1
2 2 1
3 1
3 2
3 3
7 rows selected.
SQL>
If you just want the shoe_nr that satisfy the condition, you can use aggregation and a having clause:
select shoe_nr
from mytable
group by shoe_nr
having
max(case when production_step = 2 then result end) = 0
and max(case when production_step = 3 then 1 end) = 1
If you want the entire row corresponding to this shoe_nr at step 3, use window functions instead:
select 1
from (
select
t.*,
max(case when production_step = 2 then result end)
over(partition by shoe_nr) as has_completed_step_2
from mytable t
) t
where production_step = 3 and has_completed_step_2 = 0

SQL: summing a column starting from row immediately after two consecutive 'trigger' values in another column

How to sum all values after two consecutive YES's in the CONDITION_SATISFIED column?
ID | CONDITION_SATISFIED | VALUE
--------------------------------
1 | NO | 100
2 | NO | 300
3 | NO | 500
4 | YES | 100
5 | YES | 300
6 | NO | 500 <-
7 | NO | 100 <-
8 | YES | 300 <-
9 | NO | 500 <-
--------------------------------
SUM | 1400
Note: further occurrences of YES/NO are ignored once the summation is started.
I've gotten to the point where I am able to generate two extra columns for the CONDITION_SATISFIED column like this:
ID | CONDITION_SATISFIED | VALUE RANK | REPEAT_COUNT
-------------------------------- -------------------
1 | NO | 100 1 | 3
2 | NO | 300 1 | 3
3 | NO | 500 1 | 3
4 | YES | 100 2 | 2
5 | YES | 300 2 | 2
6 | NO | 500 3 | 2 <- start from here
7 | NO | 100 3 | 2
8 | YES | 300 4 | 1
9 | NO | 500 5 | 1
-------------------------------- -------------------
But I'm not able to figure out how to get the first instance of REPEAT_COUNT >= 2 AND CONDITION_SATISFIED = 'YES', and then start the summation immediately after the 2nd YES (as indicated).
Hmmm . . . You can get the first where the two yesses are using lag():
select t.*
from (select t.*,
lag(condition_satisfied) over (order by id) as prev_cs,
lag(condition_satisfied, 2) over (order by id) as prev2_cs
from t
) t
where prev2_cs = 'YES' and prev_cs = 'YES';
Then you can just use this in a query:
select t.*
from t join
(select min(t.id) as id
from (select t.*,
lag(condition_satisfied) over (order by id) as prev_cs,
lag(condition_satisfied, 2) over (order by id) as prev2_cs
from t
) t
where prev2_cs = 'YES' and prev_cs = 'YES'
) yy
on t.id >= yy.id;
Oracle 12c: pattern matching
with t1 (id, condition_satisfied, value) as (
select 1, 'NO' , 100 from dual union all
select 2, 'NO' , 300 from dual union all
select 3, 'NO' , 500 from dual union all
select 4, 'YES', 100 from dual union all
select 5, 'YES', 300 from dual union all
select 6, 'NO' , 500 from dual union all
select 7, 'NO' , 100 from dual union all
select 8, 'YES', 300 from dual union all
select 9, 'NO' , 500 from dual)
select sum(v_value) as sum_value
from t1
match_recognize(
order by id
measures s.value as v_value
all rows per match
pattern (yes{2} s+)
define
yes as condition_satisfied = 'YES'
);
SUM_VALUE
----------
1400
If you have version lower than 12 no need to self-join and generate/prevent duplicates:
with s (id, condition_satisfied, value) as (
select 1, 'NO' , 100 from dual union all
select 2, 'NO' , 300 from dual union all
select 3, 'NO' , 500 from dual union all
select 4, 'YES', 100 from dual union all
select 5, 'YES', 300 from dual union all
select 6, 'NO' , 500 from dual union all
select 7, 'YES' , 100 from dual union all
select 8, 'YES', 300 from dual union all
select 9, 'NO' , 500 from dual)
select sum(value) sum_value
from
(select s.*, min(first_id) over () min_id
from
(select s.*,
case when condition_satisfied = 'YES' and condition_satisfied = lag(condition_satisfied) over (order by id) then id end first_id
from s
) s
)
where id > min_id;
SUM_VALUE
----------
1400

create window group based on value of preceding row

I have a table like so:
#standardSQL
WITH k AS (
SELECT 1 id, 1 subgrp, 'stuff1' content UNION ALL
SELECT 2, 2, 'stuff2' UNION ALL
SELECT 3, 3, 'stuff3' UNION ALL
SELECT 4, 4, 'stuff4' UNION ALL
SELECT 5, 1, 'ostuff1' UNION ALL
SELECT 6, 2, 'ostuff2' UNION ALL
SELECT 7, 3, 'ostuff3' UNION ALL
SELECT 8, 4, 'ostuff4'
)
and like to group based on the subgrp value to re-create the missing grp: if subgrp value is smaller than previous row, belongs to same group.
Intermediate result would be:
| id | grp | subgrp | content |
| 1 | 1 | 1 | stuff1 |
| 2 | 1 | 2 | stuff2 |
| 3 | 1 | 3 | stuff3 |
| 4 | 1 | 4 | stuff4 |
| 5 | 2 | 1 | ostuff1 |
| 6 | 2 | 2 | ostuff2 |
| 7 | 2 | 3 | ostuff3 |
| 8 | 2 | 4 | ostuff4 |
on which I can then apply
SELECT id, grp, ARRAY_AGG(STRUCT(subgrp, content)) rcd
FROM k ORDER BY id, grp
to have I nice nested structure.
Notes:
with 'id' ordered, subgrp is always in sequence so no 3 before 2
groups are not always 4 subgrp's - this is just to illustrate so cannot hardcode
Problem: how can I (re)create the grp column here ? I played with several Window functions to no avail.
EDIT
Although Gordon's answer work, it took 3min over 104M records to run and I had to remove an ORDER BY on the final resultset because of Resources exceeded during execution: The query could not be executed in the allotted memory. ORDER BY operator used too much memory.
Anyone having an alternative solution for large dataset ?
A simple way to assign the group is to do a cumulative count of the subgrp = 1 values:
select k.*,
sum(case when subgrp = 1 then 1 else 0 end) over (order by id) as grp
from k;
You can also do it your way, using lag() and a cumulative sum. That requires a subquery:
select k.*,
sum(case when prev_subgrp = subgrp then 0 else 1 end) over (order by id) as grp
from (select k.*,
lag(subgrp) over (order by id) as prev_subgrp
from k
) k
Below can potentially perform better - but has limitation - I assume there is no gaps in numbering within subgroups and respective ids
#standardSQL
WITH k AS (
SELECT 1 id, 1 subgrp, 'stuff1' content UNION ALL
SELECT 2, 2, 'stuff2' UNION ALL
SELECT 3, 3, 'stuff3' UNION ALL
SELECT 4, 4, 'stuff4' UNION ALL
SELECT 5, 1, 'ostuff1' UNION ALL
SELECT 6, 2, 'ostuff2' UNION ALL
SELECT 7, 3, 'ostuff3' UNION ALL
SELECT 8, 4, 'ostuff4'
)
SELECT
ROW_NUMBER() OVER(ORDER BY id) grp,
rcd
FROM (
SELECT
MIN(id) id,
ARRAY_AGG(STRUCT(subgrp, content)) rcd
FROM k
GROUP BY id - subgrp
)
result is
Row grp rcd.subgrp rcd.content
1 1 1 stuff1
2 stuff2
3 stuff3
4 stuff4
2 2 1 ostuff1
2 ostuff2
3 ostuff3
4 ostuff4

Sorting by max value [duplicate]

This question already has answers here:
How to select records with maximum values in two columns?
(2 answers)
Closed 9 years ago.
I have a table that looks like this in an Oracle DB:
TransactionID Customer_id Sequence Activity
---------- ------------- ---------- -----------
1 85 1 Forms
2 51 2 Factory
3 51 1 Forms
4 51 3 Listing
5 321 1 Forms
6 321 2 Forms
7 28 1 Text
8 74 1 Escalate
And I want to be able to sort out all rows where sequence is the highest for each customer_id.
I there a MAX() function I could use on sequence but based on customer_id somehow?
I would like the result of the query to look like this:
TransactionID Customer_id Sequence Activity
---------- ------------- ---------- -----------
1 85 1 Forms
4 51 3 Listing
6 321 2 Forms
7 28 1 Text
8 74 1 Escalate
select t1.*
from your_table t1
inner join
(
select customer_id, max(Sequence) mseq
from your_table
group by customer_id
) t2 on t1.customer_id = t2.customer_id and t1.sequence = t2.mseq
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE tbl ( TransactionID, Customer_id, Sequence, Activity ) AS
SELECT 1, 85, 1, 'Forms' FROM DUAL
UNION ALL SELECT 2, 51, 2, 'Factory' FROM DUAL
UNION ALL SELECT 3, 51, 1, 'Forms' FROM DUAL
UNION ALL SELECT 4, 51, 3, 'Listing' FROM DUAL
UNION ALL SELECT 5, 321, 1, 'Forms' FROM DUAL
UNION ALL SELECT 6, 321, 2, 'Forms' FROM DUAL
UNION ALL SELECT 7, 28, 1, 'Text' FROM DUAL
UNION ALL SELECT 8, 74, 1, 'Escalate' FROM DUAL;
Query 1:
SELECT
MAX( TransactionID ) KEEP ( DENSE_RANK LAST ORDER BY Sequence ) AS TransactionID,
Customer_ID,
MAX( Sequence ) KEEP ( DENSE_RANK LAST ORDER BY Sequence ) AS Sequence,
MAX( Activity ) KEEP ( DENSE_RANK LAST ORDER BY Sequence ) AS Activity
FROM tbl
GROUP BY Customer_ID
ORDER BY TransactionID
Results:
| TRANSACTIONID | CUSTOMER_ID | SEQUENCE | ACTIVITY |
|---------------|-------------|----------|----------|
| 1 | 85 | 1 | Forms |
| 4 | 51 | 3 | Listing |
| 6 | 321 | 2 | Forms |
| 7 | 28 | 1 | Text |
| 8 | 74 | 1 | Escalate |
Please Try it
with cte as
(
select Customer_id,MAX(Sequence) as p from Tablename group by Customer_id
)
select b.* from cte a join Tablename b on a.p = b.Sequence where a.p = b.Sequence and a.Customer_id=b.Customer_id order by b.TransactionID