SQL: How to add values according to index columns - sql

I have an sql table which looks like the following:
|value| |position| |relates_to_position| |type|
100 | 2 | NULL | 1
50 | 6 | NULL | 2
20 | 7 | 6 | 3
From this I need to create the resulting table, which adds all the lines with a |relates_to_position| field to the line which has |position| = |relates_to_position|.
For the above table, this would be
|value| |position| |relates_to_position| |type|
100 2 NULL 1
70 6 NULL 2
I am quite a newbie in SQL, so I would be glad for help. The database I use is Oracle XE 11. There will only be a single level of relates_to_position, meaning, that if relates_to_position is set, no other line will reference to this line.

If we only assume 1 level of hierarchy. If multiple level's of hierarchy this gets more interesting.
SELECT A.Value+coalesce(B.Value,0) as Value
, A.Position
, A.Relates_to_Position
, A.Type
FROM Table A
LEFT JOIN Table B
on B.Relates_To_Position = A.Position
WHERE A. Relate_to_Position is null
What this does is a self join so it puts related records on the same row. it then eliminate all those records with a value in relate_to_position as they will be added to a parent row.
we use a LEFT join because not all records will have a related value and we use coalesce to ensure null's are not attempted to be added. (coalesce takes the first non-null value)
Not sure why you need relates_To_Position returned as it will ALWAYS be null..

If you can have more than one level of hierarchy and they all need to sum up to the root position, then the following ought to do the trick:
WITH sample_data AS (SELECT 100 VALUE, 2 position, NULL relates_to_position, 1 TYPE FROM dual UNION ALL
SELECT 50 VALUE, 6 position, NULL relates_to_position, 2 TYPE FROM dual UNION ALL
SELECT 20 VALUE, 7 position, 6 relates_to_position, 3 TYPE FROM dual UNION ALL
SELECT 10 VALUE, 8 position, 7 relates_to_position, 3 TYPE FROM dual)
SELECT SUM(VALUE) VALUE,
root_position position,
root_type TYPE
FROM (SELECT value,
position,
TYPE,
connect_by_root(position) root_position,
connect_by_root(TYPE) root_type
FROM sample_data
CONNECT BY PRIOR position = relates_to_position
START WITH relates_to_position IS NULL)
GROUP BY root_position,
root_type;
VALUE POSITION TYPE
---------- ---------- ----------
100 2 1
80 6 2

Related

How to process a column that holds a comma-separated or range string values in Oracle

Using Oracle 12c DB, I have the following table data example that I need assistance with using SQL and PL/SQL.
Table data is as follows:
Table Name: my_data
ID ITEM ITEM_LOC
------- ----------- ----------------
1 Item-1 0,1
2 Item-2 0,1,2,3,4,7
3 Item-3 0-48
4 Item-4 0,1,2,3,4,5,6,7,8
5 Item-5 1-33
6 Item-6 0,1
7 Item-7 0,1,5,8
Using the data above within the my_data table, what is the best way to process this ITEM_LOC as I need to use the values in this column as an individual value, i.e:
0,1 means the SQL needs to return either 0 or 1 or
range values, i.e:
0-48 means the SQL needs to return a value between 0 and 48.
The returned values for both scenarios should commence from lowest to highest and can't be re-used once processed.
Based on the above, it would be great to have a function that takes the ID and returns an individual value from ITEM_LOC that hasn't been used, based on my description above. This could be a comma-separated string value or a range string value.
Desired result for ID = 2 could be 7. For this ID = 2, ITEM_LOC = 7 could not be used again.
Desired result for ID = 5 could be 31. For this ID = 5, ITEM_LOC = 31 could not be used again.
For the ITEM_LOC data that could not be used again, against that ID, I am looking at holding another table to hold this or perhaps separate all data into separate rows with a new column called VALUE_USED.
This query shows how to extract list of ITEM_LOC values based on whether they are comma-separated (which means "take exactly those values") or dash-separated (which means "find all values between starting and end point"). I modified your sample data a little bit (didn't feel like displaying ~50 values if 5 of them do the job).
lines #1 - 6 represent sample data.
the first select (lines #7 - 15) splits comma-separated values into rows
the second select (lines #17 - 26) uses a hierarchical query which adds 1 to the starting value, up to item's end value.
SQL> with my_data (id, item, item_loc) as
2 (select 2, 'Item-2', '0,2,4,7' from dual union all
3 select 7, 'Item-7', '0,1,5' from dual union all
4 select 3, 'Item-3', '0-4' from dual union all
5 select 8, 'Item-8', '5-8' from dual
6 )
7 select id,
8 item,
9 regexp_substr(item_loc, '[^,]+', 1, column_value) loc
10 from my_data
11 cross join table(cast(multiset
12 (select level from dual
13 connect by level <= regexp_count(item_loc, ',') + 1
14 ) as sys.odcinumberlist))
15 where instr(item_loc, '-') = 0
16 union all
17 select id,
18 item,
19 to_char(to_number(regexp_substr(item_loc, '^\d+')) + column_value - 1) loc
20 from my_data
21 cross join table(cast(multiset
22 (select level from dual
23 connect by level <= to_number(regexp_substr(item_loc, '\d+$')) -
24 to_number(regexp_substr(item_loc, '^\d+')) + 1
25 ) as sys.odcinumberlist))
26 where instr(item_loc, '-') > 0
27 order by id, item, loc;
ID ITEM LOC
---------- ------ ----------------------------------------
2 Item-2 0
2 Item-2 2
2 Item-2 4
2 Item-2 7
3 Item-3 0
3 Item-3 1
3 Item-3 2
3 Item-3 3
3 Item-3 4
7 Item-7 0
7 Item-7 1
7 Item-7 5
8 Item-8 5
8 Item-8 6
8 Item-8 7
8 Item-8 8
16 rows selected.
SQL>
I don't know what you meant by saying that "item_loc could not be used again". Used where? If you use the above query in, for example, cursor FOR loop, then yes - those values would be used only once as every loop iteration fetches next item_loc value.
As others have said, it's a bad idea to store data in this way. You very likely could have input like this, and you likely could need to display the data like this, but you don't have to store the data the way it is input or displayed.
I'm going to store the data as individual LOC elements based on the input. I assume the data contains only integers separated by commas, or pairs of integers separated by a hyphen. Whitespace is ignored. The comma-separated list does not have to be in any order. In pairs, if the left integer is greater than the right integer I return no LOC element.
create table t as
with input(id, item, item_loc) as (
select 1, 'Item-1', ' 0,1' from dual union all
select 2, 'Item-2', '0,1,2,3,4,7' from dual union all
select 3, 'Item-3', '0-48' from dual union all
select 4, 'Item-4', '0,1,2,3,4,5,6,7,8' from dual union all
select 5, 'Item-5', '1-33' from dual union all
select 6, 'Item-6', '0,1' from dual union all
select 7, 'Item-7', '0,1,5,8,7 - 11' from dual
)
select distinct id, item, loc from input, xmltable(
'let $item := if (contains($X,",")) then ora:tokenize($X,"\,") else $X
for $i in $item
let $j := if (contains($i,"-")) then ora:tokenize($i,"\-") else $i
for $k in xs:int($j[1]) to xs:int($j[count($j)])
return $k'
passing item_loc as X
columns loc number path '.'
);
Now to "use" an element I just delete it from the table:
delete from t where rowid = (
select min(rowid) keep (dense_rank first order by loc)
from t
where id = 7
);
To return the data in the same format it was input, use MATCH_RECOGNIZE:
select id, item, listagg(item_loc, ',') within group(order by first_loc) item_loc
from t
match_recognize(
partition by id, item order by loc
measures a.loc first_loc,
a.loc || case count(*) when 1 then null else '-'||b.loc end item_loc
pattern (a b*)
define b as loc = prev(loc) + 1
)
group by id, item;
ID ITEM ITEM_LOC
1 Item-1 0-1
2 Item-2 0-4,7
3 Item-3 0-48
4 Item-4 0-8
5 Item-5 1-33
6 Item-6 0-1
7 Item-7 1,5,7-11
Note that the output here will not be exactly like the input, because any consecutive integers will be compressed into a pair.

Compare column entry to every other entry in the same column

I have a Column of values in SQLite.
value
-----
1
2
3
4
5
For each value I would like to know how many of the other values are larger and display the result. E.g. For value 1 there are 4 entries that have higher values.
value | Count
-------------
1 | 4
2 | 3
3 | 2
4 | 1
5 | 0
I have tried nested select statements and using the Count(*) function but I do not seem to be able to extract the correct levels. Any suggestions would be much appreciated.
Many Thanks
You can do this with a correlated subquery in SQLite:
select value,
(select count(*) from t t2 where t2.value > t.value) as "count"
from t;
In most other databases, you would use a ranking function such as rank() or dense_rank(), but SQLite doesn't support these functions.

Parent and child relationship is broken when do the sort based on parent name

I want to sort the name based on 1st level (sort only applicable root id is null or level ==1). If I search based on name the result set is broken the parent and child relationship.
I have 500 000 records. What is the best way to do this performance wise?
ID PARENT ID ROOT ID NAME level
===============================================
1 NULL NULL FIRST 1
2 1 1 SECOND 2
3 2 1 THIRD 3
4 1 1 FORTH 4
5 4 1 FIFTH 5
6 NULL NULL SIXTH 1
7 6 6 SEVENTH 2
8 7 6 EIGHTH 2
9 NULL NULL NINTH 1
10 NULL NULL TENTH 1
11 NULL NULL ELEVEN 1
12 11 11 TWELVE 2
13 12 11 THIRTEEN 3
14 13 11 FOURTEEN 4
EXPECTED OUTPUT - SORT BY NAME ASC
ID PARENT ID ROOT ID NAME level
===============================================
11 NULL NULL ELEVEN 1
12 11 11 TWELVE 2
13 12 11 THIRTEEN 3
14 13 11 FOURTEEN 4
1 NULL NULL FIRST 1
2 1 1 SECOND 2
3 2 1 THIRD 3
4 1 1 FORTH 4
5 4 1 FIFTH 5
9 NULL NULL NINTH 1
6 NULL NULL SIXTH 1
7 6 6 SEVENTH 2
8 7 6 EIGHTH 2
10 NULL NULL TENTH 1
It appears that you want to sort the results by two keys:
first, by the NAME column of the root row associated with each row, which may be that row itself, and
second, by the level column.
You can achieve this by joining the table to itself to make the root-NAME association. For example,
select a.*
from
my_table a
join my_table b
on isnull(a.[root id], a.id) = b.id
order by b.name, a.level
Notice the isnull() -- although you generally want to join based on b.id = a.[root id], you need to avoid excluding the root rows, whose root_id is NULL. With the isnull(), you join those rows based on id instead (i.e. you join them to themselves).
You can use a LEFT JOIN to get the parent's name, then use the parent's name to ORDER:
SELECT t1.ID, t1.[PARENT ID], t1.[ROOT ID], t1.NAME, t1.level
FROM mytable AS t1
LEFT JOIN mytable AS t2 ON t2.ID = t1.[ROOT ID]
ORDER BY COALESCE(t2.NAME, t1.NAME), t1.level
If the parent's name is not available, then this is a case of the current row being the parent itself. Hence the row's NAME field is used instead so as to sort.
Finally, records that belong to the same parent are sorted by level.
This should give you what I think you're looking for:
SELECT
id,
parent_id,
root_id,
name,
level
FROM
Your_Table T1
INNER JOIN Your_Table T2 ON T2.id = T1.root_id
ORDER BY
T2.name,
T1.level

In SQL, find duplicates in one column with unique values for another column

So I have a table of aliases linked to record ids. I need to find duplicate aliases with unique record ids. To explain better:
ID Alias Record ID
1 000123 4
2 000123 4
3 000234 4
4 000123 6
5 000345 6
6 000345 7
The result of a query on this table should be something to the effect of
000123 4 6
000345 6 7
Indicating that both record 4 and 6 have an alias of 000123 and both record 6 and 7 have an alias of 000345.
I was looking into using GROUP BY but if I group by alias then I can't select record id and if I group by both alias and record id it will only return the first two rows in this example where both columns are duplicates. The only solution I've found, and it's a terrible one that crashed my server, is to do two different selects for all the data and then join them
ON [T_1].[ALIAS] = [T_2].[ALIAS] AND NOT [T_1].[RECORD_ID] = [T_2].[RECORD_ID]
Are there any solutions out there that would work better? As in, not crash my server when run on a few hundred thousand records?
It looks as if you have two requirements:
Identify all aliases that have more than one record id, and
List the record ids for these aliases horizontally.
The first is a lot easier to do than the second. Here's some SQL that ought to get you where you want with the first:
WITH A -- Get a list of unique combinations of Alias and [Record ID]
AS (
SELECT Distinct
Alias
, [Record ID]
FROM T1
)
, B -- Get a list of all those Alias values that have more than one [Record ID] associated
AS (
SELECT Alias
FROM A
GROUP BY
Alias
HAVING COUNT(*) > 1
)
SELECT A.Alias
, A.[Record ID]
FROM A
JOIN B
ON A.Alias = B.Alias
Now, as for the second. If you're satisfied with the data in this form:
Alias Record ID
000123 4
000123 6
000345 6
000345 7
... you can stop there. Otherwise, things get tricky.
The PIVOT command will not necessarily help you, because it's trying to solve a different problem than the one you have.
I am assuming that you can't necessarily predict how many duplicate Record ID values you have per Alias, and thus don't know how many columns you'll need.
If you have only two, then displaying each of them in a column becomes a relatively trivial exercise. If you have more, I'd urge you to consider whether the destination for these records (a report? A web page? Excel?) might be able to do a better job of displaying them horizontally than SQL Server can do in returning them arranged horizontally.
Perhaps what you want is just the min() and max() of RecordId:
select Alias, min(RecordID), max(RecordId)
from yourTable t
group by Alias
having min(RecordId) <> max(RecordId)
You can also count the number of distinct values, using count(distinct):
select Alias, count(distinct RecordId) as NumRecordIds, min(RecordID), max(RecordId)
from yourTable t
group by Alias
having count(DISTINCT RecordID) > 1;
This will give all repeated values:
select Alias, count(RecordId) as NumRecordIds,
from yourTable t
group by Alias
having count(RecordId) <> count(distinct RecordId);
I agree with Ann L's answer but would like to show how you can use window functions with CTE's as you may prefer the readability.
(Re: how to pivot horizontally, I again agree with Ann)
create temporary table things (
id serial primary key,
alias varchar,
record_id int
)
insert into things (alias, record_id) values
('000123', 4),
('000123', 4),
('000234', 4),
('000123', 6),
('000345', 6),
('000345', 7);
with
things_with_distinct_aliases_and_record_ids as (
select distinct on (alias, record_id)
id,
alias,
record_id
from things
),
things_with_unique_record_id_counts_per_alias as (
select *,
COUNT(*) OVER(PARTITION BY alias) as unique_record_ids_count
from things_with_distinct_aliases_and_record_ids
)
select * from things_with_unique_record_id_counts_per_alias
where unique_record_ids_count > 1
The first CTE gets all the unique alias/record id combinations. E.g.
id | alias | record_id
----+--------+-----------
1 | 000123 | 4
4 | 000123 | 6
3 | 000234 | 4
5 | 000345 | 6
6 | 000345 | 7
The second CTE simply creates a new column for the above and adds the count of record ids for each alias. This allows you to filter only those aliases which have more than one record id associated with them.
id | alias | record_id | unique_record_ids_count
----+--------+-----------+-------------------------
1 | 000123 | 4 | 2
4 | 000123 | 6 | 2
3 | 000234 | 4 | 1
5 | 000345 | 6 | 2
6 | 000345 | 7 | 2
SELECT A.CitationId,B.CitationId, A.CitationName, A.LoaderID, A.PrimaryReferenceLoaderID,B.SecondaryReference1LoaderID, A.SecondaryReference1LoaderID, A.SecondaryReference2LoaderID,
A.SecondaryReference3LoaderID, A.SecondaryReference4LoaderID, A.CreatedOn, A.LastUpdatedOn
FROM CitationMaster A, CitationMaster B
WHERE A.PrimaryReferenceLoaderID= B.SecondaryReference1LoaderID and Isnull(A.PrimaryReferenceLoaderID,'') != '' and Isnull(B.SecondaryReference1LoaderID,'') !=''

Why does CONNECT BY LEVEL on a table return extra rows?

Using CONNECT BY LEVEL seems to return too many rows when performed on a table. What is the logic behind what's happening?
Assuming the following table:
create table a ( id number );
insert into a values (1);
insert into a values (2);
insert into a values (3);
This query returns 12 rows (SQL Fiddle).
select id, level as lvl
from a
connect by level <= 2
order by id, level
One row for each in table A with the value of column LVL being 1 and three for each in table A where the column LVL is 2, i.e.:
ID | LVL
---+-----
1 | 1
1 | 2
1 | 2
1 | 2
2 | 1
2 | 2
2 | 2
2 | 2
3 | 1
3 | 2
3 | 2
3 | 2
It is equivalent to this query, which returns the same results.
select id, level as lvl
from dual
cross join a
connect by level <= 2
order by id, level
I don't understand why these queries return 12 rows or why there are three rows where LVL is 2 and only one where LVL is 1 for each value of the ID column.
Increasing the number of levels that are "connected" to 3 returns 13 rows for each value of ID. 1 where LVL is 1, 3 where LVL is 2 and 9 where LVL is 3. This seems to suggest that the rows returned are the number of rows in table A to the power of the value of LVL minus 1.
I would have though that these queries would be the same as the following, which returns
6 rows
select id, lvl
from ( select level as lvl
from dual
connect by level <= 2
)
cross join a
order by id, lvl
The documentation isn't particularly clear, to me, in explaining what should occur. What's happening with these powers and why aren't the first two queries the same as the third?
When connect by is used without start with clause and prior operator, there is no restriction on joining children row to a parent row. And what Oracle does in this situation, it returns all possible hierarchy permutations by connecting a row to every row of level higher.
SQL> select b
2 , level as lvl
3 , sys_connect_by_path(b, '->') as ph
4 from a
5 connect by level <= 2
6 ;
B LVL PH
---------- ----------
1 1 ->1
1 2 ->1->1
2 2 ->1->2
3 2 ->1->3
2 1 ->2
1 2 ->2->1
2 2 ->2->2
3 2 ->2->3
3 1 ->3
1 2 ->3->1
2 2 ->3->2
3 2 ->3->3
12 rows selected
In the first query, you connect by just the level.
So if level <= 1, you get each of the records 1 time. If level <= 2, then you get each level 1 time (for level 1) + N times (where N is the number of records in the table). It is like you are cross joining, because you're just picking all records from the table until the level is reached, without having other conditions to limit the result. For level <= 3, this is done again for each of those results.
So for 3 records:
Lvl 1: 3 record (all having level 1)
Lvl 2: 3 records having level 1 + 3*3 records having level 2 = 12
Lvl 3: 3 + 3*3 + 3*3*3 = 39 (indeed, 13 records each).
Lvl 4: starting to see a pattern? :)
It's not really a cross join. A cross join would only return those records that have level 2 in this query result, while with this connect by, you get the records having level 1 as well as the records having level 2, thus resulting in 3 + 3*3 instead of just 3*3 record.
you're comparing apples to oranges when comparing the final query to the others as the LEVEL is isolated in that to the 1-row dual table.
lets consider this query:
select id, level as lvl
from a
connect by level <= 2
order by id, level
what that is saying is, start with the table set (select * From a). then, for each row returned connect this row to the prior row. as you have not defined a join in the connect by, this is in effect a Cartesian join, so when you have 3 rows of (1,2,3) 1 joins to 2, 1->3, 2->1, 2->3, 3->1 and 3->2 and they also join to themselves 1->1,2->2 and 3->3. these joins are level=2. so we have 9 joins there, which is why you get 12 rows (3 original "level 1" rows plus the Cartesian set).
so the number of rows output = rowcount + (rowcount^2)
in the last query you are isolating level to this
select level as lvl
from dual
connect by level <= 2
which of course returns 2 rows. this is then cartesianed to the original 3 rows, giving 6 rows as output.
You can use technique below to overcome this issue:
select id, level as lvl
from a
left outer join (select level l from dual connect by level <= 2) lev on 1 = 1
order by id