Oracle sql with regex - getting duplicate records - - sql

here is my table records - table name is temp.
1 | java,c,.net
2 | oracle,hadoop,ruby
Actually i am looking for data like this.
1| java
1 | c
1 | .net
2 | oracle
2| hadoop
2 | ruby
I written below query and expected result is not matching. can you please check verify my query why it is causing deplicate,
select id,
regexp_substr(liked,'[^,]+', 1, level) from
temp connect by regexp_substr(liked,'[^,]+', 1, level) is not null order by id

You want to do something like this:
SELECT id, REGEXP_SUBSTR(liked, '[^,]+', 1, LEVEL)
FROM temp
CONNECT BY REGEXP_SUBSTR(liked, '[^,]+', 1, LEVEL) IS NOT NULL
AND PRIOR id = id
AND PRIOR SYS_GUID() IS NOT NULL
ORDER BY id;
This way using DISTINCT is not necessary.
EDIT: Without using a random number in the CONNECT BY clause, you will get an error as Oracle will think it is an infinite loop.

To get the exact result you're looking for you need to both use DISTINCT and you need to order by LEVEL within ID:
SELECT ID, LIKED
FROM (select DISTINCT id, level, regexp_substr(liked,'[^,]+', 1, level) as liked
from temp
connect by regexp_substr(liked,'[^,]+', 1, level) is not null
order by id, level);
SQLFiddle here
Best of luck.

Related

Nested decode statement in ORACLE SQL

I'm writing simple SQL Statement where I need to found my boss's boss. For example if there is three ranks in the column like (lil boss,boss and big boss) and I input lil boss id it need to return the the big boss id. I think that this query must be write white case or decode statements but i have struggles with that. In this code everything would run fine except if I input lil boss ID.Can I use nested decode or case statements? I think that UNION statement would be solution also .
If my table has three column:
id
status
boss_id
1
big boss
null
2
big boss
null
3
lil boss
4
4
boss
2
If I input 3 it have to return 2
select decode(status,'big boss', id,
'boss',boss_id,
'lil boss',boss_id)
from bosses
where id=123113;
I need to found my boss's boss
Use a hierarchical query:
SELECT CONNECT_BY_ROOT id AS root_id,
CONNECT_BY_ROOT status AS root_status,
id,
status
FROM bosses
WHERE LEVEL = 3
OR (LEVEL < 3 AND CONNECT_BY_ISLEAF = 1)
START WITH id = 2
CONNECT BY PRIOR boss_id = id;
Which, for the sample data:
CREATE TABLE bosses (id, status, boss_id) AS
SELECT 1, 'big boss', null FROM DUAL UNION ALL
SELECT 2, 'big boss', null FROM DUAL UNION ALL
SELECT 3, 'lil boss', 4 FROM DUAL UNION ALL
SELECT 4, 'boss', 2 FROM DUAL;
Outputs:
ROOT_ID
ROOT_STATUS
ID
STATUS
3
lil boss
2
big boss
and if you START WITH id = 2:
SELECT CONNECT_BY_ROOT id AS root_id,
CONNECT_BY_ROOT status AS root_status,
id,
status
FROM bosses
WHERE LEVEL = 3
OR (LEVEL < 3 AND CONNECT_BY_ISLEAF = 1)
START WITH id = 2
CONNECT BY PRIOR boss_id = id;
Then it outputs:
ROOT_ID
ROOT_STATUS
ID
STATUS
2
big boss
2
big boss
fiddle
I was able to come to a solution to this problem. For this purpose I used hierarchical queries and the solution of #MT0. Here is the code that not need to use level. I think it is the best and simpliest solution to the problem. Thanks for helping me!
select id as bossid,
status as boss_status
from bosses b
where b.status='big boss'
start with id ='4'
connect by prior boss_id = id;

How to limit number of groups returned in a query, but not the number of rows in Oracle

How to limit the number of groups in a query, but not the number of rows in Oracle?
If I had to do that manually, I would have to use a DISTINCT.
Would be something like this:
FOR d IN (
SELECT DISTINCT COLUMN_1 FROM myTable
WHERE myDate BETWEEN x AND y
OFFSET o ROWS
FETCH NEXT l ROWS ONLY
) LOOP
And then, do the selects from each of the ids returned in the query, which, in my opinion, is a terrible solution.
SAMPLE DATA:
If I limit the number of groups to 2 by using COLUMN_2, the expected result should be something like:
I believe you may be looking for something like this:
select *
from mytable
where id in (
select distinct id
from my_table
where my_date between x and y
fetch first :n rows only
)
;
:n is a bind variable, encoding the number of groups you want to select.
This should be more efficient than solutions using analytic functions - even if it must read the base table twice. In tests posted on OTN, I showed that the difference is not small.
EDIT If I remember correctly, FETCH is not implemented in the most efficient way (perhaps for good reasons, having to do with features we don't need in this query - such as how to deal with ties). FETCH itself resembles a DENSE_RANK() implementation rather than the faster row limiting clause (using ROWNUM). I would likely need to modify the query to do away with FETCH, if speed was really important. END EDIT
Further edit to do with performance comparisons
Frequent poster MT0 requested a pointer for the claim that aggregate solutions can (and often are) more efficient than analytic function approaches, even when the former may require multiple passes through the data where the analytic function approach requires only one.
Alas, OTN (what now calls itself the "Oracle Groundbreakers Developer Community", the discussion board hosted by Oracle itself) went through a massive - and massively botched - platform change at the end of September 2020; that messed up both the search facilities and the formatting of old posts, to the point of rendering them almost unusable.
Instead, I will show here a simple mock-up of the OP's problem in this thread; code that anyone can run so they can repeat the tests on their own machine.
I created a table with two columns, ID and STR - the ID plays the same role as in the OP's question, and STR is just extra payload to mimic real-life data. ID is number and STR is varchar2(100). I populated the table with 9 million rows - 1 million ID's, nine rows for each ID. The task is to select just three "groups" (three distinct ID's, then select all the rows from the base table for those three distinct ID's).
With no index on the ID column, the aggregate solution runs in 0.81 seconds on my machine; with an index on ID, it runs in 0.47 seconds. The analytic functions solution runs in 0.91 seconds, with or without an index (obviously - there is no way an index can benefit the analytic function solution). All these results are for column ID not declared NOT NULL.
Here is the code to create the table, the index on ID, and the two queries I tested. Note: As I explained in my first edit (above), fetch is slow; I replaced it with a standard row-limiting technique using ROWNUM in an over-query.
drop table t purge;
create table t (id number, str varchar2(100));
insert into t
with row_gen as (select level from dual connect by level <= 3000)
select mod(344227 * rownum, 1000000), rpad('x', 100, 'x')
from row_gen cross join row_gen
;
commit;
create index t_idx on t(id);
select *
from t
where id in (
select id from (select distinct id from t)
where rownum <= 3
);
select *
from ( select t.*, dense_rank() over (order by id) dr from t )
where dr <= 3;
You can use DENSE_RANK:
SELECT *
FROM (
SELECT t.*,
DENSE_RANK() OVER ( ORDER BY column2 ) AS rnk
FROM table_name t
)
WHERE rnk <= 2;
Which, for the sample data:
CREATE TABLE table_name ( column1, column2, column3, column4 ) AS
SELECT 1, 1, 1.0, 1.0 FROM DUAL UNION ALL
SELECT 2, 2, 2.0, 2.0 FROM DUAL UNION ALL
SELECT 2, 2, 2.2, 2.1 FROM DUAL UNION ALL
SELECT 2, 2, 2.2, 2.2 FROM DUAL UNION ALL
SELECT 2, 2, 2.0, 2.3 FROM DUAL UNION ALL
SELECT 3, 3, 3.0, 3.1 FROM DUAL UNION ALL
SELECT 3, 3, 3.1, 3.1 FROM DUAL UNION ALL
SELECT 3, 3, 3.1, 3.1 FROM DUAL UNION ALL
SELECT 4, 4, 4.2, 4.0 FROM DUAL;
Outputs:
COLUMN1 | COLUMN2 | COLUMN3 | COLUMN4 | RNK
------: | ------: | ------: | ------: | --:
1 | 1 | 1 | 1 | 1
2 | 2 | 2 | 2 | 2
2 | 2 | 2.2 | 2.1 | 2
2 | 2 | 2.2 | 2.2 | 2
2 | 2 | 2 | 2.3 | 2
(and, if you want DISTINCT rows then add DISTINCT to the outer query)
db<>fiddle here
If I understand correctly, you want ROW_NUMBER():
SELECT t.*
FROM (SELECT t.*,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY id) as seqnum
FROM myTable t
WHERE t.myDate BETWEEN x AND y
) t
WHERE seqnum = 1;
This returns an arbitrary row for each id meeting the conditions.

Rewrite a query with "LEVEL" in snowflake

How can I write this SQL query into SNOWFLAKE?
SELECT LEVEL lv FROM DUAL CONNECT BY LEVEL <= 3;
You can find some good starting points by using CONNECT BY (https://docs.snowflake.com/en/sql-reference/constructs/connect-by.html) and here (https://docs.snowflake.com/en/user-guide/queries-hierarchical.html).
Snowflake is also supporting recursive CTEs.
It seems like you want to duplicate the rows three times. For a fixed, small multiplier such as 3, you could just enumerate the numbers:
select c.lv, a.*
from abc a
cross join (select 1 lv union all select 2 union all select 3) c
A more generic approach, in the spirit of the original query, uses a standard recursive common-table-expression to generate the numbers:
with cte as (
select 1 lv
union all select lv + 1 from cte where lv <= 3
)
select c.lv, a.*
from abc a
cross join cte c
There is level in snowflake. The differences from Oracle are:
In snowflake it's neccesary to use prior with connect by expression.
And you can't just select level - there should be any existing column in the select statement.
Example:
SELECT LEVEL, dummy FROM
(select 'X' dummy ) DUAL
CONNECT BY prior LEVEL <= 3;
LEVEL DUMMY
1 X
2 X
3 X
4 X
As per #Danil Suhomlinov post, we can further simplify using COLUMN1 for special table dual single column:
SELECT LEVEL, column1 FROM dual
CONNECT BY prior LEVEL <= 3;
LEVEL | COLUMN1
------+--------
1 |
2 |
3 |
4 |

Hierarchical queries in Oracle

I have a table with a structure like (id, parent_id) in Oracle11g.
id parent_id
---------------
1 (null)
2 (null)
3 1
4 3
5 3
6 2
7 2
I'd like to query it to get all the lines that are hierarchically linked to each of these id, so the results should be :
root_id id parent_id
------------------------
1 3 1
1 4 3
1 5 3
2 6 2
2 7 2
3 4 3
3 5 3
I've been struggling with the connect by and start with for quite some time now, and all i can get is a fraction of the results i want with queries like :
select connect_by_root(id) root_id, id, parent_id from my-table
start with id=1
connect by prior id = parent_id
I'd like to not use any for loop to get my complete results.
Any Idea ?
Best regards,
Jérôme Lefrère
PS : edited after first answer, noticing me i had forgotten some of the results i want...
The query you posted is missing the from clause and left an underscore out of connect_by_root, but I'll assume those aren't actually the source of your problem.
The following query gives you the result you're looking for:
select * from (
select connect_by_root(id) root_id, id, parent_id
from test1
start with parent_id is null
connect by prior id = parent_id)
where root_id <> id
The central problem is that you were specifying a specific value to start from, rather that specifying a way to identify the root rows. Changing id = 1 to parent_id is null allows the entire contents of the table to be returned.
I also added the outer query to filter the root rows out of the result set, which wasn't mentioned in your question, but was shown in your desired result.
SQL Fiddle Example
Comment Response:
In the version provided, you do get descendants of id = 3, but not in such a way that 3 is the root. This is because we're starting at the absolute root. Resolving this is easy, just omit the start with clause:
SELECT *
FROM
(SELECT connect_by_root(id) root_id,
id,
parent_id
FROM test1
CONNECT BY
PRIOR id = parent_id)
WHERE root_id <> id
SQL Fiddle Example
try this:
select connect_by_root(id) root_id, id, parent_id
from your_table
start with parent_id is null
connect by prior id = parent_id
It will give you the exact result you want:
select connect_by_root(id) as root, id, parent_id
from test1
connect by prior id=parent_id
start with parent_id is not null;

SQL - How to find all linked ids from a table

I have a table that shows the relationships between people a bit like this.
id linked_id
1 2
1 3
3 4
4 1
There is no apparent order to the table or the relationships.
I'm trying to find a way to list all the ids that have any kind of link to a given id. So for examples from the table above:
id = 1 would return 1, 2, 3 and 4.
id = 2 would also return 1, 2, 3 and 4
It's an oracle database, and the query would have to be in plain SQL. Thanks for your help, this has been driving me nuts.
You could use something like this:
SELECT linked_id
FROM DATA
START WITH ID = :1
CONNECT BY NOCYCLE PRIOR ID = linked_id
OR ID = PRIOR linked_id
UNION
SELECT ID
FROM DATA
START WITH linked_id = :1
CONNECT BY NOCYCLE PRIOR ID = linked_id
OR ID = PRIOR linked_id
UNION
SELECT :1 FROM dual