How to select hierarchy collection? (mixed with non hierarchy data, etc) - sql

Having the table:
I need to show the following:
| ID | PERSONID | MASTERID | CHILDID | VALUE | DEPTHLEVEL |
---------------------------------------------------------------
| 1 | 3 | 78452 | 21456 | 100 | 1 |
| 2 | 3 | 21456 | | 0 | 2 |
| 3 | 3 | 652314 | 417859 | 115 | 1 |
| 4 | 3 | 417859 | | 0 | 2 |
| 5 | 4 | 998654 | 223655 | 300 | 1 |
| 6 | 4 | 223655 | | 0 | 2 |
| 7 | 4 | 201302 |789654,441592| 200 | 1 |
| 8 | 4 | 789654 | | 0 | 2 |
| 9 | 4 | 441592 | | 0 | 2 |
| 10 | 5 | 999852 | | 123 | 1 |
Look at the row with id 10 this row has not relations (childs), the row with id 7 has two childs.
I need to quit (put value to 0) the value for every child/leaf.
For the row 1-9 I try the following query:
select v.* from
(
select v.id, v.personid,
case when level > 1
then 0
else
v.value
end thevalue,
v.masterid, v.childid, level depthlevel
from tmpsimpleexample v
start with v.childid is not null
connect by v.masterid = prior v.childid
) v
order by v.id
Results:
Look the rows with id 7, 8 is the master with two childs, I need to put this in one row.
This is the first problem.
Also I need to show the data with no hierarchy relation(id 10 in expected result table, id 11 in image table data).
I think that I can query all rows with masterid not referenced by a childid and then make an union between the first query(above) and the query to search all master id not referenced by childid.
The query to to search all rows with masterid not referenced by childid will show me the row without relation and the master rows of level 1.
select id, personid, value thevalue, masterid, childid, 1 depthlevel
from TMPSIMPLEEXAMPLE
where masterid not in
(select childid from TMPSIMPLEEXAMPLE where childid is not null)
Here I can do an union and the result will fit my requirements(except the childid concatenate for master row).
select v.* from
(
select v.id, v.personid,
case when level > 1
then 0
else
v.value
end thevalue,
v.masterid, v.childid, level depthlevel
from tmpsimpleexample v
start with v.childid is not null
connect by v.masterid = prior v.childid
union
select id, personid, value thevalue, masterid, childid, 1 depthlevel
from TMPSIMPLEEXAMPLE
where masterid not in
(select childid from TMPSIMPLEEXAMPLE where childid is not null)
) v
order by v.id
Almost final result:
But knowing that my real table has hundred of thousands of records make union like that are a good approach?

I've taken a stab at what I think your source data looks like:
| ID | PERSONID | MASTERID | CHILDID | VALUE |
-----------------------------------------------
| 1 | 3 | 78452 | 21456 | 100 |
| 2 | 3 | 21456 | | -1 |
| 3 | 3 | 652314 | 417859 | 115 |
| 4 | 3 | 417859 | | -1 |
| 5 | 4 | 998654 | 223655 | 300 |
| 6 | 4 | 223655 | | -1 |
| 7 | 4 | 201302 | 441592 | 200 |
| 7 | 4 | 201302 | 789654 | 200 |
| 9 | 4 | 441592 | | -1 |
| 8 | 4 | 789654 | | -1 |
| 10 | 4 | 999852 | | 123 |
-----------------------------------------------
The following query gets you your desired results:
enter code here
select id,
personid,
masterid,
listagg(childid, ',') within group (order by childid) childid,
-- Took a guess that all values for a personid were the same and didn't need to be aggregated...
min(decode(depthlevel, 1, value, null)) value,
min(depthlevel) depthlevel
from (select v.*, level depthlevel
from tmpsimpleexample v
connect by v.masterid = prior v.childid
-- Trick here is to start with all of the desired starting conditions...
start with not exists ( select 'X' from tmpsimpleexample v2 where v2.childid = v.masterid ))
group by id, personid, masterid;
If ordering of your CHILDID is important, you would need to re-join the nested view with TMPSIMPLEEXAMPLE:
select v1.id,
v1.personid,
v1.masterid,
listagg(v1.childid, ',') within group (order by v2.id) childid,
min(decode(depthlevel, 1, v1.value, null)) value,
min(depthlevel) depthlevel
from (select v.*, level depthlevel
from tmpsimpleexample v
connect by v.masterid = prior v.childid
start with not exists ( select 'X' from tmpsimpleexample v2 where v2.childid = v.masterid )) v1,
tmpsimpleexample v2
-- Outer Join is important!
where v1.childid = v2.masterid (+)
group by v1.id, v1.personid, v1.masterid;
The real magic here is the LISTAGGG function. If you are not on 11g or better yet (why not?!?), then the following article can guide you in building your own aggregate function:
http://www.oracle-base.com/articles/misc/string-aggregation-techniques.php

Related

Postgres - Unique values for id column using CTE, Joins alongside GROUP BY

I have a table referrals:
id | user_id_owner | firstname | is_active | user_type | referred_at
----+---------------+-----------+-----------+-----------+-------------
3 | 2 | c | t | agent | 3
5 | 3 | e | f | customer | 5
4 | 1 | d | t | agent | 4
2 | 1 | b | f | agent | 2
1 | 1 | a | t | agent | 1
And another table activations
id | user_id_owner | referral_id | amount_earned | activated_at | app_id
----+---------------+-------------+---------------+--------------+--------
2 | 2 | 3 | 3.0 | 3 | a
4 | 1 | 1 | 6.0 | 5 | b
5 | 4 | 4 | 3.0 | 6 | c
1 | 1 | 2 | 2.0 | 2 | b
3 | 1 | 2 | 5.0 | 4 | b
6 | 1 | 2 | 7.0 | 8 | a
I am trying to generate another table from the two tables that has only unique values for referrals.id and returns as one of the columns the count for each apps as best_selling_app_count.
Here is the query I ran:
with agents
as
(select
referrals.id,
referral_id,
amount_earned,
referred_at,
activated_at,
activations.app_id
from referrals
left outer join activations
on (referrals.id = activations.referral_id)
where referrals.user_id_owner = 1),
distinct_referrals_by_id
as
(select
id,
count(referral_id) as activations_count,
sum(coalesce(amount_earned, 0)) as amount_earned,
referred_at,
max(activated_at) as last_activated_at
from
agents
group by id, referred_at),
distinct_referrals_by_app_id
as
(select id, app_id as best_selling_app,
count(app_id) as best_selling_app_count
from agents
group by id, app_id )
select *, dense_rank() over (order by best_selling_app_count desc) best_selling_app_rank
from distinct_referrals_by_id
inner join distinct_referrals_by_app_id
on (distinct_referrals_by_id.id = distinct_referrals_by_app_id.id);
Here is the result I got:
id | activations_count | amount_earned | referred_at | last_activated_at | id | best_selling_app | best_selling_app_count | best_selling_app_rank
----+-------------------+---------------+-------------+-------------------+----+------------------+------------------------+-----------------------
2 | 3 | 14.0 | 2 | 8 | 2 | b | 2 | 1
1 | 1 | 6.0 | 1 | 5 | 1 | b | 1 | 2
2 | 3 | 14.0 | 2 | 8 | 2 | a | 1 | 2
4 | 1 | 3.0 | 4 | 6 | 4 | c | 1 | 2
The problem with this result is that the table has a duplicate id of 2. I only need unique values for the id column.
I tried a workaround by harnessing distinct that gave desired result but I fear the query results may not be reliable and consistent.
Here is the workaround query:
with agents
as
(select
referrals.id,
referral_id,
amount_earned,
referred_at,
activated_at,
activations.app_id
from referrals
left outer join activations
on (referrals.id = activations.referral_id)
where referrals.user_id_owner = 1),
distinct_referrals_by_id
as
(select
id,
count(referral_id) as activations_count,
sum(coalesce(amount_earned, 0)) as amount_earned,
referred_at,
max(activated_at) as last_activated_at
from
agents
group by id, referred_at),
distinct_referrals_by_app_id
as
(select
distinct on(id), app_id as best_selling_app,
count(app_id) as best_selling_app_count
from agents
group by id, app_id
order by id, best_selling_app_count desc)
select *, dense_rank() over (order by best_selling_app_count desc) best_selling_app_rank
from distinct_referrals_by_id
inner join distinct_referrals_by_app_id
on (distinct_referrals_by_id.id = distinct_referrals_by_app_id.id);
I need a recommendation on how best to achieve this.
I am trying to generate another table from the two tables that has only unique values for referrals.id and returns as one of the columns the count for each apps as best_selling_app_count.
Your question is really complicated with a very complicated SQL query. However, the above is what looks like the actual question. If so, you can use:
select r.*,
a.app_id as most_common_app_id,
a.cnt as most_common_app_id_count
from referrals r left join
(select distinct on (a.referral_id) a.referral_id, a.app_id, count(*) as cnt
from activations a
group by a.referral_id, a.app_id
order by a.referral_id, count(*) desc
) a
on a.referral_id = r.id;
You have not explained the other columns that are in your result set.

How can I generate sequence number for sql select that gives sub numbers for descendant items?

I would like to generate sequence numbers for select that gives sub numbers for descendant items.
I want the numbers be the following format:
root: 1...n
children of root: 1.1 -> 1.n
sub children: 1.1.1 -> 1.1.n
and so on...
I have Item table which has an owner_ref foreign key
the table: (name of items is just an example, it can be anything)
id | item_name | parent_id | owner_ref_id
----|------------|-----------|--------------
1 | item_1 | null | 1
2 | item_1.1 | 1 | 1
3 | item_1.1.1 | 2 | 1
4 | item_2 | null | 1
5 | item_2.1 | 4 | 1
6 | item_2.2 | 4 | 1
--------------------------------------------
The outcome should looks like :
seq_num | item_name | parent_id | owner_ref_id
---------|------------|-----------|--------------
1 | item_1 | null | 1
1.1 | item_1.1 | 1 | 1
1.1.1 | item_1.1.1 | 2 | 1
2 | item_2 | null | 1
2.1 | item_2.1 | 4 | 1
2.2 | item_2.2 | 4 | 1
--------------------------------------------
Use recursive CTE to form a tree-like structure -
with recursive nodes(id,item_name, parent_id,lvl, path) as (
select id,item_name, parent_id, 1
, row_number() OVER (order by parent_id nulls first)::text as path
from items where parent_id is null
union all
select o.id,o.item_name, o.parent_id,n.lvl+1, n.path || '.' ||
row_number() OVER (partition by o.parent_id order by o.parent_id)::text
from items o
join nodes n on n.id = o.parent_id
)
select *
from nodes
order by id
View on DBFiddle

Split data by levels in hierarchy

Example of initial data:
| ID | ParentID |
|------|------------|
| 1 | NULL |
| 2 | 1 |
| 3 | 1 |
| 4 | 2 |
| 5 | NULL |
| 6 | 2 |
| 7 | 3 |
In my initial data I have ID of element and his parent ID.
Some elements has parent, some has not, some has a parent and his parent has a parent.
The maximum number of levels in this hierarchy is 3.
I need to get this hierarchy by levels.
Lvl 1 - elements without parents
Lvl 2 - elements with parent which doesn't have parent
Lvl 3 - elements with parent which has a parent too.
Expected result looks like:
| Lvl1 | Lvl2 | Lvl3 |
|-------|----------|----------|
| 1 | NULL | NULL |
| 1 | 2 | NULL |
| 1 | 3 | NULL |
| 1 | 2 | 4 |
| 5 | NULL | NULL |
| 1 | 2 | 6 |
| 1 | 3 | 7 |
How I can do it?
For a fixed dept of three, you can use CROSS APPLY.
It can be used like a JOIN, but also return extra records to give you the NULLs.
SELECT
Lvl1.ID AS lvl1,
Lvl2.ID AS lvl2,
Lvl3.ID AS lvl3
FROM
initial_data AS Lvl1
CROSS APPLY
(
SELECT ID FROM initial_data WHERE ParentID = Lvl1.ID
UNION ALL
SELECT NULL AS ID
)
AS Lvl2
CROSS APPLY
(
SELECT ID FROM initial_data WHERE ParentID = Lvl2.ID
UNION ALL
SELECT NULL AS ID
)
AS Lvl3
WHERE
Lvl1.ParentID IS NULL
ORDER BY
Lvl1.ID,
Lvl2.ID,
Lvl3.ID
But, as per my comment, this is often a sign that you're headed down a non-sql route. It might feel easier to start with, but later it turns and bites you, because SQL benefits tremendously from normalised structures (your starting data).

Oracle Connect_is_leaf similar in SQL server

Here is my query which is in Oracle PL/SQL syntax, How can I Change it to SQL server format?
Any alternatives for Connect_by_isleaf?
(
select PARTY_KEY, ltrim(sys_connect_by_path(alt_name, '|'), '|') AS alt_name_list
from
(select PARTY_KEY, alt_name, row_number() over(partition by PARTY_KEY order by alt_name) rno
from (
select party_key, (select alt_name_type_desc from "CRMS"."PRJ_APP_ALT_NAME_TYPE" where alt_name_type_cd = alt_name_type) || ' - ' || alt_name as alt_name
from "CDD_PROFILES"."PRJ_PRF_ALT_NAME" order by party_key, alt_name_type
) alt
)
where connect_by_isleaf = 1
connect by PARTY_KEY = prior PARTY_KEY
and rno = prior rno+1
start with rno = 1
)
tried to use With AS clause but it is not working somehow.
Thanks in advance
The equivalent in SQL Server is called a "recursive CTE".
You can read about it here:
https://learn.microsoft.com/en-us/sql/t-sql/queries/with-common-table-expression-transact-sql?view=sql-server-2017
Oracle Hierarchical queries can be rewritten as recursive CTE statements in databases that support them (SQL Server included). A classic set of hierarchical data would be an organization hierarchy such as the one below:
SQL Fiddle
MS SQL Server 2017 Schema Setup:
CREATE TABLE ORGANIZATIONS
([ID] int primary key
, [ORG_NAME] varchar(30)
, [ORG_TYPE] varchar(30)
, [PARENT_ID] int foreign key references organizations)
;
INSERT INTO ORGANIZATIONS
([ID], [ORG_NAME], [ORG_TYPE], [PARENT_ID])
VALUES
(1, 'ACME Corp', 'Company', NULL),
(2, 'Finance', 'Division', 1),
(6, 'Accounts Payable', 'Department', 2),
(7, 'Accounts Receivables', 'Department', 2),
(8, 'Payroll', 'Department', 2),
(3, 'Operations', 'Division', 1),
(4, 'Human Resources', 'Division', 1),
(10, 'Benefits Admin', 'Department', 4),
(5, 'Marketing', 'Division', 1),
(9, 'Sales', 'Department', 5)
;
In the recursive t1 below the select statement before the union all is the anchor query and the select statement after the union all is the recursive part. The recursive part has exactly one reference to t1 in its from clause. The org_path column simulates oracles sys_connect_by_path function concatenating the org_names together. The level column simulates oracles LEVEL pseudo column and is utilized in the output query to determine the leaf status (is_leaf column) similar to oracles connect_by_isleaf pseudo column:
with t1(id, org_name, org_type, parent_id, org_path, level) as (
select o.*
, cast('|' + org_name as varchar(max))
, 1
from organizations o
where parent_id is null
union all
select o.*
, t1.org_path+cast('|'+o.org_name as varchar(max))
, t1.level+1
from organizations o
join t1
on t1.id = o.parent_id
)
select t1.*
, case when t1.level < lead(t1.level) over (order by org_path) then 0 else 1 end is_leaf
from t1 order by org_path
Results:
| id | org_name | org_type | parent_id | org_path | level | is_leaf |
|----|----------------------|------------|-----------|-------------------------------------------|-------|---------|
| 1 | ACME Corp | Company | (null) | |ACME Corp | 1 | 0 |
| 2 | Finance | Division | 1 | |ACME Corp|Finance | 2 | 0 |
| 6 | Accounts Payable | Department | 2 | |ACME Corp|Finance|Accounts Payable | 3 | 1 |
| 7 | Accounts Receivables | Department | 2 | |ACME Corp|Finance|Accounts Receivables | 3 | 1 |
| 8 | Payroll | Department | 2 | |ACME Corp|Finance|Payroll | 3 | 1 |
| 4 | Human Resources | Division | 1 | |ACME Corp|Human Resources | 2 | 0 |
| 10 | Benefits Admin | Department | 4 | |ACME Corp|Human Resources|Benefits Admin | 3 | 1 |
| 5 | Marketing | Division | 1 | |ACME Corp|Marketing | 2 | 0 |
| 9 | Sales | Department | 5 | |ACME Corp|Marketing|Sales | 3 | 1 |
| 3 | Operations | Division | 1 | |ACME Corp|Operations | 2 | 1 |
To select just the leaf nodes, change the output query from above to another CTE (T2) dropping the order by clause or moving it to final output query and limiting by the is_leaf column:
with t1(id, org_name, org_type, parent_id, org_path, level) as (
select o.*
, cast('|' + org_name as varchar(max))
, 1
from organizations o
where parent_id is null
union all
select o.*
, t1.org_path+cast('|'+o.org_name as varchar(max))
, t1.level+1
from organizations o
join t1
on t1.id = o.parent_id
), t2 as (
select t1.*
, case when t1.level < lead(t1.level) over (order by org_path) then 0 else 1 end is_leaf
from t1
)
select * from t2 where is_leaf = 1
Results:
| id | org_name | org_type | parent_id | org_path | level | is_leaf |
|----|----------------------|------------|-----------|-------------------------------------------|-------|---------|
| 6 | Accounts Payable | Department | 2 | |ACME Corp|Finance|Accounts Payable | 3 | 1 |
| 7 | Accounts Receivables | Department | 2 | |ACME Corp|Finance|Accounts Receivables | 3 | 1 |
| 8 | Payroll | Department | 2 | |ACME Corp|Finance|Payroll | 3 | 1 |
| 10 | Benefits Admin | Department | 4 | |ACME Corp|Human Resources|Benefits Admin | 3 | 1 |
| 9 | Sales | Department | 5 | |ACME Corp|Marketing|Sales | 3 | 1 |
| 3 | Operations | Division | 1 | |ACME Corp|Operations | 2 | 1 |
Alternatively if you realize that leaf nodes can be identified by their lack of child nodes, you can flip this on its head and start with the leaf nodes, and search up the tree, retaining all the original record values, building out the org_path in reverse, and passing along the next parent id as next_id. In the final output, stage, selecting only those records whose next_id is null will yield the same results as the prior query:
with t1(id, org_name, org_type, parent_id, org_path, level, next_id) as (
select o.*
, cast('|'+org_name as varchar(max))
, 1
, parent_id
from organizations o
where not exists (select 1 from organizations c where c.parent_id = o.id)
union all
select t1.id
, t1.org_name
, t1.org_type
, t1.parent_id
, cast('|'+p.org_name as varchar(max))+t1.org_path
, level+1
, p.parent_id
from organizations p
join t1
on t1.next_id = p.id
)
select * from t1 where next_id is null order by org_path
Results:
| id | org_name | org_type | parent_id | org_path | level | next_id |
|----|----------------------|------------|-----------|-------------------------------------------|-------|---------|
| 6 | Accounts Payable | Department | 2 | |ACME Corp|Finance|Accounts Payable | 3 | (null) |
| 7 | Accounts Receivables | Department | 2 | |ACME Corp|Finance|Accounts Receivables | 3 | (null) |
| 8 | Payroll | Department | 2 | |ACME Corp|Finance|Payroll | 3 | (null) |
| 10 | Benefits Admin | Department | 4 | |ACME Corp|Human Resources|Benefits Admin | 3 | (null) |
| 9 | Sales | Department | 5 | |ACME Corp|Marketing|Sales | 3 | (null) |
| 3 | Operations | Division | 1 | |ACME Corp|Operations | 2 | (null) |
One of these two methods may prove more performant than the other, but you'll need to try them each out on your data to see which one works better.

Efficient ROW_NUMBER increment when column matches value

I'm trying to find an efficient way to derive the column Expected below from only Id and State. What I want is for the number Expected to increase each time State is 0 (ordered by Id).
+----+-------+----------+
| Id | State | Expected |
+----+-------+----------+
| 1 | 0 | 1 |
| 2 | 1 | 1 |
| 3 | 0 | 2 |
| 4 | 1 | 2 |
| 5 | 4 | 2 |
| 6 | 2 | 2 |
| 7 | 3 | 2 |
| 8 | 0 | 3 |
| 9 | 5 | 3 |
| 10 | 3 | 3 |
| 11 | 1 | 3 |
+----+-------+----------+
I have managed to accomplish this with the following SQL, but the execution time is very poor when the data set is large:
WITH Groups AS
(
SELECT Id, ROW_NUMBER() OVER (ORDER BY Id) AS GroupId FROM tblState WHERE State=0
)
SELECT S.Id, S.[State], S.Expected, G.GroupId FROM tblState S
OUTER APPLY (SELECT TOP 1 GroupId FROM Groups WHERE Groups.Id <= S.Id ORDER BY Id DESC) G
Is there a simpler and more efficient way to produce this result? (In SQL Server 2012 or later)
Just use a cumulative sum:
select s.*,
sum(case when state = 0 then 1 else 0 end) over (order by id) as expected
from tblState s;
Other method uses subquery :
select *,
(select count(*)
from table t1
where t1.id < t.id and state = 0
) as expected
from table t;