Explain my problem with sample.
I have three tables ledger, balance, group.
Columns are
ledger ---> no, name, groupno
balance --> ledgerno, balance
group --> groupno, groupname, undergroupno
I want to show the ledger with top most parent which has balance > 0.
ledger
no name groupno
1 A 5
2 B 4
balance
ledgerno balance
1 100
2 200
group
groupno groupname undergroupno
1 AA 0
2 BB 0
3 CC 1
4 DD 1
5 EE 1
6 FF 1
7 GG 2
8 HH 2
9 II 2
10 JJ 2
So I want the result like this:
name balance
AA
CC
DD
B 100
EE
A 100
FF
I tried the below with query but it does not show the right results
WITH rel AS (
SELECT groupname, amount
FROM (
WITH RECURSIVE rel_tree AS (
SELECT groupno, groupname, undergroupno
FROM "group"
WHERE undergroupno = 0
UNION ALL
SELECT groupno, groupname, undergroupno
FROM balance b
INNER JOIN ledger l ON l.no = b.ledgerno
INNER JOIN "group" g ON g.groupno = l.groupno AS tt
INNER JOIN rel_tree r ON r.groupno = tt.undergroupno
)
SELECT *, 0 AS amount
FROM rel_tree
GROUP BY groupno, groupname, undergroupno
)
SELECT *
FROM rel
UNION ALL
SELECT groupname, amount
FROM (
SELECT name AS groupname, balance AS amount, groupname AS ord
FROM balance b
INNER JOIN ledger l ON l.no = b.ledgerno
INNER JOIN "group" g ON g.groupno = l.groupno) AS ta
INNER JOIN rel ON rel.groupname = ta.ord
Using postgresql 9.3
First of all, NEVER EVER use a SQL reserved word as a name for a table or a column. NEVER. EVER. Below I use grp instead of group.
Second, use column names that are immediately clear. Below I use parent instead of undergroupno.
Third, this is a really nice problem that I happily spent some time on. I am using recursive data structures myself and getting the query right is always something of a puzzle.
Fourth, what you state that you want is rather impossible. You have output on multiple lines from one table (grp), which is interspersed with data from other tables. I have a solution that comes quite close to what you specified. Here it is:
WITH tree AS (
WITH RECURSIVE rel_tree(parent, groupno, refs, path) AS (
SELECT groupno, groupno, 0, lpad(groupno::text, 4, '0') FROM grp WHERE parent > 0
UNION
SELECT g.parent, t.groupno, t.refs+1, lpad(g.parent::text, 4, '0') || '.' || t.path FROM grp g
JOIN rel_tree t ON t.parent = g.groupno)
SELECT * FROM rel_tree WHERE parent > 0
UNION
SELECT groupno, groupno, 0 AS refs, lpad(groupno::text, 4, '0') FROM grp WHERE parent = 0)
SELECT repeat(' ', t.refs) || grp.groupname AS name, l.name AS ledger, b.balance
FROM grp
JOIN (
SELECT DISTINCT ON (groupno) groupno, parent, max(refs) AS refs, path
FROM tree
GROUP BY parent, groupno, path
ORDER BY groupno, path) t USING (groupno)
LEFT JOIN ledger l USING (groupno)
LEFT JOIN balance b ON b.ledgerno = l.no
ORDER BY t.path
This gives the output:
name, ledger, balance
AA
CC
DD, B, 200
EE, A, 100
FF
BB
GG
HH
II
JJ
A few words on the recursive-with query:
This query yields a self-inclusive complete hierarchy. What this means is that it lists for every node of the hierarchy all of its parents, including itself. If you run the tree CTE as a separate query, you will find that it returns more rows than the 10 in the grp table.This is because it lists all grp records with their groupno as groupno but also as parent and additionally all parent nodes higher up in the hierarchy. Such a self-inclusive complete hierarchy is very handy when analyzing other properties of recursive data structures, such as containment and parentage.
Note that the hierarchy is built from the bottom up, starting with every node having itself as a parent, a refs value of 0 (i.e. 0 referrals between parent and self and a path which is just the groupno as a text value (padded with 0's). The recursion works its way up the hierarchy with increasing ref values and longer paths. The sub-select in the main query trims the complete list down to a single record for each grp record, which can be ordered by path. (Note that the use of refs can be omitted and replaced by length(path), but refs does have uses in other contexts of using a similar query.)
This query works to greater depths of the hierarchy as well. If you add:
11 KK 8
12 LL 8
13 MM 11
to table grp, the query will output:
name, ledger, balance
AA
CC
DD, B, 200
EE, A, 100
FF
BB
GG
HH
KK
MM
LL
II
JJ
Note that the path works with groupno values up to 9999. For larger groupno values increase the leading 0's in the recursive CTE, or consider using the ltree extension for more flexibility.
Related
Let's say I have a table in Postgres that stores a column of strings like this.
animal
cat/dog/bird
dog/lion
bird/dog
dog/cat
cat/bird
What I want to do, is calculate how "correlated" any two animals are to each other in this column, and store that as its own table so that I can easily look up how often "cat" and "dog" show up together.
For example, "cat" shows up a total of 3 times in all of these strings. Of those instances, "dog" shows up in the same string 2 out of the three times. Therefore, the correlation from cat -> dog would be 66%, and the number of co-occurrence instances (we'll call this instance_count) would be 2.
According to the above logic, the resulting table from this example would look like this.
base_animal
correlated_animal
instance_count
correlation
cat
cat
3
100
cat
dog
2
66
cat
bird
2
66
cat
lion
0
0
dog
dog
4
100
dog
cat
2
50
dog
bird
2
50
dog
lion
1
25
bird
bird
3
100
bird
cat
2
66
bird
dog
2
66
bird
lion
0
0
lion
lion
1
100
lion
cat
0
0
lion
dog
1
100
lion
bird
0
0
I've come up with a working solution in Python, but I have no idea how to do this easily in Postgres. Anybody have any ideas?
Edit:
Based off Erwin's answer, here's the same idea, except this answer doesn't make a record for animal combinations that never intersect.
with flat as (
select t.id, a
from (select row_number() over () as id, animal from animals) t,
unnest(string_to_array(t.animal, '/')) a
), ct as (select a, count(*) as ct from flat group by 1)
select
f1.a as b_animal,
f2.a as c_animal,
count(*) as instance_count,
round(count(*) * 100.0 / ct.ct, 0) as correlation
from flat f1
join flat f2 using(id)
join ct on f1.a = ct.a
group by f1.a, f2.a, ct.ct
Won't get much simpler or faster than this:
WITH flat AS (
SELECT t.id, a
FROM (SELECT row_number() OVER () AS id, animal FROM tbl) t
, unnest(string_to_array(t.animal, '/')) a
)
, ct AS (SELECT a, count(*) AS ct FROM flat GROUP BY 1)
SELECT a AS base_animal
, b AS corr_animal
, COALESCE(xc.ct, 0) AS instance_count
, COALESCE(round(xc.ct * 100.0 / x.ct), 0) AS correlation
FROM (
SELECT a.a, b.a AS b, a.ct
FROM ct a, ct b
) x
LEFT JOIN (
SELECT f1.a, f2.a AS b, count(*) AS ct
FROM flat f1
JOIN flat f2 USING (id)
GROUP BY 1,2
) xc USING (a,b)
ORDER BY a, instance_count DESC;
db<>fiddle here
Produces your desired result, except for ...
added consistent sort order
rounded correctly
This assumes distinct animals per row in the source data. Else it's unclear how to count the same animal in the same row exactly ...
Setp-by-step
CTE flat attaches an arbitrary row number as unique id. (If you have a PRIMARY KEY, use that instead and skip the subquery t.) Then unnest animals to get one pet per row (& id).
CTE ct gets the list of distinct animals & their total count.
The outer SELECT builds the complete raster of animal pairs (a / b) in subquery x, plus total count for a. LEFT JOIN to the actual pair count in subquery xc. Two steps are needed to keep pairs that never met in the result. Finally, compute and round the "correlation" smartly. See:
Look for percentage of characters in a word/phrase within a block of text
Updated task
If you don't need pairs that never met, and pairing with self, either, this could be your query:
-- updated task excluding pairs that never met and same pairing with self
WITH flat AS (
SELECT t.id, a, count(*) OVER (PARTITION BY a) AS ct
FROM (SELECT row_number() OVER () AS id, animal FROM tbl) t
, unnest(string_to_array(t.animal, '/')) a
)
SELECT f1.a AS base_animal
, f1.ct AS base_count
, f2.a AS corr_animal
, count(*) AS instance_count
, round(count(*) * 100.0 / f1.ct) AS correlation
FROM flat f1
JOIN flat f2 USING (id)
JOIN (SELECT a, count(*) AS ct FROM flat GROUP BY 1) ct ON ct.a = f1.a
WHERE f1.a <> f2.a -- exclude pairing with self
GROUP BY f1.a, f1.ct, f2.a
ORDER BY f1.a, instance_count DESC;
db<>fiddle here
I added the total occurrence count of the base animal as base_count.
Most notably, I dropped the additional CTE ct, and get the base_count from the first CTE with a window function. That's about the same cost by itself, but we then don't need another join in the outer query, which should be cheaper overall.
You can still use that if you include pairs with self. Check the fiddle.
Oh, and we don't need COALESCE any more.
Idea is to split the data into rows (using unnest(string_to_array())) and then cross-join same to get all permutations.
with data1 as (
select *
from corr_tab), data2 as (
select distinct un as base_animal, x.correlated_animal
from corr_tab, unnest(string_to_array(animal,'/')) un,
(select distinct un as correlated_animal
from corr_tab, unnest(string_to_array(animal,'/')) un) X)
select base_animal, correlated_animal,
(case
when
data2.base_animal = data2.correlated_animal
then
(select count(*) from data1 where substring(animal,data2.base_animal) is not NULL)
else
(select count(*) from data1 where substring(animal,data2.base_animal) is not NULL
and substring(animal,data2.correlated_animal) is not NULL)
end) instance_count,
(case
when
data2.base_animal = data2.correlated_animal
then
100
else
ceil(
(select count(*) from data1 where substring(animal,data2.base_animal) is not NULL
and substring(animal,data2.correlated_animal) is not NULL) * 100 /
(select count(*) from data1 where substring(animal,data2.base_animal) is not NULL) )
end) correlation
from data2
order by base_animal
Refer to fiddle here.
I am looking for workaround that works like parent-child, but without using recursive searching. I am not able to use temporary tables.
THIS SCRIPT WORKS but slowly, always run for 600 sec.:
SELECT CONNECT_BY_ROOT party_id as ANCESTOR,
party_id, role_id, subject_id
FROM onecrm.CRM_PARTY
WHERE LEVEL>1
and party_id = 'text'
CONNECT BY PRIOR Party_id=parent_id;
This works well, but it contain 3 steps. I need to use only one step because of aggregate tasks.
select internal_id, party_id, parent_id, subject_id, channel_type_id
from onecrm.O_ORDER oo
join onecrm.CRM_PARTY cp on oo.party_ref_no = cp.party_ref_no
where internal_id = 'O7VYECF';
Result:
INTERNAL_ID, PARTY_ID, PARENT_ID, SUBJECT_ID, CHANNEL_TYPE_ID
O7VYECF 110179237 110179236 null CRM
select internal_id, cp.party_id, parent_id
from onecrm.O_ORDER oo
right join onecrm.CRM_PARTY cp on oo.party_ref_no = cp.party_ref_no
where cp.party_id = '110179236';
Result:
INTERNAL_ID, PARTY_ID, PARENT_ID
OAMUAY7 110179236 null
select internal_id, cp.party_id, parent_id, cp.subject_id,
channel_type_id, full_name, phone_no_1, phone_no_2, email, segment
from onecrm.O_ORDER oo
right join onecrm.CRM_PARTY cp on oo.party_ref_no = cp.party_ref_no
left join onecrm.CRM_SUBJECT cs on cs.SUBJECT_ID = cp.SUBJECT_ID
left join onecrm.crm_contact_ref ccr on ccr.conre_ref_no = cs.subj_ref_no
left join onecrm.CRM_CONTACT_EXT cce on cce.contact_id = ccr.contact_id
where cp.party_id = '110179236';
Expected result:
INTERNAL_ID, PARTY_ID, PARENT_ID, SUBJECT_ID, CHANNEL_TYPE_ID, FULL_NAME, PHONE_NO_1, PHONE_NO_2, EMAIL, SEGMENT
OAMUAY7 110179236 null 102219217 TGB great_company s.r.o.
xxx xxx TNC RNC
Expected result is write only internal_id and get parent_id INFO
The original connect by query has no start with clause. This means it's calculating the tree for every single row in the table!
It's then applying the where clause to the tree generated.
For example the following builds a tree start for the rows C1 = 1, C1 = 2, & C1 = 3:
create table t as
select level c1, level - 1 c2
from dual
connect by level <= 3;
select t.*,
connect_by_root c1 rt
from t
connect by prior c1 = c2;
C1 C2 RT
1 0 1
2 1 1
3 2 1
2 1 2
3 2 2
3 2 3
As you load more data into the table, this will very quickly slow your query to a crawl.
Even if your where clause means you only get a few rows back, you're very likely to be processing a huge data set.
To avoid this, you almost certainly want a start with clause. This defines which row is the root of the tree:
select t.*,
connect_by_root c1 rt
from t
start with c1 = 1
connect by prior c1 = c2;
C1 C2 RT
1 0 1
2 1 1
3 2 1
I want to consolidate a set of records
(id) / (referencedid)
1 10
1 11
2 11
2 10
3 10
3 11
3 12
The result of query should be
1 10
1 11
3 10
3 11
3 12
So, since id=1 and id=2 has same set of corresponding referenceids {10,11} they would be consolidated. But id=3 s corresponding referenceids are not the same, hence wouldnt be consolidated.
What would be good way to get this done?
Select id, referenceid
From MyTable
Where Id In (
Select Min( Z.Id ) As Id
From (
Select Z1.id, Group_Concat( Z1.referenceid ) As signature
From (
Select id, referenceid
From MyTable
Order By id, referenceid
) As Z1
Group By Z1.id
) As Z
Group By Z.Signature
)
-- generate count of elements for each distinct id
with Counts as (
select
id,
count(1) as ReferenceCount
from
tblReferences R
group by
R.id
)
-- generate every pairing of two different id's, along with
-- their counts, and how many are equivalent between the two
,Pairings as (
select
R1.id as id1
,R2.id as id2
,C1.ReferenceCount as count1
,C2.ReferenceCount as count2
,sum(case when R1.referenceid = R2.referenceid then 1 else 0 end) as samecount
from
tblReferences R1 join Counts C1 on R1.id = C1.id
cross join
tblReferences R2 join Counts C2 on R2.id = C2.id
where
R1.id < R2.id
group by
R1.id, C1.ReferenceCount, R2.id, C2.ReferenceCount
)
-- generate the list of ids that are safe to remove by picking
-- out any id's that have the same number of matches, and same
-- size of list, which means their reference lists are identical.
-- since id2 > id, we can safely remove id2 as a copy of id, and
-- the smallest id of which all id2 > id are copies will be left
,RemovableIds as (
select
distinct id2 as id
from
Pairings P
where
P.count1 = P.count2 and P.count1 = P.samecount
)
-- validate the results by just selecting to see which id's
-- will be removed. can also include id in the query above
-- to see which id was identified as the copy
select id from RemovableIds R
-- comment out `select` above and uncomment `delete` below to
-- remove the records after verifying they are correct!
--delete from tblReferences where id in (select id from RemovableIds) R
I am trying to create a report that has a summary for each group. For example:
ID NAME COUNT TOTAL TYPE
-------------------------------------------------------------
1 Test 1 10 A
2 Test 2 8 A
18
7 Mr. Test 9 B
12 XYZ 4 B
13
25 ABC 3 C
26 DEF 5 C
19 GHIJK 1 C
9
I have a query that can do everything except the TOTAL columns:
select sd.id DATA_REF_NUM ID, count(sd.DATA_DEF_ID) COUNT, defs.data_name NAME, sd.type
from some_data sd, data_defs defs
where sd.data_def_id = defs.data_def_id
group by some_data.type, some_data.id, defs.data_nam
order by some_data.id asc, count(amv.MSG_ID) desc ;
I'm just not sure how to get a summary on a group. In this case, I'm trying to get a sum of COUNT for each group of ID.
UPDATE:
Groups are by type. Forgot that in the original post.
TOTAL is SUM(COUNT) for each group.
How about using ROLLUP like...
select sd.id DATA_REF_NUM ID, count(sd.DATA_DEF_ID) COUNT, defs.data_name NAME, sd.type
from some_data sd, data_defs defs
where sd.data_def_id = defs.data_def_id
group by ROLLUP(some_data.type, (some_data.id, defs.data_nam))
order by some_data.id asc, count(amv.MSG_ID) desc ;
This works for a similar example in my database, but I only did it over two columns, not sure how it will function over more...
Hope this is helpful,
Craig...
EDIT: In a ROLLUP, columns you want to sum over but not subtotal over like id and data_nam should be lumped together inside the ROLLUP in parantheses)
Assuming SQL*Plus, you could do something like this:
col d1 noprint
col d2 noprint
WITH q AS
(SELECT sd.id, count(sd.DATA_DEF_ID) COUNT, defs.data_name NAME, sd.type
FROM some_data sd JOIN data_defs defs ON (sd.data_def_id = defs.data_def_id)
GROUP BY some_data.type, some_data.id, defs.data_nam)
SELECT 1 d1, type d2, id, count, name FROM q
UNION ALL
SELECT 2, type, null, null, null, SUM(count) FROM q GROUP BY 2, type
ORDER BY 2,1,3;
I can't make this work in PL/SQL Developer 8, only SQL*Plus. Not even the command window will work...
Try a subquery that returns the count of all the items of the type. This would
select sd.id DATA_REF_NUM ID, count(sd.DATA_DEF_ID) COUNT, tot.TOTAL_FOR_TYPE, defs.data_name NAME, sd.type
from some_data sd, data_defs defs,
(select count(sd2.DATA_DEF_ID) TOTAL_FOR_TYPE
from some_data sd2
where sd2.type = sd.type) tot
where sd.data_def_id = defs.data_def_id
group by some_data.type, some_data.id, defs.data_nam
order by some_data.id asc, count(amv.MSG_ID) desc ;
I have a bit of an SQL problem. Here are my tables:
areas(id, name, sla_id)
areas_groups(id, group_id, areaprefix)
The sla_id is an identifier from a different source - it is unique, but areas has its own auto-incrementing primary key.
The areaprefix field is the interesting one. It just contains the first few digits of the sla_id and is unique. Each area can only exist in one group, so the area belongs to the group with the most specific prefix. Example:
Group 12's area prefixes: 105, 110, 115, 805
Group 13's area prefixes: 1, 8
Area sla_id = 10533071 matches both group 12 (105*) and group 13 (1*)
"105" is longer, so this area is in group 12
Area sla_id = 81031983 matches only group 13 (8*)
The reason it's done like this is so we can easily make a "catch-all" group for areas which don't fall into any other group.
I can find which group an area is in like this:
-- eg: area with sla_id 105055200
SELECT * FROM (
SELECT group_id
FROM areas_groups
WHERE SUBSTR('105055200', 0, LENGTH(area_prefix)) = area_prefix
ORDER BY LENGTH(area_prefix) DESC
)
WHERE rownum = 1;
(Did I mention this is Oracle?)
Going the other way is the tricky one: Given a group Id, I want to find all the areas which belong to that group. That is, given group 13, I want all the areas that start with 1 or 8 but not 105, 110, 115 or 805 (in this example).
The closest I've come is this:
SELECT a.id, a.sla_id, MAX(LENGTH(ag.area_prefix)), ag.group_id
FROM areas a INNER JOIN areas_groups ag
ON (SUBSTR(a.sla_id, 0, LENGTH(ag.area_prefix)) = ag.area_prefix)
WHERE a.sla_id IS NOT NULL
GROUP BY a.id, a.sla_id, ag.group_id
That returns data like this:
id sla_id leng group_id
583 105308400 3 12
583 105308400 1 13
584 105556700 3 12
584 105556700 1 13
So if I could only grab the group_id which has the longest length for each id... I have a feeling that I'm really close but just missing a tiny little thing... Can anyone help put me out of my misery?
select id
, sla_id
, leng
, group_id
, (row_number() over (partition by id order by leng desc)) rn
from
(
SELECT a.id, a.sla_id, MAX(LENGTH(ag.area_prefix)) leng, ag.group_id
FROM areas a INNER JOIN areas_groups ag
ON (SUBSTR(a.sla_id, 0, LENGTH(ag.area_prefix)) = ag.area_prefix)
WHERE a.sla_id IS NOT NULL
GROUP BY a.id, a.sla_id, ag.group_id
)
where rn = 1
This is untested on Oracle, but I believe Oracle has supported COALESCE as a string function since version 9 so this should be OK unless you're working on an old version of Oracle.
I have assumed that there is also group of area_prefix records with two characters.
select a.id
,a.sla_id
,coalesce(ag3.area_prefix,ag2.area_prefix,ag1.area_prefix) area_prefix
,coalesce(ag3.group_id,ag2.group_id,ag1.group_id) group_id
from areas a
left join areas_groups ag3
on substr(a.sla_id,1,3) = ag3.area_prefix
left join areas_groups ag2
on substr(a.sla_id,1,2) = ag2.area_prefix
left join areas_groups ag1
on substr(a.sla_id,1,1) = ag1.area_prefix