Match single relationship - cypher

This is a follow up question to How to efficiently find multiple relationship size
Assuming working with the movie sample graph
When running the below query
MATCH ()-[:PRODUCED]-() RETURN count(*)
we know that there are 15 PRODUCED relationships.
From the query
MATCH (n)-->(m)
WITH n,m, COUNT(*) as cnt
WHERE cnt=3 RETURN *
We know there are 2 PRODUCED relationships that connect 2 nodes with additional relationships.
How can we find the relationships that are distinct relationships between nodes (i.e. there are no additional relationships except for the PRODUCED relationship)?

Our solution was using this query
MATCH p=(n)-[:PRODUCED]->(m)
WHERE size((n)--(m))=1
RETURN count(p)
which results with 9 (total is 15, with size=3 is 2, and size=2 is 4)

Related

Recursive query within recursive query

I would like to solve a problem consisting of 2 recursions.
In one of the 2 recursions I find out the answer to one question which is "What is the leaf member of a specific input (template)?" This is already solved.
In a second recursion I would like to run this query for a number of other inputs (templates).
1st part of the problem:
I have a tree and would like to find the leaf of it. This part of the recursion can be solved using this query:
with recursive full_tree as (
select id, "previousVersionId", 1 as level
from template
where
template."id" = '5084520a-bb07-49e8-b111-3ea8182dc99f'
union all
select c.id, c."previousVersionId", p.level + 1
from template c
inner join full_tree p on c."previousVersionId" = p.id
)
select * from full_tree
order by level desc
limit 1
The query output is one record including the leaf id I'm interested in. This is fine.
2nd part of the query:
Here's the problem. I would like to run the first query n times.
Currently I can run the query only if it's just one id ('5084520a-bb07-49e8-b111-3ea8182dc99f' in the example). But what If I have a list of 100 such ids.
My ultimate goal is to get one id response (the leaf id) to each of the 100 template ids in the list.
In theory, a query that allows me to run above query for each of my e.g. 100 template ids would solve my problem.

This is the query I am trying to run in neo4j but it takes too long to run:

I am trying to run this query using Neo4j but it takes too long (more than 30 min, for almost 2500 nodes and 1.8 million relationships) to run:
Match (a:Art)-[r1]->(b:Art)
with collect({start:a.url,end:b.url,score:r1.ed_sc}) as row1
MATCH (a:Art)-[r1]->(b:Art)-[r2]->(c:Art)
Where a.url<>c.url
with row1 + collect({start:a.url,end:c.url,score:r1.ed_sc*r2.ed_sc}) as row2
Match (a:Art)-[r1]->(b:Art)-[r2]->(c:Art)-[r3]->(d:Art)
WHERE a.url<>c.url and b.url<>d.url and a.url<>d.url
with row2+collect({start:a.url,end:d.url,score:r1.ed_sc*r2.ed_sc*r3.ed_sc}) as allRows
unwind allRows as row
RETURN row.start as start ,row.end as end , sum(row.score) as final_score limit 10;
Here :Art is the label under which there are 2500 nodes, and there are bidirectional relationships between these nodes which has a property called ed_sc. So basically I am trying to find the score between two nodes by traversing one, two and three degree paths, and then sum these scores.
Is there a more optimized way to do this?
For one I'd discourage use of bidirectional relationships. If your graph is densely connected, this kind of modeling will play havoc on most queries like this.
Assuming url is unique for each :Art node, it would be better to compare the nodes themselves rather than their properties.
We should also be able to use variable-length relationships in place of your current approach:
MATCH p = (start:Art)-[*..3]->(end:Art)
WHERE all(node in nodes(p) WHERE single(t in nodes(p) where node = t))
WITH start, end, reduce(score = 1, rel in relationships(p) | score * rel.ed_sc) as score
WITH start, end, sum(score) as final_score
LIMIT 10
RETURN start.url as start, end.url as end, final_score

How can i find the number of disk accesses needed for a relational algebra query?

Hello are any of you very nice people able to explain the concept of query optimization to me in regards to relational algebra?
my preferred method of constructing relational algebra queries is by using temporary values step by step, but the only resources i can find for explaining how query optimization works to find the amount of disk access needed uses different notation for relational algebra queries, which confuses me.
so if i am given the following relations:
department(deptNo, deptName, location)
Employee(empNo, empName, empAddress, jobDesc, deptNo*)
and have produced the following relational algebra query to find all the programmers who work in a Manchester department as so:
temp1 = department JOIN employee
temp 2 = SELECT(jobdesc = 'programmer') (temp1)
result = SELECT(location = 'Manchester)(temp 2)
And i can assume that there are 10,00 tuples in the employee relation, 50 tuples in the department relation, 100 programmers (2 in each department) and one department located in Manchester, how would i work out how many disk accesses are needed?
Thankyou in advance!
Yup - Gordon's right. However, this is an academic exercise: you're building sets of data - assume each element/tuple returned by a sub-query is one disk access. General rule of thumb - limit the most amount of data as early as possible. Lets assume you do the JOIN first (10000 employees + 50 departments = 10050 disk entries {even thought he number of rows returned is 10000!}), then you do the SELECT (assuming that the sub-query is perfectly indexed) = (100 programmers + 1 department in Manchester) thus the total number of "accesses" = 10050+101 = 10151.
If you do the SELECTS first, the whole exercise changes dramatically: (temp 1=get programmers = 100 rows/disk accesses), (temp 2=get departments = 1 row/disk access), JOIN (again, assuming perfect indexing on temporary views/queries, etc, etc) = 50 rows: therefore total number of "accesses" = 100+1+50 = 151.
Same results, but the way it is interpreted and executed can influence the amount of work the database engine has to perform.
It's been many years, every possibility that I might have got this wrong - I don't mind being corrected.

SQL Server 2008: Recursive query where hierarchy isn't strict

I'm dealing with a large multi-national corp. I have a table (oldtir) that shows ownership of subsidiaries. The fields for this problem are:
cID - PK for this table
dpm_sub - FK for the subsidiary company
dpm_pco - FK for the parent company
year - the year in which this is the relationship (because they change over time)
There are other fields, but not relevant to this problem. (Note that there are no records to specifically indicate the top-level companies, so we have to figure out which they are by having them not appear as subsidiaries.)
I've written the query below:
with CompanyHierarchy([year], dpm_pco, dpm_sub, cID)
as (select distinct oldtir.[year], cast(' ' as nvarchar(5)) as dpm_pco, oldtir.dpm_pco as dpm_sub, cast(0 as float) as cID
from oldtir
where oldtir.dpm_pco not in
(select dpm_sub from oldtir oldtir2
where oldtir.[year] = oldtir2.[year]
and oldtir2.dpm_sub <> oldtir2.dpm_pco)
and oldtir.[year] = 2011
union all
select oldtir.[year], oldtir.dpm_pco, oldtir.dpm_sub, oldtir.cID
from oldtir
join CompanyHierarchy
on CompanyHierarchy.dpm_sub = oldtir.dpm_pco
and CompanyHierarchy.[year] = oldtir.[year]
where oldtir.[year] = 2011
)
select distinct CompanyHierarchy.[Year],
CompanyHierarchy.[dpm_pco],
CompanyHierarchy.dpm_sub,
from CompanyHierarchy
order by 1, 2, 3
It fails with msg 530: "The maximum recursion 100 has been exhausted before statement completion."
I believe the problem is that the relationships in the table aren't strictly hierarchical. Specifically, one subsidiary can be owned by more than one company, and you can even have the situation where A owns B and part of C, and B also owns part of C. (One of the other fields indicates percent of ownership.)
For the time being, I've solved the problem by adding a field to track level, and arbitrarily stopping after a few levels. But this feels kludgy to me, since I can't be sure of the maximum number of levels.
Any ideas how to do this generically?
Thanks,
Tamar
Thanks to the commenters. They made me go back and look more closely at the data. There were, in fact, errors in the data, which led to infinite recursion. Fixed the data and the query worked just fine.
Add the OPTION statement and see if it makes a difference. This will increase the levels of recursion to 32K
select distinct CompanyHierarchy.[Year],
CompanyHierarchy.[dpm_pco],
CompanyHierarchy.dpm_sub,
from CompanyHierarchy
order by 1, 2, 3
option (maxrecursion 0)

A simple MDX question

I am new to MDX and I know that this must be a simple question but I haven't been able to find an answer.
I am modeling a a questionnaire that has questions and answers. What I am trying to achieve is to find out the number of people who gave specific answers to questions., e.g. the number of males aged between 20-25
When I run the query below for the questions individually the correct result is returned
SELECT
[Measures].[Fact Demographics Count] ON Columns
FROM
[Dsv All]
WHERE
[Answer].[Dim Answer].&[1]
[Measures].[Fact Demographics Count] is a count of the primary key column
[Answer].[Dim Answer].&[1] is the key for the Male answer
Result for number of people who are male = 150
Result for number of people who are between 20-25 = 12
But when I run the next query below rather than getting the number people who are males and aged between 20-25. I get the sum of the number of people who are males and the number of people who are between 20-25.
SELECT
[Measures].[Fact Demographics Count] ON Columns
FROM
[Dsv All]
WHERE
{[Answer].[Dim Answer].&[1],[Answer].[Dim Answer].&[9]}
result = 162
The structure of the fact table is
FactDemographicsKey,
RespodentKey,
QuestionKey,
AnswerKey
Any help would be greatly appreciated
Thanks
Take a look at the MDX function FILTER - this may give you what you need. A combination of FILTER and Member Properties to filter against the ID's might do it. You're having a problem because what you're trying to do is a little against the grain of how an OLAP cube is structured (from my experience) because Age and Gender are both members of the same dimension (Answers), which means that they each get their own cells for aggregation, but unlike if Age and Gender were each on their own dimension, they don't get aggregated with respect to one another except to get added together. In an OLAP cube, each combination of each member of each dimension with each member of every other dimension gets a "cell" with the value of each measure that is unique to that combination - that is what you want, but members of the same dimension (such as Answers) aren't cross-calculated in that way.
If possible, I would recommend breaking out the individual answers into individual dimensions, i.e. Age and Gender each have their own dimensions with their own members, then what you want to do will naturally flow out of your cube. Otherwise, I'm afraid you will have lots of MDX fiddelry to do. (I am not an MDX expert, though, so I could be completely off base on this one, but that is my understanding)
Also, definitely read the book previously mentioned, MDX Solutions, unless this is the only MDX query you think you'll need to write. MDX and Multidimensional analysis are nothing like SQL, and a solid understanding of the structure of an OLAP database and MDX in general is absolutely essential, and that book does a very, very nice job of getting you where you need to be in that department.
When trying to figure out problems with where-criteria or slices I find it helpful to breakdown the items that you're slicing on into dimensions, rather than measures.
select
[Measures].[Fact Demographics Count] on Columns
from [Dsv All]
where
{
[Answer].[Dim Answer].&[1],
[Dim Age Band].[20-25]
}
Although then you're not really using the power of MDX - you're getting just one value.
select
[Dim Answer].Members on Columns,
[Dim Age Band].Members on Rows
from [Dsv All]
where ( [Measures].[Fact Demographics Count] )
Will give you a pivot table (or crosstable) breaking down gender (on columns) by age-bands (on rows).
BTW - ff you're learning MDX this book: MDX Solutions is far and away the best starting point that I've found.
Firstly thanks to everyone for their replies. This was an interesting one to solve and for anyone new to MDX and coming from SQL its an easy trap to fall into.
So for those interested here is a brief overview of the solution.
I have 3 tables
factDemographics: holds respondents and their answers (who answered what)
dimAnswer: the answers
dimRespondent: the respondents
In the datasource view for the cube I duplicated factDemographics 5 times using Named Queries and I named these fact1, fact2, ..., fact5. (which will create 5 measure groups)
Using VS Studio's create cube wizard I set the following fact tables
fact1, fact2, ... as fact tables
dimRespondent a fact table. I use this table to get the number of respondents.
Removed the original factDemographics table.
Once the cube was created I duplicated the dimAnswer dimension 5 times, naming them filter1, filter2, ...
Finally in the Cube Structure's Dimension Usage tab I linked these together as follows
filter1 many to many dimRespondent
filter2 many to many dimRespondent
filter3 many to many dimRespondent
filter4 many to many dimRespondent
filter5 many to many dimRespondent
filter1 regular relationship fact1
filter2 regular relationship fact2
filter3 regular relationship fact3
filter4 regular relationship fact4
filter5 regular relationship fact5
This now enables me to rewrite the query I used in my original post as
SELECT
[Measures].[Dim Respondent Count] On 0
FROM
[DemographicsCube]
WHERE
(
[Filter1].[Answer].&[Male],
[Filter2].[Answer].&[20-25]
)
My query can now be filtered by up to 5 questions.
Although this works I'm sure that there is a more elegant solution. If anyone knows what that is I'd love to hear it.
Thanks
If you are using MSSQL, you can use the "WITH ROLLUP" to get some extra rows which would have the information you want. Also, you are not using a "GROUP BY" which you will need.
Use the GROUP BY to break up the set into groups and then use aggregate functions to get your counts and other stats.
Example:
select AGE, GENDER, count(1)
from MY_TABLE
group by AGE, GENDER
with rollup
This would give you the number of each gender of person in your table in each age group, and the "rollup" would give you the total number of people in your table, the numbers in each age group regardless of gender, and the numbers of each gender regardless of age. Something like
AGE GENDER COUNT
--- ------ -----
20 M 1245
21 M 1012
20 F 942
21 F 838
M 2257
F 1780
20 2187
21 1850
4037