For a later traitements with The project CAPS , I need to store 2 different Graphs into one:
Graph3=Graph1+Graph2
I tried to search for solutions to do that and I found UNION ALL but the last doesn't work as I expected. Is there another way to do that with Cypher?
Example :
val Graph1=session.cypher("""
| FROM GRAPH mergeGraph
| MATCH (from)-[via]->(to)
|WHERE substring(from.geohash,0,5)=substring(to.geohash,0,5)
| CONSTRUCT
| CREATE (h1:HashNode{geohash:substring(from.geohash,0,5)})-[COPY OF via]->(h1)
| RETURN GRAPH
""".stripMargin).graph
which contains this pattern :
val Graph2=session.cypher("""
| FROM GRAPH mergeGraph
| MATCH (from)-[via]->(to)
|WHERE substring(from.geohash,0,5)<>substring(to.geohash,0,5)
| CONSTRUCT
| CREATE (:HashNode{geohash:substring(from.geohash,0,5)})-[COPY OF via]->(:HashNode{geohash:substring(to.geohash,0,5)})
| RETURN GRAPH
""".stripMargin).graph
which contains this pattern :
With union All :
Graph3=Graph1.unionAll(Graph2)
I get this graph :
As you can see the green nodes are the nodes of Graph2 without relationship ! thats what i didn't expected.
Related
I have the following table:
postgres=# \d so_rum;
Table "public.so_rum"
Column | Type | Collation | Nullable | Default
-----------+-------------------------+-----------+----------+---------
id | integer | | |
title | character varying(1000) | | |
posts | text | | |
body | tsvector | | |
parent_id | integer | | |
Indexes:
"so_rum_body_idx" rum (body)
I wanted to do phrase search query, so I came up with the below query, for example:
select id from so_rum
where body ## phraseto_tsquery('english','Is it possible to toggle the visibility');
This gives me the results, which only match's the entire text. However, there are documents, where the distance between lexmes are more and the above query doesn't gives me back those data. For example: 'it is something possible to do toggle between the. . . visibility' doesn't get returned. I know I can get it returned with <2> (for example) distance operator by giving in the to_tsquery, manually.
But I wanted to understand, how to do this in my sql statement itself, so that I get the results first with distance of 1 and then 2 and so on (may be till 6-7). Finally append results with the actual count of the search words like the following query:
select count(id) from so_rum
where body ## to_tsquery('english','string & string . . . ')
Is it possible to do in a single query with good performance?
I don't see a canned solution to this. It sounds like you need to use plainto_tsquery to get all the results with all the lexemes, and then implement your own custom ranking function to rank them by distance between the lexemes, and maybe filter out ones with the wrong order.
According to mv-expand documentation:
Expands multi-value array or property bag.
mv-expand is applied on a dynamic-typed column so that each value in the collection gets a separate row. All the other columns in an expanded row are duplicated.
Just like the mv-expand operator will create a row each for the elements in the list -- Is there an equivalent operator/way to make each element in a list an additional column?
I checked the documentation and found Bag_Unpack:
The bag_unpack plugin unpacks a single column of type dynamic by treating each property bag top-level slot as a column.
However, it doesn't seem to work on the list, and rather works on top-level JSON property.
Using bag_unpack (like the below query):
datatable(d:dynamic)
[
dynamic({"Name": "John", "Age":20}),
dynamic({"Name": "Dave", "Age":40}),
dynamic({"Name": "Smitha", "Age":30}),
]
| evaluate bag_unpack(d)
It will do the following:
Name Age
John 20
Dave 40
Smitha 30
Is there a command/way (see some_command_which_helps) I can achieve the following (convert a list to columns):
datatable(d:dynamic)
[
dynamic(["John", "Dave"])
]
| evaluate some_command_which_helps(d)
That translates to something like:
Col1 Col2
John Dave
Is there an equivalent where I can convert a list/array to multiple columns?
For reference: We can run the above queries online on Log Analytics in the demo section if needed (however, it may require login).
you could try something along the following lines
(that said, from an efficiency standpoint, you may want to check your options of restructuring the data set to begin with, using a schema that matches how you plan to actually consume/query it)
datatable(d:dynamic)
[
dynamic(["John", "Dave"]),
dynamic(["Janice", "Helen", "Amber"]),
dynamic(["Jane"]),
dynamic(["Jake", "Abraham", "Gunther", "Gabriel"]),
]
| extend r = rand()
| mv-expand with_itemindex = i d
| summarize b = make_bag(pack(strcat("Col", i + 1), d)) by r
| project-away r
| evaluate bag_unpack(b)
which will output:
|Col1 |Col2 |Col3 |Col4 |
|------|-------|-------|-------|
|John |Dave | | |
|Janice|Helen |Amber | |
|Jane | | | |
|Jake |Abraham|Gunther|Gabriel|
To extract key value pairs from text and convert them to columns without hardcoding the key names in query:
print message="2020-10-15T15:47:09 Metrics: duration=2280, function=WorkerFunction, count=0, operation=copy_into, invocationId=e562f012-a994-4fc9-b585-436f5b2489de, tid=lct_b62e6k59_prd_02, table=SALES_ORDER_SCHEDULE, status=success"
| extend Properties = extract_all(#"(?P<key>\w+)=(?P<value>[^, ]*),?", dynamic(["key","value"]), message)
| mv-apply Properties on (summarize make_bag(pack(tostring(Properties[0]), Properties[1])))
| evaluate bag_unpack(bag_)
| project-away message
id | name | parent_id
ab | file | de
ad | song | de
bc | Bob |ad
mn | open.txt | bc
Assuming
ab is the ID of file and bc is the parent ID of file
then to store you can either use the bulk-insert utility
Or you can use the following Cypher query:
CREATE (A {id:'ab', name: 'file'}), (B {id:'bc', name: 'folder'}), (A)-[:child]->(B)
To query, depending on the data you would like to extract use a Cypher query similar to:
MATCH (c)-[:child]->(p) RETURN c,p
For the type of query you're running, I believe it would be better if you maintained a reverse edge [:parent] and modify your query as such:
GRAPH.QUERY Makinga "MATCH (r:Resource{Id:'6e3f67da-43ed-11e9-b149-d3f886f8337c'})-[:parent*1..]->(b:Resource) RETURN count(b) as count"
This is related to the way RedisGraph describes connections and applies filters.
I have an array type I want to store in Postgres. One of the major use cases I have is to see if any of the records has an array which has a string in it.
eg.
| A | ["NY", "Paris", "Milan"] |
| B | ["Paris", "NY"] |
| C | [] |
| D | ["Milan"] |
Does there exist a row with Paris in the array? Which rows have Milan in the array? and so on.
I have 2 options on how to store the column. I can either make it a type text[] or convert it into a json as {"cities": ["NY", "Paris", "Milan"]} and then store as a JSONB field
However, I am not sure what would allow the fastest querying for the use case I have. Is there any one obviously better way of doing this? Am I tying myself down in any way by choosing one over the other? If I choose one over the other then how can I query the DB?
As you seem to be storing simple lists of values, I would recommend to use datataype Array over JSON, which better fits more complex cases (nested datastructures, associative arrays, ...).
To check for the value of an element at any position in the array, you can use array function ANY().
Here is a query that will return all records where the array stored in column cities contains 'Paris' :
SELECT t.* FROM mytable t WHERE 'Paris' = ANY(t.cities);
Yields :
id cities
---------------------------
A ["NY","Paris","Milan"]
B ["Paris","NY"]
Demo on DB Fiddle
For more information :
Postgres Arrays Documentation
Postgres Arrays Tutorial
I've noticed it is better to query JSONB, if it is a simple key-value store.
As in for instance you want to store arbitrary info on a row that your not sure what the columns(keys) would be.
info = {"a":"apple", "b":"ball"}
For use cases like yours, it would be better if you could design the db with simple tables so you could use JOINS and Indexes to your advantage.
You could restructure the tables like :
Location
id | name
----------
1 | Paris
2 | NY
3 | Milan
Other Table (with foreign key on location table)
user | location_id
--------------------
A | 1
A | 3
B | 2
Using these set of tables it would be easy to query all users with location paris using JOINS.
I have to build a database structure which allow a totally modular structure. Let's take an example, it will be easier to understand.
We have a website record, looking like this :
WEBSITE A
| ----- SECTION A
| |-- SUBSECTION 1
| | | -- Data 1 : Value 1
| | | -- Data 2 : Value 2
| | | ...
| | | -- Data N : Value N
| |
| |-- SUBSECTION 2
| | | -- Data 52 : Value 1
| | | -- Data 53 : Value 2
| | | ...
| | | -- Data M : Value M
| |
| ...
|
| ----- SECTION B
| |
| ...
...
Model 1 :
And so on. The trouble is that I have to implement a permission system. For instance, User A have access to Section A,B,D,Z from website 1 whereas User 2 have acces to section C,V,W,X from website 2.
First, I though that building this as a tree would be the most efficient way to do.
Here is my first database representation :
TABLE website (id, id_client, name, address)
TABLE section (id, id_website, name)
TABLE sub_section (id, id_section, name)
TABLE data (id, id_sub_section, key, value)
With this representation, it would be easy to give some restricted access to the employees.
However, both websites will have common data. For instance, all websites will have section A,B,C,D with the same structure. It implies a lot of redundancy. For each website, we'll have a lot of common structure, the only difference will be the attribute value in the TABLE data.
The second problem is that this structure have to be totally modular. For instance, the admin should be able to add a section, a subsection or a data to a website record. That's the reason why I though that this model is easier to manage.
Model 2 :
I have a second model, easier to store but harder to exploit :
TABLE website (id, id_client, Value 1, Value 2, Value 3 ... Value N)
TABLE section (id, name, Data 1, Data 2, Data 3 .. Data N, ..., Data 52, Data 53, Data M) (it represents the name of the columns)
TABLE subsection (id, id_section, name, Data 1, Data 2, Data N)
By doing this, I have a table where data are stored and "structural tables" with section and subsection in common with both websites. If the admin wants to add a section / subsection, we're going back to the tree structure to store additionnal data, looking like this :
TABLE additional_section (id,id_website,name)
TABLE additionnal_subsection (id,id_section, id_additional_section, name)
TABLE additional_data (id, id_subsection, id_additionnal_subsection, key, value)
It avoids a lot of redundancy and facilitate the permissions management.
Here's my question :
What's the best model for this kind of application ? Model 1 ? Model 2 ? Another one ?
Thanks for reading and for your answers !
I would suggest that you modify Model 1.
You can eliminate the redundancy in the section table by removing the id_website FK from that table and create a new table between the website table and it.
This new table WebsiteSection has a PK that consist of an FK to website AND an FK to section, allowing each section to be part of multiple websites.
Section data that is common to all websites would then be stored in the section table while section data that is site specific would be stored in the WebsiteSection table.