I have the database:
Node(nno,color)
edge(eno,head,tail,weight,gno)
graph(gno,gname)
bold here represents primary key
and the graphh is directed: head -> tail
how can I build a trigger to check whether or not , with each insert of a node to the graph , the graph is connected? meaning there is a path between every two nodes
How do I even check if there is path between each two nodes ?
I am using postgreSQL
This is an X-Y problem, and most likely unworkable using a trigger: If referential integrity is to be maintained with foreign keys (definitely desirable) then by definition the insertion of a node will create a disconnected graph, and be rejected.
The solution is to have a stored procedure through which all inserts to all three tables are passed, and which only accepts connected graph additions.
Given the assumption that a given graph is already connected, then for an extension to a given graph to be accepted it is sufficient for every node in the extension to be connected to any one node in the existing graph.
Related
Update: Microsoft have identified the problem and will be fixing it!
I am attempting to use Azure Data Factory to load a parent and child table in Azure SQL, which is enforced in the database by a foreign key.
My DataFlow is very simple, reading from staging tables and writing 1-for-1 into the destination tables. One of the reads has an exists constraint against a third table to ensure that only the correct subset of records are loaded.
I have two very similar DataFlows loading two kinds of record with similar parent-child relationships, one of them works just fine, the other fails with a foreign key violation. Sometimes. It's not consistent, and changing seemingly unrelated things such as refreshing a Dataset schema sometimes makes it work.
Both DataFlows have Custom Sink Ordering set to make the parent table insert happen first at Order 1, and the child record happen at Order 2.
Am I using this feature correctly, is this something that Custom Sink Ordering should give me?
This is the job layout, it's actually loading two child tables:
I tried removing the top sink, so it only loads the Write Order 1 table (sinkSSUSpatialUnit) and the Write Order 2 table (sinkSSUCompartment) that is failing with a foreign key violation, and the problem does not happen in that cut-down clone of the DataFlow.
Microsoft have found a problem with Custom Sink Order not working as intended intermittently, and will be fixing it. I will update this if I find out any more.
In the documentation, MS does not say anything about the order of the source: https://learn.microsoft.com/en-us/azure/data-factory/data-flow-sink#custom-sink-ordering
Maybe yo can try to use the read committed isolation level for the source in order to see if ADF decides to wait to the sink before reading the source dataset.
Microsoft support have said that if you have a parent table with Write Order 1, and two child tables with Write Order 2, then it might go wrong.
The answer is to set one of the child tables to Write Order 3, so that there is no ambiguity.
I'm trying to understand how in CloudConnect Designer to model and publish (in my ETL graph) 2 tables with composite keys.
Example:
TableA has columns foo and bar.
TableB has columns foo and baz
Even though the column names are different, our old reports joins on both a.foo=b.foo and a.bar=b.baz.
Our schema is a bit of a mess.
For this scenario, I want TableA and TableB loaded in my graph so I can select attributes from both tables in my report.
I don't see any use cases that describe composite keys in the modeling guide.
Is there a common way to handle composite key relationships when bringing those tables into CloudConnect?
NOTE: I'm a software engineer without much data warehouse experience. I've been able to model and publish several other tables and their relationships that have only a single primary key. And, this isn't going straight to production or anything. I'm merely trying to learn and mimic an existing report we have in one of our applications.
I'm not sure if I understand the question well but in general - CloudConnect has no direct support for composite keys.
If the fields foo and bar (and analogically foo and baz in the second table) should serve as composite key you have to create special attribute in LDM and this attribute will be loaded during ETL with value which concatenate foo and bar for given row. You can use then this specially created attribute as a primary key (connection point) or reference in LDM.
Composite keys are called grain, and a recent software update to CloudConnect now has support for grain. I should mention though, be very careful when adding grain to existing data. You will likely have to do a full load, replacing all existing data (in the relevant table) in gooddata.
I have had experience where the publish fails in the synchronize step due to some existing data which goes against the grain. In that case to a synchronize dataset on it.
I want to represent a network using a relational schema.
The entities of my network are:
Node : a point on the network.
Arc : a direct connection between 2 nodes
Path : an ordered sequence of arcs.
Is a relational model suited for representing such a network ?
I am considering SQL/No SQL as the options.
The size of my data is not expected to grow at a very rapid pace. I do not want to pick SQL/No SQL based on any predefined query patterns.
Often the best tool to represent a network is a graph database like Neo4j.
But when you want to do it in SQL, both Node (or Vertex in graph theory) and Arc (properly called Edge) would get an own table. A Vertex would contain only the data about the vertex itself, and no information about its relations to others. An Edge would contain the primary keys of the two nodes it links, plus any meta-information about the link itself.
When you need to store paths of multiple nodes, you should use two tables. A Path table with the path-id and any data about the path as a whole, and another table PathVertex consisting of Path-ID, number in that path and primary key of the Edge table which contains all the positions a path consists of.
We're using ORACLE 11.2.0.3.0, configured as 3 node RAC.
In our application we have hibernate over UCP and OJDBC with compatible version to RAC. Hibernate use some sequence to get ID for any record in database. I database we've got table with UNIQUE_CONSTRAINT (some_value) on it. It's used to synchronized many instance of application, every transaction in application requires unique row in this table. So application A tries to insert in this table (some_value="A"), if other application already inserted row with (some_value="A"), first instance get ORA-00001 unique constrain violated, and retry this with other value (some_value="B").
UNIQUE_CONSTRAINT fires very often. Like one in 8tx.
We run two tests:
service pinned to one node: response time avg 6ms
service on all 3 nodes: response time avg 800-1000ms
High level question is why? What is happening in 3 node RAC when UNIQUE_CONSTRAINT occurs, and why it's slowing down so much application. How can I diagnose this case?
Michal
Use service level scaling on RAC. Create a "LOADER" service the RAC side. Make this service active on one node only. And let hibernate use these service "LOADER" connections for loads.
The explanation is - very vague - each cluster node is mastering some subset of database's address space. When using unique constraint, each node must request data blocks of the unique index from it's mastering node. When a duplicit key is found and both duplicit keys were inserted via transactions which were not commited yet. Oracle has to enqueue one session and let it wait till the other session(belonging to another node) commits or rollbacks.
If you need to generate a unique value, you should let the database do it for you. You can create an object called a SEQUENCE. You then get the next value of a sequence simply by
my_seq.nextval
And the current value of the sequence is simply
my_seq.currval
So if you are inserting record...
insert into my_table( my_seq.nextval, 'xxx', yyy, 123, ... )
I'm trying to store the data in a binary space partitioning tree in a relational database. The tricky part about this data structure is it has two different types of nodes. The first type, which we call a data node, simply holds a certain number of items. We define the maximum number of items able to be held as t. The second type, which we refer to as a container node, holds two other child nodes. When an item is added to the tree, the nodes are recursed until a data node is found. If the number of items in the data node are less than t, then the item is inserted into the data node. Otherwise the data node is split into two other data nodes, and is replaced by one of the container nodes. When an element is deleted, a reverse process must happen.
I'm a little bit lost. How am I supposed to make this work using a relational model?
Why not have two tables, one for nodes and one for items? (Note that I used the term "leaf" instead of "data" nodes below when I wrote my answer; a "leaf" node has data items, a non-"leaf" node contains other nodes.)
The node table would have columns like this: id primary key, parentid references node, leaf boolean and in addition some columns to describe the spatial bounaries of the node and how it will/has been split. (I don't know if you're working in 2D or 3D so I haven't given details on the geometry.)
The data table would have id primary key, leafid references node and whatever data.
You can traverse the tree downward by issuing SELECT * FROM node WHERE parentid = ?queries at each level and checking which child to descend into. Adding a data item to a leaf is a simple INSERT. Splitting a node requires unsetting the leaf flag, inserting two new leaf nodes, and updating all the data items in the node to point to the appropriate child node by changing their leafid values.
Note that SQL round trips can be expensive, so if you're looking to use this for a real application, consider using a relatively large t in the DB constructing a finer-grained tree in memory of the leaves you are interested in after you have the data items.