I have a dimension (d_orga) with the following structure : http://dongorath.free.fr/d_orga.png.
As you can see, there is a hierarchy for each parallel branch.
My problem is determining the key path of a member at the l_site level, knowing that each member has a parent in every branch. An exemple member is : [d_orga].[l_site].&[grp]&[p3]&[e3]&[c3]&[eu]&[DE]&[ber]. This tells me it wants all levels in the order l_grp - l_pol - l_ent - l_com - l_reg - l_cou - l_site for my specific case, but those specific hierarchies can be different depending on the client (this example is our "demo" environment whereas a client could have different levels, or only 2 hierarchies, etc.). How can I determine the order of the wanted levels without having to hardcode it each time ? Does it depends on the creation order of the hierarchies ? An alphabetical order I failed to see ? Another arcane inner working of SSAS ?
It has, in fact, nothing to do with the structure of the dimension. The key path of a member is "simply" the key columns (property KeyColumns) defined on the attribute. They are ordered when defined and this is the order that must be used.
In the example of the question, I defined the key columns of the l_site attribute to be, in order, grp_code - pol_code - ent_code - com_code - reg_code - cou_code - site_code, thus, it is the order to be used.
Concerning the problem of specific hierarchies in client applications, the definition of the key columns being computed by the application, it can be safely re-computed by this very application.
Related
I'm working on an integration with Workday, and am tracking people by their WID (Workday ID), such as 85bb0669d8ac412582c0a473f7074d79. That WID may be interleaved with WID's from unrelated employers using completely distinct Workday accounts, as well as with IDs (known to be UUID's) from other other (non-Workday) sources.
The only ID source that is not known to be a UUID is the WID.
My standard approach to ensure uniqueness of ID's from various external sources would be to save two fields, external_source (e.g. "workday") and external_id (e.g. "85bb0669d8ac412582c0a473f7074d79"). When combined, these two fields assure uniqueness of person ID's across all sources and employers. But if I can confirm that the WID is in fact a UUID, I can make some desirable optimizations.
I've found no explicit definition of WID in Workday documentation of the WID other than, "The unique identifier type. Each "ID" for an instance of an object contains a type and a value. A single instance of an object can have multiple 'ID' but only a single 'ID' per 'type'. " from https://community.workday.com/sites/default/files/file-hosting/productionapi/Human_Resources/v20/Get_Workers.html
All the samples of WID's I've seen are 32-character hexadecimal strings, matching some non-authoritative articles I've found. I've not found any Workday documentation that spec's that they will always be in that format. They are not formatted with hyphens like a UUID, but could be arranged that way.
So...does anyone have a reference to Workday documentation that specify the contents of a WID? Lacking official docs, does anyone have practical knowledge about it?
I’ve a simple Vertex „url“:
schema.vertexLabel('url').partitionKey('url_fingerprint', 'prop1').properties("url_complete").ifNotExists().create()
And a edgeLabel called „links“ which connects one url to another.
schema.edgeLabel('links').properties("prop1", 'prop2').connection('url', 'url').ifNotExists().create()
It’s possible that one url has millions of incoming links (e.g. front page of ebay.com from all it’s subpages).
But that seems to result in really big partitions / and a crash of dse because of wide partitions (From Opscenter wide partitions report):
graphdbname.url_e (2284 mb)
How can i avoid that situation? How to handle this „Supernodes“? I’ve found a „partition“ command (article about this [1]) for Labels but that is set deprecated and will be removed in DSE 6.0 / the only hint in release notes is to model the data on another way - but i’ve no idea how i can do that in that case.
I’m happy about every hint. Thanks!
[1] https://www.experoinc.com/post/dse-graph-partitioning-part-2-taming-your-supernodes
The current recommendation is to use the concept of "bucketing" that drives data model design in the C* world and apply that to the graph by creating an intermediary Vertex that represents groups of links.
2 Vertex Labels
URL
URL_Group | partition key ((url, group)) … i.e. a composite primary key with 2 partition key components
2 Edges
URL -> URL_Group
URL_Group (replaces existing self reference edge) URL_Group <->URL_Group
Store no more than 100Kish url_fingerprints per group. Create a new group after each 100kish edges exist.
This solution requires bookkeeping to determine when a new group is needed.
This could be done through a simple C* table for fast, easy retrievable.
CREATE TABLE lookup url_fingerprint, group, count counter PRIMARY KEY (url_fingerprint, group)
This should preserve DESC order, may need to add an ORDER BY statement if DESC order is not preserved.
Prior to writing to the Graph, one would need to read the table to find the latest group.
SELECT url_fingerprint, group, count from lookup LIMIT(1)
If the counter is > 100kish, create a new group (increment group +1). During or after writing a new row to Graph, one would need to increment the counter.
Traversing would require something similar to:
g.V().has(some url).out(URL).out(URL_Group).in(URL)
Where conceptually you would traverse the relationships like URL -> URL_Group->URL_Group<-URL
The visual model of this type of traversal would look like the following diagram
I'm new to O.O.P and would like advice on best practice.
Say for example I have a Course class which holds course information, and a Location class which holds location details. Classes have corresponding repository classes. Now, each Course HAS A location which I have added Location as a property.
When I am pulling the details of a Course from the database, is it best practice to:
A – Populate the Location object from within the CourseRepository Class meaning SQL would return both course and location details
B – Only populate Course object, returning the Location ID, then use the LocationRepository class to find the location details
I’m leaning more towards B as this is a separation of responsibility, however, the thing that’s getting me is performance. Say I need a List instead which returns a result of 50. Would it be wise to query SQL 50 times to seek location details? Would appreciate your thoughts on this.
Lewis
In part, you're thinking in a wrong conceptual direction. It should be: one location can have many courses, not the reciprocal.
That said, theoretical, a Course domain object should not contain a location as class member, but just a location id. On the other hand, a domain object Location could contain an array of Course objects as class member, if needed. You see the difference?
Now, in your case, indeed pass a Location as argument to a Course object. And, in the Course repository, define a method like fetchCoursesWithLocations() in which you run only one sql query to fetch 50 courses TOGETHER WITH the corresponding location details - based on your criterias - into an array. Then loop through the records array. For each of the record item build a Location object and a Course object (to which you pass the Location object as argument). Then pass each so created Course object to another array holding all resulting Course objects, or to a CourseCollection object (which I recommend). In the end return the Courses array (or the CourseCollection content) from the method.
Now, all is somehow too complex to present in here. But I'll give you here three great articles (a serie) which will make the whole process very clear to you. You'll find out in there how a CourseCollection should see, too. In the articles (from the second one upwards), it is used the term "Mapper", which I'm pretty sure it's the same as your "repository". Actually, there are two abstraction layers for data access in the db: mappers and repositories. Plus the adapters.
Look to the part with the PostMapper and the CommentMapper. They are the parallels to your CourseRepository, respectively your LocationRepository. The same roles have Post and Comment models (domain objects!): as parallels to your Course and Location.
The articles are:
Building a Domain Model - An Introduction to Persistence
Agnosticism
Building a Domain Model - Integrating Data Mappers
Handling Collections of Aggregate Roots - the Repository Pattern
I have no idea if I wrote that correctly. I want to start learning higher end data mining techniques and I'm currently using SQL server and Access 2016.
I have a system that tracks ID cards. Each ID is tagged to one particular level of a security hierarchy, which has many branches.
For example
Root
-Maintenance
- Management
- Supervisory
- Manager
- Executive
- Vendors
- Secure
- Per Diem
- Inside Trades
There are many other departments like Maintenance, some simple, some with much more convoluted, hierarchies.
Each ID card is tagged to a level so in the Maintenance example, - Per Diem:Vendors:Maintenance:Root. Others may be just tagged to Vendors, Some to general Maintenance itself (No one has root, thank god).
So lets say I have 20 ID Cards selected, these are available personnel I can task to a job but since they have different area's of security I want to find a commonalities they can all work on together as a 20 person group or whatever other groupings I can make.
So the intended output would be
CommonMatch = - Per Diem
CardID = 1
CardID = 3
CommonMatch = Vendors
CardID = 1
CardID = 3
CardID = 20
So in the example above, while I could have 2 people working on -Per Diem work, because that is their lowest common security similarity, there is also card holder #20 who has rights to the predecessor group (Vendors), that 1 and 3 share, so I could have three of them work at that level.
I'm not looking for anyone to do the work for me (Although examples always welcome), more to point me in the right direction on what I should be studying, what I'm trying to do is called, etc. I know CTE's are a way to go but that seems like only a tool in a much bigger process that needs to be done.
Thank you all in advance
Well, it is not so much a graph-theory or data-mining problem but rather a data-structure problem and one that has almost solved itself.
The objective is to be able to partition the set of card IDs into disjoint subsets given a security clearance level.
So, the main idea here would be to layout the hierarchy tree and then assign each card ID to the path implied by its security level clearance. For this purpose, each node of the hierarchy tree now becomes a container of card IDs (e.g. each node of the hierarchy tree holds a) its own name (as unique identification) b) pointers to other nodes c) a list of card IDs assigned to its "name".)
Then, retrieving the set of cards with clearance UP TO a specific security level is simply a case of traversing the tree from that specific level downwards until the tree's leafs, all along collecting the card IDs from the node containers as they are encountered.
Suppose that we have access tree:
A
+-B
+-C
D
+-E
And card ID assignments:
B:[1,2,3]
C:[4,8]
E:[10,12]
At the moment, B,C,E only make sense as tags, there is no structural information associated with them. We therefore need to first "build" the tree. The following example uses Networkx but the same thing can be achieved with a multitude of ways:
import networkx
G = networkx.DiGraph() #Establish a directed graph
G.add_edge("A","B")
G.add_edge("A","C")
G.add_edge("A","D")
G.add_edge("D","E")
Now, assign the card IDs to the node containers (in Networkx, nodes can be any valid Python object so I am going to go with a very simple list)
G.node["B"]=[1,2,3]
G.node["C"]=[4,8]
G.node["E"]=[10,12]
So, now, to get everybody working under "A" (the root of the tree), you can traverse the tree from that level downwards either via Depth First Search (DFS) or Breadth First Search (BFS) and collect the card IDs from the containers. I am going to use DFS here, purely because Networkx has a function that returns the visited nodes depending on visiting order, directly.
#dfs_preorder_nodes returns a generator, this is an efficient way of iterating very large collections in Python but I am casting it to a "list" here, so that we get the actual list of nodes back.
vis_nodes = list(networkx.dfs_preorder_nodes(G,"A")); #Start from node "A" and DFS downwards
cardIDs = []
#I could do the following with a one-line reduce but it might be clearer this way
for aNodeID in vis_nodes:
if G.node[aNodeID]:
cardIDs.extend(G.node[aNodeID])
In the end of the above iteration, cardIDs will contain all card IDs from branch "A" downwards in one convenient list.
Of course, this example is ultra simple, but since we are talking about trees, the tree can be as large as you like and you are still traversing it in the same way requiring only a single point of entry (the top level branch).
Finally, just as a note, the fact that you are using Access as your backend is not necessarily an impediment but relational databases do not handle graph type data with great ease. You might get away easily for something like a simple tree (like what you have here for example), but the hassle of supporting this probably justifies undertaking this process outside of the database (e.g, use the database just for retrieving the data and carry out the graph type data processing in a different environment. Doing a DFS on SQL is the sort of hassle I am referring to above.)
Hope this helps.
I successfully amended the nice CloudBalancing example to include the fact that I may only have a limited number of computers open at any given time (thanx optaplanner team - easy to do). I believe this is referred to as a bounded-space problem. It works dandy.
The processes come in groupwise, say 20 processes in a given order per group. I would like to amend the example to have optaplanner also change the order of these groups (not the processes within one group). I have therefore added a class ProcessGroup in the domain with a member List<Process>, the instances of ProcessGroup being stored in a List<ProcessGroup>. The desired optimisation would shuffle the members of this List, causing the instances of ProcessGroup to be placed at different indices of the List List<ProcessGroup>. The index of ProcessGroup should be ProcessGroup.index.
The documentation states that "if in doubt, the planning entity is the many side of the many-to-one relationsship." This would mean that ProcessGroup is the planning entity, the member index being a planning variable, getting assigned to (hopefully) different integers. After every new assignment of indices, I would have to resort the list List<ProcessGroup in ascending order of ProcessGroup.index. This seems very odd and cumbersome. Any better ideas?
Thank you in advance!
Philip.
The current design has a few disadvantages:
It requires 2 (genuine) entity classes (each with 1 planning variable): probably increases search space (= longer to solve, more difficult to find a good or even feasible solution) + it increases configuration complexity. Don't use multiple genuine entity classes if you can avoid it reasonably.
That Integer variable of GroupProcess need to be all different and somehow sequential. That smelled like a chained planning variable (see docs about chained variables and Vehicle Routing example), in which case the entire problem could be represented as a simple VRP with just 1 variable, but does that really apply here?
Train of thought: there's something off in this model:
ProcessGroup has in Integer variable: What does that Integer represent? Shouldn't that Integer variable be on Process instead? Are you ordering Processes or ProcessGroups? If it should be on Process instead, then both Process's variables can be replaced by a chained variable (like VRP) which will be far more efficient.
ProcessGroup has a list of Processes, but that a problem property: which means it doesn't change during planning. I suspect that's correct for your use case, but do assert it.
If none of the reasoning above applies (which would surprise me) than the original model might be valid nonetheless :)