I've been trying to define rules in my ontology to infer that if a person has friends who are friends amongst each other then all are friends, but if 1 or more are not friends to each other then my ontology will infer that they all, are not friends.
Thank you
You probably need to get your intended semantics straightened out a little bit more.
From what I gather, you want isFriendWith at least to be symmetric, i.e. when isFriendWith(bob, alice) then also isFriendWith(alice, bob).
Also, if you want to have friendsAll to have any meaning, isFriendWith cannot be transitive. This would also capture the natural meaning, as a friend of my friend is not necessarily my friend.
To elaborate: If isFriendWith where symmetric and transitive every friend of bob would automatically also be a friend of all of bob's friends (because isFriendWith(bob, alice) implies isFriendWith(alice, bob). From there on, with any isFriendWith(bob, carol) transitivity implies that isFriendWith(alice, carol). So if isFriendWith is symmetric and transitive, you get the clique automatically.
But as stated, this is probably not, what you want.
As for formulating this in SWRL, let's give it a try, shall we?
friendsAll is most likely reflexive, i.e. let's just assume everybody is his/her own friend. Now, we need an recursive rule that extends this set while still fulfilling the condition: "In this set, everybody is everybody's friend".
To include bob's friends, you would need to be able to quantify over isFriendWith and check if that any candidate friend of bob is also a friend of all other friends of bob. Since you cannot nest quantifiers in SWRL, I'm more or less sure, you cannot express that algorithm in the rule language alone. However, I maybe wrong here and there is a neat little trick hidden inside the semantics. It is not one that I know of, however and the need for quantifier nesting in the direct formulation leaves me believing that it is not possible.
It basically boils down to a well-known graph-theoretic problem: given a starting point bob friendsAll is the largest subset of bob's friends such that every everbody in the group is friends with everyone else, i.e. bob's Maximal Clique.
Related
The main examples of the words I mean are "object", "value" etc. In many (well, not really, but the chances are on some occasions at least) cases you may happen to find yourself willing to name a variable etc. of yours this way.
Another example I have stumbled upon in my practice is "try" which represents both the keyword (in many C-like and other languages) used in exception handling and the currency of Turkey. But this is an example just for fun, I doubt there are any common practices known for this particular case (though I feel like there may be for the previous).
What do people do in such cases? What are some synonyms for an object, a value etc reasonable in the programming and data modelling context?
For example imagine you are developing an object database, manipulating objects, properties and values (rather than documents, fields and... eh... values) is, for some reason, among the key ideas of its philosophy and you really don't want to use words too distinct from these semantically. What words would you use to replace the reserved ones while keeping the sense very close to that of theirs?
The easiest solution to come into my mind so far it to use misspelled (or spelled in a different language orthography) varieties of the same words like "objekt", "walue" etc. but although this can do the job this just disgusts me so much I really don't want to accept going this way ever.
UPDATE: Indeed, in some specific cases (particular languages) using a different case (which, some times, may go against the case aspect of the commynity and/or the company naming convention by the way) and/or namespaces (which have been introduced almost exactly for this) may solve the problem at least partially but I am still interested in alternatives as I believe actually duplicating a system keyword is a thing one should at least think about avoiding (might there be a way to do it easily without accepting compromises considered too serious) in every case.
I am even considering writing script that would scrape through GitHub to analyse the common code elements naming vocabulary but I think it is always a good idea to ask first rather than to "reinvent a bicycle", perhaps somebody has done something like this already.
UPDATE2: Please do me a favour and consider the following with applicable degree of objectivity before voting to close. With all do respect I would like to emphasise that the actual degree of subjectivity of this question is excusably low (though, I admit, somewhat above zero anyway). The only real flaw of it is that it might perhaps fit the English site better but I believe the audience of StackOverflow is much more relevant (generally informed in a much more relevant way) to the context. The actual goal of publishing this question is to highlight a problem that is fairly easy to understand clearly enough and which can not be denied of existence (though its importance may be questionable so far) but is spoken of too little (as importance of code clarity and semantic relevance is increasing, IMHO, code as a media is quickly moving towards obtaining bigger cultural (in the broad meaning of this word) importance than of books). And to let people share the ways of addressing it in practice they know of.
Capitalization: Often, a different capitalization instead of a synonym does the trick, as most language keywords are case-sensitive. E.g. object = new Object();
Prefix / Postfix: Another often encountered solution is to write myObject = new Object(). Which one you chose really depends on the naming conventions you follow. For private class fields, some developers use an underscore, e.g. this._object indicating a private access modifier.
Specification: In most cases however, you can find a more specific word describing the role - such as instance, parent, child or argument - or the subclass - such as integer or n instead of a generic number datatype - of your object.
In addition to the above, many language communities follow de-facto conventions such as cls for Class, obj for Object, me or self for this etc.
Background
I have 2 resources: courses and professors.
A course has the following attributes:
id
topic
semester_id
year
section
professor_id
A professor has the the following attributes:
id
faculty
super_user
first_name
last_name
So, you can say that a course has one professor and a professor may have many courses.
If I want to get all courses or all professors I can: GET /api/courses or GET /api/professors respectively.
Quandary
My quandary comes when I want to get all courses that a certain professor teaches.
I could use either of the following:
GET /api/professors/:prof_id/courses
GET /api/courses?professor_id=:prof_id
I'm not sure which to use though.
Current solution
Currently, I'm using an augmented form of the latter. My reasoning is that it is more scale-able if I want to add in filtering/sorting criteria.
I'm actually encoding/embedding JSON strings into the query parameters. So, a (decoded) example might be:
GET /api/courses?where={professor_id: "teacher45", year: 2016}&order={attr: "topic", sort: "asc"}
The request above would retrieve all courses that were (or are currently being) taught by the professor with the provided professor_id in the year 2016, sorted according to topic title in ascending ASCII order.
I've never seen anyone do it this way though, so I wonder if I'm doing something stupid.
Closing Questions
Is there a standard practice for using the query string vs the resource path for filtering criteria? What have some larger API's done in the past? Is it acceptable, or encouraged to use use both paradigms at the same time (make both endpoints available)? If I should indeed be using the second paradigm, is there a better organization method I could use besides encoding JSON? Has anyone seen another public API using JSON in their query strings?
Edited to be less opinion based. (See comments)
As already explained in a previous comment, REST doesn't care much about the actual form of the link that identifies a unique resource unless either the RESTful constraints or the hypertext transfer protocol (HTTP) itself is violated.
Regarding the use of query or path (or even matrix) parameters is completely up to you. There is no fixed rule when to use what but just individual preferences.
I like to use query parameters especially when the value is optional and not required as plenty of frameworks like JAX-RS i.e. allow to define default values therefore. Query parameters are often said to avoid caching of responses which however is more an urban legend then the truth, though certain implementations might still omit responses from being cached for an URI containing query strings.
If the parameter defines something like a specific flavor property (i.e. car color) I prefer to put them into a matrix parameter. They can also appear within the middle of the URI i.e. /api/professors;hair=grey/courses could return all cources which are held by professors whose hair color is grey.
Path parameters are compulsory arguments that the application requires to fulfill the request in my sense of understanding otherwise the respective method handler will not be invoked on the service side in first place. Usually this are some resource identifiers like table-row IDs ore UUIDs assigned to a specific entity.
In regards to depicting relationships I usually start with the 1 part of a 1:n relationship. If I face a m:n relationship, like in your case with professors - cources, I usually start with the entity that may exist without the other more easily. A professor is still a professor even though he does not hold any lectures (in a specific term). As a course wont be a course if no professor is available I'd put professors before cources, though in regards to REST cources are fine top-level resources nonetheless.
I therefore would change your query
GET /api/courses?where={professor_id: "teacher45", year: 2016}&order={attr: "topic", sort: "asc"}
to something like:
GET /api/professors/teacher45/courses;year=2016?sort=asc&onField=topic
I changed the semantics of your fields slightly as the year property is probably better suited on the courses rather then the professors resource as the professor is already reduced to a single resource via the professors id. The courses however should be limited to only include those that where held in 2016. As the sorting is rather optional and may have a default value specified, this is a perfect candidate for me to put into the query parameter section. The field to sort on is related to the sorting itself and therefore also belongs to the query parameters. I've put the year into a matrix parameter as this is a certain property of the course itself, like the color of a car or the year the car was manufactured.
But as already explained previously, this is rather opinionated and may not match with your or an other folks perspective.
I could use either of the following:
GET /api/professors/:prof_id/courses
GET /api/courses?professor_id=:prof_id
You could. Here are some things to consider:
Machines (in particular, REST clients) should be treating the URI as an opaque thing; about the closest they ever come to considering its value is during resolution.
But human beings, staring that a log of HTTP traffic, do not treat the URI opaquely -- we are actually trying to figure out the context of what is going on. Staying out of the way of the poor bastard that is trying to track down a bug is a good property for a URI Design to have.
It's also a useful property for your URI design to be guessable. A URI designed from a few simple consistent principles will be a lot easier to work with than one which is arbitrary.
There is a great overview of path segment vs query over at Programmers
https://softwareengineering.stackexchange.com/questions/270898/designing-a-rest-api-by-uri-vs-query-string/285724#285724
Of course, if you have two different URI, that both "follow the rules", then the rules aren't much help in making a choice.
Supporting multiple identifiers is a valid option. It's completely reasonable that there can be more than one way to obtain a specific representation. For instance, these resources
/questions/38470258/answers/first
/questions/38470258/answers/accepted
/questions/38470258/answers/top
could all return representations of the same "answer".
On the /other hand, choice adds complexity. It may or may not be a good idea to offer your clients more than one way to do a thing. "Don't make me think!"
On the /other/other hand, an api with a bunch of "general" principles that carry with them a bunch of arbitrary exceptions is not nearly as easy to use as one with consistent principles and some duplication (citation needed).
The notion of a "canonical" URI, which is important in SEO, has an analog in the API world. Mark Seemann has an article about self links that covers the basics.
You may also want to consider which methods a resource supports, and whether or not the design suggests those affordances. For example, POST to modify a collection is a commonly understood idiom. So if your URI looks like a collection
POST /api/professors/:prof_id/courses
Then clients are more likely to make the associate between the resource and its supported methods.
POST /api/courses?professor_id=:prof_id
There's nothing "wrong" with this, but it isn't nearly so common a convention.
GET /api/courses?where={professor_id: "teacher45", year: 2016}&order={attr: "topic", sort: "asc"}
I've never seen anyone do it this way though, so I wonder if I'm doing something stupid.
I haven't either, but syntactically it looks a little bit like GraphQL. I don't see any reason why you couldn't represent a query that way. It would make more sense to me as a single query description, rather than breaking it into multiple parts. And of course it would need to be URL encoded, etc.
But I would not want to crazy with that right unless you really need to give to your clients that sort of flexibility. There are simpler designs (see Roman's answer)
So my question refers to what precisely is a "Stable Matching"? I am aware of the Stable Marriage Problem, and all solutions seem to indicate that you finish with a "Stable Matching". But I am unsure as to what this actually means.
I would stare at the .gif from the wiki because it is actually a great explanation of the mechanics.
In stable matching we guarantee that all elements from two sets (men & woman, kids & toys, persons & vacation destinations, whatever) are put in a pair with an element from the other set AND that pair is the "best available match"
The way we determine what is the "best available match" is imagine that each element of a set has ranked their preference for the elements of the opposite set. The goal is to satisfy the preferences of everybody. Sticking with the marriage problem, imagine every man has a ranked list of woman and every woman has a ranked list of men. The goal is to match them together making sure everybody gets the highest person on their list possible.
Keep in mind this takes several steps (or rounds) for this to work itself out. In the marriage example another thing to keep in mind is that each side can only do one thing. So in our example women accept, men propose. Also, the proposal is not binding. Meaning that if in round one, a woman gets a proposal from her #2 most desired man and in the next round she gets a proposal from her #1 man, she can reject #2 and take #1. The wiki said it better than I could so...
Upon completion of the algorithm, it is not possible for both Alice and Bob to prefer each other over their current partners. If Bob prefers Alice to his current partner, he must have proposed to Alice before he proposed to his current partner. If Alice accepted his proposal, yet is not married to him at the end, she must have dumped him for someone she likes more, and therefore doesn't like Bob more than her current partner. If Alice rejected his proposal, she was already with someone she liked more than Bob.
Basically, no one gets shafted in the marriage because the pair was the highest preference for one another given limited options.
I am trying to really get a good idea how to think in OOP terms, so I have a semi-hypothetical scenario in my mind and I was looking for some thoughts.
If I wanted to design a simulation for different types of people interacting with each other, each of whom could acquire different proficiency levels in different "skills", what would be an optimal way to do this?
It's really the "skills" thing that I was a bit caught up on. My requirements are as follows:
-Each person either "has" a skill or does not
-If people have skills, they also have a "proficiency level" associated with the skill
-I need a way to find and pick out every person that has certain skills at all, or at a certain level
-The design needs to be extendible (ie, I need to be able to add more "skills" later)
I considered the following options:
have a giant enum for every single skill I include, and have the person class contain an
"int Skills[TOTAL_NUM_SKILLS]" member. The array would have zeros for "unacquired" skills, and 1 to (max) for proficiency levels of "acquired skills".
have the same giant enumeration, and have the person class contain a map of skills (from the enum) and numbers associated with the skills so that you can just add only the acquired skills to the map and associate a number this way.
Create a concrete class for every single skill, and have each inherit from an abstract base class (ISkill, say), and have the person class have a map of ISkill's
Really, option 1 seems like the straightforward no-nonsense way to do it. Please criticize; is there some reason this is not acceptable? Is there a more object oriented way to do this?
I know that option 3 doesn't make much sense right now, but if I decided to extend this later to have skills be more than just things with proficiency associated with them (ie, actually associate new actions with the skills (ISkill::DoAction, etc), does this make sense as an option?
Sorry for the broad question, I just want to see if this line of thought makes sense, or if I'm barking up the wrong tree altogether.
The problem with option 1 is future compatibility. Say you were shipping this framework to customers. Now, the customer has built this array of Skill values, which is length TOTAL_NUM_SKILLS, for each person. But this fails as soon as you try to add another skill, and especially as you try to reorder skills.
What if the customer is using an RPC framework in which a client and server pass Person objects over the wire? Now, unless the customer upgrades the client and server at the exact same time, the RPC calls break, since now the client and server expect arrays of different lengths. This can be particularly tricky because the customer may own only the client, or only the server, and be unable to upgrade both at once.
But it gets worse. Say the client has written out a Person object to disk in some file. If they decided to serialize a person as a simple list of numbers, then a new skill will cause the deserialization code to fail. Worse, if you reorder skills in your enum, the deserialization code may work just fine but give a wrong answer.
I like option 3 exactly for the reason you named: later you can add more functionality, and do so safely (well, except for the fact that every public change is a breaking change if your customers exercised certain edge cases in the language).
If you want to add skills often without changing the overall program structure, I'd consider some kind of external data file that you can change without recompiling your code. Think about how you'd want to do it in a really large project. The person who chooses the skills might be a designer with no programming ability. He could edit the skills in an XML file, but not in C++ code.
If you defined the skills in XML, it would naturally extend to store more data with each skill. Your players could be serialized as XML files too.
When you set up a player's skills at runtime, you could build a hash table keyed on the skill name from the XML file. If it's more common to enumerate a player's skills than to query whether a player has a certain skill, you could just use a vector of strings.
Of course, this solution will use more memory and run slower than your enum solution. But it will probably be good enough unless you're dealing with millions of players in your program.
Alice & Bob are both secret quadruple agents who could be working for the US, Russia or China. They want to come up with a scheme that would:
If they are both working for the same side, prove this to each other so they can talk freely.
If they are working for different sides, not expose any additional information about which side they are on.
Oh, and because of the sensitive nature of what they do, there is no trusted third party who can do the comparison for both of them.
What protocol would be able to satisfy both of these needs?
Ideally, any protocol would also be able to generalize to multiple participants and multiples states but that's not essential.
I've puzzled over it for a while and I can't find a satisfactory solution, mainly owing to condition 2.
edit: Here's the original problem that motivated me to look for a solution. "Charlie" had some personal photos that he shared with me and I later discovered that he had also shared them with "Bob". We both wanted to know if we had the same set of photos but, at the same time, if Charlie hadn't shared a certain photo with either of us, he probably had a good reason not to and we didn't want to leak information.
My first thought would be for each of us to concatenate all the photos and provide the MD5 sum. If they matched, then we had the same photos but if they didn't, neither party would know which photos the other had. However, I realized soon after that this scheme would still leak information because Bob could generate an MD5 for each subset of photos he had and if any of them matched my sum, he would know which photos I didn't have. I've yet to find a satisfactory solution to this particular problem but I thought I would generalize it to avoid people focusing on the particulars of my situation.
For both problems, you could use a Secure two-party computation equality-algorithm. There are many schemes, for example this by Damgard, Fitzi, Kiltz, Nielsen and Toft: Unconditionally Secure Constant Round Multi-Party
Computation for Equality, Comparison, Bits and Exponentiation.
Of course an agent could try to pose as an agent from another side to get a 1/3 chance to discover the true side of another agent, but that seems unavoidable.
A much simpler scheme for the photo-problem, which should be almost as good as the secure multiparty computation, is the following:
Alice and Bob sorts their pictures and generate a SHA-512 hash.
Alice sends the first bit of her hash to Bob.
Bob compares the bit to the first bit of his hash. If it is different, they know that they have received different photos. Otherwise they continue.
Bob sends the second bit of his hash to Alice.
Alice checks this bit and decides whether to continue.
Continue until the protocol aborts or all bits have been checked.
So they are guaranteed to be quadruple agents? That is they are guaranteed to be secretly working for one faction while pretending to work for a second while pretending to work for a third while pretending to work for a fourth? They are limited to just the US, Russia or China? If so then that means that there will always be at least one faction they are both pretending to work for and are simultaneously actually working for. That seems to negate their ability to be quadruple agents, because surely one of them can't be working for the Americans while secretly working for the Americans, while secretly working for the Americans, while secretly working for the Americans.
You say that the ideal solution would generalize to arbitrary numbers of states and spy-stacks. Can the degree of secret agent-ness be either higher, equal or lower than the number of states? This might be important. Also, is Alice always guaranteed to have the same degree of agent-ness as Bob? i.e. They will ALWAYS both be triple agents, or ALWAYS both by quintuple agents? The modulo operator springs to mind...
More details please.
As a potential answer, you can enumerate the states into a bitfield. US=1 Russia=2, China=4, Madagascar=8, Tuva=16 etc. Construct a device that is essentially an AND gate. Alice builds and brings one half and Bob builds and brings the other. Separated by a cloth, they each press the button of the state they're really working for. If the output of the AND gate is high, then they're on the same side. If not, then they quietly take down the cloth, and depart with the respective halves of their machine so that the button can't be determined by fingerprint.
This is not theoretical or rigorous, but practical.
For your photos problem, create hashes for all subsets of your photos; randomly select a subset of these, and shuffle in an agreed quantity of randomly generated hash values. Bob does the same, and you exchange these sets. If the proportion of hashes in what Bob has sent you that matches ones you can generate by hashing subsets of your photos significantly differs from what you expect, it is likely you have a significantly different corpus of photos from him. If the proportion of random hashes you agree on is high, you risk being unable to detect small differences in your collections of photos; if the proportion is low, you risk exposing information about missing photos; you will have to select a suitable point for the tradeoff.
Interesting.
I think, no matter what the scheme, it'll need to involve a component of random failure. This is because of the conflicting requirements. You would need a scheme that, occasionally, even when they are on the same side, doesn't work. Because if it always worked, they would immediately be able to determine they aren't on the same side.
Your point 'B' is also vague. You say you don't want to expose what side they are on. Does that mean that the info can't point to specifically one of the sides? Is it okay if Alice thinks Bob is from either one of the others?
Also, have you tried emailing this to the cryptography mailing list? May get a better response there. It's an interesting one to think about :)
Here's the closest I've come to a solution:
Assume there is a function doubleHash such that
doubleHash(a+doubleHash(b)) == doubleHash(b+doubleHash(a))
Alice generates a 62 bit secret and appends the 2 bit country code to the end of it, hashes it and gives Bob doubleHash(a).
Bob does the same thing and gives Alice doubleHash(b).
Alice appends the original secret to the hash that Bob gave her, hashes it and publishes it as doubleHash(a+doubleHash(b)).
Bob does the same thing and publishes doubleHash(b+doubleHash(a)).
If both the hashes match, then they are from the same country. On the other hand, if they don't match, then Bob can't decipher the hash because he doesn't know Alice's secret and vice versa.
However, such a scheme relies on the existence of a doubleHash function and I'm not sure if such a thing is possible.
The most simple thing I can think of with the photos that would possibly work is as thus:
Hash all the photos with a 4096 bit hash.
Sort the photos by hash value. ( Hashes are afterall, just a string representation of a large number )
using that sort order, use a streaming system to pipe, and hash, those photos, as if they were a singular file.
Share your hashes.
If the hashes match, you have the same files. ( low low risk of incorrect positive match, but at 4K hashing, its a bit unlikely )
There are of course, a few weaknesses here:
Don't share how many photos you have. Doing so could permit the party with the greater number of photos do intelligent permutation of the data and remove photos from the hash set they suspect likely you don't have, using the number as a guide, and find ( at great computational expense mind ) a set of images that matches your hash.
They can do 1 without the number, but its harder, and they're out of luck if they actually have less photos.
They could create a fake hash, simply with a random number generator, and send it to you, giving you the impression you had different datasets when you really had the same.
The above weaknesses are also prevalent in your country code identification system, except of course, you have far less entropy to get in the way, and its far easier to fraud the system. ( and thus, far far far easier to work out who they are by sheer brute force, or have yourself worked out by brute force, regardless of how fancy your hash algorithm is )
If this were not the case, you would have already been found out by the very agencies you work for, because something that reliable and secure would be a sure fire way to do a secure background check.
The Photo Scenario is Impossible to Achieve:
Your scheme is impossible for the reasons that you name.
Consider a function f, which takes two sets of photos, s1 and s2.
f(s1, s2) returns true if s1=s2 and false if s1!=s2.
That is, this function implements the scheme you want.
Bob can always supply a subset of photo's he has, and learn which photo's charlie doesn't have.
There is no way around this, any function which has the property you want can not have the security you want.
The Spy Scenario is Even More Impossible:
As Kent Fredric pointed out the spy scenario has even greater inherent weaknesses.
It has all problems of the photo scenario, plus the additional weakness of having only four secrets.
In the photo scenario it would be highly unlikely that Bob would randomly guess one of Charlies photographs.
It is trivial in the spy scenario for Bob to guess Alices choice (1/4).
The spys only have four countries they can belong to, as they are both quadruple agents they both know all the secret code words for each country.
Thus, Bob could pretend to be working for the Chinese to test Alice.
A Different Type of Solution:
Some posters have noted, the security can be increased if you weaken the accuracy of f.
Of course if it is not accurate what is the point. I propose a different type of solution.
Do not let them compare the same
photographs more than one time.
The party which wishes to initiate the comparison must first show that this is a new comparison and does not use any of the pictures from before.
EDIT: Problems with Double Hash
I am making some assumptions about the doublhash protocol, but...
For the photograph scheme, the doublehash protocol is no better than f, because the 62 bit secret must be constructed from a set of photographs for the comparison to be meaningfull. The subset attack mentioned in the original question still applies here. Try all subsets of photographs to brute force the secrets you can generate, thus Bob can see if he can generate the same secret as Alice.
Using the doublehash property Bob can still brute force the secret.
doubleHash(s1+doubleHash(b)) != doubleHash(aliceSecret+doubleHash(a))
doubleHash(s2+doubleHash(b)) != doubleHash(aliceSecret+doubleHash(a))
doubleHash(s3+doubleHash(b)) == doubleHash(aliceSecret+doubleHash(a))
Bingo, aliceSecret == s3.
DoubleHash is only as strong as it is hard to bruteforce either a or b
Implementating DoubleHash
Instead doubleHash(a + doubleHash(b)), try doubleHash(a, md5(b)).
DoubleHash(a + doubleHash(b)) is bad because Bob could generate colliding hashes like so:
doubleHash((12 + doubleHash(34)) + doubleHash(5678))
= doubleHash((34 + doubleHash(12)) + doubleHash(5678))
= doubleHash(5678 + doubleHash(12 + doubleHash(34))
= doubleHash(5678 + doubleHash(34 + doubleHash(12))
Here is an implementation of doubleHash using the new formulation,
Doublehash(a, hashOfB){
hashOfA = md5(a)
combinedHash = hashOfA xor hashOfB
return md5(combinedHash)
}
One could also use the math behind blind signatures to impliment version of doubleHash.
Wouldn't RSA work here? Each nation knows its private key, you publish your public key, and only nations that are the same can decrypt the info. I guess the second person would know that the first isn't on the same side as they are, however.
Hmm.
How about Public Key Cryptography?