XACML ALFA / Any-of-Any Condition with a Match at Multi Value Comparison / Which is the matching Element? - xacml

When comparing two different bags using an Any-of-Any function, is XACML Version 3 able to identify which was the element that produced the match (signalled by the boolean true value). Beside this return value, is there an index value available, or as integer, or as a list of integers?

No, there isn't any such mechanism. As you know, in XACML (and ALFA), attribute values are always bags of values. You can convert a single-valued bag into a single atomic value but you can never know the order of a bag nor can you pick a specific value by first, last, or any index. Bags are unordered.
You could potentially introduce a function that would sort a bag e.g. alphabetically and then return the first element or the last.
Consequently, you cannot know which value produced the match. If, for instance, you have a function that says stringAtLeastOneMemberOf(userQualifications, requiredQualifications), you cannot know which one triggered the match.
HTH,
David.

Related

SQL UNION - what are the implicit matching conditions?

If I perform an SQL UNION on 2 tables, what are the explicit conditions that UNION uses to decide that any 2 rows are the same?
Looking through the Postgres documentation, it just states "A row is in the set union of two result sets if [it] appears in at least one of the result sets". If [WHAT] appears in the result sets?
Does it match column1, then column2, ...? How does it decide to stop at columnX?
I'm a little stunned that the actual matching rules aren't spelled out explicitly. Or, I just couldn't find them with 10 minutes of Google-ing.
Thanks for your help.
UNION (as compared to UNION ALL) is similar to DISTINCT in that for duplicates to be removed, all columns have to be identical.
I'm not sure what you are asking, but perhaps this is it.
When you use union (or other set operators) on two tables, the rows are compared by position. All columns must match -- that is, there need to be the same number of columns in both tables. Postgres will decide on the type of each column in the result set based on type precedence rules. The columns in either table will be converted to the specified type.
Does that address what you are asking?
Actually, this is pretty much specified in the documentation:
SQL UNION constructs must match up possibly dissimilar types to become
a single result set. The resolution algorithm is applied separately to
each output column of a union query. The INTERSECT and EXCEPT
constructs resolve dissimilar types in the same way as UNION. The
CASE, ARRAY, VALUES, GREATEST and LEAST constructs use the identical
algorithm to match up their component expressions and select a result
data type.
Type Resolution for UNION, CASE, and Related Constructs
If all inputs are of the same type, and it is not unknown, resolve as
that type.
If any input is of a domain type, treat it as being of the domain's
base type for all subsequent steps. [9]
If all inputs are of type unknown, resolve as type text (the preferred
type of the string category). Otherwise, unknown inputs are ignored
for the purposes of the remaining rules.
If the non-unknown inputs are not all of the same type category, fail.
Choose the first non-unknown input type which is a preferred type in
that category, if there is one.
Otherwise, choose the last non-unknown input type that allows all the
preceding non-unknown inputs to be implicitly converted to it. (There
always is such a type, since at least the first type in the list must
satisfy this condition.)
Convert all inputs to the selected type. Fail if there is not a
conversion from a given input to the selected type.

Keen IO mixed property values (integers as strings)

Since Keen is not strongly typed, I've noticed it is possible to send data of different types into the same property. For instance, some events may have a property whose value is a String (sent surrounded by quotes), and some whose value is an integer (sent without quotes). In the case of mathematical operations, what is the expected behavior?
Our comparator will only compute mathematical operations on numbers. If you have a property whose values are mixed, the operation will only apply to the numbers, strings will be ignored. You can see the values in your property by running a select_unique query on that property as the target_property, then (if you're using the Explorer) selecting JSON from the drop-down in the top-right. Any values you see there that are surrounded by quotes will be ignored by a mathematical query type (minimum, maximum, median, average, percentile, and sum).
If you are just starting out, and you know you want to be able to do mathematical operations on this property, we recommend making sure that you always send integers as numbers (without quotes). If you really want to keep your dataset clean, you can even start a new collection once you've made sure you are no longer sending any strings.
Yes, you're correct, Keen can accept data of different types as the value for your properties. An example of Keen's lenient data type is that a property such as VisitorID can contain both numbers (ie 14558) or strings (ie "14558").
This is article from the Keen site is useful for seeing where you can check data types: https://keen.io/docs/data-collection/data-modeling-guide-200/#check-for-data-type-mismatch

Lucene - Expected behavior when indexing multiple occurrences of a token within a field

Lets say that I'm indexing a string value "useridA;useridB,userdidC,useridA,useridA"
The field is set to ANALYZED and uses a custom CharTokenizer which looks for a boundary comma char.
What is the expected behavior in the index, as the token "useridA" occurs multiples times within the same field?
Will it just re-index the same value an preserve the same space as if it would have been just one occurrence?
At the basic level lucene is an "inverted term index" it stores term->docID. So if a term occurs many times it'll only be recorded once.
Obviously this is a huge simplification. Positional information will also be stored depending on the TermVector value used when adding the field (you will need this to use phrase and slop queries).
Depending only your use-case I'd suggest you de-dupe the list either when indexing or just use a HashSet< string> for that property of whatever your class is.

SQL Server 2008 - Default column value - should i use null or empty string?

For some time i'm debating if i should leave columns which i don't know if data will be passed in and set the value to empty string ('') or just allow null.
i would like to hear what is the recommended practice here.
if it makes a difference, i'm using c# as the consuming application.
I'm afraid that...
it depends!
There is no single answer to this question.
As indicated in other responses, at the level of SQL, NULL and empty string have very different semantics, the former indicating that the value is unknown, the latter indicating that the value is this "invisible thing" (in displays and report), but none the less it a "known value". A example commonly given in this context is that of the middle name. A null value in the "middle_name" column would indicate that we do not know whether the underlying person has a middle name or not, and if so what this name is, an empty string would indicate that we "know" that this person does not have a middle name.
This said, two other kinds of factors may help you choose between these options, for a given column.
The very semantics of the underlying data, at the level of the application.
Some considerations in the way SQL works with null values
Data semantics
For example it is important to know if the empty-string is a valid value for the underlying data. If that is the case, we may loose information if we also use empty string for "unknown info". Another consideration is whether some alternate value may be used in the case when we do not have info for the column; Maybe 'n/a' or 'unspecified' or 'tbd' are better values.
SQL behavior and utilities
Considering SQL behavior, the choice of using or not using NULL, may be driven by space consideration, by the desire to create a filtered index, or also by the convenience of the COALESCE() function (which can be emulated with CASE statements, but in a more verbose fashion). Another consideration is whether any query may attempt to query multiple columns to append them (as in SELECT name + ', ' + middle_name AS LongName etc.).
Beyond the validity of the choice of NULL vs. empty string, in given situation, a general consideration it to try and be as consistent as possible, i.e. to try and stick to ONE particular way, and to only/purposely/explicitly depart from this way for good reasons and in few cases.
Don't use empty string if there is no value. If you need to know if a value is unknown, have a flag for it. But 9 times out of 10, if the information is not provided, it's unknown, and that's fine.
NULL means unknown value. An empty string means a known value - a string with length zero. These are totally different things.
empty when I want a valid default value that may or may not be changed, for example, a user's middle name.
NULL when it is an error if the ensuing code does not set the value explicitly.
However, By initializing strings with the Empty value instead of null, you can reduce the chances of a NullReferenceException occurring.
Theory aside, I tend to view:
Empty string as a known value
NULL as unknown
In this case, I'd probably use NULL.
One important thing is to be consistent: mixing NULLs and empty strings will end in tears.
On a practical implementation level, empty string takes 2 bytes in SQL Server where as NULLs are bitmapped. In some conditions and for wide/larger tables it makes a different in performance because it's more data to shift around.

MySQL command to search CSV (or similar array)

I'm trying to write an SQL query that would search within a CSV (or similar) array in a column. Here's an example:
insert into properties set
bedrooms = 1,2,3 (or 1-3)
title = nice property
price = 500
I'd like to then search where bedrooms = 2+. Is this even possible?
The correct way to handle this in SQL is to add another table for a multi-valued property. It's against the relational model to store multiple discrete values in a single column. Since it's intended to be a no-no, there's little support for it in the SQL language.
The only workaround for finding a given value in a comma-separated list is to use regular expressions, which are in general ugly and slow. You have to deal with edge cases like when a value may or may not be at the start or end of the string, as well as next to a comma.
SELECT * FROM properties WHERE bedrooms RLIKE '[[:<:]]2[[:>:]]';
There are other types of queries that are easy when you have a normalized table, but hard with the comma-separated list. The example you give, of searching for a value that is equal to or greater than the search criteria, is one such case. Also consider:
How do I delete one element from a comma-separated list?
How do I ensure the list is in sorted order?
What is the average number of rooms?
How do I ensure the values in the list are even valid entries? E.g. what's to prevent me from entering "1,2,banana"?
If you don't want to create a second table, then come up with a way to represent your data with a single value.
More accurately, I should say I recommend that you represent your data with a single value per column, and Mike Atlas' solution accomplishes that.
Generally, this isn't how you should be storing data in a relational database.
Perhaps you should have a MinBedroom and MaxBedroom column. Eg:
SELECT * FROM properties WHERE MinBedroom > 1 AND MaxBedroom < 3;