Abstract data type (ADT) for truth table - abstract-data-type

Just like there is a nice, common/generic, ADT for Graphs.
Is there something for "truth tables"? I'm trying to find 'class' definitions from projects which already implement a truth table.
If not, how would you go about designing a (generic) truth table ADT?
UPDATE: As suggested in comments, here is what I came up with:
Add (delete) a row: TruthTable.add(input,output) (the first row which gets added is used to extract the length (in bits) of inputs and output. All subsequent row additions are validated against this.) and TruthTable.delete(input)
Get the output(image) for a given input:TruthTable.output(input) or TruthTable.image(input)
Get all the inputs: TruthTable.inputs (Odometer ordering.)
Get all outputs : TruthTable.outputs (Order according to ordering of inputs or odometer ordering?)
See if a table is completely specified, ie, for all the 2n possible inputs, is an output specified: TruthTable.completely_specified?
Other specialized operations can be:
Check if a truth table is invertible: TruthTable.invertible?
Check if two tables are equivalent: TT1 ==? TT2
(I wish sometimes that programming languages allowed method names to end in a '?', for names of those methods which return a Bool)

Related

Explanation of Set vs Tuple in MDX

It seems I am struggling to understand the difference between a set and a tuple in MDX. I've read very fancy definitions comparing the two, but the only difference to me seems that 'A set has the same-type members' and 'A tuple has non-same-type members'. Other than that, any definition I read or come across (talking about dimensional space or what-not) seems to make no sense. The 'one-item' I get:
# Tuple
[Team].[Hierarchy].[Code].[DET]
And then multiple items with that same type (dimensionality) is a set, ok:
{[Team].[Hierarchy].[Code].[DET], [Team].[Hierarchy].[Code].[DAL]}
But here are a few examples that don't make sense to me:
# How is this a set? It just has two exact same items!
{[Team].[Hierarchy].[Code].[DET], [Team].[Hierarchy].[Code].[DET]}
And another example:
# Tuple (again, same thing -- now adding a duplicate attribute
(
{[Team].[Hierarchy].[Code].[DET],[Team].[Hierarchy].[Code].[DET]},
[Team].[Name].[Name].[Detroit Lions]
)
Now since both of these are almost doing the same thing (and neither references a measure, so neither would be self-sufficient to pull a 'value'), what is the actual difference between a tuple and a set? These seem to be so loosely defined in the language (for example, above I can have duplicate members in a set, which is usually not allowed in a set).
A related question (some of the answers cover the basics of a one-level set/tuple difference but don't go into too much detail on nesting): Difference between tuple and set in mdx. Also, most of the links on that page are broken.
MDX sets are an ordered collection of 0 or more tuples (note that a member is considered to be a tuple containing a single element) with the same dimensionality. Unlike a mathematical set, an MDX set may contain duplicates, it is more of a list of elements. More details here.
And perhaps as a refresh for MDX concepts here is a gentle introduction of MDX.

Dummy Variable Trap And removing one Column

Can anyone explain me excatly what is meant by Dummy Variable Trap?And why we want to remove one column to avoid that trap?Please provide me some links or explain this.I am not clear about this process.
In regression analysis there's often talk about the issue of multicolinearity, which you might be familiar with already. The dummy variable trap is simply perfect colinearity between two or more variables. This can arise if, for one binary variable, two dummies are included; Imagine that you have a variable x which is equal to 1 when something is True. If you would include x, along with another variable z, which would be the opposite of x (i.e. 1 when that same thing is False), in your regression model, you would have two perfectly negatively correlated variables.
Here's a simple demonstration. Let's say your x is one column with True/False values in a pandas dataframe. See what happens when you use pd.get_dummies(df.x) below. The two dummies that are created are mirroring each other, so one of them is redundant. In simpler terms, you only need one of them since you can always guess the value of the other based on the one that you have.
import pandas as pd
df = pd.DataFrame({'x': [True, False]})
pd.get_dummies(df.x)
False True
0 0 1
1 1 0
The same applies if you have a categorical variable that can take on more than two values. Whether binary or not, there is always a "base scenario" that can be defined by the variation in the other case(s). This "base scenario" is therefore redundant and will only introduce perfect colinearity in the model if included.
So what's the issue with multicolinearity/linear dependence? The short answer is that if there is imperfect multicolinearity among your explanatory variables, your estimated coefficients can be distorted/biased. If there is perfect multicolinearity (which is the case with the dummy variable trap) you can't estimate your model at all; think of it like this, if you have a variable that can be perfectly explained by another variable, it means that your sample data only includes valuable information about one, not two, truly unique variables. So it would be impossible to obtain two separate coefficient estimates for the same variable.
Further Reading
Multicolinearity
Dummy Variable Trap

python - pandas - dataframe - data padding multidimensional statistics

i have a dataframe with columns accounting for different characteristics of stars and rows accounting for measurements of different stars. (something like this)
\property_______A _______A_error_______B_______B_error_______C_______C_error ...
star1
star2
star3
...
in some measurements the error for a specifc property is -1.00 which means the measurement was faulty.
in such case i want to discard the measurement.
one way to do so is by eliminating the entire row (along with other properties who's error was not -1.00)
i think it's possible to fill in the faulty measurement with a value generated by the distribution based on all the other measurements, meaning - given the other properties which are fine, this property should have this value in order to reduce the error of the entire dataset.
is there a proper name to the idea i'm referring to?
how would you apply such an algorithm?
i'm a student on a solo project so would really appreciate answers that also elaborate on theory (:
edit
after further reading, i think what i was referring to is called regression imputation.
so i guess my question is - how can i implement multidimensional linear regression in a dataframe in the most efficient way???
thanks!

What is difference between equivalence class testing and input domain partitioning?

I'm learning software testing now, just wondering what is difference between equivalence class testing and input domain partitioning, seems like both of them about to partition input domain.
Frankly saing, during my career as software testing engineer I haven't met a lot of mentions about input domain partitions.
But nevertheless this term exists and let's try to take a look is there a difference between equivalence class testing and input domain partitioning?
Equivalence class technique divides possible test data for, let's say application module, into partitions of equivalent data. They're "equivalent" because any member of that partition can perfectly represent the other member of that partition, and theoretically you need only one test using one of the partitions' members in order to make testing of that partition enough sufficient. Moreover the partitions should not overlap.
Yes I know, that's a little bit cumbersome, but let's take a look on the example: you have an input field on the web page which accepts all kind of chars but up to 256 of them. It gives you following equivalence partitions (simplified):
Char types:
only letters
only numbers
only special chars
mixed chars (letters + numbers + spec. chars)
Char quantity:
0
>0
<256
256
Each of that equivalence partitions has sub-partitions, e.g. "letters":
Big letters
Small letters
Mixed letters
That means that in order to sufficiently test "letters partitions" you have to design test case which will include at least one of those sub-partitions. Let's say it will be "letters -> Big letters": "TEST INPUT STRING". Take a look that here we've also combined our test string with "Char quantity - >0" equivalence partition.
So basicly saying combining sub-partitions of "Char types" and "Char quantities" partitions, you'll be able to design a minimum test set for testing input data of that field.
From the other side input domain for a program contains all the possible inputs to that program which is farely equal to equivalence classes of possible inputs of the application module.
Sometimes the ones who speak about input domain for a program, say also about regions which is the same thing as sub-partition of equivalence partitions. Moreover those input domains (and accordingly regions) must not overlap (so must they not within equivalence partition testing).
With all that said I would consider those two terms as ones, that describe the same matter but using different words.

Appropriate operators for assignment semantics in a non-pure declarative language

I'm designing a declarative language for defining signal networks. I want to use variable bindings to represent groups of nodes in the network. It occurred to me that there are two types of "assignment" I wish to do for these variables.
On the one hand, a variable should represent the output of a specific group of signal operators. This output can then be attached to another input. This is important for directing different outputs to different places, for example:
a, b, c = (SignalA with three outputs)
(SignalB a)
(SignalC c)
(SignalD a)
In this case there would be a SignalA with three outputs, where the first and third outputs get linked to SignalB and SignalC respectively, and SignalD also gets linked to the first output of SignalA. There is only one instance of SignalA.
On the other hand, a variable should represent a common pattern of signal operations, so that it's easy to reproduce a common configuration:
a = (SignalA (SignalB))
(SignalC a)
(SignalD a)
In this case, I'd like a to represent the composition of SignalA and SignalB, and this is reproduced as the input for SignalC and SignalD. There are two instances of SignalA here.
So my question is, in functional/declarative programming, are there common terms for these two assignment semantics? And in my language, which one should get '=', and what would be a common operator for the other? (perhaps := ?)
I realized of course that if each Signal really represented a pure function, then both of these would be the same, but in my case it's possible for side effects to occur when the signal is processed, so I need to differentiate these two cases.
It's past my bed time, so I may not be reading carefully enough. But is the second case similar to an anonymous function? Your syntax looks lisp-like already, so I wonder if lisp's shortcut syntax for the lambda function might be what you want.
a = '(SignalA (SignalB))
If your usage is not actually similar in meaning to lambda, then it will probably cause more confusion.
BTW, in the first case, you could follow Perl's idea for the left side of a list assignment:
(a, b, c) = (SignalA with three outputs)
No idea if this will be helpful; I'm not that experienced outside of imperative languages like perl and C.