Find the symmetric difference between two sets in Kotlin - kotlin

Is there a Kotlin stdlib function to find the symmetric difference between two sets? So given two sets [1, 2, 3] and [1, 3, 5] the symmetric difference would be [2, 5].
I've written this extension function which works fine, but it feels like an operation that should already exist within the collections framework.
fun <T> Set<T>.symmetricDifference(other: Set<T>): Set<T> {
val mine = this subtract other
val theirs = other subtract this
return mine union theirs
}
EDIT: What is the best way get the symmetric difference between two sets in java? suggests Guava or ApacheCommons, but I'm wondering if Kotlin's stdlib supports this.

Related

Sampling non-repeating integers with given probability distribution

I need to sample n different values taken from a set of integers.
These integers should have different occurence probability. E.g. the largest the lukilier.
By using the random package I can sample a set of different values from the set, by maeans of the method
random.sample
However it doesn't seem to provide the possibility to associate a probability distribution.
On the other hand there is the numpy package which allows to associate the distribution, but it returns a sample with repetitions. This can be done with the method
numpy.random.choice
I am looking for a method (or a way around) to do what the two methods do, but together.
You can actually use numpy.random.choice as it has the replace parameter. If set to False, the sampling will be done wihtout remplacement.
Here's a random example:
>>> np.random.choice([1, 2, 4, 6, 9], 3, replace=False, p=[1/2, 1/8, 1/8, 1/8, 1/8])
>>> array([1, 9, 4])

row-wise calculation of cosine similarity in pandas without looping

I have a pandas dataframe df with many rows. For each row, I want to calculate the cosinus similarity between the row's columns A (first vector) and the row's columns B (second vector). At the end, I aim to get a vector with one cosine similarity value for each row. I have found a solution but it seems to me like it could be done much faster without this loop. May anyone give me some feedback on this code?
Thank you very much!
for row in np.unique(df.index):
cos_sim[row]=scipy.spatial.distance.cosine(df[df.index==row][columnsA],
df[df.index==row][columnsB])
df['cos_sim']=cos_sim
Here comes some sample data:
df = pd.DataFrame({'featureA1': [2, 4, 1, 4],
'featureA2': [2, 4, 1, 4],
'featureB1': [10, 2, 1, 8]},
'featureB2': [10, 2, 1, 8]},
index=['Pit', 'Mat', 'Tim', 'Sam'])
columnsA=['featureA1', 'featureA2']
columnsB=['featureB1', 'featureB2']
This is my desired output (cosine similarity for Pit, Mat, Tim and Sam):
cos_sim=[1, 1, 1, 1]
I am already receiving this output with my method, but I am sure the code could be improved from a performance perspective
several things you can improve on :)
Take a look at the DataFrame.apply function. pandas already offers you looping "under the hood".
df['cos_sim'] = df.apply(lambda _df: scipy.spatial.distance.cosine(_df[columnsA], _df[columnsB])
or something similar should be more performant
Also take a look at DataFrame.loc
df[df.index==row][columnsA]
and
df.loc[row,columnsA]
should be equivalent
If you really have to iterate over the dataframe (should be avoided again due to performance penalties and it is more difficult to read and understand), pandas gives you a generator for the rows (and id)
for index, row in df.iterrows():
scipy.spatial.distance.cosine(row[columnsA], row[columnsB])
Finally as mentioned above to get better answers on stackoverflow, always provide a concrete example where the problem is reproducible. Otherwise it is much harder to interpret the question correctly and to test a solution.
Pretty old post but I am replying for future readers. I created https://github.com/ma7555/evalify for all those rowwise similarity/distance calculations (disclaimer: i am the owner of the package)

TensorFlow: Can data sets contain string category values?

With TensorFlow, it is easy to determine from examples that data contains numeric values. For example:
x_train = [1, 2, 3, 4]
y_train = [0, -1, -2, -3]
However, does it also work with string category values? For example:
x_train = ["sunny", "rainy", "sunny", "cloudy"]
y_train = ["go outside", "stay inside", "go outside", "go outside"]
If it does not, I must assume that TensorFlow has a methodology for working with categorical values. Perhaps by some clever trick such as converting them to numeric values in some systematic way.
Yes, TensorFlow does support datasets with categorical features. Perhaps the easiest way to work with them is to use the Feature Column API, which provides methods such as tf.feature_column.categorical_column_with_vocabulary_list() (for dealing with small, known sets of categories) and tf.feature_column.categorical_column_with_hash_bucket() (for dealing with large and potentially unbounded sets of categories).

Fastest way to apply arithmetic operations to System.Array in IronPython

I would like to add (arithmetics) two large System.Arrays element-wise in IronPython and store the result in the first array like this:
for i in range(0:ArrA.Count) :
arrA.SetValue(i, arrA.GetValue(i) + arrB.GetValue(i));
However, this seems very slow. Having a C background I would like to use pointers or iterators. However, I do not know how I should apply the IronPython idiom in a fast way. I cannot use Python lists, as my objects are strictly from type System.Array. The type is 3d float.
What is the fastests / a fast way to perform to compute this computation?
Edit:
The number of elements is appr. 256^3.
3d float means that the array can be accessed like this: array.GetValue(indexX, indexY, indexZ). I am not sure how the respective memory is organized in IronPython's System.Array.
Background: I wrote an interface to an IronPython API, which gives access to data in a simulation software tool. I retrieve 3d scalar data and accumulate it to a temporal array in my IronPython script. The accumulation is performed 10,000 times and should be fast, so that the simulation does not take ages.
Is it possible to use the numpy library developed for IronPython?
https://pytools.codeplex.com/wikipage?title=NumPy%20and%20SciPy%20for%20.Net
It appears to be supported, and as far as I know is as close you can get in python to C style pointer functionality with arrays and such.
Create an array:
x = np.array([[1, 2, 3], [4, 5, 6]], np.int32)
Multiply all elements by 3.0:
x *= 3.0

Lua: Dimensions of a table

This seems like a really easy, "google it for me" kind of question but I can't seem to get an answer to it. How do I find the dimensions of a table in Lua using a command similar to Numpy's .shape method?
E.g. blah = '2 x 3 table'; blah.lua_equivalent_of_shape = {2,3}
Tables in Lua are sets of key-value pairs and do not have dimensions.
You can implement 2d-arrays with Lua tables. In this case, the dimension is given by #t x #t[1], as in the example below:
t={
{11,12,13},
{21,22,23},
}
print(#t,#t[1])
Numpy's arrays are contiguous in memory and Lua's tables are Hashes so they don't always have the notion of a shape. Tables can be used to implement ragged arrays, sets, objects, etc.
That being said, to find the length of a table, t, using indices 1..n use #t
t = {1, 2, 3}
print(#t) -- prints 3
You could implement an object to behave more like a numpy array and add a shape attribute, or implement it in C and make bindings for Lua.
t = {{1, 0}, {2, 3}, {3, 1}, shape={2, 2}}
print(t.shape[1], t.shape[2])
print("dims", #t.shape)
If you really miss Numpy's functionality you can use use torch.tensor for efficient numpy like functionality in Lua.