Julia - Generate 2-matching odd Set - optimization

In Julia, given a Set{Tuple{Int, Int}} named S of length greater than 3, for instance:
julia> S = Set{Tuple{Int,Int}}([(1, 4), (2, 5), (2, 6), (3, 6)])
Set{Tuple{Int64,Int64}} with 4 elements:
(2, 5)
(3, 6)
(2, 6)
(1, 4)
I want to return a subset T of S of length greater than 3 and odd (3, 5, 7, ...) such that, all first values of the tuples are unique. For instance, I can't have (2, 5) and (2, 6) because first value, 2 will not be unique. The same applies for second values meaning that I can't have (2, 6) and (3, 6).
If it is not possible, returning an empty Set of Tuple is fine.
Finally for the above minimal example the code should return:
julia> T = Set{Tuple{Int,Int}}([(1, 4), (2, 5), (3, 6)])
Set{Tuple{Int64,Int64}} with 3 elements:
(2, 5)
(3, 6)
(1, 4)
I am truly open to any other type of strucutre if you think it is better than Set{Tuple{Int, Int}} :)
I know how I can do it with integer programming. However, I will run this many times with large instances and I would like to know if there is a better way because I deeply think it can be done in polynomial time and perhaps in Julia with clever map or other efficient functions!

What you need is a way to filter the possible combinations of members of a set. So create a filtering function. If the part about an odd [3, 5, 7...] sequence you mentioned applies here, somehow, you might need to add that to the filter logic below:
using Combinatorics
allunique(a) = length(a) == length(unique(a))
slice(tuples, position) = [t[position] for t in tuples]
uniqueslice(tuples, position) = allunique(slice(tuples, position))
is_with_all_positions_unique(tuples) = all(n -> uniqueslice(tuples, n), 1:length(first(tuples)))
Now you can find combinations. With big sets these will explode in number, so make sure to exit when you have enough. You could use Lazy.jl here, or just a function:
function tcombinations(tuples, len, needed)
printed = 0
for combo in combinations(collect(tuples), len)
if is_with_all_positions_unique(combo)
printed += 1
println(combo)
printed >= needed && break
end
end
end
tcombinations(tuples, 3, 4)
[(2, 5), (4, 8), (3, 6)]
[(2, 5), (4, 8), (1, 4)]
[(2, 5), (4, 8), (5, 6)]
[(2, 5), (3, 6), (1, 4)]

Related

How to exclude business days from date calculations

Dates are in the following format:
YYYY-MM-DD HH:MM:SS
Using the following notebooks and calculation:
import pandas as pd
import numpy as np
import datetime as dt
F52_metrics['Dock to UPS SAP Receipt'] = (F52_metrics['UPS SAP Receipt Date'].dt.date - F52_metrics['Dock Date'].dt.date).astype(str).map(lambda x: x.rstrip('00:00:00.000000000')).str.replace("NaT", "").str.replace("+","").str.replace("days","")
Need to replicate the above calculation to exclude business days. I have tried replacing calculation entirely with numpy.busday_count but have been experiencing syntax errors.
You can use numpy calendar.
def calendar():
#set work week mask and optional holidays array
return np.busdaycalendar(weekmask='1111100', holidays=['2020-01-01','2020-01-20','2020-02-17','2020-05-25','2020-07-03','2020-09-07','2020-10-12','2020-11-11','2020-11-26','2020-12-25'])
def countWeekDays(fromDate='2020-03-03', toDate='2020-06-03'):
d = np.arange(fromDate, toDate, dtype=np.datetime64)
weekdays = d[np.is_busday(d, busdaycal=calendar())]
workDays = [(m, np.array([i for i in weekdays if i.item().month==m]).size) for m in range(1,13)]
return workDays #weekdays, months #
>>> countWeekDays()
[(1, 0), (2, 0), (3, 21), (4, 22), (5, 20), (6, 2), (7, 0), (8, 0), (9, 0), (10, 0), (11, 0), (12, 0)]

Finding every points in a sphere in a 3d coordinates

I'm trying to solve this algorithm.
Given a radius and an a point.
Find every points in the 3d coordinate system that are in the sphere of that radius that centered at the given point, and store them in a list.
you could do this with numpy, below.
Note the code here will give you coordinates relative to a sphere centered at a point you choose, with a radius you choose. You need to make sure that your input dimensions 'dim' below are set so that the sphere would be fully contained within that volume first. It also will only work for positive indicies. If your point has any coordinates that are negative, use the positive of that, and then in the output flip the signs of that axis coordinates yourself.
import numpy as np
dim = 15
# get 3 arrays representing indicies along each axis
xx, yy, zz = np.ogrid[:dim, :dim, :dim]
# set you center point and radius you want
center = [7, 7, 7]
radius = 3
# create 3d array with values that are the distance from the
# center squared
d2 = (xx-center[0])**2 + (yy-center[1])**2 + (zz-center[2])**2
# create a logical true/false array based on whether the values in d2
# above are less than radius squared
#
# so this is what you want - all the values within "radius" of the center
# are now set to True
mask = d2 <= radius**2
# calculate distance squared and compare to radius squared to avoid having to use
# slow sqrt()
# now you want to get the indicies from the mask array where the value of the
# array is True. numpy.nonzero does that, and gives you 3 numpy 1d arrays of
# indicies along each axis
s, t, u = np.nonzero(mask)
# finally, to get what you want, which is all those indicies in a list, zip them together:
coords = list(zip(s, t, u))
print(coords)
>>>
[(2, 5, 6),
(3, 4, 5),
(3, 4, 6),
(3, 4, 7),
(3, 5, 5),
(3, 5, 6),
(3, 5, 7),
(3, 6, 5),
(3, 6, 6),
(3, 6, 7),
(4, 3, 6),
(4, 4, 5),
(4, 4, 6),
(4, 4, 7),
(4, 5, 4),
(4, 5, 5),
(4, 5, 6),
(4, 5, 7),
(4, 5, 8),
(4, 6, 5),
(4, 6, 6),
(4, 6, 7),
(4, 7, 6),
(5, 4, 5),
(5, 4, 6),
(5, 4, 7),
(5, 5, 5),
(5, 5, 6),
(5, 5, 7),
(5, 6, 5),
(5, 6, 6),
(5, 6, 7),
(6, 5, 6)]

TF-IDF using in pandas data frame

i am trying to use TF-IDF in pandas with data set content two columns first column it content text data and the another one it content categorical data looks like blow
summary type of attack
unknown african american assailants fired seve... Armed Assault
unknown perpetrators detonated explosives paci... Bombing
karl armstrong member years gang threw firebom... Infrastructure
karl armstrong member years gang broke into un... Infrastructure
unknown perpetrators threw molotov cocktail in... Infrastructure
i want to use tf-idf to convert the first column and then use it to build the mode for prediction of the second columns that content the attack type
I helped you to process your df into X and y to be trained with a short example.
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.preprocessing import LabelEncoder
data = {'summary':['unknown african american assailants fired',
'Armed Assault unknown perpetrators detonated explosives','Bombing karl armstrong member years gang threw'],'type of attack':['bullet','explosion','gang']}
#tfidf
df = pd.DataFrame(data)
tf = TfidfVectorizer()
X = tf.fit_transform(df['summary'])
#label encoding
le = LabelEncoder()
y = le.fit_transform(df['type of attack'])
#your X and y ready to be trained
print('X----')
print(X)
print('y----')
print(y)
Output
X----
(0, 9) 0.4673509818107163
(0, 4) 0.4673509818107163
(0, 1) 0.4673509818107163
(0, 0) 0.4673509818107163
(0, 15) 0.35543246785041743
(1, 8) 0.4233944834119594
(1, 7) 0.4233944834119594
(1, 13) 0.4233944834119594
(1, 5) 0.4233944834119594
(1, 2) 0.4233944834119594
(1, 15) 0.3220024178194947
(2, 14) 0.37796447300922725
(2, 10) 0.37796447300922725
(2, 16) 0.37796447300922725
(2, 12) 0.37796447300922725
(2, 3) 0.37796447300922725
(2, 11) 0.37796447300922725
(2, 6) 0.37796447300922725
y----
[0 1 2]

Getting ND behaviour of `np.dot` with `np.tensortdot` or 2-D only `np.dot`

I'm trying to express the N-D behaviour of np.dot using only 2-D np.dot or np.tensordot.
To recap, np.dot does something like the following for N-D: It matches/broadcasts the arrays along all dimensions but the last two and performs dot products for all of them. For example, if x.shape is (2, 3, 4, 5) and y.shape is (2, 3, 5, 4), np.dot(x, y).shape is (2, 3, 4, 4) and np.dot(x, y)[i, j] is np.dot(x[i, j], y[i, j]).
Also, if x.shape is just (4, 5), it will first be converted to (2, 3, 5, 4) via np.broadcast.
I tried np.tensortdot(x, y, axes=(-1, -2)) but it repeats along every dimension of x, y instead of matching them up.
I realise I could write a loop but I was looking for a vectorised solution.
You got the broadcasting behavior of np.dot wrong:
In [254]: x=np.ones((2,3,4,5)); y=np.ones((2,3,5,4))
In [255]: np.dot(x,y).shape
Out[255]: (2, 3, 4, 2, 3, 4)
In [256]: np.matmul(x,y).shape
Out[256]: (2, 3, 4, 4)
and for the (4,5) x:
In [257]: np.dot(x[0,0],y).shape
Out[257]: (4, 2, 3, 4)
In [258]: np.matmul(x[0,0],y).shape
Out[258]: (2, 3, 4, 4)
matmul was added precisely because np.dot does not act like it is performing np.dot(x[i,j,:,:], y[i,j,:,:]) for all i,j.
The shape in Out[255] is the shape of x minus the 5, plus the shape of y minus its 5. In effect an outer produce of everything with summing on the size 5 dimension.
tensordot uses np.dot. It just reshapes and transposes the inputs to reduce the problem to a 2d dot one. Then it massages the result back to the desired shape and order.
In [259]: np.tensordot(x, y, axes=(-1,-2)).shape
Out[259]: (2, 3, 4, 2, 3, 4) # cf Out[255]
In [261]: np.einsum('ijkl,ijlm->ijkm',x,y).shape
Out[261]: (2, 3, 4, 4) # cf Out[256]
Since sparse matrices are 2d to start with - and end with, I don't understand your question. If you have multiple sparse matrices, you'll have to work with them individually.

Numpy multidimensional transpose not giving expected result

I have an array with dimensions (2, 3, 4, 5).
When I do np.transpose(a, (0, 3, 2, 1)) I get back the expected result with shape (2, 5, 4, 3).
But when I do np.transpose(a, (0, 3, 1, 2)), I expect to get a result with shape (2, 4, 5, 3) but instead I get a shape of (2, 5, 3, 4)...
What is going on?
The dimensions:
0: 2
1: 3
2: 4
3: 5
first transpose (0,3,2,1) -> dims=[2,5,4,3]
Second transpose (0,3,1,2) -> dims=[2,5,3,4]
What's happening is that numpy is doing it's job, you're just feeding wrong shape, what you want is np.transpose(a, (0, 2, 3, 1))