QGIS - Retrieve start and end features from a line - line

I'm creating lines by selecting two features from various layers. When I create a line a form pops up. In this form I want to display data from the start and end features of the line.
What I'm currently doing is retrieving the vertices as point :
geom = feature.geometry ()
line = geom.asPolyline ()
pointFather = ligne[0]
pointChild = ligne[-1]
then I get the coordinates of each point :
xf = pointFather.x()
yf = pointFather.y()
and then I look into each possible layer to find the features with the same coordinates, just to retrieve the features I just clicked on !
for layer in layerList:
provider = layer.dataProvider()
iter = provider.getFeatures()
for feature in iter:
geom = feature.geometry().asPoint()
if geom.x() == xf and geom.y() == yf:
It must be something easier to do to directly retrieve the start and end features, isn't it ?
EDIT 1 :
here is my try after PCamargo first answer :
def retrieve_feature_from_xy(geom,point,layerList):
for layer in layerList:
index = QgsSpatialIndex()
iter = layer.getFeatures()
for feat in iter:
index.insertFeature(feat)
ids = index.intersects(geom.boundingBox())
request = QgsFeatureRequest()
request.setFilterFids(ids)
iter = layer.getFeatures(request)
for feat in iter:
geom2 = feat.geometry().asPoint()
if geom2.x() == point.x() and geom2.y() == point.y():
return feat
EDIT 2 :
Here is my try after PCamargo second comment :
def retrieve_feature_from_xy2(geom,point,layerList):
allfeatures = {}
indexes=[]
ids=[]
for layer in layerList:
index = QgsSpatialIndex()
iter = layer.getFeatures()
for feat in iter:
index.insertFeature(feat)
allfeatures[feat.id()]=feat
indexes.append(index)
for index in indexes:
intersect_ids = index.intersects(geom.boundingBox())
ids.append(intersect_ids)
for id in ids:
for i in id:
feat=allfeatures[i]
geom2=feat.geometry().asPoint()
if geom2.x() == point.x() and geom2.y() == point.y():
return feat
EDIT 3
Here is my try after PCamargo third comment :
def retrieve_feature_from_xy3(geom,point,layerList):
allfeatures = {}
indexes=[]
ids=[]
indexDict = {}
intersectsIdsDict = {}
for layer in layerList:
index = QgsSpatialIndex()
iter = layer.getFeatures()
for feat in iter:
index.insertFeature(feat)
allfeatures[layer,feat.id()]=feat
indexes.append(index)
indexDict[layer]=index
for layer, index in indexDict.items():
intersectsIds = index.intersects(geom.boundingBox())
intersectsIdsDict[layer]=intersectsIds
for layer, index in intersectsIdsDict.items():
for id in index:
feat = allfeatures[layer,id]
geom2=feat.geometry().asPoint()
if geom2.x() == point.x() and geom2.y() == point.y():
return feat

Chris,
You can definitely improve the look for similar coordinates (Third part of the code).
Instead of looping through all features in each layer, create a spatial index (https://docs.qgis.org/2.2/en/docs/pyqgis_developer_cookbook/vector.html#using-spatial-index) for each link and use nearestNeighbor.
It would be something like this:
#You only need to create these indices once
indexes=[]
for layer in layerlist:
index = QgsSpatialIndex()
for feat in layer:
index.insertFeature(feat)
indexes.append(index)
Now that we have the indexes, we can use faster geographic search.
geom = feature.geometry ()
for index in indexes:
intersect_ids = index.intersects(geom.boundingBox())
intersect_ids is a smaller list of features that are candidates to be equivalent, so you can compare only these features with the feature you selected.
You need to organize this a bit more, but that is the idea

Related

how do i add beam search in inference function tensorflow model

I'm having a hard time adding the beam search to this function.
the initial search was always taking the max probablity in each position ( greedy search), now that i'm trying to add a loop to generate K output it is complicated , i could use some help.
***here is the link to the model i'm using the exacte same :
https://github.com/syedshahzadraza/Encoder-Decoder-Model-with-Attention/blob/master/machine_translation_french_english.ipynb
here is the function for the greedy search :
def evaluate(sentence):
sentence = preprocess_sentence(sentence)
inputs = [inp_lang.word_index[i] for i in sentence.split(' ')]
inputs = tf.keras.preprocessing.sequence.pad_sequences([inputs],
maxlen=max_length_inp,
padding='post')
inputs = tf.convert_to_tensor(inputs)
result = ''
hidden = [tf.zeros((1, units))]
enc_out, enc_hidden = encoder(inputs, hidden)
dec_hidden = enc_hidden
dec_input = tf.expand_dims([targ_lang.word_index['<start>']], 0)
for t in range(max_length_targ):
predictions, dec_hidden, attention_weights = decoder(dec_input,
dec_hidden,
enc_out)
# storing the attention weights to plot later on
attention_weights = tf.reshape(attention_weights, (-1, ))
attention_plot[t] = attention_weights.numpy()
predicted_id = tf.argmax(predictions[0]).numpy()
result += targ_lang.index_word[predicted_id] + ' '
if targ_lang.index_word[predicted_id] == '<end>':
return result, sentence, attention_plot
# the predicted ID is fed back into the model
dec_input = tf.expand_dims([predicted_id], 0)
return result, sentence

How to show the class distribution in Dataset object in Tensorflow

I am working on a multi-class classification task using my own images.
filenames = [] # a list of filenames
labels = [] # a list of labels corresponding to the filenames
full_ds = tf.data.Dataset.from_tensor_slices((filenames, labels))
This full dataset will be shuffled and split into train, valid and test dataset
full_ds_size = len(filenames)
full_ds = full_ds.shuffle(buffer_size=full_ds_size*2, seed=128) # seed is used for reproducibility
train_ds_size = int(0.64 * full_ds_size)
valid_ds_size = int(0.16 * full_ds_size)
train_ds = full_ds.take(train_ds_size)
remaining = full_ds.skip(train_ds_size)
valid_ds = remaining.take(valid_ds_size)
test_ds = remaining.skip(valid_ds_size)
Now I am struggling to understand how each class is distributed in train_ds, valid_ds and test_ds. An ugly solution is to iterate all the element in the dataset and count the occurrence of each class. Is there any better way to solve it?
My ugly solution:
def get_class_distribution(dataset):
class_distribution = {}
for element in dataset.as_numpy_iterator():
label = element[1]
if label in class_distribution.keys():
class_distribution[label] += 1
else:
class_distribution[label] = 0
# sort dict by key
class_distribution = collections.OrderedDict(sorted(class_distribution.items()))
return class_distribution
train_ds_class_dist = get_class_distribution(train_ds)
valid_ds_class_dist = get_class_distribution(valid_ds)
test_ds_class_dist = get_class_distribution(test_ds)
print(train_ds_class_dist)
print(valid_ds_class_dist)
print(test_ds_class_dist)
The answer below assumes:
there are five classes.
labels are integers from 0 to 4.
It can be modified to suit your needs.
Define a counter function:
def count_class(counts, batch, num_classes=5):
labels = batch['label']
for i in range(num_classes):
cc = tf.cast(labels == i, tf.int32)
counts[i] += tf.reduce_sum(cc)
return counts
Use the reduce operation:
initial_state = dict((i, 0) for i in range(5))
counts = train_ds.reduce(initial_state=initial_state,
reduce_func=count_class)
print([(k, v.numpy()) for k, v in counts.items()])
A solution inspired by user650654 's answer, only using TensorFlow primitives (with tf.unique_with_counts instead of for loop):
In theory, this should have better performance and scale better to large datasets, batches or class count.
num_classes = 5
#tf.function
def count_class(counts, batch):
y, _, c = tf.unique_with_counts(batch[1])
return tf.tensor_scatter_nd_add(counts, tf.expand_dims(y, axis=1), c)
counts = train_ds.reduce(
initial_state=tf.zeros(num_classes, tf.int32),
reduce_func=count_class)
print(counts.numpy())
Similar and simpler version with numpy that actually had better performances for my simple use-case:
count = np.zeros(num_classes, dtype=np.int32)
for _, labels in train_ds:
y, _, c = tf.unique_with_counts(labels)
count[y.numpy()] += c.numpy()
print(count)

Is the numpy sum method superfluous in this code?

I am reading a book, and found an error as below:
def relu(x):
return (x>0)*x
def relu2dev(x):
return (x>0)
street_lights = np.array([[1,0,1],[0,1,1],[0,0,1],[1,1,1]])
walk_stop = np.array([[1,1,0,0]]).T
alpha = 0.2
hidden_size = 4
weights_0_1 = 2*np.random.random((3,hidden_size))-1
weights_1_2 = 2*np.random.random((hidden_size,1))-1
for it in range(60):
layer_2_error = 0;
for i in range(len(street_lights)):
layer_0 = street_lights[i:i+1]
layer_1 = relu(np.dot(layer_0,weights_0_1))
layer_2 = np.dot(layer_1,weights_1_2)
layer_2_delta = (layer_2-walk_stop[i:i+1])
# -> layer_2_delta's shape is (1,1), so why np.sum?
layer_2_error += np.sum((layer_2_delta)**2)
layer_1_delta = layer_2_delta.dot(weights_1_2.T) * relu2dev(layer_1)
weights_1_2 -= alpha * layer_1.T.dot(layer_2_delta)
weights_0_1 -= alpha * layer_0.T.dot(layer_1_delta)
if(it % 10 == 9):
print("Error: " + str(layer_2_error))
The error place is commented with # ->:
layer_2_delta's shape is (1,1), so why would one use np.sum? I think np.sum can be removed, but not quite sure, since it comes from a book.
As you say, layer_2_delta has a shape of (1,1). This means it is a 2 dimensional array with one element: layer_2_delta = np.array([[X]]). However, layer_2_error is a scalar. So you can get the scalar from the array by either selecting the value at the first index (layer_2_delta[0,0]) or by summing all the elements (which in this case is just the one). As the book seems to use "sum of square errors", it seems natural to keep the notation which is square each element in array and then add all of these up (for instruction purposes): this would be more general (e.g., to cases where the layer has more than one element) than the index approach. But you're right, there could be other ways to do this :).

How to map different indices in Pyomo?

I am a new Pyomo/Python user. Now I need to formulate one set of constraints with index 'n', where all of the 3 components are with different indices but correlate with index 'n'. I am just curious that how I can map the relationship between these sets.
In my case, I read csv files in which their indices are related to 'n' to generate my set. For example: a1.n1, a2.n3, a3.n5 /// b1.n2, b2.n4, b3.n6, b4.n7 /// c1.n1, c2.n2, c3.n4, c4.n6 ///. The constraint expression of index n1 and n2 is the follows for example:
for n1: P(a1.n1) + L(c1.n1) == D(n1)
for n2: - F(b1.n2) + L(c2.n2) == D(n2)
Now let's go the coding. The set creating codes are as follow, they are within a class:
import pyomo
import pandas
import pyomo.opt
import pyomo.environ as pe
class MyModel:
def __init__(self, Afile, Bfile, Cfile):
self.A_data = pandas.read_csv(Afile)
self.A_data.set_index(['a'], inplace = True)
self.A_data.sort_index(inplace = True)
self.A_set = self.A_data.index.unique()
... ...
Then I tried to map the relationship in the constraint construction like follows:
def createModel(self):
self.m = pe.ConcreteModel()
self.m.A_set = pe.Set( initialize = self.A_set )
def obj_rule(m):
return ...
self.m.OBJ = pe.Objective(rule = obj_rule, sense = pe.minimize)
def constr(m, n)
As = self.A_data.reset_index()
Amap = As[ As['n'] == n ]['a']
Bs = self.B_data.reset_index()
Bmap = Bs[ Bs['n'] == n ]['b']
Cs = self.C_data.reset_index()
Cmap = Cs[ Cs['n'] == n ]['c']
return sum(m.P[(p,n)] for p in Amap) - sum(m.F[(s,n)] for s in Bmap) + sum(m.L[(r,n)] for r in Cmap) == self.D_data.ix[n, 'D']
self.m.cons = pe.Constraint(self.m.D_set, rule = constr)
def solve(self):
... ...
Finally, the error raises when I run this:
KeyError: "Index '(1, 1)' is not valid for indexed component 'P'"
I know it is the wrong way, so I am wondering if there is a good way to map their relationships. Thanks in advance!
Gabriel
I just forgot to post my answer to my own question when I solved this one week ago. The key thing towards this problem is setting up a map index.
Let me just modify the code in the question. Firstly, we need to modify the dataframe to include the information of the mapped indices. Then, the set for the mapped index can be constructed, taking 2 mapped indices as example:
self.m.A_set = pe.Set( initialize = self.A_set, dimen = 2 )
The names of the two mapped indices are 'alpha' and 'beta' respectively. Then the constraint can be formulated, based on the variables declared at the beginning:
def constr(m, n)
Amap = self.A_data[ self.A_data['alpha'] == n ]['beta']
Bmap = self.B_data[ self.B_data['alpha'] == n ]['beta']
return sum(m.P[(i,n)] for i in Amap) + sum(m.L[(r,n)] for r in Bmap) == D.loc[n, 'D']
m.TravelingBal = pe.Constraint(m.A_set, rule = constr)
The summation groups all associated B to A with a mapped index set.

Best Subset regression on test sample (after k-fold)

I am running a best subset regression analyses in R studio. I am using the following libraries:
library(foreign)
library(glmnet)
library(caTools)
library(leaps)
library(ISLR)
library(knitr)
library(ggvis)
I want to split my sample into three samples: training, cross-validation, and test (maybe 50%, 30%, 20%).
I have successfully ran best subset on our training and cross-validated those results with the following script:
k = 10
set.seed(1)
folds = sample(1:k,nrow(best_demo_train),replace=TRUE)
table(folds)
cv.errors=matrix(NA,k,5, dimnames=list(NULL, paste(1:5)))
predict.regsubsets = function(object, newdata, id, ...) {
form = as.formula(object$call[[2]])
mat = model.matrix(form, newdata)
coefi = coef(object, id = id)
mat[, names(coefi)] %*% coefi
}
for(j in 1:k){
best.fit = regsubsets(selfaware ~., data=best_demo_train[folds != j,])
for (i in 1:5){
pred = predict(best.fit, best_demo_train[folds == j, ], id = i)
cv.errors[j, i] = mean((best_demo_train$selfaware[folds == j] - pred)^2)
}
}
mean.cv.errors=apply(cv.errors,2,mean)
mean.cv.errors
which.min(mean.cv.errors)
par(mfrow=c(1,1))
plot(mean.cv.errors,type='b')
points(which.min(mean.cv.errors),mean.cv.errors[which.min(mean.cv.errors)],
col="red",cex=2,pch=20)
reg.best=regsubsets (selfaware~.,data=best_demo_train)
coef(reg.best ,3)
reg.summary=summary(reg.best ,3)
reg.summary$adjr2
So, once I have the "best" variables, I would like to "test" that model on 20% of the data. Can someone help me out with this? I do not know what the script would be to test this model and have been unsuccessful searching online.
Thank you, I appreciate it.
Sarah