I've been trying to learn tensorflow and machine learning and this article was one of the first tutorials I've stumbled onto: https://medium.com/towards-data-science/tensorflow-for-absolute-beginners-28c1544fb0d6. I stepped through the code and thought I understood the vast majority of it but then I got to the final output which was a set of 3 numbers, the weights. How are these weights supposed to be used? That is, how would I put this result to use in a real world scenario?
Weights are what you are trying to optimize.
The goal is to find a set of weights, that when given a set of inputs will ouput the right answer.
In this case, you have 1 (True) and -1 (False) inputs and a bias that is always one. The goal is to learn the AND function. The function should return 1 (True) only when both inputs are 1 (True), -1 (False) otherwise.
When given a new input [1, -1, 1] (bias is always one in this case), the function will multiply these inputs by the weights you computed earlier and sum the result. If the result of this is greater than 0 it will output 1, if not, it will ouput -1
Related
I am trying to create a search relevance model where I take the dot product between query vector and resulting documents. I add a positional bias term on top to take into account the fact that position 1 is more likely to be clicked on. The final (unnormalised) log likelihood calculation is as follows:
query = self.query_model(query_input_ids, query_attention_mask)
docs = self.doc_model(doc_input_ids, doc_attention_mask)
positional_bias = self.position_model()
if optimizer_idx is not None:
if optimizer_idx == 0:
docs = docs.detach()
positional_bias = positional_bias.clone().detach()
elif optimizer_idx == 1:
query = query.detach()
positional_bias = positional_bias.clone().detach()
else:
query = query.detach()
docs = docs.detach()
similarity = (docs # query.unsqueeze(-1)).squeeze()
click_log_lik = (similarity + positional_bias)\
.reshape(doc_mask.shape)\
.masked_fill_((1 - doc_mask).bool(), float("-inf"))
The query and doc model is simply a distilbert model with a projection layer on top of CLS token. The models can be seen here: https://pastebin.com/g21g9MG3
When inspecting the first gradient descent step, it has nans, but only for the query model and not the doc model. My hypothesis is that normalizing the return values for doc and query models (return F.normalize(out, dim=-1)) is somehow playing up with the gradients.
Does anyone know 1. If my hypothesis is true and more importantly 2. How can I rectify nan gradients?.
Additional Info:
None of the losses are inf or nan.
query is BS x 768
docs is BS x DOC_RESULTS x 768
positional_bias is DOC_RESULTS
DOC_RESULTS is 10 in my case.
The masked_fill in the last line is because occasionally I have less than 10 data points for a query.
Update 1
The following changes made no difference to nans:
Changing masked_fill from -inf to 1e5.
Changing the projection from F.normalize(out, dim=-1) to out / 100.
Removed positional bias altogether with again no luck.
If it helps anyone, and you come across this while using Transformers this is what I did:
So in the end the bug was due to the fact that I was masking away nan's. Since I had some documents with zero length, the output of the transformer was nan. I was hoping that masked_fill would fix this problem, but it doesn't. The solution in my case was to only put non-zero length sequences through transformers, and then append with zeros to fill the batch size.
Thank you for reading. I'm not good at English.
I am wondering how to predict and get future time series data after model training. I would like to get the values after N steps.
I wonder if the time series data has been properly learned and predicted.
How i do this right get the following (next) value?
I want to get the next value using like model.predict or etc
I have x_test and x_test[-1] == t, so the meaning of the next value is t+1, t+2, .... t+n,
In this example I want to get predictions of the next t+1, t+2 ... t+n
First
I tried using stock index data
inputs = total_data[len(total_data) - forecast - look_back:]
inputs = scaler.transform(inputs)
X_test = []
for i in range(look_back, inputs.shape[0]):
X_test.append(inputs[i - look_back:i])
X_test = np.array(X_test)
predicted = model.predict(X_test)
but the result is like below
The results from X_test[-20:] and the following 20 predictions looks like same.
I'm wondering if it's the correct train and predicted value.
I'm wondering if it was a right training and predict.
full source
The method I tried first did not work correctly.
Seconds
I realized something is wrong, I tried using another official data
So, I used the time series in the Tensorflow tutorial to practice predicting the model.
a = y_val[-look_back:]
for i in range(N-step prediction): # predict a new value n times.
tmp = model.predict(a.reshape(-1, look_back, num_feature)) # predicted value
a = a[1:] # remove first
a = np.append(a, tmp) # insert predicted value
The results were predicted in a linear regression shape very differently from the real data.
Output a linear regression that is independent of the real data:
full source (After the 25th line is my code.)
I'm really very curious what is a standard method of predicting next values of a stock market.
Thank you for reading the long question. I seek advice about your priceless opinion.
Q : "How can I find a standard method of predicting next values of a stock market...?"
First - salutes to C64 practitioner!
Next, let me say, there is no standard method - there cannot be ( one ).
Principally - let me draw from your field of a shared experience - one can easily predict the near future flow of laminar fluids ( a technically "working" market instrument - is a model A, for which one can derive a better or worse predictive tool )
That will never work, however, for turbulent states of the fluids ( just read the complexity of the attempts to formulate the many-dimensional high-order PDE for a turbulence ( and it still just approximates the turbulence ) ) -- and this is the fundamentally "working" market ( after some expected fundamental factor was released ( read NFP or CPI ) or some flash-news was announced in the news - ( read a Swiss release of currency-bonding of CHF to some USD parity or Cyprus one time state tax on all speculative deposits ... the financial Big Bangs follow ... )
So, please, do not expect one, the less any simple, model for reasonably precise predictions, working for both the laminar and turbulent fluidics - the real world is for sure way more complex than this :o)
I am playing around with DeepExplainer to get shap values for deep learning models. By following some tutorials I can get some results, i.e. what variables are pushing the model prediction from the base value, which is the average model output in training set.
I have around 5,000 observations along with 70 features. The performance of DeepExplainer is quite satisfactory. And my code is:
model0 = load_model(model_p+'health0.h5')
background = healthScaler.transform(train[healthFeatures])
e = shap.DeepExplainer(model0, background)
shap_values = e.shap_values(healthScaler.transform(test[healthFeatures]))
test2 = test[healthFeatures].copy()
test2[healthFeatures] = healthScaler.transform(test[healthFeatures])
shap.force_plot(e.expected_value[0], shap_values[0][947,:], test2.iloc[947,:])
And the plot is the following:
Here the base value is 0.012 (can also be seen through e.expected_value[0]) and very close to the output value which is 0.01.
At this point I have some questions:
1) The output value is not identical to the prediction gotten through model0.predict(test[healthFeatures])[947] = -0.103 How should I assess output value?
2) As can be seen, I am using whole training set as the background to approximate conditional expectations of SHAP values. What is the difference between using random samples from training set and entire set? Is it only related to performance issue?
Many thanks in advance!
Probably too late but stil a most common question that will benefit other begginers. To answer (1), the expected and out values will be different. the expected is, as the name suggest, is the avereage over the scores predicted by your model, e.g., if it was probability then it is the average of the probabilties that your model spits. For (2), as long as the backroung values are less then 5k, it wont change much, but if > 5k then your calculations will take days to finish.
See this (lines 21-25) for more comprehensive answers.
I just get a question. I want the system to return the -1 as unknown char for the new untrained letters. For example, if I have trained 1/2/3/4,when I test the char '5' or '6', the tensorflow should return -1 as unknown char.
Is it possible?
Thanks.
I'd think for simple classifications, you're looking for anything that has less than a certain confidence/score of being a known class.
To be fair, I've only use Keras on top of TensorFlow, so YMMV.
I'd just train it on the 4 categories you know, then when it classifies if the final results if the top one has less than a certain raw score/weight (let's say it classifies an unknown 7 as a 4, but with a mediocre score) treat it as a -1.
This might not work with every loss/objection function you train your model on, but should work MSE or categorical cross entropy if you can get the raw final weight.
I'm playing around with encog 3.2 for java. From the example (http://www.heatonresearch.com/wiki/Hello_World), I make my own network with 4 neutrons in input layers and 2 neutrons in output layer.
1.0,1.0, actual=0.22018401281844316,ideal=1.0
-1.0,-1.0, actual=0.9903002141301814,ideal=0.0
Can someone explain to me how can I understand the result(actual vs ideal and those numbers before them..)
Thank you very much.
Note that at this stage, the network has been trained, and you are now in the testing stage.
The network has 2 inputs neurons and 1 output neuron.
The first two numbers in your result are given to the trained network as the inputs. Using the internal weights and biases (which are not changed during testing) it computes the result / output ... listed as actual.
ideal is what the result should be, ie the number listed in the dataset for that sample/row.
Generally when you want a 0 or 1 output (eg one of n) you will round the actual result.
So in this case the network computes
1 XOR 1 = 0.22, (rounded = 0) which is wrong (according to ideal).
-1 XOR -1 = 0.99, (rounded = 1) which is also wrong