Cite in LaTeX is not working in TexStudio - pdf

i have the issue that when i want to quote anything in LaTeX the PDF shows only the Name of the Source. in this Case it is Berk.2019
My Document (which i maded shorter as an example):
\documentclass[12pt,a4paper]{article}
\usepackage{amsmath,amssymb}
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\usepackage[ngerman]{babel}
\usepackage[backend=biber, style=ieee]{biblatex}
\addbibresource{literatur.bib}
\title{Musterdokument}
\author{Manfred Muster}
\date{\today}
\begin{document}
Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. \cite{Berk.2019}
\newpage
\printbibliography
\end{document}
And the .bib file:
% This file was created with Citavi 6.14.0.0
#book{Berk.2019,
abstract = {Intro -- Preface -- Contents -- 1 Getting Started -- 1.1 Some Introductory Caveats -- 1.2 What Criminal Justice Risk Assessment Is and Is Not -- 1.3 Difficulties Defining Machine Learning -- 1.4 Setting the Stage -- 1.4.1 A Brief Motivating Example -- 2 Some Important Background Material -- 2.1 Policy Considerations -- 2.1.1 Criminal Justice Risk Assessment Goals -- 2.1.2 Decisions to Be Informed by the Forecasts -- 2.1.3 Outcomes to Be Forecasted -- 2.1.4 Real World Constraints -- 2.1.5 Stakeholders -- 2.2 Data Considerations -- 2.2.1 Stability Over Time -- 2.2.2 Training Data, Evaluation Data, and Test Data -- 2.2.3 Representative Data -- 2.2.4 Large Samples -- 2.2.5 Data Continuity -- 2.2.6 Missing Data -- 2.3 Statistical Considerations -- 2.3.1 Actuarial Methods -- 2.3.2 Building in the Relative Costs of Forecasting Errors -- 2.3.3 Effective Forecasting Algorithms -- 3 A Conceptual Introduction to Classification and Forecasting -- 3.1 Populations and Samples -- 3.2 Classification and Forecasting Using Decision Boundaries -- 3.3 Classification by Data Partitions -- 3.4 Forecasting by Data Partitions -- 3.5 Finding Good Data Partitions -- 3.6 Enter Asymmetric Costs -- 3.7 Recursive Partitioning Classification Trees -- 3.7.1 How Many Terminal Nodes? -- 3.7.2 Classification Tree Instability and Adaptive Fitting -- 4 A More Formal Treatment of Classification and Forecasting -- 4.1 Introduction -- 4.2 What Is Being Estimated? -- 4.3 Data Generation Formulations -- 4.4 Notation -- 4.5 From Probabilities to Classification -- 4.6 Computing (G|X) in the Real World -- 4.6.1 Estimation Bias -- 4.6.2 The Traditional Bias-Variance Tradeoff with Extensions -- 4.6.3 Addressing Uncertainty -- 4.7 A Bit More on the Joint Probability Model -- 4.8 Putting It All Together -- 5 Tree-Based Forecasting Methods -- 5.1 Introduction.},
author = {Berk, Richard},
year = {2019},
title = {Machine Learning Risk Assessments in Criminal Justice Settings},
url = {https://ebookcentral.proquest.com/lib/kxp/detail.action?docID=5615389},
address = {Cham},
publisher = {Springer},
isbn = {978-3-030-02272-3}
}
Do you know what is wrong?
I compile it with TeXStudio and Biber. Let me know if you need any further information

Related

Setting up a Keras model get the correct output tensor shape of an embedded text, LSTM, autoencoder

I'm trying to put together an LSTM autoencoder where the layers (and shapes returned) are:
Embedding -> LSTM -> RepeatVector -> LSTM -> TimeDistributed(Dense).
[6, 18, 8] -> [6, 64] -> [6, 18, 64] -> [6, 18, 64] -> [6, 18, 1]
The last layer is where I'm running into problems and cannot seem to find an answer online. I'm looking for something that would return the same dimensions as the Embedding layer (since I'm trying to make an autoencoder) where I can undo the embedding to get the predicted sequence of words back. Maybe I'm misunderstanding something here..?
Below is an example of what I've tried.
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, LSTM, Dense, RepeatVector, TimeDistributed
dummy_docs = [
"Lorem Ipsum is simply dummy text of the printing and typesetting industry.",
"Lorem Ipsum has been the industry's standard dummy text ever since the 1500s.",
"When an unknown printer took a galley of type and scrambled it to make a type specimen book.",
"It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged.",
"It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages.",
"More recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum."
]
tokenizer = Tokenizer(
num_words=None,
filters='!"#$%&()*+,-./:;<=>?#[\\]^_`{|}~\t\n',
lower=True,
split=" ",
oov_token="<OOV>"
)
tokenizer.fit_on_texts(dummy_docs)
sequences = pad_sequences(
tokenizer.texts_to_sequences(dummy_docs),
padding='post'
)
model = Sequential(
[
Embedding(
input_dim=len(tokenizer.word_index) + 1,
output_dim=8,
input_shape=(sequences.shape[1],)
),
LSTM(8, activation='tanh'), # return_sequences..?
RepeatVector(sequences.shape[1]),
LSTM(8, activation='tanh', return_sequences=True), # return_sequences..?
TimeDistributed(Dense(1))
]
)
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.summary()
model.fit(sequences, sequences, epochs=30, batch_size=1)
y = model.predict(sequences)
When I run the fitted model on the sequences to obtain a prediction, I'm expecting a 6x18 matrix but I get a single vector of length 18. I'm thinking Dense(1) needs to be something else, but anything other than 1 causes mismacthes in the tensor shapes and then the whole model fails. Should I rather be trying a one-hot-encoding input here? Any help will be appreciated.
The solution was to remove the TimeDistirbuted layer from the model and use the embedding dimension in the final Dense() layer.
Also, I moved the details of the Embedding layer outside of Sequential to use the same embedding details to get the y for model.fit(X, y, ...).
What remains now is to work the predicted embeddings back to the original word sequences, but that is outside of the scope of the original question.
Note, this is not intended for use on sentences, but to use to (hopefully) clean text data (names) where different users entered the same site name in different ways, so regex is not an option because there is no fixed pattern to follow. Some work is still needed to get to the final solution.
Below is the updated section of the code.
sequences = pad_sequences(
tokenizer.texts_to_sequences(dummy_docs),
padding='post'
)
embedding_dim = 8
embedding_layer = Embedding(
input_dim=len(tokenizer.word_index) + 1,
output_dim=embedding_dim,
input_shape=(sequences.shape[1],)
)
model = Sequential(
[
embedding_layer,
LSTM(6, activation='tanh'),
RepeatVector(sequences.shape[1]),
LSTM(6, activation='tanh', return_sequences=True),
Dense(embedding_dim)
]
)
model.compile(loss='mse', optimizer='adam', metrics=['accuracy'])
model.summary()
y = embedding_layer(sequences)
model.fit(sequences, y, epochs=10, batch_size=1)
y_hat = model.predict(sequences)

Matterport Mask RCNN TRAIN_ROI_PER_IMAGE explanation?

I have been trying to train a breast cancer segmentation model with mask rcnn. I have been able to understand almost all the hyperparameter but this one variable TRAIN_ROI_PER_IMAGE I just can't seem to wrap my head around it and there's little to no documentation available for it.
If anyone could please explain it to me, it would be super helpful for my research.
TRAIN_ROI_PER_IMAGE - means how many Region of Interest or ROI proposals will be fed to the mask head or the classifier.
img_src : Ren et al., 2016
Concretely, This setting is like the batch size for the second stage of the model.

float16 vs float32 for convolutional neural networks

The standard is float32 but I'm wondering under what conditions it's ok to use float16?
I've compared running the same covnet with both datatypes and haven't noticed any issues. With large dataset I prefer float16 because I can worry less about memory issues..
Surprisingly, it's totally OK to use 16 bits, even not just for fun, but in production as well. For example, in this video Jeff Dean talks about 16-bit calculations at Google, around 52:00. A quote from the slides:
Neural net training very tolerant of reduced precision
Since GPU memory is the main bottleneck in ML computation, there has been a lot of research on precision reduction. E.g.
Gupta at al paper "Deep Learning with Limited Numerical Precision" about fixed (not floating) 16-bit training but with stochastic rounding.
Courbariaux at al "Training Deep Neural Networks with Low Precision Multiplications" about 10-bit activations and 12-bit parameter updates.
And this is not the limit. Courbariaux et al, "BinaryNet: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1". Here they discuss 1-bit activations and weights (though higher precision for the gradients), which makes the forward pass super fast.
Of course, I can imagine some networks may require high precision for training, but I would recommend at least to try 16 bits for training a big network and switch to 32 bits if it proves to work worse.
float16 training is tricky: your model might not converge when using standard float16, but float16 does save memory, and is also faster if you are using the latest Volta GPUs. Nvidia recommends "Mixed Precision Training" in the latest doc and paper.
To better use float16, you need to manually and carefully choose the loss_scale. If loss_scale is too large, you may get NANs and INFs; if loss_scale is too small, the model might not converge. Unfortunately, there is no common loss_scale for all models, so you have to choose it carefully for your specific model.
If you just want to reduce the memory usage, you could also try tf. to_bfloat16, which might converge better.
According to this study:
Gupta, S., Agrawal, A., Gopalakrishnan, K., & Narayanan, P. (2015,
June). Deep learning with limited numerical precision. In
International Conference on Machine Learning (pp. 1737-1746). At:
https://arxiv.org/pdf/1502.02551.pdf
stochastic rounding was required to obtain convergence when using half-point floating precision (float16); however, when that rounding technique was used, they claimed to get very good results.
Here's a relevant quotation from that paper:
"A recent work (Chen et al., 2014) presents a hardware accelerator
for deep neural network training that employs
fixed-point computation units, but finds it necessary to
use 32-bit fixed-point representation to achieve convergence
while training a convolutional neural network on
the MNIST dataset. In contrast, our results show that
it is possible to train these networks using only 16-bit
fixed-point numbers, so long as stochastic rounding is used
during fixed-point computations."
For reference, here's the citation for Chen at al., 2014:
Chen, Y., Luo, T., Liu, S., Zhang, S., He, L., Wang, J., ... & Temam,
O. (2014, December). Dadiannao: A machine-learning supercomputer. In
Proceedings of the 47th Annual IEEE/ACM International Symposium on
Microarchitecture (pp. 609-622). IEEE Computer Society. At:
http://ieeexplore.ieee.org/document/7011421/?part=1

Attention in Tensorflow (tf.contrib.rnn.AttentionCellWrapper)

How exactly is tf.contrib.rnn.AttentionCellWrapper used? Can someone give a piece of example code?
Specifically, I only managed to make the following
fwd_cell = tf.contrib.rnn.AttentionCellWrapper(tf.contrib.rnn.BasicLSTMCell(hidden_size),10,50,state_is_tuple=True)
bwd_cell = tf.contrib.rnn.AttentionCellWrapper(tf.contrib.rnn.BasicLSTMCell(hidden_size),10,50,state_is_tuple=True)
but in Bahdanau et al. 2015, the attention operates on the entire bidirectional RNN. I have no idea how to code that in Tensorflow.

How to implement sentence-level log-likelihood in tensorflow?

I want to implement the sentence-level log-likelihood as described in
Collobert et al., p. 14.
To compute transition scores, I could use CRF, but I don't know how to integrate it in tensorflow. I thought about using
tf.contrib.crf.CrfForwardRnnCell to compute transition scores, but this class returns a pair of [batch_size, num_tags] matrices values containing the new alpha values and not as I would expect one [batch_size, num_tags, num_tags] tensor.
Does anyone has an example how to use CRF in tensorflow? Thank you!
A good example of using contrib.crf in TensorFlow is given here: https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/crf
It's worth noting that the SLL objective described in the paper Collobert et al. 2011 is slightly different than the CRF objective in that SLL lacks normalization (see Remark 4 on p. 16), but this shouldn't really matter in practice (I'd just use the CRF.)