Parameterize the number of iterations in the Gremlin pageRank step - datastax

As far as I understand gremlin pageRank step (g.V().pageRank()) runs a PageRank vertex program with 30 iterations.
Is it possible to change the max number of iterations?

This should work:
g.V().pageRank().times(5)
There's an example of this in the Apache TinkerPop docs:
gremlin> g.V().pageRank().by('pageRank').times(5).order().by('pageRank').valueMap()
==>[pageRank:[0.15000000000000002],name:[marko],age:[29]]
==>[pageRank:[0.15000000000000002],name:[peter],age:[35]]
==>[pageRank:[0.19250000000000003],name:[vadas],age:[27]]
==>[pageRank:[0.19250000000000003],name:[josh],age:[32]]
==>[pageRank:[0.23181250000000003],name:[ripple],lang:[java]]
==>[pageRank:[0.4018125],name:[lop],lang:[java]]

Related

Optimize batch transform inference on sagemaker

With current batch transform inference I see a lot of bottlenecks,
Each input file can only have close to 1000 records
Currently it is processing 2000/min records on 1 instance of ml.g4dn.12xlarge
GPU instance are not necessarily giving any advantage over cpu instance.
I wonder if this is the existing limitation of the currently available tensorflow serving container v2.8. If thats the case config should I play with to increase the performance
i tried changing max_concurrent_transforms but doesn't seem to really help
my current config
transformer = tensorflow_serving_model.transformer(
instance_count=1,
instance_type="ml.g4dn.12xlarge",
max_concurrent_transforms=0,
output_path=output_data_path,
)
transformer.transform(
data=input_data_path,
split_type='Line',
content_type="text/csv",
job_name = job_name + datetime.now().strftime("%m-%d-%Y-%H-%M-%S"),
)
Generally speaking, you should first have a performing model (steps 1+2 below) yielding a satisfactory TPS, before you move over to batch transform parallelization techniques to push your overall TPS higher with parallization nobs.
Steps:
GPU enabling - Run manual test to see that your model can utilize GPU instances to begin with (this isn't related to batch transform).
picking instance - Use SageMaker Inference recommender to find the the most cost/effective instance type to run inference on.
Batch transform inputs - Sounds like you have multiple input files which is needed if you'll want to speed up the job by adding more instances.
Batch Transform Job single instance noobs - If you are using the CreateTransformJob API, you can reduce the time it takes to complete batch transform jobs by using optimal values for parameters such as MaxPayloadInMB, MaxConcurrentTransforms, or BatchStrategy. The ideal value for MaxConcurrentTransforms is equal to the number of compute workers in the batch transform job. If you are using the SageMaker console, you can specify these optimal parameter values in the Additional configuration section of the Batch transform job configuration page. SageMaker automatically finds the optimal parameter settings for built-in algorithms. For custom algorithms, provide these values through an execution-parameters endpoint.
Batch transform cluster size - Increase the instance_count to more than 1, using the cost/effective instance you found in (1)+(2).

Train / Test split % for Object Detection - what's the current recommendation?

Using the Tensorflow Object Detection API, what's the current recommendation / best practice around the train / test split percentage for labeled examples? I've seen a lot of conflicting info, anywhere from 70/30 to 95/5. Any recent real world experience is appreciated.
Traditional advice is ~70-75% training and the rest test data. More recent articles indeed suggest a different split. I read 95/2.5/2.5 (train / test / dev for hyperparameter tuning) a lot these days.
I guess your optimal split depends on the amount of available data and the bias/variance characteristics. Poor performance on training data may be caused by underfitting and need more training data. If your model is fitting well or even overfitting, you should be able to allocate some of the training data away to test data.
If you're stuck in the middle, you may also consider cross validation as a computationally expensive but data friendly option.
It depends on the size of the dataset as Andrew Ng suggests:
(train/ dev or Val /test)
If the size of the dataset is 100 to 10K ~ 60/20/20
If the size of the dataset is 1M to INF ==> 98/1/1 or 99.5/0.25/0.25
Note that these are not fixed and just suggestions.
The goal of the test set mentioned here is to give you an unbiased performance measurement of your work. In some works, it is OK not to have only two sets set (then they will call it train/test, though test set here is actually working as dev set ratio can be 70/30 )

Why cifar10 benchmark shows slow performance on every n x 100 step?

I have tried to get performance comparison result between source built and google provided .whl files for tensorflow-gpu runs. I have tried more than tens of bench mark tests, and I always get slow performance on every n x 100 step like 0, 100, 200, .... I cannot figure out the reason. Who, one of you, expert of tensorflow, can answer for me?
I am running ubuntu(18.04). fedora(27, 28), Windows, and CUDA 9.0/9.1/9.2
I've tested with tf1.6, 1.7, 1.8, 1.9.
My GPU is 1080ti/11GB.
My cpu is intel 4690k with 32G dram.
attached one sample
.
Tnank you very much in advance.
Dae-Chul Jo
dcjo00#gmail.com
It could be for some different reasons:
Every 100 steps you are saving the model
Every 100 steps you are testing validation data
Every 100 steps you are saving logs to tensorboard
These are my first guesses in order of probability, if you provide code I could study it more deeply.
Hope it helps! :)
EDIT: it ended up being:
tf.train.MonitoredTrainingSession has a default of saving summaries every 100 steps. Which was proposal 3.

In a dynamic_rnn do I need to bucket samples in order to balance batches?

I am implementing a bidirectional dynamic rnn. Now I face the question whether I need to bucket my training samples.
My thought (and fear) is that if I don't bucket I might face the following situation: In a batch with 32 samples and maybe all but one samples being below 500 characters long and one samples being say 10.000 characters long the backprop will behave basically as if I had only a batch size of 1 and might result in NANs quickly or throw off the learned weights pretty badly every time that situation occurs.
Any experiences before I write code and check for days of training and debugging? Thx

Word2Vec: Any way to train model fastly?

I use Gensim Word2Vec to train word sets in my database.
I have about 400,000 phrase(Each phrase is short. Total 700MB) in my PostgreSQL database.
This is how I train these data using Django ORM:
post_vector_list = []
for post in Post.objects.all():
post_vector = my_tokenizer(post.category.name)
post_vector.extend(my_tokenizer(post.title))
post_vector.extend(my_tokenizer(post.contents))
post_vector_list.append(post_vector)
word2vec_model = gensim.models.Word2Vec(post_vector_list, window=10, min_count=2, size=300)
But this job getting a lot of time and feels like not efficient.
Especially, creating post_vector_list part took a lot of time and space..
I want to improve speed of training but have no idea how to do.
Want to get your advices. Thanks.
To optimize such code, you need to collect good information about where the time is spent.
Is most of the time spent preparing post_vector_list?
If so, you will want to make sure my_tokenizer (whose code is not shown) is as efficient as possible. You may want to try to minimize the number of extend()s and append()s that are done on large lists. You might have to even take a look at your DB's configuration or options to speed up the DB-to-Object mapping started inside Post.objects.all().
Is most of the time spent in the call to Word2Vec()?
If so, other steps may help:
ensure you're using gensim's Cython-optimized routines – if not, you should be seeing a logged warning (and training will be up to 100X slower)
consider using a workers=4 or workers=8 optional argument to use more threads, if your machine has at least 4 or 8 CPU cores
consider using a larger min_count, which speeds training somewhat (and since vectors for words where there are only a few examples typically aren't very good anyway, doesn't lose much and can even improve the quality of the surviving words)
consider using a smaller window, since training takes longer for larger windows
consider using a smaller vector_size (previously called size), since training takes longer for larger-size vectors
consider using a more-aggressive (smaller) value for the optional sample argument, which randomly skips more of the most-frequent words. The default is 1e-04, but values of 1e-05 or 1e-06 (especially on larger corpuses) can offer additional speedup, and even often improve the final vectors (by spending relatively less training time on words with an excess of usage examples)
consider using a lower-than-default (5) value for the optional epochs parameter (previously called iter). (I wouldn't recommend this unless the corpus is very large – so it already has many redundant, equally-good examples of the same words throughout.)
you could use a python generator instead of loading all the data into the list. Gensim works with python generators too. The code will look something like this
class Post_Vectors(object):
def __init__(self, Post):
self.Post = Post
def __iter__(self):
for post in Post.objects.all():
post_vector = my_tokenizer(post.category.name)
post_vector.extend(my_tokenizer(post.title))
post_vector.extend(my_tokenizer(post.contents))
yield post_vector
post_vectors = Post_Vectors(Post)
word2vec_model = gensim.models.Word2Vec(post_vectors, window=10, min_count=2, size=300, workers=??)
For the gensim speedup, if you have a multi-core CPU, you could use the workers parameter. (By default it is 3)