Is there any comprehensive test framework for testing the performance of Kafka 2.12 cluster in which test done using a predefined dataset?
I looked at Jmeter and Pepper-box but it seems they don't fit my requirement
thanks in advance
Actually, both tools are used for performance testing of Kafka but Kafka also have the executable script to test the performance of Producer and Consumer.
kafka-producer-perf-test.sh
kafka-consumer-perf-test.sh
Setup
bin/kafka-topics.sh \
--zookeeper localhost:2181 \
--create \
--topic test-rep-one \
--partitions 6 \
--replication-factor 1
bin/kafka-topics.sh \
--zookeeper localhost:2181 \
--create \
--topic test \
--partitions 6 --replication-factor 3
Producer
A single thread, no replication
bin/kafka-producer-perf-test.sh \
--topic test \
--num-records 50000000 \
--record-size 100 \
--throughput -1 \
--producer-props acks=1 \
bootstrap.servers=localhost:9092 \
buffer.memory=67108864 \
batch.size=8196
Single-thread, async 3x replication
bin/kafk-topics.sh \
--zookeeper zookeeper.example.com:2181 \
--create \
--topic test \
--partitions 6 \
--replication-factor 3
bin/kafka-producer-perf-test.sh \
--topic test \
--num-records 50000000 \
--record-size 100 \
--throughput -1 \
--producer-props acks=1 \
bootstrap.servers=localhost:9092 \
buffer.memory=67108864 \
batch.size=8196
Single-thread, sync 3x replication
bin/kafka-producer-perf-test.sh \
--topic test \
--num-records 50000000 \
--record-size 100 \
--throughput -1 \
--producer-props acks=1 \
bootstrap.servers=localhost:9092 \
buffer.memory=67108864 batch.size=64000
Three Producers, 3x async replication
bin/kafka-producer-perf-test.sh \
--topic test \
--num-records 50000000 \
--record-size 100 \
--throughput -1 \
--producer-props acks=1 \
bootstrap.servers=localhost:9092 \
buffer.memory=67108864 \
batch.size=8196
Throughput Versus Stored Data
bin/kafka-producer-perf-test.sh \
--topic test \
--num-records 50000000 \
--record-size 100 \
--throughput -1 \
--producer-props acks=1 \
bootstrap.servers=localhost:9092 \
buffer.memory=67108864 batch.size=8196
Effect of message size
for i in 10 100 1000 10000 100000; do
echo ""
echo $i
bin/kafka-producer-perf-test.sh \
--topic test \
--num-records $((1000*1024*1024/$i))\
--record-size $i\
--throughput -1 \
--producer-props acks=1 \
bootstrap.servers=localhost:9092 \
buffer.memory=67108864 \
batch.size=128000
Consumer
Consumer throughput
bin/kafka-consumer-perf-test.sh \
--zookeeper localhost:2181 \
--messages 50000000 \
--topic test \
--threads 1
3 Consumers On three servers, run:
bin/kafka-consumer-perf-test.sh \
--zookeeper localhost:2181 \
--messages 50000000 \
--topic test \
--threads 1
End-to-end Latency
bin/kafka-run-class.sh \
kafka.tools.TestEndToEndLatency \
localhost:9092 \
localhost:2181 \
test 5000
Producer and consumer
bin/kafka-run-class.sh \
org.apache.kafka.tools.ProducerPerformance \
bin/kafka-producer-perf-test.sh \
--topic test \
--num-records 50000000 \
--record-size 100 \
--throughput -1 \
--producer-props acks=1 \
bootstrap.servers=localhost:9092 \
buffer.memory=67108864 \
batch.size=8196
bin/kafka-consumer-perf-test.sh \
--zookeeper localhost:2181 \
--messages 50000000 \
--topic test \
--threads 1
Related
I need to understand how to upload a file in chunks in azure file storage
File size: 580B
Creation:
fileName="bigfile.txt"
file_length=$(wc -m < $fileName)
request_date=$(TZ=GMT date "+%a, %d %h %Y %H:%M:%S %Z")
# Create
curl "https://xxxx.file.core.windows.net/test-fs/$fileName?sv=2021-06-08&ss=f&srt=sco&sp=rwdlc&se=2022-11-23T20:35:52Z&st=2022-11-23T12:35:52Z&spr=https&sig=xxxxxxxxxxxxxx" \
-X 'PUT' \
-H 'Content-Length: 0' \
-H "x-ms-date: $request_date" \
-H "x-ms-content-length: $file_length" \
-H 'x-ms-type: file' \
-H 'x-ms-version: 2021-06-08'
First chunk: OK added first 512 bytes
curl "https://xxxx.file.core.windows.net/test-fs/$fileName?sv=2021-06-08&ss=f&srt=sco&sp=rwdlc&se=2022-11-23T20:35:52Z&st=2022-11-23T12:35:52Z&spr=https&sig=xxxxxxxxxxxxxx&comp=range" \
-X 'PUT' \
-H "x-ms-range: bytes=0-511" \
-H 'x-ms-version: 2021-06-08' \
-H 'x-ms-write: update' \
-H "Content-Length: 512" \
-T "$fileName"
Second chunk: KO. It's adding not the remaining bytes but the first 68 bytes
curl "https://xxx.file.core.windows.net/test-fs/$fileName?sv=2021-06-08&ss=f&srt=sco&sp=rwdlc&se=2022-11-23T20:35:52Z&st=2022-11-23T12:35:52Z&spr=https&sig=xxxxxxxxxxxxxx&comp=range" \
-X 'PUT' \
-H "x-ms-range: bytes=512-579" \
-H 'x-ms-version: 2021-06-08' \
-H 'x-ms-write: update' \
-H "Content-Length: 68" \
-T "$fileName"
I downloaded the file first using:
!curl -L -O https://github.com/huggingface/transformers/blob/master/examples/legacy/question-answering/run_squad.py
Then used following code:
!python run_squad.py \
--model_type bert \
--model_name_or_path bert-base-uncased \
--output_dir models/bert/ \
--data_dir data/squad \
--overwrite_output_dir \
--overwrite_cache \
--do_train \
--train_file /content/train.json \
--version_2_with_negative \
--do_lower_case \
--do_eval \
--predict_file /content/val.json \
--per_gpu_train_batch_size 2 \
--learning_rate 3e-5 \
--num_train_epochs 2.0 \
--max_seq_length 384 \
--doc_stride 128 \
--threads 10 \
--save_steps 5000
Also tried following:
!python run_squad.py \
--model_type bert \
--model_name_or_path bert-base-cased \
--do_train \
--do_eval \
--do_lower_case \
--train_file /content/train.json \
--predict_file /content/val.json \
--per_gpu_train_batch_size 12 \
--learning_rate 3e-5 \
--num_train_epochs 2.0 \
--max_seq_length 584 \
--doc_stride 128 \
--output_dir /content/
The error says in both the codes:
File "run_squad.py", line 7
^ SyntaxError: invalid syntax
What exactly is the issue? How can I run the .py file?
SOLVED: It was giving error because I was downloading the github link rather than the script in github. Once I copied and used 'Raw' link to download the script, the code ran.
I have a script for building a project that I need to upgrade from using configure to cmake. The original configure command is
CFLAGS="$SLKCFLAGS" \
CXXFLAGS="$SLKCFLAGS" \
./configure \
--with-clang \
--prefix=$PREFIX \
--libdir=$PREFIX/lib${LIBDIRSUFFIX} \
--incdir=$PREFIX/include \
--mandir=$PREFIX/man/man1 \
--etcdir=$PREFIX/etc/root \
--docdir=/usr/doc/$PRGNAM-$VERSION \
--enable-roofit \
--enable-unuran \
--disable-builtin-freetype \
--disable-builtin-ftgl \
--disable-builtin-glew \
--disable-builtin-pcre \
--disable-builtin-zlib \
--disable-builtin-lzma \
$GSL_FLAGS \
$FFTW_FLAGS \
$QT_FLAGS \
--enable-shared \
--build=$ARCH-slackware-linux
I am not familiar enough with cmake to know how to do the equivalent. I would prefer a command line option but am open to modifying the CMakeLists.txt file as well.
I get this error when I try to submit my training job.
ERROR: (gcloud.ml-engine.jobs.submit.training) Could not copy [dist/object_detection-0.1.tar.gz] to [packages/10a409168355064d603079b7c34cdd7010a13b181a8f7776751e9110d66a5bdf/object_detection-0.1.tar.gz]. Please retry: HTTPError 404: Not Found
I'm running the following code:
gcloud ml-engine jobs submit training ${train1} \
--job-dir=gs://${object-detection-tutorial-bucket1/}/train \
--packages dist/object_detection-0.1.tar.gz,slim/dist/slim-0.1.tar.gz \
--module-name object_detection.train1 \
--region us-central1 \
--config object_detection/samples/cloud/cloud.yml \
--runtime-version=1.4 \
-- \
--train_dir=gs://${object-detection-tutorial-bucket1/}/train \
--pipeline_config_path=gs://${object-detection-tutorial-
bucket1/}/data/ssd_mobilenet_v1_coco.config
It looks like the syntax you're using is incorrect.
If the name of your bucket is object-detection-tutorial-bucket1, then you specify that with:
--job-dir=gs://object-detection-tutorial-bucket1/train
or you can run:
export YOUR_GCS_BUCKET="gs://object-detection-tutorial-bucket1"
and then specify the bucket as:
--job-dir=${YOUR_GCS_BUCKET}/train
The ${} syntax is used for accessing the value of a variable, but object-detection-tutorial-bucket1/ isn't a valid variable name, so it evaluates as empty.
Sources:
https://cloud.google.com/blog/big-data/2017/06/training-an-object-detector-using-cloud-machine-learning-engine
Difference between ${} and $() in Bash
Just remove $ { } in the script.Considering your bucket name to be object-detection-tutorial-bucket1,Run the below script-
gcloud ml-engine jobs submit training \
--job-dir=gs://object-detection-tutorial-bucket1/train \
--packages dist/object_detection-0.1.tar.gz,slim/dist/slim-0.1.tar.gz \
--module-name object_detection.train1 \
--region us-central1 \
--config object_detection/samples/cloud/cloud.yml \
--runtime-version=1.4 \
-- \
--train_dir=gs://object-detection-tutorial-bucket1/train \
--pipeline_config_path=gs://object-detection-tutorial- \
bucket1/data/ssd_mobilenet_v1_coco.config \
Terrible fix but something which worked for me - just remove $variable format completely.
Here is an example:
!gcloud ai-platform jobs submit training anurag_card_fraud \
--scale-tier basic \
--job-dir gs://anurag/credit_card_fraud/models/JOB_20210401_194058 \
--master-image-uri gcr.io/anurag/xgboost_fraud_trainer:latest \
--config trainer/hptuning_config.yaml \
--region us-central1 \
-- \
--training_dataset_path=$TRAINING_DATASET_PATH \
--validation_dataset_path=$EVAL_DATASET_PATH \
--hptune
I use TF-slim inception-v4 training a model from scratch.
python train_image_classifier.py \
--train_dir=${TRAIN_DIR} \
--dataset_name=mydata \
--dataset_split_name=train \
--dataset_dir=${DATASET_DIR} \
--model_name=inception_v4 \
--clone_on_cpu=true \
--max_number_of_steps=1000 \
--log_every_n_steps=100
# Run evaluation.
python eval_image_classifier.py \
--checkpoint_path=${TRAIN_DIR} \
--eval_dir=${TRAIN_DIR} \
--dataset_name=mydata \
--dataset_split_name=validation \
--dataset_dir=${DATASET_DIR} \
--model_name=inception_v4 \
--batch_size=32
# # # Fine-tune all the new layers for 500 steps.
python train_image_classifier.py \
--train_dir=${TRAIN_DIR}/all \
--dataset_name=mydata \
--dataset_split_name=train \
--dataset_dir=${DATASET_DIR} \
--model_name=inception_v4 \
--clone_on_cpu=true \
--checkpoint_path=${TRAIN_DIR} \
--max_number_of_steps=1000 \
--log_every_n_steps=100 \
--batch_size=32 \
--learning_rate=0.0001 \
--learning_rate_decay_type=fixed \
--save_interval_secs=600 \
--save_summaries_secs=600 \
--optimizer=rmsprop \
--weight_decay=0.00004
then freeze the graph:
python export_inference_graph.py \
--alsologtostderr \
--model_name=inception_v4 \
--is_training=True \
--labels_offset=999 \
--output_file=${OUTPUT_DIR}/unfrozen_inception_v4_graph.pb \
--dataset_dir=${DATASET_DIR}
#NEWEST_CHECKPOINT=$(cat ${TRAIN_DIR}/all/checkpoint |head -n1|awk -F\" '{print $2}')
NEWEST_CHECKPOINT=$(ls -t1 ${TRAIN_DIR}/all|grep model.ckpt |head -n1)
echo ${NEWEST_CHECKPOINT%.*}
python ${OUTPUT_DIR}/tensorflow/tensorflow/python/tools/freeze_graph.py \
--input_graph=${OUTPUT_DIR}/unfrozen_inception_v4_graph.pb \
--input_checkpoint=${TRAIN_DIR}/all/${NEWEST_CHECKPOINT%.*} \
--input_binary=true \
--output_graph=${OUTPUT_DIR}/frozen_inception_v4.pb \
--output_node_names=InceptionV4/Logits/Predictions \
--input_meta_graph=True
After all this, I got a frozen_inception_v4.pb file.
for this example https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/label_image/label_image.py
what is the input layer for inception_v4 ?
Does anyone know how to solve this?
That depends on the particular implementation of slim you used. Look where they define the input and see what is the name of that tensor.
Try this:
bazel build tensorflow/tools/graph_transforms:summarize_graph
bazel-bin/tensorflow/tools/graph_transforms/summarize_graph \
--in_graph=/path/to/your_frozen.pb
It will show possible input and output layer