I need to simulate the below:
1. SSH (only once)
2. Execute a command on all the rows in a csv file at once.
Number of rows in the csv file is dynamic. If 10, the command needs to be executed over all the 10 rows in parallel.
Am not sure of using SSH Command Sampler here. SSH and Command are to be entered in the same sampler. How do I separate these? i.e. SSH only once and then executing the commands in parallel. Which JMeter components do I use here?
Note: Increasing the number of Threads is not an efficient option. While doing this many sessions get created. In turn hanging the terminal. This option works fine up to 10 users. Not sure if there's a limit on the number of sessions.
Thanks for your support.
Regards,
Ajith
Why do you think that Increasing the number of Threads is not an efficient option?
I would suggest moving the SSH (only once) to setUp Thread Group and put Execute a command on all the rows in a csv file at once. bit under the normal Thread Group
If the number of rows in the CSV file is dynamic - you can make the number of threads dynamic as well using __groovy() function like:
${__groovy(new File('/path/to/your/file.csv').readLines().size,)}
If you want to execute all the 10 requests (or whatever is the number of lines) at exactly the same moment you can add a Synchronizing Timer
Related
Using Hue, I've got a Hive query that will take an input (eg. an ID number) and return a record based on that. I need to handle multiple numbers to look up in one go (in serial or parallel) and collate the results (i.e. list the records for each, one after the other) so input might be:
1234567890
45345353
32423422
1323122
etc...
I've got access to Hue (which I'm supposed to use), Hive, Oozie and Beeline. How do I:
1.) extract the number for each line
2.) repeatedly call my HiveQL query passing in each number in turn
3.) supply the total output to the user in one go
I don't know Python if that's relevant but could attempt a shell script.
I'm guessing one way might be to get the multi-line user input via Oozie (can it prompt a user for input?), then pass that to a shell script which extracts the number from each line and uses beeline to repeatedly run my Hive query with the next number as the parameter?
Thanks
I have a collection of SQL queries that need to run in a specific order using Teradata. How can this be done?
I've considered writing an application in some other language (like Python or C++) to sequentially call each query, but am unsure how to get live data there from Teradata. I also want to keep the queries as separate SQL files (like it is currently).
Goal is to minimize the need for human interaction ie. I want to hit "Run" and let it take care of the rest.
BTEQ scripts are your Go-To solution.
Have each query, or at least, logical blocks of several statements, in single bteq script.
Then create a script that will call the the BTEQ with needed settings, i.e. TD logon command and have this script be called in a batch with parameters like this:
start /wait C:\Teradata\BTEQ.bat Script_1.txt
start /wait C:\Teradata\BTEQ.bat Script_2.txt
start /wait C:\Teradata\BTEQ.bat Script_3.txt
pause
Then you can create several batch files, split in logical blocks and have them executed at will, or scheduled.
As part of an ongoing research work, I am checking if an URL exists or not using the cURL command. I have been executing a shell script for couple of days and it is doing some updates for each URL in my database. However, the script seems to update around only 100,000 rows in a day.
I was thinking if I could write the values in a file first and then do the updates, the execution might be faster.
I am connecting to the database using the command line.
mysql -h servername -u username -ppassword databasename "Update Query"
For example, instead of connecting to the database 2 million times like above from the command line and updating 2 million rows, I am planning to connect to the database only once from the command line and update 2 million rows from the file.
So is the second approach better than the first one or the time difference would be negligible?
Three approaches.
You could using load data infile
You could build up a .sql file with all of the updates you need.
You could use something other than a CLI to connect to the URLs and DB. In other words, not using "curl" and "mysql" commands, but using a real programming language and provided libraries for checking URLs and updating databases.
Any of those would probably be faster. Though you'll likely get more speed improvement by making the http calls in parallel. You can do that more easily with a real programming language.
Is it possible to run a kettle job simultaneously more than once at the same time?
What I am Trying
Say we are running this script twice at a same time,
sh kitchen.sh -rep="development" -dir="job_directory" -job="job1"
If I run it only once at a point of time, data-flow is perfectly fine.
But, when I run this command twice at the same time, it throws error like:
ERROR 09-01 13:34:13,295 - job1 - Error in step, asking everyone to stop because of:
ERROR 09-01 13:34:13,295 - job1 - org.pentaho.di.core.exception.KettleException:
java.lang.Exception: Return code 1 received from statement : mkfifo /tmp/fiforeg
Return code 1 received from statement : mkfifo /tmp/fiforeg
at org.pentaho.di.trans.steps.mysqlbulkloader.MySQLBulkLoader.execute(MySQLBulkLoader.java:140)
at org.pentaho.di.trans.steps.mysqlbulkloader.MySQLBulkLoader.processRow(MySQLBulkLoader.java:267)
at org.pentaho.di.trans.step.RunThread.run(RunThread.java:50)
at java.lang.Thread.run(Thread.java:679)
Caused by: java.lang.Exception: Return code 1 received from statement : mkfifo /tmp/fiforeg
at org.pentaho.di.trans.steps.mysqlbulkloader.MySQLBulkLoader.execute(MySQLBulkLoader.java:95)
... 3 more
It's important to run the jobs simultaneously twice at a same time. To accomplish this, I can duplicate every job and run the original and the duplicate job at a point of time. But, not a good approach for long run!
Question:
Is Pentaho not maintaining threads?
Am I missing some option, or can I enable some option to make pentaho create different threads for different job instances?
Of course Kettle maintains threads. A great many of them in fact. It looks like the problem is that the MySQL bulk loader uses a FIFO. You have two instances of a FIFO called /tmp/fiforeg. The first instance to run creates the FIFO just fine; the second then tries to create another instance with the same name and that results in an error.
At the start of the job, you need to generate a unique FIFO name for that instance. I think you can do this by adding a transformation at the start of the job that uses a Generate random value step to generate a random string or even a UUID and store it in a variable in the job via the Set variables step.
Then you can use this variable in the 'Fifo file' field of the MySQL bulk loader.
Hope that works for you. I don't use MySQL, so I have no way to make sure.
I have a shell script in which I have to execute two queries (on different databases), spool their results to text files and finally call a C++ program that processes the information on those text files. Something like this:
sqlplus user1/pw1#database1 #query1.sql
sqlplus user2/pw2#database2 #query2.sql
./process_db_output
Both queries take some time to execute. One of them can take up to 10 minutes, while the other one is usually faster. What I want to do is execute them simultaneously and when both are done, call the processing utility.
Any suggestion on how to do so?
Use & to background the queries, then wait to wait for all subprocesses to finish, and then the c++ thing that processes the results.
Code:
#!/bin/bash
# first calling
sqlplus user1/pw1#database1 #query1.sql &
sqlplus user2/pw2#database2 #query2.sql &
#now waiting
wait
#done waiting
./process_db_output