Can anyone confirm if my understanding is right? As per the documentation, I understand that on a maximum only 8 concurrent queries can run on the pool. Is that right? Or for every 12.5% memory we can run 8 queries, so would it be in total 64 for 100% memory? Please let me know which is right here
dw200c screenshot
Related
I am new to Spark SQL queries and trying to understand it's working under the hood.
I have come across the term "Core" in the Spark vocabulary but still struggling to get a hold on the same.
I know that - 1 core = 1 task.
My questions -
Can anyone please explain what exactly does a core mean ?
Does Spark UI show the number of cores currently allocated for my job ? If yes,
then where can I see it ?
If I find in the Spark UI that the number of tasks running is less, is
there a way to increase the number of cores allocated for my job, so
that Spark can submit more tasks and make my job run faster ?
Please advise.
Yes, you are right in a way.
In spark task are distributed across executors, on each executor number of task running is equal to the number of cores on that executors. So basically core is something that is going to execute your task. The task here is the most granular work that needs to be carried out.
JOB=>STAGE=>TASK
Yes, spark UI shows you the number of the task currently running on your every executor. You can check them under the Executors tab. This tab shows you a very detailed view of your task allocation against the number of cores available and a lot of other details.
Yes, you can increase the number of cores. You can do that by passing the argument in the spark-submit command.
--executor-cores n
Here n is the number of cores you want. For optimum usage, it should be 5.
It is not necessary that more than the number of cores faster your job will run.
Your task needs to be distributed equally across all the cores available to run faster.
If you provide more cores than required they will remain idle most of the time.
I have developed a Laravel API and looking into picking a server to deploy the project. There is no big business logic running on the server. It's a simple application. But the application will be accessed by ~100 users per second at its peak time. In that case, what parameters of the server should I be looking into for selecting a server (from hardware aspect - RAM, Storage, Processor, etc...)?
API will be used for shop floor time reporting. Every hour (when the hour completes), ~150 users will access the system to report time.
You say you will have 100 users per second, yet you say employees will access it 150 per hour.
While it is likely you can get 100 writes in 30 secs, that's nothing to a modern database.
I would recommend getting the lowest vps package from a hosting provider you like and upgrading to a higher plan if needed.
If you want to run a dedicated server on premises even an office PC with a low end ssd will do the job.
I’m going to round up my estimates because it’s better that you have slightly more then you need then less. Also I’m more used to bigger databases so these estimates may be slightly overkill? But based on my understanding of what you require, they shouldn’t be too excessive , I’ll explain everything aswell so feel free to edit this based on your requirements.
RAM= 150 people? Minimum 10gb. But ram doesn’t come in 10GB and you might aswell go for 16.
Storage? 50GB is a safe bet for small databases and whatnot, feel free to use more or less based on your numbers.
OS requirements. If your app takes up 40gb. Then you do not want only 41gb of space, that will slow everything down.
A good rule of thumb is to reserve 1 GB of RAM for the OS by default, plus an additional 1 GB for each 4 GB between 4-16 and another 1 GB for every 8 GB installed above 16 GB. What this looks like in a server with 32 GB RAM is 7 GB for your OS, with the remaining 25 GB dedicated for your application.
CPU. Whenever I talk about this people always think it’s not a big deal. It kinda is. The amount of servers that have been bottlenecked by their cpu? Is more then it should be. Now, you said that it’ll be lots of interactions (150) but small ones (just logging hours) therefore cpu cores are what you wanna look at. So just find something within budget that has a fair few cores. Intel Xeon E3 1270 V3 is pretty good for its price I would say. That’s all I can think of right now, don’t hesitate to follow up if I’ve missed anything.
I would recommend taking a look at this aswell:
Choose your version and see if you want to make any motivations based on what’s shown in the official documentation below
https://laravel.com/docs/master/installation
I am fairly new to CUDA and would like to find out more about optimising kernel launch conditions to speed up my code. This is quite a specific scenario but I'll try to generalise it as much as possible so anyone else with a similar question can gain from this in the future.
Assume I've got an array of 300 elements (Array A) that is sent to the kernel as an input. This array is made of a few repeating integers with each integer having a device function specific to it. For example, every time 5 appears in Array A, the kernel performs the function specific to 5. These functions are device functions.
How I have parallelised this problem is by launching 320 blocks (probably not the best number) so that each block will perform the device function relevant to its element in parallel.
The CPU would handle the entire problem in a serial fashion where it will take element by element and call each function one after the other whereas the GPU would allocate an element to each block so that all 320 blocks can access the relevant device functions and calculate simultaneously.
In theory for a large number of elements the GPU should be faster - at least I though so but in my case it isn't. My assumption is that since 300 elements is a small number the CPU will always be faster than the GPU.
This is acceptable BUT what I want to know is how I can cut down the GPU execution time at least by a little. Currently, the CPU takes 2.5 milliseconds and the GPU around 12 ms.
Question 1 - How can I choose the optimum number of blocks/threads to launch at the start?
First I tried 320 blocks with 1 thread per block. Then 1 block with 320 threads. No real change in execution time. Will tweaking the number of blocks/threads improve the speed?
Question 2 - If 300 elements is too small, why is that, and roughly how many elements do I need to see the GPU outperforming the CPU?
Question 3 - What optimisation techniques should I look into?
Please let me know if any of this isn't that clear and I'll expand on it.
Thanks in advance.
Internally, CUDA manages threads in groups of 32 (so-called warps). If you have 1 thread per block device will still execute 32 of those - 31 thread will simply be in divergent state. This is potentially an occupancy issue though you may not observe it on your device and with your problem size. There is also limit on number of blocks given multiprocessor (SM) can execute. AFAIR, GeForce 4x can run up to 8 blocks on one SM. Hence if you have a device with 8 SMs you can simultaneously run 64 threads if you have block size of 1. You can use a tool called occupancy calculator to estimate a better block size - or you can use a visual profiler.
This can only be decided by profiling. There are too many unknowns - e.g. what is your ratio of memory accesses to actual computations, how parallelizable your task is, etc.
I would really recommend you to start with best practices guide.
How much CPU/RAM would I need to host 5 Ruby on Rails 3 applications?
I am talking about applications that will not get more than 300 hits per day each.
That's only a few hits per minute, even after allowing for peak hours and bursts.
It's hard for me to imagine a reasonably new machine that would have any problems with that.
But to answer your question, it depends a bit on which web server you choose but about 300 MB / Rails server is a starting point for planning a big application rollout. Since you won't be needing lots of simultaneous transactions, a couple of threads should do and therefore a totally random 2GB machine should be more than enough.
I wouldn't really bother deploying a server without at least 8 or 16 GB, though, even if not immediately needed. Given the other costs involved, even a small budget allocation for memory should result in way more than your scenario needs.
In the old (single-threaded) days we instructed our testing team to always report the CPU time and not the real-time of an application. That way, if they said that in version 1 an action took 5 CPU seconds, and in version 2 it took 10 CPU seconds, that we had a problem.
Now, with more and more multi-threading, this doesn't seem to make sense anymore. It could be that the version 1 of an application takes 5 CPU seconds, and version 2 10 CPU seconds, but that version 2 is still faster if version 1 is single-threaded, and version 2 uses 4 threads (each consuming 2.5 CPU seconds).
On the other hand, using real-time to compare performance isn't reliable either since it can be influenced by lots of other elements (other applications running, network congestion, very busy database server, fragmented disk, ...).
What is in your opinion the best way to 'numerate' performance?
Hopefully it's not intuition since that is not an objective 'value' and probably leads to conflicts between the development team and the testing team.
Performance needs to be defined before it is measured.
Is it:
memory consumption?
task completion times?
disk space allocation?
Once defined, you can decide on metrics.