I have a very large data set: 512x512x512 cells. Loading the mesh into the memory takes over 120GB and the memory on a single node is a problem for me. I am wandering if paraview can load the data in memory on multiple nodes so that there is more memory in total?
Thanks.
Generally speaking, yes. ParaView can be run in parallel (http://www.paraview.org/Wiki/Users_Guide_Client-Server_Visualization) to distribute the data across nodes.
What kind of file format this is? Based on the file-format, the reader could either read in partitioned data on processes or will read on single node and then one will have to redistribute using filters.
Related
I have the data of around 623G (nodes and relationships) in a csv which is consuming around 670G to cache all the data. Hence we are running the system of 94 cores and 750G of memory on cloud. My intention here is to reduce the system config, but with the same dataset. Is it possible to use redis here or do I have to think of any other method like splitting the data into multiple smaller instances ?
I intend to use chronicle-map instead of redis, the application scenario is the memoryData module starts every day from the database to load hundreds of millions of records to chronicle-map, and dozens of jvm continue to read chronicle-map records. Each jvm has hundreds of threads. But probably because of the lack of understanding of the chronicle-map, the code poor performance, running slower, until the memory overflow. I wonder if the above practice is the correct use of chronicle-map.
Because Chronicle map stores your data off-heap it's able to store more data than you can hold in main memory, but will perform better if all the data can fit into memory, ( so if possible, consider increasing your machine memory, if this is not possible try to use an SSD drive ), another reason for poor performance maybe down to how you have sized the map in the chronicle map builder, for example how you have set the number of max entries, if this is too large it will effect performance.
I have implemented a custom extractor for NetCDF files and load the variables into arrays in memory before outputting them. Some arrays can be quite big, so I wonder what the memory limit is in ADLA. Is there some max amount of memory you can allocate?
Each vertex has 6GB available. Keep in mind that this memory is shared between the OS, the U-SQL runtime, and the user code running on the vertex.
In addition to Saveen's reply: Please note that a row can at most contain 4MB of data, thus your SqlArray will be limited by the maximal row size as well once you return it from your extractor.
I was reading some documentations on Pig Latin and could not fully understand why would Pig not need to import the data into the system before applying queries, during data analysis?
Can someone please explain? Thanks.
In Hadoop and HDFS there is a concept of Data Locality, which actually means that "Bringing your computer/code near to data" not bringing the data near to computer.
This concepts applied to all the data processing technology over Hadoop, like MapReduce, Hive and Pig.This is the mail reason Pig doesn't import the data into the system instead it goes near to data and analyze it.
Data locality: An important concept with HDFS and MapReduce, data locality can best be described as “bringing the compute to the data.” In other words, whenever you use a MapReduce program on a particular part of HDFS data, you always want to run that program on the node, or machine, that actually stores this data in HDFS. Doing so allows processes to be run much faster, since it prevents you from having to move large amounts of data around.
When a MapReduce job is submitted, part of what the JobTracker does is look to see which machines the blocks required for the task are located on. This is why, when the NameNode splits data files into blocks, each one is replicated three times: the first is stored on the same machine as the block, while the second and third are each stored on separate machines.
Storing the data across three machines thus gives you a much higher chance of achieving data locality, since it’s likely that at least one of the machines will be freed up enough to process the data stored at that particular location.
Reference: http://www.plottingsuccess.com/hadoop-101-important-terms-explained-0314/
I have a FORTRAN MPI code to solve a flow field.
At the start I want to read data from file and distribute it to the participating processes.
The data is consisting of several 3-D arrays(velocities in space x,y,z).
Every process stores only a part of the array.
So if every process is going to read the file(the easiest way I think) it is not going to work as it will only store a the first part of the file corresponding to the number of arrays that the process can hold.
MPI Bcast can work for 3d arrays? But then things become complex.
Or is there an easier way?
You have, broadly speaking, 2 or 3 choices, depending on your platform.
One process reads the input data and sends (parts of) it to the other processes. I wouldn't usually use broadcast for this since it is a collective operation and all processes have to take part. I'd usually just send the necessary information to each process. If it is convenient (and not a memory issue) you could certainly broadcast all the input data to all the processes, it's just not a pattern of operation that I use or see much.
All processes read the data that they require. This may involve a process reading an entire input file and only storing those parts it requires. But if you have very large input files you can write routines to read only the necessary part into each process's memory space. This approach may involve processes competing for disk access, which is only slow in a relative sense: if you are running large-scale and long-running parallel computations waiting a few seconds while all the processes get their data is not much of an overhead.
If you have a parallel file system then you can use MPI's parallel I/O routines so that each process reads only those parts of the input data that it requires.
The canonical way of such an I/O pattern in MPI is either to
Read the data on rank 0, then use MPI_Scatter to distribute it. Or if memory is tight, do this blockwise, or then use 1-to-1 communication rather than MPI_Scatter.
Use MPI-I/O, and have each rank read its own subset of the data file (to be useful, this of course requires a file format where you can figure out the boundaries without first reading through the entire file).
For extreme scalability, one can combine the two approaches, that is a subset of processes (say, sqrt(N) as a rough rule of thumb) use MPI I/O, and each MPI process sends data to its own IO process.
If you are running your code on less than 1000 cores with a good file system (e.g. Lustre) then just use Fortran I/O where each rank opens the file and reads the data it needs (skipping the rest). Yes it takes a few minutes but you're only reading the file once during start.
MPI I/O (binary only) is non-trivial and usually you are always better off using higher level libs such as HDF5 or Parallel NetCDF. Performance will depend on how the data is read (contiguous vs non-contiguous and so on). The following links may be helpful ...
http://www.osc.edu/supercomputing/training/pario/parallel-io-nov04.pdf
https://support.scinet.utoronto.ca/wiki/images/0/01/Parallel_io_course.pdf