Python currently uses Numpy for heavy duty math and image processing.
The earlier Numeric and Numarray are obsolete, but still today there are many tutorials, notes, sample code and other documentation using them. Some of these cover special topics of interest, some are well written but haven't been updated or replaced, or are otherwise of use. Quite a bit is the same between Numeric, Numarray and Numpy, so I usually get good mileage out these older docs. Ocassionaly, though, I run into a line of code that results in error. Not often enough to remember how to get around it, but usually I figure it out at the cost of some time.
What are the main things to watch out for when relying on such older documentation for current Numpy use? Is there a list of how to translate the differences that exist?
Two good resources:
Numarray to numpy guide
Differences between Numeric and numpy
Related
As a physics major I am by no means good at coding but it suffices for modeling stuff. Plotting always ends up a beeing super annoying. It´s easy enough to look up stuff in the the Julia documentation if needed.
With plotting it´s a totally different story. There is just no good ressources to learn handling the available plotting packages.
I´m lost. There is proper documentation for PyPlot for example but only for Python and the code won´t work in Julia. Then there a few examples that obivously do not answer all my questions. Am I missing something? I feel stupid but also know that everyone around me spends to much time on plotting, too.
Any advice on were to look up stuff?
thanks in advance
Plotting in Julia is provided by packages. So you will not find plotting docs in the main documentation for julia.
There are a number of plotting packages to choose between - it's just a question of picking the one you prefer. Here is the documentation for most of the more used packages:
Plots: https://docs.juliaplots.org/latest/
Makie: https://makie.juliaplots.org/dev/
PlotlyJS: http://spencerlyon.com/PlotlyJS.jl/
VegaLite: https://www.queryverse.org/VegaLite.jl/stable/
PGFPlotsX: https://kristofferc.github.io/PGFPlotsX.jl/dev/
Gadfly: http://gadflyjl.org/stable/
GR: https://gr-framework.org/julia.html
UnicodePlots (for plotting in terminal): https://github.com/Evizero/UnicodePlots.jl
For PyPlot the matplotlib syntax should work. It's not clear from your question why it doesn't for you. So there should be ample resources.
I'm using scipy and numpy to calculate exponentiation of a 6*6 matrix for many times.
Compared to Matlab, it's about 10 times slower.
The function I'm using is scipy.linalg.expm, I have also tried deprecated methods scipy.linalg.expm2 and scipy.linalg.expm3, and those are only two times faster than expm. My question is:
What's wrong with expm2 and expm3 as they are faster than expm?
I'm using wheel package from http://www.lfd.uci.edu/~gohlke/pythonlibs/, and I found https://software.intel.com/en-us/articles/building-numpyscipy-with-intel-mkl-and-intel-fortran-on-windows. Is the wheel package compiled with MKL. If not, I think I can optimize and numpy, scipy by compile it by myself with MKL?
Any other ways to optimize the performance?
Well I think I have found answer for question 1 and 2 by myself
1. It seems expm2 and expm3 returns array rather than matrix. But they are about 2 times faster than expm
Well, after a whole day trying to compile scipy by MKL, I succeed. It's really hard to build the scipy, especially when I'm using windows, x64 and python3. It turned out to be a waste of time. It's not even a bit faster than the whl package from http://www.lfd.uci.edu/~gohlke/pythonlibs/ .
Hoping someone give answer to question 3.
Your matrix is relatively small, so maybe the numerical part is not the bottleneck. You should use a profiler to make sure that the limitation is in the exponentiation.
You can also take a look at the source code of these implementations and write an equivalent function with less conditionals and checking.
Now that pandas provides a data frame structure, is there any need for structured/record arrays in numpy? There are some modifications I need to make to an existing code which requires this structured array type framework, but I am considering using pandas in its place from this point forward. Will I at any point find that I need some functionality of structured/record arrays that pandas does not provide?
pandas's DataFrame is a high level tool while structured arrays are a very low-level tool, enabling you to interpret a binary blob of data as a table-like structure. One thing that is hard to do in pandas is nested data types with the same semantics as structured arrays, though this can be imitated with hierarchical indexing (structured arrays can't do most things you can do with hierarchical indexing).
Structured arrays are also amenable to working with massive tabular data sets loaded via memory maps (np.memmap). This is a limitation that will be addressed in pandas eventually, though.
I'm currently in the middle of transition to Pandas DataFrames from the various Numpy arrays. This has been relatively painless since Pandas, AFAIK, if built largely on top of Numpy. What I mean by that is that .mean(), .sum() etc all work as you would hope. On top of that, the ability to add a hierarchical index and use the .ix[] (index) attribute and .xs() (cross-section) method to pull out arbitray pieces of the data has greatly improved the readability and performance of my code (mainly by reducing the number of round-trips to my database).
One thing I haven't fully investigated yet is Pandas compatibility with the more advanced functionality of Scipy and Matplotlib. However, in case of any issues, it's easy enough to pull out a single column that behaves enough like an array for those libraries to work, or even convert to an array on the fly. A DataFrame's plotting methods, for instance, rely on matplotlib and take care of any conversion for you.
Also, if you're like me and your main use of Scipy is the statistics module, pystatsmodels is quickly maturing and relies heavily on pandas.
That's my two cents' worth
I never took the time to dig into pandas, but I use structured array quite often in numpy. Here are a few considerations:
structured arrays are as convenient as recarrays with less overhead, if you don't mind losing the possibility to access fields by attributes. But then, have you ever tried to use min or max as field name in a recarray ?
NumPy has been developed over a far longer period than pandas, with a larger crew, and it becomes ubiquitous enough that a lot of third party packages rely on it. You can expect structured arrays to be more portable than pandas dataframes.
Are pandas dataframes easily pickable ? Can they be sent back and forth with PyTables, for example ?
Unless you're 100% percent that you'll never have to share your code with non-pandas users, you might want to keep some structured arrays around.
I have a simple optimization problem and am looking for java software for that.
The Apache math optimization software looks just like what I want but I cant find documentation to suit my needs (where those needs are to useful to a beginner / non maths professional!)
Does anyone know of a worked, simple, example?
In case it helps, the problem is that I want to find the max r where
r1 = s1 * m1
r2 = s2 * m2
and there are some constraints and formula for defining the relationship between the variables. The Excel Solver works fine for this problem. I got LPSolve working great, but this problem requires a multiplication of s and m, so I understand LPSolve cant help as this makes the problem non linear.
I recently ported the derivative-free non-linear constrained optimization code COBYLA2 to Java. Since it does not explicitly rely on derivatives, the algorithm may require quite a few iterations for larger problems. Nonetheless, you are able to formulate your problem with both a non-linear objective function and (potentially) non-linear constraints.
You can read more about it and download the source code from here.
I am not aware of a simple Java-based NLP solver. (I did find an example of Quadratic programming (QP) in Apache Math Works, but it doesn't qualify since you asked for a non-math professional example.)
I have two suggestions for you to solve your non-linear program:
1.. Excel's Solver does have the ability to tackle non-linear problems. (Don't use LPSOLVE.) In fact, NLP is the default mode in Solver.
Here are two links to using Excel to solve NLPs: Example 1 - Step by step Solver walk-through that covers NLP and
Example 2 - A General Neural network example in Excel
Also for Excel, I like Paul Jensen's (utexas) ORMM Add-in's.
He has a module called Teach NLP. Chapter 10 of his book deals with NLP and is available from his site.
2.. If you are going to be doing even some amount of data analysis, then I recommend investing a few hours to download and learn the basics of R.
R has numerous packages and libraries for optimization. optim() and nlme are relavant for solving non-linear programs.
Just for completeness, I mention SAS, MATLAB and CPLEX as other options. If you have access to any of these, they all do a very good job with solving non-linear programs.
Hope these pointers help.
In case somebody doesn't know: A cartogram is a type of map where some country/region-dependent numeric property scales the respective regions so that that property's density is (close to) constant. An example is
from worldmapper.org. In this example, countries are scaled according to their population, resulting in near-constant population density.
Needless to say, this is really cool. Does anyone know of a Matplotlib-based library for drawing such maps? The method used at worldmapper.org is described in (1), so it would surprise me if no one has implemented this yet...
I'm also interested in hearing about other cartogram libraries, even if they're not made for Matplotlib.
(1) Michael T. Gastner and M. E. J. Newman,
Diffusion-based method for producing density-equalizing maps,
Proc. Nat. Acad. Sci. USA, 101, 7499-7504 (2004). Available at arXiv.
There's this, though it's based and a different algorithm (and though it's on the ESRI site, it doesn't require ArcGIS). Of course, once you have the cartogram you can plot it in matplotlib.
Here is a Javascript plugin to make cartograms using D3. It is a good, simple solution if you are not too concerned about the regions being sized accurately. If accuracy is important, there are other options available that give you more freedom to play with the algorithm's parameters to get to a more accurate result.
Here are two great standalone programs I know of:
Scapetoad
Carto3F
Scapetoad is very easy to use. Just give it a shapefile, tell it which attribute to use for the scaling, and set a few accuracy parameters. If there is any doubt, this post describes the process.
Carto3F is more complex and allows for greater accuracy, though it is a bit trickier to figure out - lots of parameter settings without much documentation explaining them.
There is also a QGIS cartogram plugin, written in Python. Though I have not been able to get it to work, so cannot comment on that one.
In short, no. But Newman has an excellent little implementation of his and Gastner's method on his website. Installing it is easy and it works from the command line. Here's an example of a workflow using this software that worked for me.
Compute a grid of density estimates over some region, e.g. in Python. Store it as a matrix of numbers.
Run the cart program with your density matrix as input from the command line or from as subprocess in Python.
The program returns a list of new coordinates for each grid point.
Pipe your shapefile points through the interp program and into a new shapefile to get the transformed map.
There are nice instructions on the main page.
The geoplot.cartogram function in
Geoplot: geospatial data visualization — geoplot 0.2.0
says it is a high-level Python geospatial plotting library, and an extension to cartopy and matplotlib.
Try this library if you are using geopandas, it is quick and doesnt require much customization. https://github.com/mthh/cartogram_geopandas