Numpy's broadcast in Pandas - broadcast

I want to reproduce Numpy's broadcast using pd.DataFrame but do not seem to be able to do it elegantly. If I do not use '.values' or a 'for' loop across keys, I get a wrong answer - please see screen shot.
Any idea?
Thx.

Related

How to create NumPy random 2D array with full rank, or a particular rank?

In NumPy, I can use random package to create a 2D array, but cannot make sure it has full rank or a particular rank. How to got it? In the case of full rank, I can use linalg.matrix_rank to check, but I want to make sure this in more simple way. A solution for the case of square matrix is welcome.
Edit: When I ask this question, I think to builtin solution in Numpy. But if there's not that kind of solution in Numpy, Scipy (or other Python modules), the situation here may as in #hpaulj's comment "But at some level this sounds more like a theoretical math topic than a Numpy programming one." And as the math topic, the question should be: find algorithm to generate a random matrix (random in uniform distribution) that has a particular rank? So, closing this question or keeping it to wait nice algorithms in Python is the admin's choice.

create octree by bottom up approach in Pyspark

I have to create octree in parallel on Pyspark dataframe. and I have to use create octree with bottom up approach. i am not able to clearly understand bottom up approach. Can someone please write briefly direction? what I need to start. First I divided all points into 8 chunks.
Help me
Have you tried spark3D? Information on Octree partitioning in Python is here.

Tensorflow: The graph couldn't be sorted in topological order only when running in terminal

I encounter some problem while running the python script on Google cloud computing instance, with python 3.6, tensorflow 1.13.1. I see several people encounter similar problems of loops in computational graph on stackoverflow. But none of it really find the culprit for it. And I observe something interesting so maybe someone experienced can figure it out.
The error message is like this:
2019-05-28 22:28:57.747339: E tensorflow/core/grappler/optimizers/dependency_optimizer.cc:704] Iteration = 0, topological sort failed with message: The graph couldn't be sorted in topological order.
2019-05-28 22:28:57.754195: E tensorflow/core/grappler/optimizers/dependency_optimizer.cc:704] Iteration = 1, topological sort failed with message: The graph couldn't be sorted in topological order.
My script for train.py will look like this:
import A,B,C
...
def main():
....
if __name__ == '__main__':
main()
So I will show my two ways to run this script:
VERSION1:
In terminal,
python3 train.py
This gives me the error like I state above. When I only use CPU, i notice it throws me something like failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected. So I add GPU to my instance but the loop in computational graph is still there.
VERSION 2(This is where weird thing happens):
I just simply copy, with nothing changed, the code in main to the jupyter notebook and run there. Then suddenly, no error occurs anymore.
I don't really know what's going on under the hood. I just notice the message at the beginning of the running is not the same between two different ways of running the code.
If you encounter the same problem, copy to jupyter notebook might help directly. I would really like share more info if someone has any ideas what might possible cause this. Thank you!
Well it turns out, no matter what, I choose a wrong way to build the graph at the beginning, which will not give the loop in my perspective. The loop error give me an idea I'm doing something wrong. But the interesting question I mentioned above is still not answered! However, I'd like to share my error so anyone see the loop error should think whether you're doing the same thing as me.
In the input_fn, I use tensor.eval() to get corresponding numpy.array in the middle to interact with data outside of that function. I choose not to use tf.data.Dataset because the whole process is complicated and I can't compress the whole thing into Dataset directly. But it turns out this approach sabotage the static computational graph design of Tensorflow. So during training, it trains on the same batch again and again. So my two cents advice is that if you want to achieve something super complex in your input_fn. It's likely you will be better off or even only doing the right thing by using the old fashioned modelling way- tf.placeholder.

Visualizing a large data series

I have a seemingly simple problem, but an easy solution is alluding me. I have a very large series (tens or hundreds of thousands of points), and I just need to visualize it at different zoom levels, but generally zoomed well out. Basically, I want to plot it in a tool like Matlab or Pyplot, but knowing that each pixel can't represent the potentially many hundreds of points that map to it, I'd like to see both the min and the max of all the array entries that map to a pixel, so that I can generally understand what's going on. Is there a simple way of doing this?
Try hexbin. By setting the reduce_C_function I think you can get what you want. Ex:
import matplotlib.pyplot as plt
import numpy as np
plt.hexbin(x,y,C=C, reduce_C_function=np.max) # C = f(x,y)
would give you a hexagonal heatmap where the color in the pixel is the maximum value in the bin.
If you only want to bin in one direction, see this this method.
First option you may want to try is Gephi- https://gephi.org/
Here is another option, though I'm not quite sure it will work. It's hard to say without seeing the data.
Try going to this link- http://bl.ocks.org/3887118. Do you see toward the bottom of the page data.tsv with all of the values? IF you can save your data to resemble this then the HTML code above should be able to build your data in the scatter plot example shown in that link.
Otherwise, try visiting this link to fashion your data to a more appropriate web page.
There are a set of research tools called TimeSearcher 1--3 that provide some examples of how to deal with large time-series datasets. Below are some example images from TimeSearcher 2 and 3.
I realized that simple plot() in MATLAB actually gives me more or less what I want. When zoomed out, it renders all of the datapoints that map to a pixel column as vertical line segments from the minimum to the maximum within the set, so as not to obscure the function's actual behavior. I used area() to increase the contrast.

Plotting data in real time

I have a program which outputs to the terminal a number, one line at a time.
My goal is to have something else read these numbers and graph them in a line plot in real time. matplotlib and wxpython have been suggested, but I'm not sure how to go about implementing these.
See the following links:
What is the best real time plotting widget for wxPython?
Minimalistic Real-Time Plotting in Python
http://eli.thegreenplace.net/2008/08/01/matplotlib-with-wxpython-guis/
http://wxpython-users.1045709.n5.nabble.com/real-time-data-plots-td2344816.html
As some of those point out, you might be able to use wx's PyPlot for something really simple or use Chaco.
I really like this library for HTML5 graphing. Here is demo of real time updates: http://dygraphs.com/gallery/#g/dynamic-update
Are you simply asking for recommendations on plotting libs?