Python Outlier Removing - pandas

I want to remove blue dots over the green dots region.
I tried using the residual error method from the blue group, and also the z -Score method from Blue but was unable to remove it.
there is a correlation between x and y.
can anyone please share some ideas or links?

What if you create a new feature using both X and Y and then set the threshold values?
like:
# Make new feature
df['xy'] = df['x'] * df['y']
# Filter
df = df[~(df['xy'] < threshold_val & df['gender'] == 'Male')]

Related

I am looking for help to group network graph nodes by colour [duplicate]

I created my graph, everything looks great so far, but I want to update color of my nodes after creation.
My goal is to visualize DFS, I will first show the initial graph and then color nodes step by step as DFS solves the problem.
If anyone is interested, sample code is available on Github
All you need is to specify a color map which maps a color to each node and send it to nx.draw function. To clarify, for a 20 node I want to color the first 10 in blue and the rest in green. The code will be as follows:
G = nx.erdos_renyi_graph(20, 0.1)
color_map = []
for node in G:
if node < 10:
color_map.append('blue')
else:
color_map.append('green')
nx.draw(G, node_color=color_map, with_labels=True)
plt.show()
You will find the graph in the attached image.
Refer to node_color parameter:
nx.draw_networkx_nodes(G, pos, node_size=200, node_color='#00b4d9')
has been answered before, but u can do this as well:
# define color map. user_node = red, book_nodes = green
color_map = ['red' if node == user_id else 'green' for node in G]
graph = nx.draw_networkx(G,pos, node_color=color_map) # node lables
In my case, I had 2 groups of nodes (from sklearn.model_selection import train_test_split). I wanted to change the color of each group (default color are awful!). It took me while to figure it out how to change it but, Tensor is numpy based and Matplotlib is the core of networkx library. Therefore ...
test=data.y
test=test.numpy()
test=test.astype(np.str_)
test[test == '0'] = '#C6442A'
test[test == '1'] = '#9E2AC6'
nx.draw(G, with_labels=True, node_color=test, node_size=400, font_color='whitesmoke')
Long story short: convert the Tensor in numpy array with string type, check your best Hex color codes for HTML (https://htmlcolorcodes.com/) and you are ready to go!

Plot axvline from Point to Point in Matplotlib Python 3.6

I am reading Data from a Simulation out of an Excel File. Out of this Data I generated two DataFrames containing 200 values. Now i want to plot all the Values from DataFrame one in blue and all Values from DataFrame two in purple. Therefore I have following code:
df = pd.read_excel("###CENSORED####.xlsx", sheetname="Data")
unpatched = df["Unpatched"][:-800]
patched = df["Patched"][:-800]
x = range(0,len(unpatched))
fig = plt.figure(figsize=(10, 5))
plt.scatter(x, unpatched, zorder=10, )
plt.scatter(x, patched, c="purple",zorder=19,)
This results in following Graph:
But now i want to draw in some lines that visualize the difference between the blue and purple dots. I thought about an orange line going from blue dot at simulation-run x to the purple dot at simulation-run x. I've tried to "cheat" with following code, since I'm pretty new to matplotlib.
scale_factor = 300
for a in x:
plt.axvline(a, patched[a]/scale_factor, unpatched[a]/scale_factor, c="orange")
But this resulted in a inaccuracy as seen seen below:
So is there a smarter way to do this? I've realized that the axvline documentation only says that ymin, ymax can only be scalars. Can I somehow turn my given values into fitting scalars?

matplot pandas plotting multiple y values on the same column

Trying to plot using matplot but lines based on the value of a non x , y column.
For example this is my DF:
code reqs value
AGB 253319 57010.16528
ABC 242292 35660.58176
DCC 240440 36587.45336
CHB 172441 57825.83052
DEF 148357 34129.71166
Which yields this plot df.plot(x='reqs',y='value',figsize=(8,4)) :
What I'm looking to do is have a plot with multiple lines one line for each of the codes. Right now its just doing 1 line and ignoring the code column.
I tried searching for an answer but each one is asking for multiple y's I dont have multiple y's I have the same y but with different focuses
(surely i'm using the wrong terms to describe what I'm trying to do hopefully this example and image makes sense)
The result should look something like this:
So I worked out how to do exactly ^ if anyone is curious:
plt_df = df
fig, ax = plt.subplots()
for key,grp in plt_df.groupby(['code']):
ax = grp.plot(ax=ax, kind ='line',x='reqs',y='value',label=key,figsize=(20,4),title = "someTitle")
plt.show()

How can I set the number of ticks in Julia using Pyplot?

I am struggling to 'translate' the instructions I find for Python to the use of Pyplot in Julia. This must be a simple question, but do you know how to set the number of ticks in a plot in Julia using Pyplot?
If you have
x = [1,2,3,4,5]
y = [1,3,6,8,11]
you can
PyPlot.plot(x,y)
which draws the plot
and then do
PyPlot.xticks([1,3,5])
for tics at 1,3 and 5 on the x-axis
PyPlot.yticks([1,6,11])
for tics at 1,6 and 11 on the y-axis
Tic spacing
if you want fx 4 tics and want it evenly spaced and dont mind Floats, you can do
collect(linspace(x[1], x[end], 4).
If you need the tics to be integers and you want 4 tics, you can do
collect(x[1]:div(x[end],4):x[end])
Edit
Maybe this wont belong here but atleast you'll see it...
whenever you're looking for a method that's supposed to be in a module X you can find these methods by typing in the REPL X. + TAB key
to clarify, if you want to search a module for a method you suspect starts with an x, like xticts, in the REPL (terminal/shell) do
PyPlot.x
and press TAB twice and you'll see
julia> PyPlot.x
xkcd xlabel xlim xscale xticks
and if you're not sure exactly how the method works, fx its arguments, and there isnt any help available, you can call
methods(PyPlot.xticks)
to see every "version" that method has
Bonus
The module for all the standard methods, like maximum, vcat etc is Base
After some trying and searching, I found a way to do it. One can just set the number of bins that should be on each axis. Here is a minimal example:
using PyPlot
x = linspace(0, 10, 200)
y = sin(x)
fig, ax = subplots()
ax[:plot](x, y, "r-", linewidth=2, label="sine function", alpha=0.6)
ax[:legend](loc="upper center")
ax[:locator_params](axis ="y", nbins=4)
The last line specifies the number of bins that should be used on the y-axis. Leaving the argument axis unspecified will set that option for both axis at the same value.

sorting data points on a plane for Matplotlib when they are in a random order to start with [duplicate]

This question already has answers here:
Python : 2d contour plot from 3 lists : x, y and rho?
(2 answers)
Closed 8 years ago.
OK... please be patient with me, as I am not really sure how to ask this question.
I am trying to generate a 2-D contour plot of some data (generated by a calculation at points on a Blender plane). The order in which I get these data points is random, but I do know the x,y coordinates for each z value. In other words I have an unsorted collection of [x,y,z] triplets.
My question is... what is the simplest way for me to mash these data points into a set of arrays that I can contour plot with Matplotlib?
This assumes that a) your data is on an evenly spaced grid and b) you have all of the grid points
from pylab import * # mostly to make my fake data work
import copy
# make some fake data
X, Y = np.meshgrid(range(10), range(10))
xyz = zip(X.flat, Y.flat, np.random.rand(100)) # make sure you have a list of tuples
xyz_org = copy.copy(xyz)
# randomize the tuples
shuffle(xyz)
# check we changed the order
assert xyz != xyz_org
# re-sort them
xyz.sort(key=lambda x: x[-2::-1]) # sort only on the first two entries
# check we did it right
assert xyz == xyz_org
# extract the points and re-shape to a grid
X_n, Y_n, z = [np.array(_).reshape(10, 10) for _ in zip(*xyz)]
# check we re-created X and Y correctly
assert np.all(X_n == X)
assert np.all(Y_n == Y)
# make the plot
plt.contour(X_n, Y_n, z)