Repeat elements from one array based on another - numpy

Given
a = np.array([1,2,3,4,5,6,7,8])
b = np.array(['a','b','c','d','e','f','g','h'])
c = np.array([1,1,1,4,4,4,8,8])
where a & b 'correspond' to each other, how can I use c to slice b to get d which 'corresponds' to c:
d = np.array(['a','a','a','d','d','d','h','h')]
I know how to do this by looping
for n in range(a.shape[0]):
d[n] = b[np.argmax(a==c[n])]
but want to know if I can do this without loops.
Thanks in advance!

With the a that is just position+1 you can simply use
In [33]: b[c - 1]
Out[33]: array(['a', 'a', 'a', 'd', 'd', 'd', 'h', 'h'], dtype='<U1')
I'm tempted to leave it at that, since the a example isn't enough to distinguish it from the argmax approach.
But we can test all a against all c with:
In [36]: a[:,None]==c
Out[36]:
array([[ True, True, True, False, False, False, False, False],
[False, False, False, False, False, False, False, False],
[False, False, False, False, False, False, False, False],
[False, False, False, True, True, True, False, False],
[False, False, False, False, False, False, False, False],
[False, False, False, False, False, False, False, False],
[False, False, False, False, False, False, False, False],
[False, False, False, False, False, False, True, True]])
In [37]: (a[:,None]==c).argmax(axis=0)
Out[37]: array([0, 0, 0, 3, 3, 3, 7, 7])
In [38]: b[_]
Out[38]: array(['a', 'a', 'a', 'd', 'd', 'd', 'h', 'h'], dtype='<U1')

Related

How can I get the row of the first True find in a numpy matrix?

I have the following matrix defined:
d = np.array(
[[False, False, False, False, False, True],
[False, False, False, False, False, True],
[False, False, False, False, True, True],
[False, False, False, False, True, True],
[False, False, False, True, True, True],
[False, False, False, True, True, True],
[False, False, True, True, True, True],
[False, False, True, True, True, True],
[False, True, True, True, True, True],
[False, True, True, True, True, True],
[ True, True, True, True, True, True],
[ True, True, True, True, True, True],
[ True, True, True, True, True, True],
[False, True, True, True, True, True],
[False, False, True, True, True, True],
[False, False, False, True, True, True],
[False, False, False, False, True, True],
[False, False, False, False, False, True],
[False, False, False, False, True, True],
[False, False, False, True, True, True],
[False, False, True, True, True, True],
[False, True, True, True, True, True],
[ True, True, True, True, True, True]])
And I would like to get a vector of length 6 containing the index of the first True occurrence in each column.
So the expected output would be:
fo = np.array([10, 8, 6, 4, 2, 0])
If there would be no True values in a given column ideally it shall return NaN for that column.
I have tried:
np.sum(d, axis=0)
array([ 4, 8, 12, 16, 20, 23])
which together with the length of the columns would give the index, but that would work only if there would be only two continuous regions, one with False and another one with True.
You can do this using argmax which find the first true, and then find columns which all is False to cure the result as needed for columns contain only False. e.g. if the first column all is False:
# if first column be all False, so it show 0, too; which need additional work using mask
ini = np.argmax(d == 1, 0) # [0 8 6 4 2 0] # if we want to fill with nans so convert it to object using ".astype(object)"
sec = (d == 0).all(0) # find column with all False
ini[sec] = 1000
# [1000 8 6 4 2 0]
First, we can iterate through the Numpy array. Then, we can check if True is in the nested array we are looking at. If so, we use .index() to find what the index is.
index_list = []
for nested_list in d:
if True in nested_list:
index_list.append(nested_list.index(True))

Choropleth Plotly calculation & dropdown

Im trying to make a simple choropleth diagram for a branch network in a country via plotly express. my aim is to create a map that shows total fee amount by city and be able to break it down by Fee names. when i run the code i can see the map and its colorized but i cant see the sum and i couldnt manage to get it, I also wasnt able to break it down by fee types. Any suggestions ?
I’ve searched it via forums but i couldnt find any answers, Im also starter and built my code from exercises that i found on the internet
Thanks in advance
from urllib.request import urlopen
import json
with urlopen("https://raw.githubusercontent.com/Babolius/project/62fef3b31fa9e34afb055e493de107d89a50a889/tr-cities-utf8.json") as response:
id = json.load(response)
import pandas as pd
df = pd.read_csv("https://raw.githubusercontent.com/Babolius/project/62fef3b31fa9e34afb055e493de107d89a50a889/komisyon5.csv",encoding ='utf8', dtype={"Fee": int})
import plotly.express as px
fig = px.choropleth_mapbox(df, geojson= id, locations='Id', color= "Fee",
color_continuous_scale="Viridis",
range_color=(0, 5000),
mapbox_style="carto-darkmatter",
zoom=3, center = {"lat": 41.0902, "lon": 28.7129},
opacity=0.5,
)
dropdown_buttons =[{'label': 'A', 'method' : 'restyle', 'args': [{'visible': [True, False, False]}, {'title': 'A'}]},
{'label': 'B', 'method' : 'restyle', 'args': [{'visible': [False, True, False]}, {'title': 'B'}]},
{'label': 'C', 'method' : 'restyle', 'args': [{'visible': [True, False, True]}, {'title': 'C'}]}]
fig.update_layout({'updatemenus':[{'type': 'dropdown', 'showactive': True, 'active': 0, 'buttons': dropdown_buttons}]})
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
fig.show()
I found the solution, just posting the answer for people who might have the same problem. the problem was not in the code but in the data, if you are using plotly express your csv has to different category information in each line, so you need to use "a long data"
I ve adjusted the csv, updated the dropdown and it worked just fine,
This is the post that helped me to solve my problem (https://towardsdatascience.com/visualization-with-plotly-express-comprehensive-guide-eb5ee4b50b57)
from urllib.request import urlopen
import json
with urlopen("https://raw.githubusercontent.com/Babolius/project/62fef3b31fa9e34afb055e493de107d89a50a889/tr-cities-utf8.json") as response:
id = json.load(response)
import pandas as pd
df = pd.read_csv("https://raw.githubusercontent.com/Babolius/project/main/komisyon5.csv",encoding ='utf8', dtype={"Toplam": int})
df.groupby(['ID']).sum()
import plotly.express as px
fig = px.choropleth_mapbox(df, geojson= id, locations= 'ID', color= "Toplam",
color_continuous_scale="Viridis",
range_color=(0, 1000000),
mapbox_style="carto-darkmatter",
zoom=3, center = {"lat": 41.0902, "lon": 28.7129},
opacity=0.5,
)
dropdown_buttons =[{'label': 'A', 'method' : 'restyle', 'args': [{'z': [df["A"]]}, {'visible': [True, False, False, False, False, False]}, {'title': 'A'}]},
{'label': 'B', 'method' : 'restyle', 'args': [{'z': [df["B"]]}, {'visible': [False, True, False, False, False, False]}, {'title': 'B'}]},
{'label': 'C', 'method' : 'restyle', 'args': [{'z': [df["C"]]}, {'visible': [False, False, True, False, False, False]}, {'title': 'C'}]},
{'label': 'D', 'method' : 'restyle', 'args': [{'z': [df["D"]]}, {'visible': [False, False, False, True, False, False]}, {'title': 'D'}]},
{'label': 'E', 'method' : 'restyle', 'args': [{'z': [df["E"]]}, {'visible': [False, False, False, False, True, False]}, {'title': 'E'}]},
{'label': 'Toplam', 'method' : 'restyle', 'args': [{'z': [df["Toplam"]]}, {'visible': [False, False, False, False, False, True]}, {'title': 'Toplam'}]}]
fig.update_layout({'updatemenus':[{'type': 'dropdown', 'showactive': True, 'active': 0, 'buttons': dropdown_buttons}]})
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
fig.show()

Do operation based on a current event index, and the index of the next event of another column [pandas]

Ok so this one is very hard to describe. So i will just put together an example to explain.
pd.DataFrame({'event_a': [False, True, False, False, False, True, False, False, False, True, False],
'event_b': [False, False, False, True, False, False, False, False, True, False, False],
'value': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]})
Here we have two events and a value columns. The events will always alternate (there will never be event a and event b in the same index, and there will never be two of the same events in a row without the other event in between)
my specific operation i want to perform is abs(next_value / current_value - 1)
Given this, my output for this example should look like...
output = [na, 1, na, 0.5, na, 0.5, na, na, 0.111, na, na]
Row 2 for example is abs(4 (value of next event) / 2 (current event) - 1) = 1
try doing:
cond = df.loc[:, ['event_a', 'event_b']].any(axis=1)
output = np.ones(cond.size) * np.nan
output[cond] = (t.loc[cond, 'value'].shift(-1) / t.loc[cond, 'value']).subtract(1).abs()

Puzzled by TensorFlow nn.in_top_k output

I modified tensorflow convnet tutorial
to train just two classes.
Then I evaluated the model using cifar10_eval.py
I tried to understand the output of tf.nn.in_top_k
L128 top_k_op = tf.nn.in_top_k(logits, labels, 1)
which is printed out as:
in_top_k output:::
[array([ True, False, True, False, True, True, True, True, True, True], dtype=bool)]
while the true labels(two classes, 10 images) are:::
[0 1 1 1 1 1 1 1 1 0]
and the logits are:::
[[ 1.45472026 -1.46666598]
[-1.0181191 1.03441548]
[-1.02658665 1.04306769]
[-1.19205511 1.21065331]
[-1.22167087 1.24064851]
[-0.89583808 0.91119087]
[-0.17517655 0.18206072]
[-0.09379113 0.09957675]
[-1.05578279 1.07254183]
[ 0.73048806 -0.73411369] ]
Question: Why the second and fourth nn.in_top_k() output are False instead of True?
It shouldn't happen.
I evaluated the example you gave and got:
In [6]: top_k_op = tf.nn.in_top_k(logits, labels, 1)
In [7]: top_k_op.eval()
Out[7]: array([ True, True, True, True, True, True, True, True, True, True], dtype=bool)
By the way, you can substitute in_top_k(A, B, 1) by a simple argmax:
In [14]: tf.equal(tf.argmax(logits, 1), labels, tf.int64).eval()
Out[14]: array([ True, True, True, True, True, True, True, True, True, True], dtype=bool)

numpy.genfromtxt cannot read boolean data correctly

No matter what the input value is, the np.genfromtxt will always return False.
Using dtype='u1' I get '1' as expected. But with dtype='b1' (Numpy's bool) I get 'False'.
I don't know if this is a bug or not, but so far, I've been able to get dtype=bool to work (without an explicit converter) only if the file contains the literal strings 'False' and 'True':
In [21]: bool_lines = ['False,False', 'False,True', 'True,False', 'True,True']
In [22]: genfromtxt(bool_lines, delimiter=',', dtype=bool)
Out[22]:
array([[False, False],
[False, True],
[ True, False],
[ True, True]], dtype=bool)
If your data is 0s and 1s, you can read it as integers and then convert to bool:
In [26]: bits = ['0,0', '0,1', '1,0', '1,1']
In [27]: genfromtxt(bits, delimiter=',', dtype=np.uint8).astype(bool)
Out[27]:
array([[False, False],
[False, True],
[ True, False],
[ True, True]], dtype=bool)
Or you can use a converter for each column
In [28]: cnv = lambda s: bool(int(s))
In [29]: converters = {0: cnv, 1: cnv}
In [30]: genfromtxt(bits, delimiter=',', dtype=bool, converters=converters)
Out[30]:
array([[False, False],
[False, True],
[ True, False],
[ True, True]], dtype=bool)