What's the equivalent of Python's list[3:7] in REBOL or Red? - rebol

With Rebol pick I can only get one element:
list: [1 2 3 4 5 6 7 8 9]
pick list 3
In python one can get a whole sub-list with
list[3:7]

AT can seek a position at a list.
COPY will copy from a position to the end of list, by default
the /PART refinement of COPY lets you add a limit to copying
Passing an integer to /PART assumes how many things you want to copy:
>> list: [1 2 3 4 5 6 7 8 9]
>> copy/part (at list 3) 5
== [3 4 5 6 7]
If you provide a series position to be the end, then it will copy up to that point, so you'd have to be past it if your range means to be inclusive.
>> copy/part (at list 3) (next at list 7)
== [3 4 5 6 7]
There have been some proposals for range dialects, I can't find any offhand. Simple code to give an idea:
range: func [list [series!] spec [block!] /local start end] [
if not parse spec [
set start integer! '.. set end integer!
][
do make error! "Bad range spec, expected e.g. [3 .. 7]"
]
copy/part (at list start) (next at list end)
]
>> list: [1 2 3 4 5 6 7 8 9]
>> range list [3 .. 7]
== [3 4 5 6 7]

>> list: [1 2 3 4 5 6 7 8 9]
== [1 2 3 4 5 6 7 8 9]
>> copy/part skip list 2 5
== [3 4 5 6 7]
So, you can skip to the right location in the list, and then copy as many consecutive members as you need.
If you want an equivalent function, you can write your own.

Related

Find common values within groupby in pandas Dataframe based on two columns

I have following dataframe:
period symptoms recovery
1 4 2
1 5 2
1 6 2
2 3 1
2 5 2
2 8 4
2 12 6
3 4 2
3 5 2
3 6 3
3 8 5
4 5 2
4 8 4
4 12 6
I'm trying to find the common values of df['period'] groups (1, 2, 3, 4) based on value
of two columns 'symptoms' and 'recovery'
Result should be :
symptoms recovery period
5 2 [1, 2, 3, 4]
8 4 [2, 4]
where each same two columns values has the periods occurrence in a list or column.
I'm I approaching the problem in the wrong way ? Appreciate your help.
I tried to turn each period into dict and loop through to find values but didn't work for me. Also tried to use grouby().apply() but I'm not getting a meaningful data frame.
Tried sorting values based on 3 columns but couldn't get the common ones between each period section.
Last attempt :
df2 = df[['period', 'how_long', 'days_to_ex']].copy()
#s = df.groupby(["period", "symptoms", "recovery"]).size()
s = df.groupby(["symptoms", "recovery"]).size()
You were almost there:
from io import StringIO
import pandas as pd
# setup sample data
data = StringIO("""
period;symptoms;recovery
1;4;2
1;5;2
1;6;2
2;3;1
2;5;2
2;8;4
2;12;6
3;4;2
3;5;2
3;6;3
3;8;5
4;5;2
4;8;4
4;12;6
""")
df = pd.read_csv(data, sep=";")
# collect unique periods
df.groupby(['symptoms','recovery'])[['period']].agg(list).reset_index()
This gives
symptoms recovery period
0 3 1 [2]
1 4 2 [1, 3]
2 5 2 [1, 2, 3, 4]
3 6 2 [1]
4 6 3 [3]
5 8 4 [2, 4]
6 8 5 [3]
7 12 6 [2, 4]

Find pattern in pandas dataframe, reorder it row-wise, and reset index

This is a multipart problem. I have found solutions for each separate part, but when I try to combine these solutions, I don't get the outcome I want.
Let's say this is my dataframe:
df = pd.DataFrame(list(zip([1, 3, 6, 7, 7, 8, 4], [6, 7, 7, 9, 5, 3, 1])), columns = ['Values', 'Vals'])
df
Values Vals
0 1 6
1 3 7
2 6 7
3 7 9
4 7 5
5 8 3
6 4 1
Let's say I want to find the pattern [6, 7, 7] in the 'Values' column.
I can use a modified version of the second solution given here:
Pandas: How to find a particular pattern in a dataframe column?
pattern = [6, 7, 7]
pat_i = [df[i-len(pattern):i] # Get the index
for i in range(len(pattern), len(df)) # for each 3 consequent elements
if all(df['Values'][i-len(pattern):i] == pattern)] # if the pattern matched
pat_i
[ Values Vals
2 6 7
3 7 9
4 7 5]
The only way I've found to narrow this down to just index values is the following:
pat_i = [df.index[i-len(pattern):i] # Get the index
for i in range(len(pattern), len(df)) # for each 3 consequent elements
if all(df['Values'][i-len(pattern):i] == pattern)] # if the pattern matched
pat_i
[RangeIndex(start=2, stop=5, step=1)]
Once I've found the pattern, what I want to do, within the original dataframe, is reorder the pattern to [7, 7, 6], moving the entire associated rows as I do this. In other words, going by the index, I want to get output that looks like this:
df.reindex([0, 1, 3, 4, 2, 5, 6])
Values Vals
0 1 6
1 3 7
3 7 9
4 7 5
2 6 7
5 8 3
6 4 1
Then, finally, I want to reset the index so that the values in all the columns stay in the new re-ordered place;
Values Vals
0 1 6
1 3 7
2 7 9
3 7 5
4 6 7
5 8 3
6 4 1
In order to use pat_i as a basis for re-ordering, I've tried to modify the second solution given here:
Python Pandas: How to move one row to the first row of a Dataframe?
target_row = 2
# Move target row to first element of list.
idx = [target_row] + [i for i in range(len(df)) if i != target_row]
However, I can't figure out how to exploit the pat_i RangeIndex object to use it with this code. The solution, when I find it, will be applied to hundreds of dataframes, each one of which will contain the [6, 7, 7] pattern that needs to be re-ordered in one place, but not the same place in each dataframe.
Any help appreciated...and I'm sure there must be an elegant, pythonic way of doing this, as it seems like it should be a common enough challenge. Thank you.
I just sort of rewrote your code. I held the first and last indexes to the side, reordered the indexes of interest, and put everything together in a new index. Then I just use the new index to reorder the data.
import pandas as pd
from pandas import RangeIndex
df = pd.DataFrame(list(zip([1, 3, 6, 7, 7, 8, 4], [6, 7, 7, 9, 5, 3, 1])), columns = ['Values', 'Vals'])
pattern = [6, 7, 7]
new_order = [1, 2, 0] # new order of pattern
for i in list(df[df['Values'] == pattern[0]].index):
if all(df['Values'][i:i+len(pattern)] == pattern):
pat_i = df[i:i+len(pattern)]
front_ind = list(range(0, pat_i.index[0]))
back_ind = list(range(pat_i.index[-1]+1, len(df)))
pat_ind = [pat_i.index[i] for i in new_order]
new_ind = front_ind + pat_ind + back_ind
df = df.loc[new_ind].reset_index(drop=True)
df
Out[82]:
Values Vals
0 1 6
1 3 7
2 7 9
3 7 5
4 6 7
5 8 3
6 4 1

How to slice continuous and discontinuous index in pandas?

pandas iloc could slice dataframe two cases such as df.iloc[:,2:5] and df.iloc[:,[6,10]].
If I want to select 2:5, 6 and 10 columns, how to use iloc to slice df?
Use numpy.r_:
From docs:
Translates slice objects to concatenation along the first axis.
This is a simple way to build up arrays quickly. There are two use
cases.
If the index expression contains comma separated arrays, then stack
them along their first axis.
If the index expression contains slice
notation or scalars then create a 1-D array with a range indicated by
the slice notation.
Demo:
In [16]: df = pd.DataFrame(np.random.rand(3, 12))
In [17]: df.iloc[:, np.r_[2:5, 6, 10]]
Out[17]:
2 3 4 6 10
0 0.760201 0.378125 0.707002 0.310077 0.375646
1 0.770165 0.269465 0.419979 0.218768 0.832087
2 0.253142 0.737015 0.652522 0.474779 0.094145
In [18]: df
Out[18]:
0 1 2 3 4 5 6 7 8 9 10 11
0 0.668062 0.581268 0.760201 0.378125 0.707002 0.249094 0.310077 0.336708 0.847258 0.705631 0.375646 0.830852
1 0.521096 0.798405 0.770165 0.269465 0.419979 0.455890 0.218768 0.833776 0.862483 0.817974 0.832087 0.958174
2 0.211815 0.747482 0.253142 0.737015 0.652522 0.274231 0.474779 0.256119 0.110760 0.224096 0.094145 0.525201
UPDATE: starting from Pandas 0.20.1 the .ix indexer is deprecated, in favor of the more strict .iloc and .loc indexers.
So I updated my answer in order to fix that deprecated feature: changed .ix[] --> df.iloc[...]
I think you need numpy.r_ for concanecate indices and then iloc for selecting by positions:
ds = pd.DataFrame({'A':[1,2,3],
'B':[4,5,6],
'C':[7,8,9],
'D':[1,3,5],
'E':[5,3,6],
'F':[7,4,3],
'G':[1,3,5],
'H':[5,3,6],
'I':[4,4,3],
'J':[6,4,3],
'K':[9,4,3]})
print (ds)
A B C D E F G H I J K
0 1 4 7 1 5 7 1 5 4 6 9
1 2 5 8 3 3 4 3 3 4 4 4
2 3 6 9 5 6 3 5 6 3 3 3
print (np.r_[2:5, 6,10])
[ 2 3 4 6 10]
print (ds.iloc[:, np.r_[2:5, 6,10]])
C D E G K
0 7 1 5 1 9
1 8 3 3 3 4
2 9 5 6 5 3
To discussion:
ix vs iloc - main problem is ix will be deprecated in Pandas 0.20.0. And it seems new version is soon - in April, so better is use iloc.

intersect(A,B) returns the data with no repetitions

I was using "intersect" in my Matlab code where I want the following:
A = [ 4 1 1 2 3];
[B] = sort(A, 'ascend'); % so that B is sorting A in ascending order, so I got B = [1 1 2 3 4]
[same,a] = intersect(B,A);
I want same = [1 1 2 3 4] but the simulation gives me same = [1 2 3 4] by omitting the repeated '1'.
I understand by using intersect it will return data with no repetition
C = intersect(A,B) returns the data common to both A and B with no repetitions.
I want it to show the complete data including those repetition, what are the alternatives I can use rather than the function "intersect"?
For example:
A = [ 4 1 1 2 3];
[B] = sort(A, 'ascend'); % so that B is sorting A in ascending order, so I got B = [1 1 2 3 4]
[same,a] = intersect(B,A);
So now I want it to be like this same =[1 1 2 3 4] and a=[2 3 4 5 1].
I need to access ‘a’ where ‘a’ shows the original index prior to sorting so I can use it for further processing.
Thank you very much.
Why do you need intersect of A and B knowing that B contains the same values than A ?
From what you said, I think you have all the needed results in B.

Rebol select like function returning more than just the next value?

Does this exist ? If not what's the best way to create it ?
If you want to return all the values after the target value, you can use next find
eg:
data: copy [1 2 3 4 5 6 7 8 9]
select data 5
== 6 ;; returns the next value only.
find data 5
== [5 6 7 8 9] ;; returns the series at that point, so ...
next find data 5
== [6 7 8 9] ;; ... returns the series after that point.
If you just want the next N items, add a copy/part...N
eg (next three items):
copy/part next find data 5 3
== [6 7 8]
I'll leave you to add the error code for when the value is not found:
next find data 0
Use find/tail,
>> find/tail [a b c d e] 'c
== [d e]
>> find/tail [a b c d e] 'x
== none