Python: Without Itertools, how do I aggregate values in a list of lists for only the years that are included in that list?

Python: Without Itertools, how do I aggregate values in a list of lists for only the years that are included in that list? - while-loop

I'm attempting to iterate over a list of lists that includes a date range of 1978-2020, but with only built-in Python modules. For instance, my nested list looks something like:
listing =[['0010', 'green', '1978', 'light'], ['0020', 'blue', '1978', 'dark'], ... ['2510', 'red', '2020', 'light']]
As I am iterating through, I am trying to make an aggregated count of colors and shades for that year, and then append that year's totals into a new list such as:
# ['year', 'blues', 'greens', 'light', dark']
annual_totals = [['1978', 12, 34, 8, 16], ['1979', 14, 40, 13, 9], ... , ['2020', 48, 98, 14, 10]]
So my failed code looks something like this:
annual_totals = []
for i in range(1978, 2021):
for line in listing:
while i == line[2] #if year in list same as year in iterated range, count tally for year
blue = 0
green = 0
light = 0
dark = 0
if line[1] == 'blue'
blue += 1
if line[1] == 'green'
blue += 1
if line[3] == 'light'
light += 1
if line[3] == 'dark'
dark += 1
tally = [i, 'blue', 'green', 'light', dark']
annual_totals.append(tally)
Of course, I never get out of the While loop to get a new year for iterable i.

Related

Split column into deciles with equal rows. How to enforce ignoring repeated values

Below is example data which are probabilities from predict_proba. I want to split this data frame into deciles but with equal number of rows in each decile. I used pd.qcut but with that because of the repeating values at the boundary the rows in each decile become unequal.
I used below method to get equal splits which worked but problem is with this approach I can't get bins(range).
test_df["TOP_DECILE"] = pd.qcut(test_df["VALIDATION_PROB_1"].rank(method='first'), 10, retbins = False, labels = [ 10, 20, 30, 40, 50, 60, 70, 80, 90, 100])
For each decile we need to see the probability range as well. This is how we need final output:
Is there a clean implementation we can do achieve this?

This is how I implemented finally:
test_df["TOP_DECILE"] = pd.qcut(test_df["VALIDATION_PROB_1"].rank(method='first'), 10, retbins = False, labels = [100, 90, 80, 70, 60, 50, 40, 30, 20, 10])
test_df = test_df.merge(test_df.groupby('TOP_DECILE')["VALIDATION_PROB_1"].agg(['min', 'max']), right_index=True, left_on='TOP_DECILE')
test_df["PROBILITY_RANGE"] = "[" + (test_df["min"]).astype(str) + " - " + test_df["max"].astype(str) + "]"
But there should be a cleaner approach.

numpy invert stride selection

Consider the following code:
aa = np.arange(16)
step = 4
bb = aa[::4]
This selects every 4th element. Is there a quick and easy numpy function to select the complement of bb? I'm looking for the following output
array([1, 2, 3, 5, 6, 7, 9, 10, 11, 13, 14, 15])
Yes, I could generate indices and then do np.setdiff1d, but I'm looking for something more elegant than that.

If you're looking for a simple single-liner:
np.delete(aa,slice(None,None,4))
Another solution (I don't know about elegant), but you could define a selection index of ones, and then set every fourth element to False to then index the original array:
o = np.ones_like(s,dtype=bool)
o[::step] = False
aa[o]

A flexible way to select based on an arbitrary repeated position could be to use a modulo:
bb = aa[np.arange(len(aa))%step != step-1]
Output:
array([ 0, 1, 2, 4, 5, 6, 8, 9, 10, 12, 13, 14])

Creating multiple columns in pandas with lambda function

I'm trying to create a set of new columns with growth rates within my df in a more efficient way than multiply imputing them one by one.
My df has +100 variables, but for simplicity, assume the following:
consumption = [5, 10, 15, 20, 25, 30, 35, 40]
wage = [10, 20, 30, 40, 50, 60, 70, 80]
period = [1, 2, 3, 4, 5, 6, 7, 8]
id = [1, 1, 1, 1, 1, 1, 1, 1]
tup= list(zip(id , period, wage))
df = pd.DataFrame(tup,
columns=['id ', 'period', 'wage'])
With two variables I could simply do this:
df['wage_chg']= df.sort_values(by=['id', 'period']).groupby(['id'])['wage'].apply(lambda x: (x/x.shift(4)-1)).fillna(0)
df['consumption_chg']= df.sort_values(by=['id', 'period']).groupby(['id'])['consumption'].apply(lambda x: (x/x.shift(4)-1)).fillna(0)
But maybe by using a for loop or something I could iterate over my column names creating new growth rate columns with the name columnname_chg as in the example above.
Any ideas?
Thanks

You can try DataFrame operation rather than Series operation in groupby.apply
cols = ['wage', 'columnname']
out = df.join(df.sort_values(by=['id', 'period'])
.groupby(['id'])[cols]
.apply(lambda g: (g/g.shift(4)-1)).fillna(0)
.add_suffix('_chg'))

Filter a range in kotlin

In kotlin I want filter a range of Int to make odd/even example. so I made a listOf range 1..50
val angka = listOf(1..50)
followed by applying filter for even and filterNot for odd
val genap = angka.filter { it % 2 == 0}
val ganjil = angka.filterNot { it % 2 == 0 }
then printing both the even/odd lists
println("Genap = $genap")
println("Ganjil = $ganjil")
I do not see any problems with code, but it does throws exception mentioned below
Unresolved reference. None of the following candidates is applicable because of receiver type mismatch:
public inline operator fun BigDecimal.mod(other: BigDecimal): BigDecimal defined in kotlin

This is creating a List<IntRange> with a single element:
val angka = listOf(1..50)
You should instead directly filter the range:
val angka = 1..50
The rest of the code is correct.

If you are a beginner with Kotlin, please either specify the type of values explicitly, or turn on the local variable type hits.
This way you would have noticed that the code is not perfect. Your list angka is not a list of type List<Int>, but a list of type List<IntRange>.
Meaning that you are not doing Int % 2 == 0, but in fact, you are doing IntRange % 2 == 0.
If you want to get a list from a range, you need to do (x..y).toList(). So your code will be:
val angka = (1..50).toList() //or since you are not using this list anywhere else, just leave it as `1..50` and the filter will work fine on the IntRange.
val genap = angka.filter { it % 2 == 0 }
val ganjil = angka.filterNot { it % 2 == 0 }
println("Genap = $genap")
println("Ganjil = $ganjil")
With output:
Genap = [2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50]
Ganjil = [1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49]

Your declaration of all numbers is wrong... it is like this val angka: List<IntRange> = listOf(1..50). You created a list of ranges contaning one range.
This should work:
val angka = 1..50
val genap = angka.filter { it % 2 == 0}
val ganjil = angka.filterNot { it % 2 == 0 }
println("Genap = $genap")
println("Ganjil = $ganjil")

In Kotlin, how to find all elements of a list which contains 0 at the end?

The given list is:
var list = mutableListOf(5, 700, 8, 9, 660, 53, 90, 36)
And I really do not know what should I do then
if(0 in list)
println("in list")
What can I add to find 0 at the end?

Your list contains integers, so you want all items that are divisible by 10,
meaning that a division with 10 will leave 0 as remainder.
You can filter the list like this:
val list = mutableListOf(5, 700, 8, 9, 660, 53, 90, 36)
val newList = list.filter { it % 10 == 0 }
println(newList)
will print
[700, 660, 90]
This creates a list newList containing all the items that end with 0 (are divisible by 10)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Python: Without Itertools, how do I aggregate values in a list of lists for only the years that are included in that list? - while-loop

Related

Split column into deciles with equal rows. How to enforce ignoring repeated values

numpy invert stride selection

Creating multiple columns in pandas with lambda function

Filter a range in kotlin

In Kotlin, how to find all elements of a list which contains 0 at the end?

Categories

Resources