Using a MultiIndex value in a boolean selection (while setting) - pandas

There is a similar question here: Pandas using row labels in boolean indexing
But that one uses a simple index and I can't figure out how to generalize it to a MultiIndex:
df = DataFrame( { 'ssn' : [ 489, 489, 220, 220 ],
'year': [ 2009, 2010, 2009, 2010 ],
'tax' : [ 300, 600, 800, 900 ],
'flag': [ 0, 0, 0, 0 ] } )
df.set_index( ['ssn','year'], inplace=True )
Semi-solutions:
df.flag[ (df.year ==2010) & (df.tax<700) ] = 9 (works if drop=False in set_index)
df.flag[ (df.index==2010) & (df.tax<700) ] = 9 (works for a simple index)
I've tried several things but I just can't figure out how to generalize from simple index to multi. E.g. df.index.year=2010 and 20 other guesses...

You can use index.get_level_values(), e.g.
df.flag[(df.index.get_level_values('year') == 2010) & (df.tax < 700)] = 9

Related

Linear Diophantine Equations with Restriction in the GAP System

I am searching for a way to use the GAP System to find a solution of a linear Diophantine equation over the non-negative integers. Explicitly, I have a list L of positive integers for each of which there exists a solution of the linear Diophantine equation s = 11*a + 7*b such that a and b are non-negative integers. I would like to have the GAP System return for each element s of L the ordered pair(s) [a, b] corresponding to the above solution(s).
I am familiar already with the command SolutionIntMat in the GAP System; however, this produces only some solution of the linear Diophantine equation s = 11*a + 7*b. Particularly, it is possible (and far more likely) that one of the coefficients a and b is negative. For instance, I obtain the solution [-375, 600] when I use the aforementioned command on the linear Diophantine equation 75 = 11*a + 7*b.
For additional context, this query arises when working with numerical semigroups generated by generalized arithmetic sequences. Use the command LoadPackage("numericalsgps"); to implement computations with such objects. For instance, if S := NumericalSemigroup(11, 29, 36, 43, 50, 57, 64, 71);, then each of the minimal generators of S other than 11 is of the form 2*11 + 7*i for some integer i in [1..7]. One can ask the GAP System for the SmallElements(S);, and the GAP System will return all elements of S up to FrobeniusNumber(S) + 1. Clearly, every element of S is of the form 11*a + 7*b for some non-negative integers a and b; I would like to investigate what coefficients a and b arise. In fact, the answer is known (cf. Proposition 2.5 of this paper); I am just trying to get an understanding of the intuition behind the proof.
Thank you in advance for your time and consideration.
Dylan, thank you for your query and for using GAP and numericalsgps.
You can probably use in this setting Factorizations from the package numericalsgps. It internally rewrites the output of RestrictedPartitions.
For instance, in your example, you can get all possible "factorizations" of the small elements of S, with respect to the generators of S, by typing List(SmallElements(S), x->[x,Factorizations(x,S)]). A particular example:
gap> Factorizations(104,S);
[ [ 1, 0, 0, 1, 1, 0, 0, 0 ], [ 1, 0, 1, 0, 0, 1, 0, 0 ],
[ 1, 1, 0, 0, 0, 0, 1, 0 ], [ 3, 0, 0, 0, 0, 0, 0, 1 ] ]
If you want to see the factorizations of the elements of S in terms of 11 and 7, then you can do the following:
gap> FactorizationsIntegerWRTList(29,[11,7]);
[ [ 2, 1 ] ]
So, for all minimal generators of S you would do
gap> List(MinimalGenerators(S), g-> FactorizationsIntegerWRTList(g,[11,7]));
[ [ [ 1, 0 ] ], [ [ 2, 1 ] ], [ [ 2, 2 ] ], [ [ 2, 3 ] ],
[ [ 2, 4 ] ], [ [ 2, 5 ] ], [ [ 2, 6 ] ], [ [ 2, 7 ] ] ]
For the set of small elements of S, try List(SmallElements(S), g-> FactorizationsIntegerWRTList(g,[11,7])). If you only want up to some integer, just replace SmallElements(S) with Intersection([1..200], S); or if you want the first, say 200, elements of S, use S{[1..200]}.
You may want to have a look at Chapter 9 of the manual, and in particular to FactorizationsElementListWRTNumericalSemigroup.
I hope this helps.

How to extract separate values from GeoJSON in BigQuery

I have a GeoJSON string for a multipoint geometry. I want to extract each of those points to a table of individual point geometries in BigQuery
I have been able to achieve point geometry for one of the points. I want to do it for all the others as well in a automated fashion. I've already tried converting the string to an array but it remains an array of size 1 with the entire content as a single string.
This is what worked for me that I was able to extract one point and convert it to a geometry
WITH temp_table as (select '{ "type": "MultiPoint", "coordinates": [ [ 20, 10 ], [ 30, 5 ], [ 90, 50 ], [ 40, 80 ] ] }' as string)
select ST_GEOGPOINT(CAST(JSON_EXTRACT(string, '$.coordinates[0][0]') as FLOAT64), CAST(JSON_EXTRACT(string, '$.coordinates[0][1]') as FLOAT64)) from temp_table
This results in POINT(20 10)
I can write manual queries for each of these points and do a UNION ALL but that won't scale or work every time. I want to achieve this such that it is able to do it in a automated fashion. For architectural purposes, we can't do string manipulation in languages like Python.
Below is for BigQuery Standard SQL
#standardSQL
SELECT
ARRAY(
SELECT ST_GEOGPOINT(
CAST(SPLIT(pair)[OFFSET(0)] AS FLOAT64), CAST(SPLIT(pair)[SAFE_OFFSET(1)] AS FLOAT64))
FROM UNNEST(REGEXP_EXTRACT_ALL(JSON_EXTRACT(STRING, '$.coordinates'), r'\[(\d+,\d+)\]')) pair
) points
FROM `project.dataset.temp_table`
You can test, play with above using sample data from your question as in below example
#standardSQL
WITH `project.dataset.temp_table` AS (
SELECT '{ "type": "MultiPoint", "coordinates": [ [ 20, 10 ], [ 30, 5 ], [ 90, 50 ], [ 40, 80 ] ] }' AS STRING
)
SELECT
ARRAY(
SELECT ST_GEOGPOINT(
CAST(SPLIT(pair)[OFFSET(0)] AS FLOAT64), CAST(SPLIT(pair)[SAFE_OFFSET(1)] AS FLOAT64))
FROM UNNEST(REGEXP_EXTRACT_ALL(JSON_EXTRACT(STRING, '$.coordinates'), r'\[(\d+,\d+)\]')) pair
) points
FROM `project.dataset.temp_table`
with result
Row points
1 POINT(20 10)
POINT(30 5)
POINT(90 50)
POINT(40 80)
Note: in above version - array of points is produced for each respective original row. Obviously you can adjust it to flatten as in below example
#standardSQL
WITH `project.dataset.temp_table` AS (
SELECT '{ "type": "MultiPoint", "coordinates": [ [ 20, 10 ], [ 30, 5 ], [ 90, 50 ], [ 40, 80 ] ] }' AS STRING
)
SELECT
ST_GEOGPOINT(
CAST(SPLIT(pair)[OFFSET(0)] AS FLOAT64), CAST(SPLIT(pair)[SAFE_OFFSET(1)] AS FLOAT64)
) points
FROM `project.dataset.temp_table`, UNNEST(REGEXP_EXTRACT_ALL(JSON_EXTRACT(STRING, '$.coordinates'), r'\[(\d+,\d+)\]')) pair
with result
Row points
1 POINT(20 10)
2 POINT(30 5)
3 POINT(90 50)
4 POINT(40 80)

Lookup smallest value greater than current

I have an objects table and a lookup table. In the objects table, I'm looking to add the smallest value from the lookup table that is greater than the object's number.
I found this similar question but it's about finding a value greater than a constant, rather than changing for each row.
In code:
import pandas as pd
objects = pd.DataFrame([{"id": 1, "number": 10}, {"id": 2, "number": 30}])
lookup = pd.DataFrame([{"number": 3}, {"number": 12}, {"number": 40}])
expected = pd.DataFrame(
[
{"id": 1, "number": 10, "smallest_greater": 12},
{"id": 2, "number": 30, "smallest_greater": 40},
]
)
First compare each value lookup['number'] by objects['number'] to 2d boolean mask, then add cumsum and compare first value by 1 and get position by numpy.argmax for set value by lookup['number'].
Output is generated with numpy.where for overwrite all not matched values to NaN.
objects = pd.DataFrame([{"id": 1, "number": 10}, {"id": 2, "number": 30},
{"id": 3, "number": 100},{"id": 4, "number": 1}])
print (objects)
id number
0 1 10
1 2 30
2 3 100
3 4 1
m1 = lookup['number'].values >= objects['number'].values[:, None]
m2 = np.cumsum(m1, axis=1) == 1
m3 = np.any(m1, axis=1)
out = lookup['number'].values[m2.argmax(axis=1)]
objects['smallest_greater'] = np.where(m3, out, np.nan)
print (objects)
id number smallest_greater
0 1 10 12.0
1 2 30 40.0
2 3 100 NaN
3 4 1 3.0
smallest_greater = []
for i in objects['number']: smallest_greater.append(lookup['number'[lookup[lookup['number']>i].sort_values(by='number').index[0]])
objects['smallest_greater'] = smallest_greater

Tensorflow, Reshape like a convolution

I have a matrix [3,3,256], my final output must be [4,2,2,256], I have to use a reshape like a 'convolution' without changing the values. (In this case using a filter 2x2). Is there a method to do this using tensorflow?
If I understand your question correctly, you want to store the original values redundantly in the new structure, like this (without the last dim of 256):
[ [ 1 2 3 ] [ [ 1 2 ] [ [ 2 3 ] [ [ 4 5 ] [ [ 5 6 ]
[ 4 5 6 ] => [ 4 5 ] ], [ 5 6 ] ], [ 7 8 ] ], [ 8 9 ] ]
[ 7 8 9 ] ]
If yes, you can use indexing, like this, with x being the original tensor, and then stack them:
x2 = []
for i in xrange( 2 ):
for j in xrange( 2 ):
x2.append( x[ i : i + 2, j : j + 2, : ] )
y = tf.stack( x2, axis = 0 )
Based on your comment, if you really want to avoid using any loops, you might utilize the tf.extract_image_patches, like below (tested code) but you should run some tests because this might actually be much worse than the above in terms of efficiency and perfomance:
import tensorflow as tf
sess = tf.Session()
x = tf.constant( [ [ [ 1, -1 ], [ 2, -2 ], [ 3, -3 ] ],
[ [ 4, -4 ], [ 5, -5 ], [ 6, -6 ] ],
[ [ 7, -7 ], [ 8, -8 ], [ 9, -9 ] ] ] )
xT = tf.transpose( x, perm = [ 2, 0, 1 ] ) # have to put channel dim as batch for tf.extract_image_patches
xTE = tf.expand_dims( xT, axis = -1 ) # extend dims to have fake channel dim
xP = tf.extract_image_patches( xTE, ksizes = [ 1, 2, 2, 1 ],
strides = [ 1, 1, 1, 1 ], rates = [ 1, 1, 1, 1 ], padding = "VALID" )
y = tf.transpose( xP, perm = [ 3, 1, 2, 0 ] ) # move dims back to original and new dim up front
print( sess.run(y) )
Output (horizontal separator lines added manually for readability):
[[[[ 1 -1]
[ 2 -2]]
[[ 4 -4]
[ 5 -5]]]
[[[ 2 -2]
[ 3 -3]]
[[ 5 -5]
[ 6 -6]]]
[[[ 4 -4]
[ 5 -5]]
[[ 7 -7]
[ 8 -8]]]
[[[ 5 -5]
[ 6 -6]]
[[ 8 -8]
[ 9 -9]]]]
I have a similar problem with you and I found that in tf.contrib.kfac.utils there is a function called extract_convolution_patches. Suppose you have a tensor X with shape (1, 3, 3, 256) where the initial 1 marks batch size, you can call
Y = tf.contrib.kfac.utils.extract_convolution_patches(X, (2, 2, 256, 1), padding='VALID')
Y.shape # (1, 2, 2, 2, 2, 256)
The first two 2's will be the number of your output filters (makes up the 4 in your description). The latter two 2's will be the shape of the filters. You can then call
Y = tf.reshape(Y, [4,2,2,256])
to get your final result.

Chartist js chart displaying incorrectly

I'm creating a chart using chartist.js, and am using the FixedScaleAxis to control exactly the points on my y axis.
var data = {
labels: [ "1995",
"1996",
"1997",
"1998",
"1999",
"2000",
"2001",
"2002",
"2003",
"2004",
"2005",
"2006",
"2007",
"2008",
"2009",
"2010",
"2011",
"2012",
"2013",
"2014",
"2015"],
series: [
[ 2.92,
2.83,
2.69,
2.57,
2.4,
2.27,
2.19,
2.15,
2.06,
2.06,
1.92,
1.82,
1.8,
1.76,
1.72,
1.71,
1.61,
1.56,
1.52,
1.41,
1.35]
]
};
var options = {
height:400,
seriesBarDistance: 100,
axisY : {
type : Chartist.FixedScaleAxis,
ticks : [ 0, 0.3, 0.6, 0.9, 1.2, 1.5, 1.8, 2.1, 2.4, 2.7, 3.0 ]
}
};
new Chartist.Line('.ct-chart', data, options);
But when I do and add a height the x axis displays within the chart rather than below it. Please see the fiddle
http://jsfiddle.net/Lnhpwn8x/19/
I suggest using high/low, referenceValue and scaleMinSpace instead. Those are pretty self-explanatory, low is the lowest value on the axis, high is the highest value, referenceValue is a value that is always shown. scaleMinSpace is the minimum distance between two lines on that axis.
axisY : {
low: 1.2,
referenceValue: 3,
scaleMinSpace: 40
}
Fiddle example on the above.
But if you just want the chart to scale to show all of the data, you could just specify scaleMinSpace, and the rest will happen automatically. Fiddle example.