Related
Original data
structure(list(Year = c(1999, 1999, 1999, 2000, 2000, 2000),
Country = c("a", "b", "b", "a", "a", "b"), number = c(2,
3, 4, 5, 3, 6), result = c(2, 4, 5, 6, 2, 2)), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -6L))
What I need is
Year Country weightresult
weightresult=result*(number/sum(number_year,country))
with weight result by number and the sum number is according to Year,Country group
the process result is
tructure(list(Year = c(1999, 1999, 1999, 2000, 2000, 2000),
Country = c("a", "b", "b", "a", "a", "b"), number = c(2,
3, 4, 5, 3, 6), result = c(2, 4, 5, 6, 2, 2), weight = c(2,
7, 7, 8, 8, 6), wre = c(2, 1.71428571428571, 2.85714285714286,
3.75, 0.75, 2)), row.names = c(NA, -6L), class = c("tbl_df",
"tbl", "data.frame"))
Finally need is
structure(list(Country = c("a", "a", "b", "b"), Year = c(1999,
2000, 1999, 2000), wre = c(2, 4.5, 4.57142857142857, 2)), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -4L))
How to get the finally result in Bigquery Standard SQL
SELECT
Year,
Country,
(number/(SUM(number) OVER (PARTITION BY Year, Country))) * result AS wre,
Count(*),
FROM `table`
Where
Year<=2020
GROUP BY Year,Country
ORDER BY Year,Country
And the error is
SELECT list expression references column number which is neither grouped nor aggregated at ...
Use below
select
year,
country,
sum(number * result) / sum(number) as weighted_result
from your_table
where year <= 2020
group by year,country
order by year,country
with output
Hello I wanted to apply a mod function of column % 24 to the hour of time column.
I believe the time column is in a string format,
I was wondering how I should go about performing the operation.
sales_id,date,time,shopping_cart,price,parcel_size,Customer_lat,Customer_long,isLoyaltyProgram,nearest_storehouse_id,nearest_storehouse,dist_to_nearest_storehouse,delivery_cost
ORD0056604,24/03/2021,45:13:45,"[('bed', 3), ('Chair', 1), ('wardrobe', 4), ('side_table', 2), ('Dining_table', 2), ('mattress', 1)]",3152.77,medium,-38.246,145.61984,1,4,Sunshine,78.43,5.8725000000000005
ORD0096594,13/12/2018,54:22:20,"[('Study_table', 4), ('wardrobe', 4), ('side_table', 1), ('Dining_table', 2), ('sofa', 4), ('Chair', 3), ('mattress', 1)]",3781.38,large,-38.15718,145.05072,1,4,Sunshine,40.09,5.8725000000000005
ORD0046310,16/02/2018,17:23:36,"[('mattress', 2), ('wardrobe', 1), ('side_table', 2), ('sofa', 1), ('Chair', 3), ('Study_table', 4)]",2219.09,medium,144.69623,-38.00731,0,2,Footscray,34.2,16.9875
ORD0031675,25/06/2018,17:38:48,"[('bed', 4), ('side_table', 1), ('Chair', 1), ('mattress', 3), ('Dining_table', 2), ('sofa', 2), ('wardrobe', 2)]",4542.1,large,144.65506,-38.40669,1,2,Footscray,72.72,18.274500000000003
ORD0019799,05/01/2021,18:37:16,"[('wardrobe', 1), ('Study_table', 3), ('sofa', 4), ('side_table', 2), ('Chair', 4), ('Dining_table', 4), ('bed', 1)]",3132.71,L,-37.66022,144.94286,1,0,Clayton,17.77,14.931
ORD0041462,25/12/2018,07:29:33,"[('Chair', 3), ('bed', 1), ('mattress', 3), ('side_table', 3), ('wardrobe', 3), ('sofa', 4)]",4416.42,medium,-38.39154,145.87448,0,6,Sunshine,105.91,6.151500000000001
ORD0047848,30/07/2021,34:18:01,"[('Chair', 3), ('bed', 3), ('wardrobe', 4)]",2541.04,small,-37.4654,144.45832,1,2,Footscray,60.85,18.4635
Convert values to timedeltas by to_timedelta and then remove days by indexing - selecting last 8 values:
print (df)
sales_id date time
0 ORD0056604 24/03/2021 45:13:45
1 ORD0096594 13/12/2018 54:22:20
print (pd.to_timedelta(df['time']))
0 1 days 21:13:45
1 2 days 06:22:20
Name: time, dtype: timedelta64[ns]
df['time'] = pd.to_timedelta(df['time']).astype(str).str[-8:]
print (df)
sales_id date time
0 ORD0056604 24/03/2021 21:13:45
1 ORD0096594 13/12/2018 06:22:20
If need also add days to date column solution is add timedeltas to dates and last extract values by Series.dt.strftime:
dates = pd.to_datetime(df['date'], dayfirst=True) + pd.to_timedelta(df['time'])
df['time'] = dates.dt.strftime('%H:%M:%S')
df['date'] = dates.dt.strftime('%d/%m/%Y')
print (df)
sales_id date time
0 ORD0056604 25/03/2021 21:13:45
1 ORD0096594 15/12/2018 06:22:20
Is there a way to remove all duplicate lines from a string. For example i have a string
Mystring = "INSERT INTO Table(Column1, Column2) Values(50, 50), (100, 100), (50, 50)"
How to delete second (50, 50)
How to give input to the model if not a numpy array?
def createmodel():
myInput = Input(shape=(96, 96, 3))
x = ZeroPadding2D(padding=(3, 3), input_shape=(96, 96, 3))(myInput)
x = Conv2D(64, (7, 7), strides=(2, 2), name='conv1')(x)
x = BatchNormalization(axis=3, epsilon=0.00001, name='bn1')(x)
x = Activation('relu')(x)
x = ZeroPadding2D(padding=(1, 1))(x)
x = MaxPooling2D(pool_size=3, strides=2)(x)
x = Lambda(LRN2D, name='lrn_1')(x)
x = Conv2D(64, (1, 1), name='conv2')(x)
x = BatchNormalization(axis=3, epsilon=0.00001, name='bn2')(x)
x = Activation('relu')(x)
x = ZeroPadding2D(padding=(1, 1))(x)
x = Conv2D(192, (3, 3), name='conv3')(x)
x = BatchNormalization(axis=3, epsilon=0.00001, name='bn3')(x)
x = Activation('relu')(x)
x = Lambda(LRN2D, name='lrn_2')(x)
x = ZeroPadding2D(padding=(1, 1))(x)
x = MaxPooling2D(pool_size=3, strides=2)(x)
# Inception3a
inception_3a_3x3 = Conv2D(96, (1, 1), name='inception_3a_3x3_conv1')(x)
inception_3a_3x3 = BatchNormalization(axis=3, epsilon=0.00001, name='inception_3a_3x3_bn1')(inception_3a_3x3)
inception_3a_3x3 = Activation('relu')(inception_3a_3x3)
inception_3a_3x3 = ZeroPadding2D(padding=(1, 1))(inception_3a_3x3)
inception_3a_3x3 = Conv2D(128, (3, 3), name='inception_3a_3x3_conv2')(inception_3a_3x3)
inception_3a_3x3 = BatchNormalization(axis=3, epsilon=0.00001, name='inception_3a_3x3_bn2')(inception_3a_3x3)
inception_3a_3x3 = Activation('relu')(inception_3a_3x3)
inception_3a_5x5 = Conv2D(16, (1, 1), name='inception_3a_5x5_conv1')(x)
inception_3a_5x5 = BatchNormalization(axis=3, epsilon=0.00001, name='inception_3a_5x5_bn1')(inception_3a_5x5)
inception_3a_5x5 = Activation('relu')(inception_3a_5x5)
inception_3a_5x5 = ZeroPadding2D(padding=(2, 2))(inception_3a_5x5)
inception_3a_5x5 = Conv2D(32, (5, 5), name='inception_3a_5x5_conv2')(inception_3a_5x5)
inception_3a_5x5 = BatchNormalization(axis=3, epsilon=0.00001, name='inception_3a_5x5_bn2')(inception_3a_5x5)
inception_3a_5x5 = Activation('relu')(inception_3a_5x5)
inception_3a_pool = MaxPooling2D(pool_size=3, strides=2)(x)
inception_3a_pool = Conv2D(32, (1, 1), name='inception_3a_pool_conv')(inception_3a_pool)
inception_3a_pool = BatchNormalization(axis=3, epsilon=0.00001, name='inception_3a_pool_bn')(inception_3a_pool)
inception_3a_pool = Activation('relu')(inception_3a_pool)
inception_3a_pool = ZeroPadding2D(padding=((3, 4), (3, 4)))(inception_3a_pool)
inception_3a_1x1 = Conv2D(64, (1, 1), name='inception_3a_1x1_conv')(x)
inception_3a_1x1 = BatchNormalization(axis=3, epsilon=0.00001, name='inception_3a_1x1_bn')(inception_3a_1x1)
inception_3a_1x1 = Activation('relu')(inception_3a_1x1)
inception_3a = concatenate([inception_3a_3x3, inception_3a_5x5, inception_3a_pool, inception_3a_1x1], axis=3)
# Inception3b
inception_3b_3x3 = Conv2D(96, (1, 1), name='inception_3b_3x3_conv1')(inception_3a)
inception_3b_3x3 = BatchNormalization(axis=3, epsilon=0.00001, name='inception_3b_3x3_bn1')(inception_3b_3x3)
inception_3b_3x3 = Activation('relu')(inception_3b_3x3)
inception_3b_3x3 = ZeroPadding2D(padding=(1, 1))(inception_3b_3x3)
inception_3b_3x3 = Conv2D(128, (3, 3), name='inception_3b_3x3_conv2')(inception_3b_3x3)
inception_3b_3x3 = BatchNormalization(axis=3, epsilon=0.00001, name='inception_3b_3x3_bn2')(inception_3b_3x3)
inception_3b_3x3 = Activation('relu')(inception_3b_3x3)
inception_3b_5x5 = Conv2D(32, (1, 1), name='inception_3b_5x5_conv1')(inception_3a)
inception_3b_5x5 = BatchNormalization(axis=3, epsilon=0.00001, name='inception_3b_5x5_bn1')(inception_3b_5x5)
inception_3b_5x5 = Activation('relu')(inception_3b_5x5)
inception_3b_5x5 = ZeroPadding2D(padding=(2, 2))(inception_3b_5x5)
inception_3b_5x5 = Conv2D(64, (5, 5), name='inception_3b_5x5_conv2')(inception_3b_5x5)
inception_3b_5x5 = BatchNormalization(axis=3, epsilon=0.00001, name='inception_3b_5x5_bn2')(inception_3b_5x5)
inception_3b_5x5 = Activation('relu')(inception_3b_5x5)
inception_3b_pool = Lambda(lambda x: x**2, name='power2_3b')(inception_3a)
inception_3b_pool = AveragePooling2D(pool_size=(3, 3), strides=(3, 3))(inception_3b_pool)
inception_3b_pool = Lambda(lambda x: x*9, name='mult9_3b')(inception_3b_pool)
inception_3b_pool = Lambda(lambda x: K.sqrt(x), name='sqrt_3b')(inception_3b_pool)
inception_3b_pool = Conv2D(64, (1, 1), name='inception_3b_pool_conv')(inception_3b_pool)
inception_3b_pool = BatchNormalization(axis=3, epsilon=0.00001, name='inception_3b_pool_bn')(inception_3b_pool)
inception_3b_pool = Activation('relu')(inception_3b_pool)
inception_3b_pool = ZeroPadding2D(padding=(4, 4))(inception_3b_pool)
inception_3b_1x1 = Conv2D(64, (1, 1), name='inception_3b_1x1_conv')(inception_3a)
inception_3b_1x1 = BatchNormalization(axis=3, epsilon=0.00001, name='inception_3b_1x1_bn')(inception_3b_1x1)
inception_3b_1x1 = Activation('relu')(inception_3b_1x1)
inception_3b = concatenate([inception_3b_3x3, inception_3b_5x5, inception_3b_pool, inception_3b_1x1], axis=3)
# Inception3c
inception_3c_3x3 = utils.conv2d_bn(inception_3b,
layer='inception_3c_3x3',
cv1_out=128,
cv1_filter=(1, 1),
cv2_out=256,
cv2_filter=(3, 3),
cv2_strides=(2, 2),
padding=(1, 1))
inception_3c_5x5 = utils.conv2d_bn(inception_3b,
layer='inception_3c_5x5',
cv1_out=32,
cv1_filter=(1, 1),
cv2_out=64,
cv2_filter=(5, 5),
cv2_strides=(2, 2),
padding=(2, 2))
inception_3c_pool = MaxPooling2D(pool_size=3, strides=2)(inception_3b)
inception_3c_pool = ZeroPadding2D(padding=((0, 1), (0, 1)))(inception_3c_pool)
inception_3c = concatenate([inception_3c_3x3, inception_3c_5x5, inception_3c_pool], axis=3)
#inception 4a
inception_4a_3x3 = utils.conv2d_bn(inception_3c,
layer='inception_4a_3x3',
cv1_out=96,
cv1_filter=(1, 1),
cv2_out=192,
cv2_filter=(3, 3),
cv2_strides=(1, 1),
padding=(1, 1))
inception_4a_5x5 = utils.conv2d_bn(inception_3c,
layer='inception_4a_5x5',
cv1_out=32,
cv1_filter=(1, 1),
cv2_out=64,
cv2_filter=(5, 5),
cv2_strides=(1, 1),
padding=(2, 2))
inception_4a_pool = Lambda(lambda x: x**2, name='power2_4a')(inception_3c)
inception_4a_pool = AveragePooling2D(pool_size=(3, 3), strides=(3, 3))(inception_4a_pool)
inception_4a_pool = Lambda(lambda x: x*9, name='mult9_4a')(inception_4a_pool)
inception_4a_pool = Lambda(lambda x: K.sqrt(x), name='sqrt_4a')(inception_4a_pool)
inception_4a_pool = utils.conv2d_bn(inception_4a_pool,
layer='inception_4a_pool',
cv1_out=128,
cv1_filter=(1, 1),
padding=(2, 2))
inception_4a_1x1 = utils.conv2d_bn(inception_3c,
layer='inception_4a_1x1',
cv1_out=256,
cv1_filter=(1, 1))
inception_4a = concatenate([inception_4a_3x3, inception_4a_5x5, inception_4a_pool, inception_4a_1x1], axis=3)
#inception4e
inception_4e_3x3 = utils.conv2d_bn(inception_4a,
layer='inception_4e_3x3',
cv1_out=160,
cv1_filter=(1, 1),
cv2_out=256,
cv2_filter=(3, 3),
cv2_strides=(2, 2),
padding=(1, 1))
inception_4e_5x5 = utils.conv2d_bn(inception_4a,
layer='inception_4e_5x5',
cv1_out=64,
cv1_filter=(1, 1),
cv2_out=128,
cv2_filter=(5, 5),
cv2_strides=(2, 2),
padding=(2, 2))
inception_4e_pool = MaxPooling2D(pool_size=3, strides=2)(inception_4a)
inception_4e_pool = ZeroPadding2D(padding=((0, 1), (0, 1)))(inception_4e_pool)
inception_4e = concatenate([inception_4e_3x3, inception_4e_5x5, inception_4e_pool], axis=3)
#inception5a
inception_5a_3x3 = utils.conv2d_bn(inception_4e,
layer='inception_5a_3x3',
cv1_out=96,
cv1_filter=(1, 1),
cv2_out=384,
cv2_filter=(3, 3),
cv2_strides=(1, 1),
padding=(1, 1))
inception_5a_pool = Lambda(lambda x: x**2, name='power2_5a')(inception_4e)
inception_5a_pool = AveragePooling2D(pool_size=(3, 3), strides=(3, 3))(inception_5a_pool)
inception_5a_pool = Lambda(lambda x: x*9, name='mult9_5a')(inception_5a_pool)
inception_5a_pool = Lambda(lambda x: K.sqrt(x), name='sqrt_5a')(inception_5a_pool)
inception_5a_pool = utils.conv2d_bn(inception_5a_pool,
layer='inception_5a_pool',
cv1_out=96,
cv1_filter=(1, 1),
padding=(1, 1))
inception_5a_1x1 = utils.conv2d_bn(inception_4e,
layer='inception_5a_1x1',
cv1_out=256,
cv1_filter=(1, 1))
inception_5a = concatenate([inception_5a_3x3, inception_5a_pool, inception_5a_1x1], axis=3)
#inception_5b
inception_5b_3x3 = utils.conv2d_bn(inception_5a,
layer='inception_5b_3x3',
cv1_out=96,
cv1_filter=(1, 1),
cv2_out=384,
cv2_filter=(3, 3),
cv2_strides=(1, 1),
padding=(1, 1))
inception_5b_pool = MaxPooling2D(pool_size=3, strides=2)(inception_5a)
inception_5b_pool = utils.conv2d_bn(inception_5b_pool,
layer='inception_5b_pool',
cv1_out=96,
cv1_filter=(1, 1))
inception_5b_pool = ZeroPadding2D(padding=(1, 1))(inception_5b_pool)
inception_5b_1x1 = utils.conv2d_bn(inception_5a,
layer='inception_5b_1x1',
cv1_out=256,
cv1_filter=(1, 1))
inception_5b = concatenate([inception_5b_3x3, inception_5b_pool, inception_5b_1x1], axis=3)
av_pool = AveragePooling2D(pool_size=(3, 3), strides=(1, 1))(inception_5b)
reshape_layer = Flatten()(av_pool)
dense_layer = Dense(128, name='dense_layer')(reshape_layer)
norm_layer = Lambda(lambda x: K.l2_normalize(x, axis=1), name='norm_layer')(dense_layer)
# Final Model
return Model(inputs=[myInput], outputs=norm_layer)
Calling the above model as:
from keras import backend as K
from keras.models import Model
from keras.layers import Input, Layer
# Input for anchor, positive and negative images
in_a =np.random.randint(10,100,(1,96,96,3)).astype(float)
in_p =np.random.randint(10,100,(1,96,96,3)).astype(float)
in_n =np.random.randint(10,100,(1,96,96,3)).astype(float)
in_a_a =K.variable(value=in_a)
in_p_p =K.variable(value=in_p)
in_n_n =K.variable(value=in_n)
# # Output for anchor, positive and negative embedding vectors
# # The nn4_small model instance is shared (Siamese network)
emb_a = nn4_small2(in_a_a)
emb_p = nn4_small2(in_p_p)
emb_n = nn4_small2(in_n_n)
class TripletLossLayer(Layer):
def __init__(self, alpha, **kwargs):
self.alpha = alpha
super(TripletLossLayer, self).__init__(**kwargs)
def triplet_loss(self, inputs):
a, p, n = inputs
p_dist = K.sum(K.square(a-p), axis=-1)
n_dist = K.sum(K.square(a-n), axis=-1)
return K.sum(K.maximum(p_dist - n_dist + self.alpha, 0), axis=0)
def call(self, inputs):
loss = self.triplet_loss(inputs)
self.add_loss(loss)
return loss
# # # Layer that computes the triplet loss from anchor, positive and negative embedding vectors
triplet_loss_layer = TripletLossLayer(alpha=0.2, name='triplet_loss_layer')([emb_a, emb_p, emb_n])
# # # Model that can be trained with anchor, positive negative images
nn4_small2_train = Model([in_a, in_p, in_n], triplet_loss_layer)
Gives a type error on this line:
nn4_small2_train = Model([in_a, in_p, in_n], triplet_loss_layer)
c:\users\amark\anaconda3\envs\python3.5\lib\site-packages\keras\legacy\interfaces.py in wrapper(*args, **kwargs)
89 warnings.warn('Update your ' + object_name +
90 ' call to the Keras 2 API: ' +signature,stacklevel=2)
---> 91 return func(*args, **kwargs)
92 wrapper._original_function = func
93 return wrapper
c:\users\amark\anaconda3\envs\python3.5\lib\site-packages\keras\engine\topology.py in init(self, inputs, outputs, name)
1526
1527 # Check for redundancy in inputs.
-> 1528 if len(set(self.inputs)) != len(self.inputs):
1529 raise ValueError('The list of inputs passed to the model '
1530 'is redundant. '
TypeError: unhashable type: 'numpy.ndarray'
If I try to use the following:
nn4_small2_train = Model([in_a_a, in_p_p, in_n_n], triplet_loss_layer)
then the error raised is:
TypeError: Input tensors to a Model must be Keras tensors. Found: (missing Keras metadata)
You are passing numpy arrays as inputs to build a Model, and that is not right, you should pass instances of Input.
In your specific case, you are passing in_a, in_p, in_n but instead to build a Model you should be giving instances of Input, not K.variables (your in_a_a, in_p_p, in_n_n) or numpy arrays. Also it makes no sense to give values to the varibles. First you build the model symbolically, without any specific input values, and then you can train it or predict on it with real input values.
I've got a data set which is a list of tuples in python like this:
dataSet = [(6.1248199999999997, 27), (6.4400500000000003, 4), (5.9150600000000004, 1), (5.5388400000000004, 38), (5.82559, 1), (7.6892199999999997, 2), (6.9047799999999997, 1), (6.3516300000000001, 76), (6.5168699999999999, 1), (7.4382099999999998, 1), (5.4493299999999998, 1), (5.6254099999999996, 1), (6.3227700000000002, 1), (5.3321899999999998, 11), (6.7402300000000004, 4), (7.6701499999999996, 1), (5.4589400000000001, 3), (6.3089700000000004, 1), (6.5926099999999996, 2), (6.0003000000000002, 5), (5.9845800000000002, 1), (6.4967499999999996, 2), (6.51227, 6), (7.0302600000000002, 1), (5.7271200000000002, 49), (7.5311300000000001, 7), (5.9495800000000001, 2), (5.1487299999999996, 18), (5.7637099999999997, 6), (5.5144500000000001, 44), (6.7988499999999998, 1), (5.2578399999999998, 1)]
Where the first element of the tuple is an energy and the second a counter, how many sensor where affected.
I want to create a histogram to study the relation between the number of affected sensors and the energy. I'm pretty new to matplotlib (and python), but this is what I've done so far:
import math
import matplotlib.pyplot as plt
dataSet = [(6.1248199999999997, 27), (6.4400500000000003, 4), (5.9150600000000004, 1), (5.5388400000000004, 38), (5.82559, 1), (7.6892199999999997, 2), (6.9047799999999997, 1), (6.3516300000000001, 76), (6.5168699999999999, 1), (7.4382099999999998, 1), (5.4493299999999998, 1), (5.6254099999999996, 1), (6.3227700000000002, 1), (5.3321899999999998, 11), (6.7402300000000004, 4), (7.6701499999999996, 1), (5.4589400000000001, 3), (6.3089700000000004, 1), (6.5926099999999996, 2), (6.0003000000000002, 5), (5.9845800000000002, 1), (6.4967499999999996, 2), (6.51227, 6), (7.0302600000000002, 1), (5.7271200000000002, 49), (7.5311300000000001, 7), (5.9495800000000001, 2), (5.1487299999999996, 18), (5.7637099999999997, 6), (5.5144500000000001, 44), (6.7988499999999998, 1), (5.2578399999999998, 1)]
binWidth = .2
binnedDataSet = []
#create another list and append the "binning-value"
for item in dataSet:
binnedDataSet.append((item[0], item[1], math.floor(item[0]/binWidth)*binWidth))
energies, sensorHits, binnedEnergy = [[q[i] for q in binnedDataSet] for i in (0,1,2)]
plt.plot(binnedEnergy, sensorHits, 'ro')
plt.show()
This works so far (although it doesn't even look like a histogram ;-) but OK), but now I want to calculate the mean value for each bin and append some error bars.
What's the way to do it? I looked at histogram examples for matplotlib, but they all use one-dimensional data which will be counted, so you get a frequency spectrum… That's not really what I want.
I am somewhat confused by exactly what you are trying to do, but I think this (to first order) will do what I think you want:
bin_width = .2
bottom = 5.0
top = 8.0
binned_data = [0.0] * int(math.ceil(((top - bottom) / bin_width)))
binned_count = [0] * int(math.ceil(((top - bottom) / bin_width)))
n_bins = len(binned_data)
for E, cnt in dataSet:
if E < bottom or E > top:
print 'out of range'
continue
bin_id = int(math.floor(n_bins * (E - bottom) / (top - bottom)))
binned_data[bin_id] += cnt
binned_count[bin_id] += 1
binned_avergaed_data = [C_sum / hits if hits > 0 else 0 for C_sum, hits in zip(binned_data, binned_count)]
bin_edges = [bottom + j * bin_width for j in range(len(binned_data))]
plt.bar(bin_edges, binned_avergaed_data, width=bin_width)
I would also suggest looking into numpy, it would make this much simpler to write.