Generate random numbers from a particular column - pandas

I have a dataset in this form
item_id EAN Price
3434 232 34
3233 412 28
There are totally 54344 datapoints.
I want to random print 40 values from EAN. I tried some techniques like
df=pd.read_csv('item_desc.csv')
print(df['EAN'].random.rand(40))
but it doesn't worked. Can someone suggest me the code

you can use sample:
df.sample(n=40)

Related

Taking the average of of columns for similar rows

What I am trying to do if I have rows with the same prefix,fromMp, toMp
then I take the average of each TPCSpeed 1
for example I have
CF 116 117 54.8 56 50 50 50 50 50
CF 116 117 54.8 56 50 50 50 50 50
CF 116 117 54.8 56 50 50 50 50 50
so If the rows share the same from mp to mp prefix and suffix then I want to take the average TPC 1 of all the rows that share this for example for 116 117 I have TPC 1 (54.8+54.8+54.8)/(3)
I want to take the average of the tpc 1 column for all the rows which share the same info. If the rows do not share the same info I just want the tpc 1. Not sure how to do this maybe duplicates.
I am not sure how to to this in pandas
import pandas as pd
import numpy as np
result=pd.read_csv("result.csv")
a1=result.columns.get_loc("TPCSpeed1")
a2=result.columns.get_loc("TPCSpeed2")
a3=result.columns.get_loc("TPCSpeed3")
a4=result.columns.get_loc("TPCSpeed4")
a5=result.columns.get_loc("TPCSpeed5")
a6=result.columns.get_loc("TPCSpeed6")
a7=result.columns.get_loc("TPCSpeed7")
pre=result.columns.get_loc("Prefix")
suf=result.columns.get_loc("Suffix")
FromMp=result.columns.get_loc("FromMP")
ToMp=result.columns.get_loc("ToMP")
w1=[]
w2=[]
w3=[]
w4=[]
w5=[]
w6=[]
w7=[]
prefix=[]
suffix=[]
begin=[]
end=[]
for index,row in result.iterrows():
print(index)
c1=row[pre]
c2=row[suf]
c3=row[FromMp]
c4=row[ToMp]
prefix.append(c1)
suffix.append(c2)
begin.append(c3)
end.append(c4)
b1=row[a1]
w1.append(b1)
b2=row[a2]
w2.append(b2)
b3=row[a3]
w3.append(b3)
b4=row[a4]
w4.append(b4)
b5=row[a5]
w5.append(b5)
b6=row[a6]
w6.append(b6)
b7=row[a7]
w7.append(b7)
This is a good use for groupby().agg().
At it's simplest, you can try:
result.groupbby(['Prefix', 'FromMP', 'ToMP', 'Suffix').agg(np.mean)
This will collapse all rows that have the same values in all four named columns, and then replace them with a single row with the mean values in each of the other columns. You can use reset_index() to get back to the original dataframe.
The agg (aka aggregate) function is fairly flexible. You can treat columns differently. It doesn't have to be the average for everything.
https://pandas.pydata.org/docs/reference/api/pandas.core.groupby.DataFrameGroupBy.aggregate.html

Creating new column based on condition and extracting respective value from other column. Pandas Dataframe

I am relatively new to this field and am working with a data set to find meaningful insights into customer behavior. My dataset looks like:
customerId week first_trip_week rides
0 156 44 36 2
1 164 44 38 6
2 224 42 36 5
3 224 43 36 4
4 224 44 36 5
What I want to do is create new columns week 44,week 43, week 42 and get the values in the "ride" column to be filled into the rows for the respective customer id. This is in the hope that I can eventually also make the customerId my index and can get denominations for different weeks. Help would be greatly appreciated!
Thank you!!
If I'm understanding you correctly, you want to create new columns in the same dataframe for weeks 44, 43, and 42 with the correct values for each customerId and NaN for those that don't have it. If your original dataframe has all the user data, I would first filter for dataframes that have the correct week number
week42DF = dataset.loc[dataset['week']==42,['customerId','rides']].rename(columns={'rides':'week42Rides'})
getting only the rides and customerId and renaming the former here to make things a little easier for us. Then left join the old dataframe and the new one on customerId
dataset = pd.merge(dataset,week42DF,how='left',on='customerId')
The users that are missing from week42DF will have NaN in the week42rides column in the merged dataset which you can then use the .fillna(0) method to replace with zeros. Do this for each week you require.
See Pandas' documentation on merge and the more general concatenate for more info.

Excel VBA SubTotals

Excel build-in functions are, at most of the time, effective. However, there are some functions really like implemented half-way and some how dictated their usage. The SUBTOTAL function is one of them.
My data is presented in the following format:
Value Count
100 20
102 3
105 4
102 5
And I want to build a table in this format:
Value Count
100 20
101 0
102 8
103 0
104 0
105 4
I've read this in SO but my situation is a bit differ. Pivot table will be able to give you the subtotals of the values appears in the original data and I don't want to have a loop to insert missing values in the original data (if it is gonna to be a loop over the original data, the loop could use to build the table - which I would prefer to avoid at all)

SQL Transform/Pivot

I'm fairly new to this, and have done a lot of searching, but I'm not even 100% sure what exactly to search for, except I know I need to use Transform.
I basically need this:
Column A Column B
Total 184
Half 20
Some 25
None 30
Total 52
Half 25
Some 16
None 86
To become:
Total Half Some None
184 20 25 30
52 25 16 86
Any help would be amazing, it's the last part of a query, then it's done.
Thanks :)
The answer ended up being something like this. Don't do it with the Query Wizard, it won't work out very well. This is Access 2010.
TRANSFORM First(Table.ColumnB) AS FirstOfColumnB
SELECT Table.Columns
FROM Table
GROUP BY Table.Columns
PIVOT Table.ColumnA;

SSRS 2008 display mutilple columns of data without a new line

I am creating a report in SSRS 2008 with MS SQL Server 2008 R2. I have data based on the Aggregate value of Medical condition and the level of severity.
Outcome Response Adult Youth Total
BMI GOOD 70 0 70
BMI MONITOR 230 0 230
BMI PROBLEM! 10 0 10
LDL GOOD 5 0 5
LDL MONITOR 4 0 4
LDL PROBLEM! 2 0 2
I need to display the data based on the Response like:
BMI BMI BMI
GOOD MONITOR PROBLEM!
Total 70 230 10
Youth 0 0 0
Adult 70 230 10
LDL LDL LDL
GOOD MONITOR PROBLEM!
Total 5 4 2
Youth 0 0 0
Adult 5 4 2
I first tried to use SSRS to do the grouping based on the Outcome and then the Response but I got each response on a separate row of data but I need all Outcomes on a single line. I now believe that a pivot would work but all the examples I have seen is a pivot on one column of data pivoted using another. Is it possible to pivot multiple columns of data based on a single column?
With your existing Dataset you could so something similar to the following:
Create a List item, and change the Details grouping to be based on Outcome:
In the List cell, add a new Matrix with one Column Group based on Response:
You'll note that since you have individual columns for Total, Youth, Adult, you need to add grand total rows to display each group.
The end result is pretty close to your requirements:
For your underlying data, to help with report development it might be useful to have the Total, Youth, Adult as unpivoted columns, but it's not a big deal if the groups are fairly static.