how to append column data from different csv files in one folder - pandas

I have some of csv files which i need to add special column from another file in one folder and save into one file. Lets i give an example below.
1.csv 2.csv 3.csv n.csv (and so on i have several csv file)
a b c d a b c d a b c d a b c d
1 2 3 4 8 3 5 7 2 9 4 6 3 6 8 3
4 2 8 3 6 3 6 7 9 3 4 5 3 6 6 8
3 9 4 8 9 3 4 2 4 7 4 4 1 8 3 5
i want to append only column a and b, like an example below
x.csv
a b a b a b a b
1 2 8 3 2 9 3 6
4 2 6 3 9 3 3 6
3 9 9 3 4 7 1 8
can some one help me how to add a column?

Related

pandas dataframe enforce monotically per row

I have a dataframe:
df = 0 1 2 3 4
1 1 3 2 5
4 1 5 7 8
7 1 2 3 9
I want to enforce monotonically per row, to get:
df = 0 1 2 3 4
1 1 3 3 5
4 4 5 7 8
7 7 7 7 9
What is the best way to do so?
Try cummax
out = df.cummax(1)
Out[80]:
0 1 2 3 4
0 1 1 3 3 5
1 4 4 5 7 8
2 7 7 7 7 9

Stack multiple columns into single column while maintaining other columns in Pandas?

Given pandas multiple columns as below
cl_a cl_b cl_c cl_d cl_e
0 1 a 5 6 20
1 2 b 4 7 21
2 3 c 3 8 22
3 4 d 2 9 23
4 5 e 1 10 24
I would like to stack the column cl_c cl_d cl_e into a single column with the name ax. But, please note that, the columns cl_a cl_b were maintained.
cl_a cl_b ax from_col
1,a,5,cl_c
2,b,4,cl_c
3,c,3,cl_c
4,d,2,cl_c
5,e,1,cl_c
1,a,6,cl_d
2,b,7,cl_d
3,c,8,cl_d
4,d,9,cl_d
5,e,10,cl_d
1,a,20,cl_e
2,b,21,cl_e
3,c,22,cl_e
4,d,23,cl_e
5,e,24,cl_e
So far, the following code does the job
df = pd.DataFrame ( {'cl_a': [1,2,3,4,5], 'cl_b': ['a','b','c','d','e'],
'cl_c': [5,4,3,2,1],'cl_d': [6,7,8,9,10],
'cl_e': [20,21,22,23,24]})
df_new = pd.DataFrame()
for col_name in ['cl_c','cl_d','cl_e']:
df_new=df_new.append (df [['cl_a', 'cl_b', col_name]].rename(columns={col_name: "ax"}))
However, I am curious whether there is Pandas build-in approach that can do the trick
Edit:
Upon Quong answer, I realise of the need to include another column (i.e., from_col) beside the ax. The from_col indicate the origin of ax previous column name.
Yes, it's called melt:
df.melt(['cl_a','cl_b'], value_name='ax').drop(columns='variable')
Output:
cl_a cl_b ax
0 1 a 5
1 2 b 4
2 3 c 3
3 4 d 2
4 5 e 1
5 1 a 6
6 2 b 7
7 3 c 8
8 4 d 9
9 5 e 10
10 1 a 20
11 2 b 21
12 3 c 22
13 4 d 23
14 5 e 24
Or equivalently set_index().stack():
(df.set_index(['cl_a','cl_b']).stack()
.reset_index(level=-1, drop=True)
.reset_index(name='ax')
)
with a slightly different output:
cl_a cl_b ax
0 1 a 5
1 1 a 6
2 1 a 20
3 2 b 4
4 2 b 7
5 2 b 21
6 3 c 3
7 3 c 8
8 3 c 22
9 4 d 2
10 4 d 9
11 4 d 23
12 5 e 1
13 5 e 10
14 5 e 24

How to find the average of multiple columns using a common column in pandas

How to calculate the mean value of all the columns with 'count' column.I have created a dataframe with random generated values in the below code.
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.rand(10,10)*100/10).astype(int)
df
output:
A B C D E F G H I J
0 4 3 2 8 5 0 9 9 0 5
1 1 5 8 0 5 9 8 3 9 1
2 9 5 1 1 3 2 6 3 8 3
3 4 0 8 1 7 3 4 2 8 8
4 9 4 8 2 7 9 7 8 9 7
5 1 0 7 3 8 6 1 7 2 0
6 3 6 8 9 6 6 5 0 8 4
7 8 9 9 5 3 9 0 7 5 5
8 5 5 8 7 8 4 3 0 9 9
9 2 4 2 3 0 5 2 0 3 0
I found mean value for a single column like this.How to find the mean for multiple columns with respect to count in pandas.
df['count'] = 1
print(df)
df.groupby('count').agg({'A':'mean'})
A B C D E F G H I J count
0 4 3 2 8 5 0 9 9 0 5 1
1 1 5 8 0 5 9 8 3 9 1 1
2 9 5 1 1 3 2 6 3 8 3 1
3 4 0 8 1 7 3 4 2 8 8 1
4 9 4 8 2 7 9 7 8 9 7 1
5 1 0 7 3 8 6 1 7 2 0 1
6 3 6 8 9 6 6 5 0 8 4 1
7 8 9 9 5 3 9 0 7 5 5 1
8 5 5 8 7 8 4 3 0 9 9 1
9 2 4 2 3 0 5 2 0 3 0 1
A
count
1 4.6
If need mean of all columns per groups by column count use:
df.groupby('count').mean()
If need mean by all rows (like grouping if same values in count) use:
df.mean().to_frame().T

pandas drop duplicate row value from a specific column

I want to remove the duplicate row value from a specific column - in this case the column name is "number".
Before:
number qty status
0 10 2 go
1 10 5 nogo
2 4 6 yes
3 3 1 no
4 2 7 go
5 5 2 nah
6 5 6 go
7 5 3 nogo
8 1 10 yes
9 1 10 go
10 5 2 nah
After:
number qty status
0 10 2 go
5 nogo
1 4 6 yes
2 3 1 no
3 2 7 go
4 5 2 nah
6 go
3 nogo
5 1 10 yes
10 go
6 5 2 nah
It is possible replace values to empty string or NaNs by mask with duplicated by new Series a created by comparing column with shifted column with cumsum:
a = df['number'].ne(df['number'].shift()).cumsum()
#for replace ''
df['number'] = df['number'].mask(a.duplicated(), '')
#for replace NaNs
#df['number'] = df['number'].mask(a.duplicated())
print (df)
number qty status
0 10 2 go
1 5 nogo
2 4 6 yes
3 3 1 no
4 2 7 go
5 5 2 nah
6 6 go
7 3 nogo
8 1 10 yes
9 10 go
10 5 2 nah
Detail:
a = df['number'].ne(df['number'].shift()).cumsum()
print (a)
0 1
1 1
2 2
3 3
4 4
5 5
6 5
7 5
8 6
9 6
10 7
Name: number, dtype: int32

CPlex coding logic

The professor in charge of an industrial engineering design course is faced with the problem of assigning 28 students to 8 projects. Each student must be assigned to one project and each project group must have 3 or 4 students. The students have been asked to rank the projects, with 1 being the best ranking and higher numbers representing lower rankings.
a) Formulate an OPL model for this problem.
b) Solve the assignment problem for the following table of assignments:
A ED EZ G H1 H2 RB SC
Allen 1 3 4 7 7 5 2 6
Black 6 4 2 5 5 7 1 3
Chung 6 2 3 1 1 7 5 4
Clark 7 6 1 2 2 3 5 4
Conners 7 6 1 3 3 4 5 2
Cumming 6 7 4 2 2 3 5 1
Demming 2 5 4 6 6 1 3 7
Eng 4 7 2 1 1 6 3 5
Farmer 7 6 5 2 2 1 3 4
Forest 6 7 2 5 5 1 3 4
Goodman 7 6 2 4 4 5 1 3
Harris 4 7 5 3 3 1 2 6
Holmes 6 7 4 2 2 3 5 1
Johnson 2 2 4 6 6 5 3 1
Knorr 7 4 1 2 2 5 6 3
Manheim 4 7 2 1 1 3 6 5
Morris 7 5 4 6 6 3 1 2
Nathan 4 7 5 6 6 3 1 2
Neuman 7 5 4 6 6 3 1 2
Patrick 1 7 5 4 4 2 3 6
Rollins 6 2 3 1 1 7 5 4
Schuman 4 7 3 5 5 1 2 6
Silver 4 7 3 1 1 2 5 6
Stein 6 4 2 5 5 7 1 3
Stock 5 2 1 6 6 7 4 3
Truman 6 3 2 7 7 5 1 4
Wolman 6 7 4 2 2 3 5 1
Young 1 3 4 7 7 6 2 5
How many students are assigned their second or third choice?
c) Some of the projects are harder than others to reach without a car. Thus, it is desirable that at least a certain number of students assigned to each project must have a car; the numbers vary by project as follows:
A ED EZ G H1 H2 RB SC
1 0 0 2 2 2 1 1
The students who have cars are Chung, Demming, Eng, Holmes, Manheim, Morris, Nathan, Patrick, Rollins and Young.
Modify the model to add this car constraint and solve the problem again. How many more students than before must be assigned second or third choices?
I coded the file for a) & b) but i am getting stuck at c).
can anyone help pls with the logic? even ampl wil suffice
Let C_i be the indicator matrix (input): C_i = 1 if student i has a car and 0 otherwise. I'll assume you have the following decision variables:
x_ij = 1 if student i is assigned to project j; 0 otherwise
then c) constraint can me modeled as follows
sum_i C_i * x_ij >= b_j for all j
where b_j is
j A ED EZ G H1 H2 RB SC
b_j 1 0 0 2 2 2 1 1