Python to convert CSV file to different output - pandas

I have 2 CSV files like so:
sheet1.csv only contains headers
Account_1 Amount_1 Currency_1 Country_1 Date_1
sheet2.csv contains headers and data
Account Currency Amount Date Country
1 GBP 117.89 20/02/2021 UK
2 GBP 129.39 15/02/2021 UK
How can I use pandas to map the data from sheet2 to sheet1 as I want the data to have the new column names in the same exact order.

First arrange the columns on sheet2 by order as sheet1
sheet2 = sheet2[["Account", "Amount", "Currency", "Country", "Date"]]
This will rearrange sheet2 columns and then
sheet2.columns = sheet1.columns
Final output of sheet2.head() will be
Account_1 Amount_1 Currency_1 Country_1 Date_1
1 117.89 GBP UK 20/02/2021
2 129.39 GBP UK 15/02/2021

Related

Update database from same table on like matches + reference column

I have a database where I would like to update rows in column B based upon data from like matches in column A and exact matches in column C.
Column A is a SKU. Column C is a pricing category (A - retail, B - small warehouse, c- big warehouse). Row 1, 7 and 10 are all the same SKU in the same pricing category (A) and I need the pricing data from Column B to match. Row 1 is the correct pricing data which I want to copy to rows 7 and 10.
The reason Rows 1, 7 and 10 are the same SKU is our ERP utilizes options within the X1234F and X1234P configurations. The main pricing data will always come from SKUs without the X/F/P configuration options.
Sample:
Column A
Column B
Column C
1234
2000
A
1234
1900
B
1234
1800
C
2355
1000
A
2355
900
B
2355
800
C
X1234F
1900
A
X1234F
1800
B
X1234F
1700
C
X1234P
1900
A
X1234P
1800
B
X1234P
1700
C
X2355F
900
A
X2355F
800
B
X2355F
700
C
X2355P
900
A
X2355P
800
B
X2355P
700
C
The data from Column B rows 1-3 should update rows 7-12 and rows 4-6 should update 13-18
UPDATE my_table AS u
SET "Column B" = ref."Column B"
FROM my_table AS ref
WHERE ref."Column A" ~ ^\d*$' -- the ref."Column A" has only digits
AND u."Column A" ~ '^\D+' || ref."Column A" || \D+$' -- the u."Column A" is like the ref."Column A" with one or more non-digit character before and after
AND u."Column C" = ref."Column C"

Pandas Create Variability DF with Multiple Row Averages in Different DF

I've been trying to create a column of variability given the mean of the column data values for 'A' and 'B' below. I don't understand how to create the average for each row or element-wise in the panda column by the single data value with the long-term average(s). For example, imagine if have data that looks like this in pandas df1:
Year Name Data
1999 A 2
2000 A 4
1999 B 6
2000 B 8
And, i have a DF with the long-term mean called "LTmean", which in this case is = 3 and 7.
mean_df =
Name Data mean
0 A 3
1 B 7
So, the result would look like this for a new df: dfnew['var'] = (df1.['Data']/mean_df(???) -1:
Year Name Var
1999 A -0.3
2000 A 0.3
1999 B -0.14
2000 B 0.14
Thank you for any suggestions on this! Would a loop be the best idea to loop through each column by the "Name' in each DF somehow?
df['Var'] = df1['Data']/LTmean - 1

Pandas Display Format on a specific column

So I want to display a single column with a currency format. Basically with a dollar sign, thousand comma separators, and two decimal places.
Input:
Invoice Name Amount Tax
0001 Best Buy 1324 .08
0002 Target 1238593.1 .12
0003 Walmart 10.32 .55
Output:
Invoice Name Amount Tax
0001 Best Buy $1,324.00 .08
0002 Target $1,238,593.10 .12
0003 Walmart $10.32 .55
Note: I still want to be able to do calculations on it, so it would only be a display feature.
If you are just format to print out, you can try:
df.apply(lambda x: [f'${y:,}'for y in x] if x.name=='Amount' else x)
which creates a new dataframe that looks like:
Invoice Name Amount Tax
0 1 Best Buy $1,324.0 0.08
1 2 Target $1,238,593.1 0.12
2 3 Walmart $10.32 0.55
You can simply add this line (before printing your data frame of course):
pd.options.display.float_format = '${:,.2f}'.format
it will print your columns in the data frame (but only float columns ) like this :
$12,500.00

Reading space delimited text file into SAS

I have a following .txt file:
Mark1[Country1]
type1=1 type2=5
type1=1.50 EUR type2=21.00 EUR
Mark2[Country2]
type1=2 type2=1 type3=1
type1=197.50 EUR type2=201.00 EUR type3= 312.50 EUR
....
I am trying to input it in my SAS program, so that it would look something like that:
Mark Country Type Count Price
1 Mark1 Country1 type1 1 1.50
2 Mark1 Country1 type2 5 21.00
3 Mark1 Country1 type3 NA NA
4 Mark2 Country2 type1 2 197.50
5 Mark2 Country2 type2 2 201.00
6 Mark2 Country2 type3 1 312.50
Or maybe something else, but i need it to be possible to print two way report
Country1 Country2
Type1 ... ...
Type2 ... ...
Type3 ... ...
But the question is how to read that kind of txt file:
read and separate Mark1[Country1] to two columns Mark and Country;
retain Mark and Country and read info for each Type (+somehow ignoring type1=, maybe using formats) and input it in a table.
Maybe there is a way to use some kind of input templates to achive that or nasted queries.
You have 3 name/value pairs, but the pairs are split between two rows. An unusual text file requiring creative input. The INPUT statement has a line control feature # to read relative future rows within the implicit DATA Step loop.
Example (Proc REPORT)
Read the mark and country from the current row (relative row #1), the counts from relative row #2 using #2 and the prices from relative row #3. After the name/value inputs are made for a given mark country perform an array based pivot, transposing two variables (count and price) at a time into a categorical (type) data form.
Proc REPORT produces a 'two-way' listing. The listing is actually a summary report (cells under count and price are a default SUM aggregate), but each cell has only one contributing value so the SUM is the original individual value.
data have(keep=Mark Country Type Count Price);
attrib mark country length=$10;
infile cards delimiter='[ ]' missover;
input mark country;
input #2 #'type1=' count_1 #'type2=' count_2 #'type3=' count_3;
input #3 #'type1=' price_1 #'type2=' price_2 #'type3=' price_3;
array counts count_:;
array prices price_:;
do _i_ = 1 to dim(counts);
Type = cats('type',_i_);
Count = counts(_i_);
Price = prices(_i_);
output;
end;
datalines;
Mark1[Country1]
type1=1 type2=5
type1=1.50 EUR type2=21.00 EUR
Mark2[Country2]
type1=2 type2=1 type3=1
type1=197.50 EUR type2=201.00 EUR type3= 312.50 EUR
;
ods html file='twoway.html';
proc report data=have;
column type country,(count price);
define type / group;
define country / ' ' across;
run;
ods html close;
Output image
Combined aggregation
proc means nway data=have noprint;
class type country;
var count price;
output out=stats max(price)=price_max sum(count)=count_sum;
run;
data cells;
set stats;
if not missing(price_max) then
cell = cats(price_max,'(',count_sum,')');
run;
proc transpose data=cells out=twoway(drop=_name_);
by type;
id country;
var cell;
run;
proc print noobs data=twoway;
run;
You can specify the name of variable with the DLM= option on the INFILE statement. That way you can change the delimiter depending on the type of line being read.
It looks like you have three lines per group. The first one have the MARK and COUNTRY values. The second one has a list of COUNT values and the third one has a list of PRICE values. So something like this should work.
data want ;
length dlm $2 ;
length Mark $8 Country $20 rectype $8 recno 8 type $10 value1 8 value2 $8 ;
infile cards dlm=dlm truncover ;
dlm='[]';
input mark country ;
dlm='= ';
do rectype='Count','Price';
do recno=1 by 1 until(type=' ');
input type value1 #;
if rectype='Price' then input value2 #;
if type ne ' ' then output;
end;
input;
end;
cards;
Mark1[Country1]
type1=1 type2=5
type1=1.50 EUR type2=21.00 EUR
Mark2[Country2]
type1=2 type2=1 type3=1
type1=197.50 EUR type2=201.00 EUR type3= 312.50 EUR
;
Results:
Obs Mark Country rectype recno type value1 value2
1 Mark1 Country1 Count 1 type1 1.0
2 Mark1 Country1 Count 2 type2 5.0
3 Mark1 Country1 Price 1 type1 1.5 EUR
4 Mark1 Country1 Price 2 type2 21.0 EUR
5 Mark2 Country2 Count 1 type1 2.0
6 Mark2 Country2 Count 2 type2 1.0
7 Mark2 Country2 Count 3 type3 1.0
8 Mark2 Country2 Price 1 type1 197.5 EUR
9 Mark2 Country2 Price 2 type2 201.0 EUR
10 Mark2 Country2 Price 3 type3 312.5 EUR

How to copy all column values of a dataframe into new columns of another one according to the index of the first and a column value of the second

I have the following dataframe 1 df1:
index tech_1 tech_2 tech_3 .....
01_es NA NA 1
02_es 1 2 NA
03_es 2 1 2
04_es 1 NA 2
05_es NA NA NA
and another dataframe 2 df2:
index id column_1 column_2 column_3
0 01_es data data data
1 02_es data data data
2 03_es data data data
3 04_es data data data
4 05_es data data data
I want to "merge" df1 into df2 whenever the df1.index matches df2.id into new df2.columns keeping all the data on df2. I will perform this with several df1.
new df2:
index id column_1 column_2 column_3 tech_1 tech_2 tech_3
0 01_es data data data NA NA 1
1 02_es data data data 1 2 NA
2 03_es data data data 2 1 2
3 04_es data data data 1 NA 2
4 05_es data data data NA NA NA
df1 can be quite large with a different number of columns and probably with not all the rows as df2.id. I have several files to run the script over. How can I do it?
Thanks!
Look at the documentation here https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html
If the id column of dataframe df2 is guaranteed to be unique, you can set it as index and do the merge.
df2.set_index('id').merge(df1, left_index=True, right_index=True)
Now for the cases where there are different columns in df1, you can define your preference by passing the "how" parameter to merge, read the documentation for details.