rename column titles unstacked data pandas - pandas

I have a data table derived via unstacking an existing dataframe:
Day 0 1 2 3 4 5 6
Hrs
0 223 231 135 122 099 211 217
1 156 564 132 414 156 454 157
2 950 178 121 840 143 648 192
3 025 975 151 185 341 145 888
4 111 264 469 330 671 201 345
-- -- -- -- -- -- -- --
I want to simply change the column titles so I have the days of the week displayed instead of numbered. Something like this:
Day Mon Tue Wed Thu Fri Sat Sun
Hrs
0 223 231 135 122 099 211 217
1 156 564 132 414 156 454 157
2 950 178 121 840 143 648 192
3 025 975 151 185 341 145 888
4 111 264 469 330 671 201 345
-- -- -- -- -- -- -- --
I've tried .rename(columns = {'original':'new', etc}, inplace = True) and other similar functions, none of which have worked.
I also tried going to the original dataframe and creating a dt.day_name column from the parsed dates, but it come out with the days of the week mixed up.
I'm sure it's a simple fix, but I'm living off nothing but caffeine, so help would be appreciated.

You can try:
import pandas as pd
df = pd.DataFrame(columns=[0,1,2,3,4,5,6])
df.columns = ["Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun"]

Related

create new column from divided columns over iteration

I am working with the following code:
url = 'https://raw.githubusercontent.com/dothemathonthatone/maps/master/fertility.csv'
df = pd.read_csv(url)
year regional_schlüssel Aus15 Deu15 Aus16 Deu16 Aus17 Deu17 Aus18 Deu18 ... aus36 aus37 aus38 aus39 aus40 aus41 aus42 aus43 aus44 aus45
0 2000 5111000 0 4 8 25 20 45 56 89 ... 935 862 746 732 792 660 687 663 623 722
1 2000 5113000 1 1 4 14 13 33 19 48 ... 614 602 498 461 521 470 393 411 397 400
2 2000 5114000 0 11 0 5 2 13 7 20 ... 317 278 265 235 259 228 204 173 213 192
3 2000 5116000 0 2 2 7 3 28 13 26 ... 264 217 206 207 197 177 171 146 181 169
4 2000 5117000 0 0 3 1 2 4 4 7 ... 135 129 118 116 128 148 89 110 124 83
I would like to create a new set of columns fertility_deu15, ..., fertility_deu45 and fertility_aus15, ..., fertility_aus45 such that aus15 / Aus15 = fertiltiy_aus15 and deu15/ Deu15 = fertility_deu15 for each ausi and Ausj where j == i \n [15-45] and deui:Deuj where j == i \n [15-45]
I'm not sure what is up with that data but we need to fix it to make it numeric. I'll end up doing that while filtering
numerator = df.filter(regex='^[a-z]+\d+$') # Lower case ones
numerator = numerator.apply(pd.to_numeric, errors='coerce') # Fix numbers
denominator = df.filter(regex='^[A-Z][a-z]+\d+$').rename(columns=str.lower)
denominator = denominator.apply(pd.to_numeric, errors='coerce')
numerator.div(denominator).add_prefix('fertility_')

How to aggregate multiple columns - Pandas

I have this df:
ID Date XXX 123_Var 456_Var 789_Var 123_P 456_P 789_P
A 07/16/2019 1 987 551 313 22 12 94
A 07/16/2019 9 135 748 403 92 40 41
A 07/18/2019 8 376 938 825 14 69 96
A 07/18/2019 5 259 176 674 52 75 72
B 07/16/2019 9 690 304 948 56 14 78
B 07/16/2019 8 819 185 699 33 81 83
B 07/18/2019 1 580 210 847 51 64 87
I want to group the df by ID and Date, aggregate the XXX column by the maximum value, and aggregate 123_Var, 456_Var, 789_Var columns by the minimum value.
* Note: The df contains many of these columns. The shape is: {some int}_Var.
This is the current code I've started to write:
df = (df.groupby(['ID','Date'], as_index=False)
.agg({'XXX':'max', list(df.filter(regex='_Var')): 'min'}))
Expected result:
ID Date XXX 123_Var 456_Var 789_Var
A 07/16/2019 9 135 551 313
A 07/18/2019 8 259 176 674
B 07/16/2019 9 690 185 699
B 07/18/2019 1 580 210 847
Create dictionary dynamic with dict.fromkeys and then merge it with {'XXX':'max'} dict and pass to GroupBy.agg:
d = dict.fromkeys(df.filter(regex='_Var').columns, 'min')
df = df.groupby(['ID','Date'], as_index=False).agg({**{'XXX':'max'}, **d})
print (df)
ID Date XXX 123_Var 456_Var 789_Var
0 A 07/16/2019 9 135 551 313
1 A 07/18/2019 8 259 176 674
2 B 07/16/2019 9 690 185 699
3 B 07/18/2019 1 580 210 847

How to query using an array of columns on SQL Server 2008

Can you please help on this, Im trying to write a query which retrieves a total amount from an array of columns, I dont know if there is a way to do this, I retrieve the array of columns I need from this query:
USE Facebook_Global
GO
SELECT c.name AS column_name
FROM sys.tables AS t
INNER JOIN sys.columns AS c
ON t.OBJECT_ID = c.OBJECT_ID
WHERE t.name LIKE '%Lifetime Likes by Gender and###$%' and c.name like '%m%'
Which gives me this table
column_name
M#13-17
M#18-24
M#25-34
M#35-44
M#45-54
M#55-64
M#65+
So I need a query that gives me a TotalAmount of those columns listed in that table. Can this be possible?
Just to clarify a little:
I have this table
Date F#13-17 F#18-24 F#25-34 F#35-44 F#45-54 F#55-64 F#65+ M#13-17 M#18-24 M#25-34 M#35-44 M#45-54 M#55-64 M#65+
2015-09-06 00:00:00.000 257 3303 1871 572 235 116 71 128 1420 824 251 62 32 30
2015-09-07 00:00:00.000 257 3302 1876 571 234 116 72 128 1419 827 251 62 32 30
2015-09-08 00:00:00.000 257 3304 1877 572 234 116 73 128 1421 825 253 62 32 30
2015-09-09 00:00:00.000 257 3314 1891 575 236 120 73 128 1438 828 254 62 33 30
2015-09-10 00:00:00.000 259 3329 1912 584 245 131 76 128 1460 847 259 66 37 31
2015-09-11 00:00:00.000 259 3358 1930 605 248 136 79 128 1475 856 261 67 39 31
2015-09-12 00:00:00.000 259 3397 1953 621 255 139 79 128 1486 864 264 68 41 31
2015-09-13 00:00:00.000 259 3426 1984 642 257 144 80 129 1499 883 277 74 42 32
And I need a column with a SUM of all the columns containing the word F and other containig the word M, instead of using something like this:
F#13-17+F#18-24+F#25-34+F#35-44+F#45-54+etc.
Is this possible?
Try something like this:
with derivedTable as
(sql from your question goes here)
select column_name
from derivedTable
union
select cast(count(*) as varchar (10) + 'records'
from derivedTable

Compare two files according to first column and print whole line

I will ask my question with an example. I have 2 files:
File1-
TR100013|c0_g1
TR100013|c0_g2
TR10009|c0_g1
TR10009|c0_g2
File2-
TR100013|c0_g1 AT1G01360.1 78.79 165 35 0 301 795 19 183 2E-089 272
TR100013|c0_g2 AT1G01360.1 78.79 165 35 0 301 795 19 183 2E-089 272
TR10009|c0_g1 AT1G16240.3 77.42 62 14 0 261 76 113 174 4E-025 95.9
TR10009|c0_g2 AT1G16240.2 69.17 120 37 0 1007 648 113 232 2E-050 171
TR29295|c0_g1 AT1G22540.1 69.19 172 53 2 6 521 34 200 2E-053 180
TR49005|c5_g1 AT5G24530.1 69.21 302 90 1 909 13 39 340 5E-157 446
Expected Output :
TR100013|c0_g1 AT1G01360.1 78.79 165 35 0 301 795 19 183 2E-089 272
TR100013|c0_g2 AT1G01360.1 78.79 165 35 0 301 795 19 183 2E-089 272
TR10009|c0_g1 AT1G16240.3 77.42 62 14 0 261 76 113 174 4E-025 95.9
TR10009|c0_g2 AT1G16240.2 69.17 120 37 0 1007 648 113 232 2E-050 171
I want to compare two files. If the first column is same in both files, then print the whole line of second file which is common in both files.
Using awk:
awk 'NR==FNR{a[$1]++;next};a[$1]' file1 file2
grep can do the same:
grep -wf file1 file2
-w is to match whole word only.
-f specifies the file with the pattern.

Group clause in SQL command

I have 3 tables: Deliveries, IssuedWarehouse, ReturnedStock.
Deliveries: ID, OrderNumber, Material, Width, Gauge, DelKG
IssuedWarehouse: OrderNumber, IssuedKG
ReturnedStock: OrderNumber, IssuedKG
What I'd like to do is group all the orders by Material, Width and Gauge and then sum the amount delivered, issued to the warehouse and issued back to stock.
This is the SQL that is really quite close:
SELECT
DELIVERIES.Material,
DELIVERIES.Width,
DELIVERIES.Gauge,
Count(DELIVERIES.OrderNo) AS [Orders Placed],
Sum(DELIVERIES.DeldQtyKilos) AS [KG Delivered],
Sum(IssuedWarehouse.[Qty Issued]) AS [Film Issued],
Sum([Film Retns].[Qty Issued]) AS [Film Returned],
[KG Delivered]-[Film Issued]+[Film Returned] AS [Qty Remaining]
FROM (DELIVERIES
INNER JOIN IssuedWarehouse
ON DELIVERIES.OrderNo = IssuedWarehouse.[Order No From])
INNER JOIN [Film Retns]
ON DELIVERIES.OrderNo = [Film Retns].[Order No From]
GROUP BY Material, Width, Gauge, ActDelDate
HAVING ActDelDate Between [start date] And [end date]
ORDER BY DELIVERIES.Material;
This groups the products almost perfectly. However if you take a look at the results:
Material Width Gauge Orders Placed Delivered Qnty Kilos Film Issued Film Returned Qty Remaining
COEX-GLOSS 590 75 1 534 500 124 158
COEX-MATT 1080 80 1 4226 4226 52 52
CPP 660 38 8 6720 2768 1384 5336
CPP 666 47 1 5677 5716 536 497
CPP 690 65 2 1232 717 202 717
CPP 760 38 3 3444 1318 510 2636
CPP 770 38 4 4316 3318 2592 3590
CPP 786 38 2 672 442 212 442
CPP 800 47 1 1122 1122 116 116
CPP 810 47 1 1127 1134 69 62
CPP 810 47 2 2250 1285 320 1285
CPP 1460 38 12 6540 4704 2442 4278
LD 975 75 1 502 502 182 182
LDPE 450 50 1 252 252 50 50
LDPE 520 70 1 250 250 95 95
LDPE 570 65 2 504 295 86 295
LDPE 570 65 2 508 278 48 278
LDPE 620 50 1 252 252 67 67
LDPE 660 50 1 256 256 62 62
LDPE 670 75 1 248 248 80 80
LDPE 690 47 1 476 476 390 390
LDPE 790 38 2 2104 1122 140 1122
LDPE 790 50 1 286 286 134 134
LDPE 790 50 1 250 250 125 125
LDPE 810 30 1 4062 4062 100 100
LDPE 843 33 1 408 408 835 835
LDPE 850 80 1 412 412 34 34
LDPE 855 30 1 740 740 83 83
LDPE 880 60 1 304 304 130 130
LDPE 900 70 2 1000 650 500 850
LDPE 1017 60 1 1056 1056 174 174
OPP 25 1100 1 381 381 95 95
OPP 1000 30 2 1358 1112 300 546
OPP 1000 30 1 1492 1491 100 101
OPP 1200 20 1 418 417 461 462
PET 760 12 3 1227 1876 132 -517
You'll see that there are some materials that have the same width and gauge yet they are not grouped. I think this is because the delivered qty is different on the orders. For example:
Material Width Gauge Orders Placed Delivered Qnty Kilos Film Issued Film Returned Qty Remaining
LDPE 620 50 1 252 252 67 67
LDPE 660 50 1 256 256 62 62
I would like these two rows to be grouped. They have the same material, width and gauge but the delivered qty is different therefore it hasn't grouped it.
Can anyone help me group these strange rows?
Your "problem" is that the deliveries occurred on different dates, and you're grouping by ActDelDate so the data splits, but because you haven't selected the ActDelDate column, this isn't obvious.
The fix is: Remove ActDelDate from the group by list
You should also remove the unnecessary brackets around the first join, and change
HAVING ActDelDate Between [start date] And [end date]
to
WHERE ActDelDate Between [start date] And [end date]
and have it before the GROUP BY
You are grouping by the delivery date, which is causing the rows to be split. Either omit the delivery date from the results and group by, or take the min/max of the delivery date.