I'd like to create a pivot table with the counts of values in a list, filtered by another column but am not sure how to use pandas pivot table (or function) with a list.
Here's an example what I'd like to do:
| Col1 | Col2 |
| --- | ----------- |
| A | ["e", "f"] |
| B | ["g", "f"] |
| C | ["g", "h"] |
| A | ["e", "g"] |
| B | ["g", "f"] |
| C | ["g", "e"] |
Ideal Pivot Table
| 1 | 2 |count|
| A | e | 2 |
| | f | 1 |
| | g | 1 |
| B | g | 2 |
| | f | 2 |
| C | g | 2 |
| | h | 1 |
| | e | 1 |
I cannot use a list to make a pivot table and am struggling to figure out how to modify the data or find a different method. Any help would be much appreciated!
Try this:
cols = ['Col1','Col2']
df.explode('Col2').groupby(cols).size()
Related
I have some dataframes (df, tmp_df and final_df) and I want to enter two columns of tmp_df into two differenrt cells of final_df as list type. My code and dataframes are presented as follow. However, the loop part is not working correctly. The other questions in stackoverflow or other websites, answer this question if there is an available dictionary for the lists from the beginning of the program. But here, the tmp_df dataframe changes during the for loop and at each iteration suitable prices are calculated. Also, the most related data are founded and they must locate as a realted cell of final_df.
import pandas as pd
df = pd.read_csv('myfile.csv')
tmp_df = pd.DataFrame()
final_df = pd.DataFrame()
tmp_df = df[df['Type'] == True]
cnt = 0
for c in tmp_df['Category']:
#################
# Apply some calculations and call different methods to do some changes on Price column of tmp_df.
#################
final_df.at[cnt,'Data'] = list(set(tmp_sub['Data']))
final_df ['Category'], final_df['Acceptable'], final_df['Rank'],final_df['Price'] = \
tmp_df['Rank'], list(tmp_sub['Price'])
cnt +=1
df:
| Data | Category | Acceptable | Rank | Price |
| ------- | -------- | ---------- | ---- | ----- |
| 30275 | A | Yes | 1 | 52787 |
| 35881 | C | No | 2 | 14804 |
| 28129 | C | Yes | 3 | 180543|
| 30274 | D | No | 2 | 8066 |
| 30351 | D | Yes | 3 | 273478|
| 35886 | A | Yes | 2 | 10808 |
| 39900 | C | Yes | 1 | 21893 |
| 35887 | A | No | 2 | 2244 |
| 35883 | A | Yes | 1 | 10066 |
| 35856 | D | Yes | 3 | 19011 |
| 35986 | C | No | 2 | 6895 |
| 30350 | D | No | 3 | 5243 |
| 28129 | C | Yes | 1 | 112859|
| 31571 | C | Yes | 1 | 20701 |
tmp_df:
| Data | Category | Acceptable | Rank | Price |
| ------- | -------- | ---------- | ---- | ----- |
| 30275 | A | Yes | 1 | 52787 |
| 38129 | C | Yes | 3 | 180543|
| 30351 | D | Yes | 3 | 273478|
| 35886 | A | Yes | 2 | 10808 |
| 39900 | C | Yes | 1 | 21893 |
| 35883 | A | Yes | 1 | 10066 |
| 35856 | D | Yes | 3 | 19011 |
| 28129 | C | Yes | 1 | 112859|
| 31571 | C | Yes | 1 | 20701 |
The prices in the final dataframe (final_df) are changed because of the calculations over the tmp_df. Now, what should I do if I want the following result?
final_df:
| Data | Category | Acceptable | Rank | Price |
| ------- | -------- | ---------- | ---- | ----- |
| [30275,35886,35883] | A | Yes | [1,2]| 195543|
| [28129,39900,38129,31571] | C | Yes | [1,3]| 210089|
| [30351,35856] | D | Yes | 3 | 113859|
You can aggregate list and for Price another aggregation function, e.g. sum, mean...:
#generate custom function for Price
def func(x):
return x.sum()
d = {'Data':list,'Rank':lambda x: list(set(x)), 'Price':func}
final_df = (tmp_df.groupby(['Category','Acceptable'],as_index=False)
.agg(d)
.reindex(tmp_df.columns, axis=1))
d = {'Data':list,'Rank':lambda x: list(set(x)), 'Price':'max'}
final_df = (tmp_df.groupby(['Category','Acceptable'],as_index=False)
.agg(d)
.reindex(tmp_df.columns, axis=1))
print (final_df)
Data Category Acceptable Rank Price
0 [30275, 35886, 35883] A Yes [1, 2] 52787
1 [38129, 39900, 28129, 31571] C Yes [1, 3] 180543
2 [30351, 35856] D Yes [3] 273478
Solution with custom function:
def func1(x):
return x.sum()
def f(x):
a = list(x['Data'])
b = list(set(x['Rank']))
c = func1(x['Price'])
return pd.Series({'Data':a,'Rank':b,'Price':c})
final_df = (tmp_df.groupby(['Category','Acceptable'])
.apply(f)
.reset_index()
.reindex(tmp_df.columns, axis=1))
I have a set of data of cars as follow:
| class | car |
| S | Hilux |
| M | Hilux |
| M | Toyota|
| M | Hilux |
| S | toyota|
| S | toyota|
| L | toyota|
And I want to show as per below:
| class | Hilux | Toyota |
| S | 1 | 2 |
| M | 2 | 1 |
| L | 0 | 1 |
How can it be done using Ms Access?
This might work:
TRANSFORM COUNT(car)
SELECT class
FROM Table_name
GROUP BY class
PIVOT car;
I am trying to find the SQL command to do something but I don't know how to explain it so I'll use an example. I have a table like so:
| one | two | three | four |
|-----|-----|-------|------|
| a | h | i | j |
| b | k | l | |
| c | m | n | o |
| d | p | | |
| e | q | | |
| f | r | s | |
| g | t | | |
I need to create new columns that take the first non-null column from the right and kind of reverse it going up and joining/concatenating the fields.
| one | 1-up | 2-up | 3-up |
|-----|------|------|---------|
| a | j | j, i | j, i, h |
| b | l | l, k | |
| c | o | o, n | o, n, m |
| d | p | | |
| e | q | | |
| f | s | s, r | |
| g | t | | |
For b, since column four doesn't have data it uses three as the first value. Same for the other rows.
I hope this makes sense. I'm not sure how else to explain this.
You can use COALESCE like this :
select one, COALESCE(four,three,two,'') as '1-up',
COALESCE(four+','+three,three+','+two,'') as '2-up',
COALESCE(four+','+three+','+two,'') as '3-up'
from Table1
SQL Fiddle link Here
I have a table (in an MS Access DB) with every employee at a company. The table also has a column to indicate each employee's manager. There are columns (with no data) to indicate that employee's full management chain to the CEO. I need to determine/obtain/find and fill this data using SQL (and/or VBA).
I know what needs to be done but I'm drawing a complete blank on how to do it efficiently.
I know I could go row by row but that seems so inefficient. There has to be a better way.
For example, take this table below:
+------------+-----------+----------+----------+----------+
| employeeID | managerID | manager1 | manager2 | manager3 |
+------------+-----------+----------+----------+----------+
| a | | | | |
| b | a | | | |
| c | a | | | |
| d | a | | | |
| e | b | | | |
| f | b | | | |
| g | c | | | |
| h | c | | | |
| i | d | | | |
| j | e | | | |
| k | f | | | |
| l | g | | | |
+------------+-----------+----------+----------+----------+
a has no manager (CEO)
b's manager is a
l's manager is g whose manager is c whose manager is a
etc...
So this would result in the table:
+------------+-----------+----------+----------+----------+
| employeeID | managerID | manager1 | manager2 | manager3 |
+------------+-----------+----------+----------+----------+
| a | | | | |
| b | a | a | | |
| c | a | a | | |
| d | a | a | | |
| e | b | a | b | |
| f | b | a | b | |
| g | c | a | c | |
| h | c | a | c | |
| i | d | a | d | |
| j | e | a | b | e |
| k | f | a | b | f |
| l | g | a | c | g |
+------------+-----------+----------+----------+----------+
I have a sheet like this:
| A | B | C | D | E | F | G | H | ...
---------------------------------
| a | 1 | | b | 2 | | c | 7 |
---------------------------------
| b | 2 | | c | 8 | | b | 4 |
---------------------------------
| c |289| | a | 3 | | a |118|
---------------------------------
| d | 6 | | e | 3 | | e |888|
---------------------------------
| e | 8 | | d |111| | d |553|
---------------------------------
I want the sheet to become like this:
| A | B | C | D | E | F | G | H | ...
---------------------------------
| a | 1 | 3 |118| | | | |
---------------------------------
| b | 2 | 2 | 4 | | | | |
---------------------------------
| c |289| 8 | 7 | | | | |
---------------------------------
| d | 6 |111|553| | | | |
---------------------------------
| e | 8 | 3 |888| | | | |
---------------------------------
Col A, Col B and Col G have letters which are unique, and in the col next to it it has weights.
To make it even more clear,
| A | B |
---------
| a | 1 |
---------
| b | 2 |
---------
| c |289|
...
are the weights of a,b,c... in January
Similarly | D | E | are weights of a,b,c... in July and | G | H | are weights of a,b,c... in December
I need to put them side-by-side for comparison, the thing is they are NOT in order.
How do I approach this?
UPDATE
There are thousands of a,b,c, aa, bb, cc, aaa, avb, as, saf, sfa etc.. and some of them MAY be present in January (Col A) and not in July (Col D)
Something like this
code
Sub Squeeze()
[c1:c5] = Application.Index([E1:E5], Evaluate("IF(A1:A5<>"""",MATCH(A1:A5,D1:D5,0),A1:A5)"), 1)
[d1:d5] = Application.Index([H1:h5], Evaluate("IF(A1:A5<>"""",MATCH(A1:A5,G1:G5,0),A1:A5)"), 1)
[e1:h5].ClearContents
End Sub
Explanation of first line
Application.Index([E1:E5], Evaluate("IF(A1:A5<>"""",MATCH(A1:A5,D1:D5,0),A1:A5)"), 1)
The MATCH returns a VBA array matching the positions (5) of A1:A5 against D1:D5
INDEX then returns the corresponding values from E1:E5
So to use the key column of A1:A100 against M1:100 with values in N1:100
Application.Index([N1:N100], Evaluate("IF(A1:A100<>"""",MATCH(A1:A100,M1:M100,0),A1:A100)"), 1)
Extend as necessary: Sort D:E by D ascending, sort G:H by G ascending, delete G,F,D,C. If you want VBA, do this with Record Macro selected.