I have a lot of data for each User ID that needs to be organized by column rather than by row as it is currently. I have tried standard transposition methods but cannot figure this out. Any ideas would be greatly appreciated.
Current data set:
UserId Item Value(mL)
1 AAA 12
1 AAB 21
1 AAC 31
2 AAA 15
2 AAB 21
2 AAC 34
2 AAD 16
Desired outcome:
UserID AAA AAB AAC AAD
1 12 21 31
2 15 21 34 16
With formula:
=SUMIFS($C:$C,$A:$A,$F2,$B:$B,G$1)
Copy over and down.
As #skkakkar stated: with Pivot Table
There is an excel paste option called "transpose" that will allow you to accomplish this. Select your data and copy it. Then go to the target cell and go to paste options and press "T" or click the transpose button.
EDIT:
There are other ways of solving this, as Scott has shown in his answer. If you are performing this on a large data set, my solution will be the fastest by far, but his solution is also very sleek. In addition, this won't work to only keep non-duplicate headers. You will need to do a bit of work to have this work the exact way the poster wanted.
Related
I am using Camelot to extract borderless tables from a pdf file. I've used the below parameters
budget_tables = camelot.read_pdf(budget_file,pages='all',flavor='stream',edge_tol=80,strip_text='\n')
The issue is that for some tables(there are over 300 tables in this file) some of the values that are too large end up grouped together in the same cell. So that I have an output like the below where some rows each value is in a separate column and other values are separated by a space and placed in the same cell.
I was thinking I'd have to create a function that goes through the dataframe and check each cell for the delimiters (' '), splits it and fills the empty cells around it with the splits (which I think i still need help with as its not consistent whether cells are empty to the left or right). But if there is a method within the Camelot line that may help reduce these type of outputs then i think that's where I'd prefer to start.
Sorry for the bad table formatting below. Any tips on showing this table a bit better would be appreciated. I can't upload images from my workstation.
0 | 1 | 2 | 3 | 4 | 5 | 6
30 Sales of non-financial assets |173,853 |192,108 |176,957 |226,843 |188,370 |74,022
31 Payments for non-financial asset|-1,274,120 |-866,331 |-1,372,111 -1,100,557 -1,359,568 ...
32 Net cash flows from investments |-1,100,267 |-674,223 -1,195,154| |-873,714 -1,171,198 -1,229,102|
33 in non-financial assets
34 Cash flows from investments in
35 financial assets for policy
36 purposes
37 Receipts
38 Repayment of loans | 30,044 | 29,409 | 1,185 |3,235 |6,136 |9,036
I wonder if you can help me to find a solution for the following problem. Given a data frame df1 like this
d1={'L':['aaa','bbb','ccc','aaa','bbb','ddd'],
'w':[1,5,9,13,17,21],
'x':[2,6,10,14,18,22],
'y':[3,7,11,15,19,23],
'z':[4,8,12,16,20,24]}
df1=pd.DataFrame(d1)
and two dictionaries to define grouping over columns and rows
dctRowGroups={'aaa':'A','bbb':'B','ccc':'A','ddd':'B'}
dctColGroups={'w':'ALPHA','x':'BETA','y':'ALPHA','z':'BETA'}
I wanted to aggregate over columns as a first step. Applying
g2=df1.groupby(dctColGroups,axis=1)
g2.sum()
results in
but I wanted to keep the 'L' column for the next step row-wise aggregation, i.e. the result should be a dataframe df2 more like this:
What do I need to code to make this happen?
As a next step, I want to aggregate df2 over the rows using the dctRowGroups dictionary
g3=df2.groupby(dctRowGroups,axis=0)
g3.sum()
to get a final result like this:
In what way can I do all these steps in as few lines of code as possible?
Appreciate your advice on this.
Thanks a lot
Willfried.
You can do:
Firstly create df2 and insert 'L' column by using insert() method:
df2=df1.groupby(dctColGroups,axis=1).sum()
df2.insert(0,'L',df1['L']) #use this only when the order matters
#OR(use anyone of the method either insert or assign)
df2=df2.assign(L=df1['L']) #otherwise use this
Finally use assign() ,map() and groupby() method:
result=df2.assign(L=df2['L'].map(dctRowGroups)).groupby('L').sum()
Outputs:
df2:
L ALPHA BETA
0 aaa 4 6
1 bbb 12 14
2 ccc 20 22
3 aaa 28 30
4 bbb 36 38
5 ddd 44 46
result:
ALPHA BETA
L
A 52 58
B 92 98
i have a Datagrid that stores the number pencils produced each day of the month it looks like this:
Pencil | day 1 | day 2 | day 3 | ... |day 31
Red 0 0 13 0 0
blue 5 1 0 8 0
yellow 0 9 5 0 0
I need to save this data into SQL table but im not sure what's the most efficent way to design the table in SQL.
I was thinking about creating a table in SQL with the fields:
pencilmodel
date
quantity
and then in vb.net making a loop that saves 1 by 1 each cell of the datagrid in to the table, but i dont think this is the best way since i will have like 30 rows and a month has 31 days max so it will be 30*31= 930 times.
Im using VB.net and SQL Server
i would create the table that way (as you suggested):
ID | pencilmodel | ProducedDate | Quantity
1 blue dd-mm-yyyy 7
2 red dd-mm-yyyy 4
3 yellow dd-mm-yyyy 6
also, dont loop and insert each row to database, its not efficient, add it to a dataset first and then update it using DataAdapter.Update or bind a dataset to the datagrid view:
How to: Bind Data to the Windows Forms DataGridView Control
I dont know if this one is relevant but why dont you create a fields based on the date and time? lets say like this in your PC
12/14/2016
You can create a program that will create a field for you everyday for example when the day passes by then add a column look like this.
__________________________________
|12/14/2016|12/15/2016|12/15/2016|
so what will happen is you dont need to loop in DGV you just do your INSERT COMMAND
you just need some modifications and validations in here like
if Date_Has_Been_Changed then
Create Table Add Columns
End If
Excel build-in functions are, at most of the time, effective. However, there are some functions really like implemented half-way and some how dictated their usage. The SUBTOTAL function is one of them.
My data is presented in the following format:
Value Count
100 20
102 3
105 4
102 5
And I want to build a table in this format:
Value Count
100 20
101 0
102 8
103 0
104 0
105 4
I've read this in SO but my situation is a bit differ. Pivot table will be able to give you the subtotals of the values appears in the original data and I don't want to have a loop to insert missing values in the original data (if it is gonna to be a loop over the original data, the loop could use to build the table - which I would prefer to avoid at all)
I wasnt able to find anything like this yet... but here is what i need to do:
I have a query result like this:
ID Data1 Data2 Data3 Data4 ... Data7
1 12 13 15 1 ... 12
2 12 13 15 1 ... 12
3 12 13 15 1 ... 12
4 12 13 15 1 ... 12
I need to make a BarChart With 2 Values, 1 is the first row (ID=1) one is the last row (ID=4). The column headers DataX is what i need the series to be paired by.
Example:
ID Insured Uninsured Rejected
1 12 3 0
4 16 9 2
In the BarChart i need to see the number of insured or ID=1 and ID=2 next to each other, the number of Uninsured and rejected the same.
I feel like i have tried all ways possible but was not able to get anything besides a BarChart where all values of ID=1 where displayed and then all values for ID=2 where displayed next to each other.
Im sure this was a very confusing way to describe it, but i hope someone can understand what i am looking for.
NOTE: I tried to do this in Excel, and it worked within 2 minutes. I set the filter: Series on the 2 rows that i wanted, and set the Categories to the dataX Columns as described, and everything looked great. When i tried to translate this into SSRS i was able to do all the same things in the Series and Categories, but then i had to put in values and that screwed everything up.
PLEASE HELP!
I bet you need to add a grouping to your values by a spanning factor.