How to do calculations on pandas data frame and put the results in a specific area on an excel sheet? - pandas

I am new to pandas and trying to automate some boring excel tasks. For example, each month I update each of these fields with a new monthly data % return.
I would then below it like to have a table that shows a bunch of stats on those returns. For example:
For each column, I would like a new table below that says for example: Standard Deviation and then gives the standard deviation of each set of total returns. I would like another row that is annualised return (which in excel is = GEOMEAN(1+Cell Range)^12 -1.
I could have another row below that which is the a modified sharpe ratio and use the return / standard deviation calculated above.
This is all straightforward on excel but I don't know how to start this on Pandas. Hopefully my image attached is intuitive.
Example info + what I am after

Related

Excel index match in SQL - lookup a value from a matrix

I'm trying to perform somethign similar to an excel index match in SQL.
I have a matrix looking like the below:
Matrix in excel:
This is made by variable Pers (vertical) and variable Contribution (horizontal) where 5-6-7-8-9-10 are values the variable contribution can take. this is just for illustration purposes, my matrix is 72 columns per 165 rows for a total of 11644 cells (Universe of possible values).
Through those 2 variables I need to extract for any ID the cell 'value' inside the matrix. an example below of the desired output:
Output in excel:
I can make this in excel via index match but I would like to import this matrix in SQL and make the lookup in there so the calculation can be dynamic because I have always new IDs with their own 'pers' and 'contribution' that will need to be associated with a value in that matrix, therefore it's inefficient to extract the IDs from SQL to make the calculation in excel to then import it into SQL again.
obviously I can't use a 'case when' because I have 11644 cases which would be a suicide to write/read
is there anything in SQL that can perform anything similar to index-match?
Any suggestion is appreciated!

Pandas Dataframe Functionality

I would like to create dataframes using excel spreadsheets as source data. I need to transform the data series from the format used to store the data in the excel spreadsheets to the dataframe variable end product.
I would like to know if users have experience in using various python methods to accomplish the following:
-data series transform: I have a series that includes one data value per month, but would like to expand the table of values to include one value per day using an index (or perhaps column with date values). So if table1 has a month based index and table2 has a daily index how can I convert table1 values to the table2 based index.
-dataframe sculpting: the data I am working with is not similar in length, some data sets are longer than others. By what methods is it possible to find the shortest series length in a column in the context of a multicolumn dataframe?
Essentially, I would like to take individual tables from workbooks and combine them into a single dataframe that uses a single index value as the basis for their presentation. My workbook tables may have data point frequencies of daily, weekly, or monthly and I would like to build a dataframe that uses the daily index as a basis for the table elements while including an element for each day in series that are weekly and monthly.
I am looking at the Pandas library, but perhaps there are other libraries that I have overlooked with additional functionality.
Thanks for helping!
For your first question, try something like:
df1 = df1.resample('1d').first()
df2.merge(df1)
That will upsample your monthly or weekly dataframes and merge it with your daily dataframe. Take a look at the interpolate method to fill in missing values. To get the name of the shortest column try this out:
df.count().idxmin()
Hope that helps!

Conditional cell formatting on SSRS pivot table

I created a pivot table in SQL that has report names along the left side, and hours (00:00, 00:01, etc.) along the top. The values in the table are the number of times each report has been used during that hour over the past three months. I've imported the table into SSRS, and I'm trying to create a heat map of sorts. I want to color the cells darker or lighter across the row based on the number in each cell compared to the value of cells across the row (cell that has the highest value will be the darkest colored).
I've tried following this guide to color the cells, but here the entire row is one field, while I have separate fields for each column. Is there a way to achieve this?
EDIT: Added picture of table design, and preview where coloring is done incorrectly
I understand your problem better now...The function uses the min and max values of a column to determine the range from lightest to darkest, then it probably looks at what fraction of the range your actual value is. In your case where you have each column's data coming from a different cell it'll be a pain unless your columns are fixed and even then it's more trouble than it needs to be.
I would suggest the following.
DON'T PIVOT your data in SQL, we can do that really easily in SSRS, your dataset will be simpler too something like
ReportName Hour UsageCount
ReportA 0 8
ReportA 1 4
ReportC 22 18
and so on...
Create a new report and add a matrix with reportName as the row group and hour as the column group. The data values will be UsageCount.
That's it for the report design, then just set the cells back ground based on your function but this time you can pass in Max(Fields!UsageCount.Value) etc as per the sample.
I've rushed this a bit so it if not clear, let me know and I'll post a clearer solution.

PowerBI Dynamic Time Series BarChart

Adding on my previous question here: TimeSeries question
I would like to plot a unit capacity chart over a Time series (which contains a range of dates set by the user).
The chart I am trying to plot is as follows:
For each Unit Name, I have start and end date for the unit capacities, as shown in the PowerBI table as below:
4 sub questions:
How to plot these capacities over time? Maybe using some DAX functions?
Do i need the SSAS cube to solve this problem or can I do all the work inside PowerBI desktop? If not, is there a better way for example in SSRS?
Is there a way to make the x-axis time series dynamic as specified by the user?
Adding to this, after Leonard's response. After converting the OutageStartDateOrig, and OutageEndDateOrig values I tried to create the calculated column as suggested in the youtube link {enter link description here}. However, the DAX formula as shown in the video gives out a syntax error for me stating that the '.' is incorrect when specifying the range of dates. Any ideas for this? [Screenshot below]:
To create such a visual, I'd recommend an area chart (or stacked area chart) with the date on the axis, the unit name on the legend, and the capacity on the values. You could also do it as a stacked column chart too. However, then each date will be broken into discrete columns. See below image.
In terms of data manipulation, you'll need to convert the data with the date ranges you have above into a row for each individual date & unit. E.g. the first row, instead of being 11/2 to 13/2, would be expanded into 3 rows, one for each date.
You can do this in Power Query as you bring the data into Power BI Desktop, or in DAX after bringing it in. There are several solutions to this outlined in this thread (https://community.powerbi.com/t5/Desktop/Convert-date-ranges-into-list-of-dates/td-p/129418), but personally, I recommend the technique (and video) posted by MarcelBeug (https://youtu.be/QSXzhb-EwHM).
You'll also want an independent list of dates (with no gaps) to join the final date column to - otherwise your visual will skip dates when no units had capacity. By default, the chart will begin on the first date with data and end on the last date with data, so in that sense it is dynamic, but you can add a date slicer to give the end-user more control.
Area chart on top, column chart on bottom, date slicer on right filtering Jan-Mar.

Transpose Large Data Set with Headers in Excel VBA

The image above shows the current structure of the time series data that I am working with. It has many columns of time series data which are identified by the customer id in the header row. In order to use this data in a pivot table for analysis, I'd like to convert it to a format like the image below:
Here the customer id becomes a dimension that describes the time series data.
Since this is a large data set, it would be a huge time sink to manually transform the data into the desired format. Also, I do not have fancy add-ins like Power Pivot or Power Query...
Using Excel VBA, how can I write a macro to handle this task?
Thanks,
Solution to problem can be found here: http://www.excel.solutions/2014/03/unpivot-excel-data
Start off by pressing keys ALT > D > P to open the Pivot Table Wizard dialog box:
Choose the ‘Multiple consolidation ranges’ option, then click ‘Next’
In step 2a of the wizard, choose the ‘I will create the page fields’ option, and click ‘Next’
Now we need to add our crosstab data range as a data source for this pivot table. Enter / select the appropriate range, then click ‘Add’. Then click ‘Next’.
Choose a location for the intermediate pivot table (it’s a good idea to use a new worksheet, as we can simply delete the entire worksheet when we’re finished). Then click ‘Finish’.
We now have an ‘intermediate’ pivot table, which looks very similar to our raw data, but has some grand totals. Now we want to drill into the source data for this pivot table, by double clicking on the overall Grand Total value – the cell intersection of the Grand Total column, and Grand Total row:
By double clicking to drill into the grand total data source, another worksheet is created, containing a table with our unpivotted data: