Merging two tables while deleting duplicates in the first table and - sql

This is my first post here and your help would be greatly appreciated! I've read a lot of other posts on this site, however I cannot find the answer to my specific question. I tried using VLOOKUP, INDEX, MATCH, Pivottables, etc. However it doesn't work out the way I want.
Background information: for my thesis I'm studying the difference in cost of capital among single segment and multi segment firms. My dataset contains two tables, one with the SIC codes of the firm's segments and one with the corresponding sales of that segment. The issue with these tables is that there are duplicate SIC codes in the first table. I want to remove the duplicates from the first table and simultaneously calculate the sum of the sales of these duplicate SIC codes in the second table.
My data looks as follows:
The SIC codes per segment and the sales per segment:
Input of SIC codes and sales per segment (one company for 20 years)
What I want to do is eliminate the duplicate SIC codes. If I change the table with SIC codes I also need the table with sales to change accordingly. However, the sales of duplicate segments should not be deleted but added to the first duplicate segment. I can computed this manually for one company, however for 1800 companies would this would be very time consuming. The manually computed output for the SIC codes and for Sales looks like this (so I don't need to merge the table, the output is still in two different tables):
Required output for the SIC codes table and Sales table (one company for 20 years
Thanks a lot!

This is the way to go about it. In 2 parts: get a list of unique SIC codes in each row, and, sum up the unique corresponding values to them.
Part I:
General logic to this part: nested INDEX functions.
Let's assume your output table (range including headers A25:J45) is exactly below your main/input table (range including headers A1:J21). Assuming only 20 rows of data, but you can drag the formulae to as many rows.
The first column should always be picking up values from the corresponding first column of input table. A26 =A2 and so on.
For B26, use this formula =INDEX($A2:$J2,MATCH(0,INDEX(COUNTIF($A26:A26,$A2:$J2),0,0),0))
You can drag this formula across to J45/end of output table. (You can use "Evaluate Formula" in excel to understand the workings of the logic)
This should populate your output table with unique SIC codes for each row.
Part II:
General logic to this part: SUM and Arrays
Let's assume your output table (range including headers L25:U45) is exactly below your main/input table (range including headers L1:U21). Assuming only 20 rows of data, but you can drag the formulae to as many rows.
In cell L26, which is the first row, left most element in the output table, you will need an array formula (Ctrl+Shift+Enter). If you don't know what array formula are read here.
Formula for L26 {=SUM(IFERROR($A2:$J2=A26,0)*IFERROR($L2:$U2,0))}.
You will need to enter only this in L26 =SUM(IFERROR($A2:$J2=A26,0)*IFERROR($L2:$U2,0)), and hit Ctrl+Shift+Enter. Excel will put the curly brackets around it on its own self.
Copy paste the formula in the rest of the output table.
Screenshot for reference based on your Excel file.

There is a remove duplicates function in excel 2010 on the Data Tab Screenshot

Related

Convert 2 columns of data in excel power query to one header row with multiple columns

I am trying to figure out how to change 2 columns of data into one header row with multiple columns in Excel Power Query. It's my understanding that Query keeps the Excel file size small, and is less on the processes, as opposed to using tons of vlookups or pivot tables. I'm open to VBA if that's a better option.
For example, I have Column A with a list of names. Then, column B has another list of names with multiple instances of the same name(s). The names in column A are individuals assigned to report to individuals in B.
I'm trying to create a query (or VBA if better) where the names in B become the row headers, and the names in A fall under the corresponding person in each header.
I hope that makes sense. Thank you in advance for your help!
Here's a screenshot, demonstrating what I'm working with, and the end result I'm trying to get:
You can use Power Query:
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
ListEmployees = Table.Group(Source, {"Supervisor Name"}, {{"Employees", each Text.Combine([Employee Name],","), type text}}),
CountEmployees = Table.AddColumn(ListEmployees, "Count", each List.Count(Text.Split([Employees],","))),
SplitEmployees = Table.SplitColumn(ListEmployees, "Employees", Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv),List.Max(CountEmployees[Count])),
Transpose = Table.Transpose(SplitEmployees),
PromoteHeaders = Table.PromoteHeaders(Transpose, [PromoteAllScalars=true])
in
PromoteHeaders
Make sure your source data is structured as a Table (listobject).

How to use more than 200 nested if conditions in excel?

I have the following data in excel sheet A.
Category Name
Fruit Apple
Vegetable Brinjal
XYZ Abc
I want to create a formula which takes a value for name column, outputs the corresponding category column.
If I use VLookUp, I have to copy this reference table in each and every excel sheet wherever I need to have this operation.
Hence I am looking for something similar to
IF(input="Apple","Fruit",IF(input="Brinjal","Vegetable",IF(input="Abc",XYZ,"")))
But There is limit on nested ifs in excel and no of cases that we can have in a switch case are also limited.
I have around 200 rows of this table.
use INDEX and MATCH functions. INDEX on "category" by matching "name"
You certainly don't need so many IF statements (though I note your Q Title), for example:
=CHOOSE(MATCH(D13,{"Apple","Brinjal","Abc"},0),"Fruit","Vegetable","XYZ")
which should not grow at quite the rate your version would - but with 200 'pairs' would be getting close to the limit for CHOOSE.
(D13 as example in spreadsheet.)

Minimum amount of Markets for Maximum Products

I am trying to determine what the fewest amount number of markets that render highest amount number of unique products. I'm not sure how to get perform DISTINCT Product Counts with every combination of market.
I can put my data in SQL tables if it's easier with a query.
Here is sample data of what I'm trying to achieve.
Attempting to clarify: Essentially, I'm trying to get all the combination of markets and determine what the distinct count of products are. From there I can derive percentages.
Example:
CHI: 4 DISTINCT PRODUCTS
CHI/LA: 5 DISTINCT PRODUCTS
CHI/LA/MIA: 8 DISTINCT PRODUCTS
To count the unique products for a single market it's a scary formula with curly brackets.
I've setup the formula to work on your sheet so try putting it in cell C2 and press CTRL+SHIFT+ENTER to make it an array formula when you input it.
=SUM(IF(FREQUENCY(IF($E$2:$E$18<>"", IF($E$2:$E$18=B2, MATCH($F$2:$F$18, $F$2:$F$18, 0))), ROW($F$2:$F$18)-ROW($F$1)+1), 1))
You should be able to autofill down to the see the rest of the unique values for the markets you have listed in column B.
Getting the list of markets is complicated but someone asked the question before so check out this answer Creating a list of all possible unique combinations from an array (using VBA)

Excel: Selecting Specific Columns and Rows in Pivot Table and Formatting

I'm using a Pivot Table in Excel 2010, and while searching posts I find that a lot of users are frustrated like me because it doesn't keep all formats.
So what I'm trying to do is run a macro that formats columns in a Pivot table, but limited to the last row and column in the table. I have the formatting info, but I just need to know how to apply it to specific columns and rows.
What I was thinking might work is finding the last column of the Values row, in this case "Stops per Rte" which is the last Values column; however, I have months listed at the top, so it repeats across months. If the user filters only certain months then the # of columns will decrease.
Same goes for the # of rows: of course, the user should be able to expand/collapse rows as needed, so I only want the column format to go to the last row or just above "Grand Total", if possible.
Hopefully, this makes sense. Thanks in advance! = )

Excel randomly select name from list with multiple entries

I have an excel 2007 worksheet with employee names in column A and total number of entries in column B. I need to be able to randomly select x number of employee names from the total number of entries, allowing for the fact that some will have multiple entries.
For example:
Amy............30
Brian..........12
Charlene.......15
Michael.........1
Nathan..........7
What is the best way to do this?
My initial thoughts are:
1) find the max() of column B occurances of a random number in another column, like C. Then find the top values for all of that new column.
2) create a VBA array of all of the potiential entries and randomly pick one from there.
3) loop through all of the names in column A and create a temp worksheet with column B instances of each, then assign a random num generator and choose the top n.
Having said that, there may be something a lot easier. I am not sure where to begin. Normally I can find code that is similar to what I need, but I am not having any luck. Any help that you can offer would be appreciated.
Thank you in advance.
I would probably do something like this if I understand your question correctly(I just read your question title):