How to perform analysis on a huge Excel sheet - vba

I have an Excel sheet with around 50k rows of data and 10 columns. The sheet is about wholesalers and their products. In the current version there are around 30 unique wholesalers, each of them with around between 1000 and 3000 different products (I have queried this information from the database). What I want to do is to extract the distinct wholesalers, put them in another sheet and then for each wholesaler to find the total count of products that they offer. I was able to get a distinct list of the wholesalers (via a macro), but now I am confused how to use it in order to get the total count of their product: something like for each wholesaler do:
Select Count(*)
From worksheet s
Where s.wholesaler == "one of the value from the list"
And in general my question is what is the best way to query worksheet with loads of data? (like to use macros, pivot tables or some other excel magic)

If you have a SQL query then use it :). Excel allows you to run SQL queries. See Data ribbon, External Data-From other sources -> Microsoft Query. Or checkout my SQL extension for Excel: http://blog.tkacprow.pl/?page_id=130

Related

Table Difference in Excel

In Excel, how can I get the difference between 2 tables?
I have 2 tables: A and B.
B is a subset of A. In other words, all rows/records of B are included in A, but not vice versa.
I would like to get
A - B
i.e. I want an output which gives me only the records which are in A but not in B.
Also, more generally, if B were not a subset, how would I get
A∪B - A∩B?
I usually use VLOOKUP to achieve this if this is going to be made only for once for a dataset. Just write a VLOOKUP fomula on the big table, the rows that are not in the small table will return #N/A error. When we filter out that error rows, we will have the minus'd rows left behind.
On the other hand this is also achievable using Power Query which is a cleaner way IMHO. For Excel 2010 you should download and install it. For the newer versions Power Query is included in Excel natively.
I am able to explain the process for Office 365 since I have that version; for previous versions slight changes may apply.
First get your tables into Power Query using Data / From Table/Range menu.
When you have your both tables to Power Query, right click on a blank space at the Queries pane at the left and go to New Query / Combine / Merge Queries as New menu:
In this screen, select your tables (select the larger table in the first place), CTRL select the table fields to be minus'd and select Left Anti in the bottom combo. When you OK this you will have a minus'd new table.
* Select Close & Load in the Home menu and your new table will be available in a new sheet in Excel.
* When there is a change made in the original sheets, just press Data / Refresh in Excel and your generated table will be refreshed accordingly.

How to combine Excel data from 40 Workbooks with 25 Worksheets in each?

I have 40 Excel Workbooks each with 25 Worksheets containing data in columns A:Q and variable number of rows (would be less than 5000 rows).
Each of the 40 Workbooks represents 1 Team. Each of the 25 Worksheets is 1 month's worth of data and the sheets are named 'Jan-15', 'Feb-15' etc... to 'Dec-16', so obviously only 5 sheets have data within them so far (as it is now May-15). Each workbook is set up for future data.
I need to get all the data into one place, which I thought would be best to use a pivot table. So far, I have set up an additional workbook that contains VBA code to pull the required data into one Pivot table. It does this by having checkboxes on a Sheet for each team and more checkboxes for each month. If the checkbox is checked (= TRUE), then the data for that team and month will be pulled through.
E.g. If 'Team 1' and 'Jan-15' is selected, the pivot table will show Team 1's January data. If 'Team 5' and 'Mar-15' is selected, then Team 1 and Team 5's Jan and March data will be pulled through.
Is does this in VBA by compiling an SQL string of the required data (my SQL knowledge is limited to 2 days' research!), such as:
SELECT *
FROM `Team 13.xlsx`.[JAN-15$]
UNION ALL
SELECT *
FROM `Team 13.xlsx`.[FEB-15$]
UNION ALL
SELECT *
FROM `Team 13.xlsx`.[MAR-15$]
UNION ALL
SELECT *
FROM `Team 15.xlsx`.[JAN-15$]
UNION ALL
SELECT *
FROM `Team 15.xlsx`.[FEB-15$]
UNION ALL
SELECT *
FROM `Team 15.xlsx`.[MAR-15$]"
etc....
Sometimes I will only need to see, say Team 3's data for Jan-15 and Feb-15, sometimes I'll need Teams 1 to 10's data for Mar-15 and sometimes I'll need to see all 40 Teams' data over the last 12 months.
However, I am struggling now, because this only works for a certain number of selections. I only need to select up to 12 months at a time, so I have limited the number of 'month' checkboxes that can be selected, but I can't select all 40 of the Teams and all 12 months as I get an error message that says
Run-time error '1004': [Microsoft][ODBC Excel Driver] Query is too complex.
If I only select 10 Teams and 5 Months, or 40 Teams and 1 Month, the Query runs fine, but I need more.
Is my SQL code wrong, or is there a better way to write it?
Or is there another way that I can combine all this data? (I only have Excel, not Access, at my disposal!)
Your query is good, you have, just as the error says, simply created too complex a query for the poor Jet* Engine to handle. I would suggest that you break it down into a simple loop where you handle each team/month check box combo as a separate query, each one pulling data into your master sheet and appending it to then end of your data set that the pivot table is built off of.
With some testing, you may determine that you can pull a team for a year without overwhelming the database engine, and write your code to work that way (up to 12 Union's, for example), but that would make your code more complex. You may find that it will execute a bit faster by not having to make so many SQL queries, but then again, because they are more complex, they may execute more slowly.
*Yes, I know it's not called "Jet" any more, but I choose to let the Wiki link handle the fine lines, and most people probably still call it that despite what Microsoft wants us to do.
Thank you to the people that responded to this question. Your help and experience is greatly appreciated :-)
I have managed to sort this out today using the comments posted; using VBA to loop through workbooks and worksheets using the checked boxes to copy the data from each sheet into a master sheet within another (report) workbook and then to use VBA to create a pivot table from the master data.
Thank you to FreeMan for posting an answer, however unfortunately my SQL and Query skills are very limited and I didn't understand or know how to implement your suggestion! So I thought my VBA solution would be the easiest for me to do and it works.
I think initially I was over-complicating the issue with trying to use Query/SQL, but I found that that was the way to go based on the research I did trawling the 'net beforehand. I thought that copying all the data across may be too time consuming and difficult, but it turns out it was not.
I can post the code I used to check the checkboxes, open a file, select worksheets, copy data across and create a pivot table if anyone needs to do the same.
Hopefully this can help someone out in the future!
NGH

sorting BIG excel data

Right so, I have been given a LOT of "consumer data" to sort, 3 excel files, each containing multiple worksheets, up to 7 worksheets, each up to 1M rows (max worksheet size in excel 2013 is just over 1 rows)
I need to pull out of these all people within a region, so have a list of post codes in this region (say 30 post code areas)
How can I achieve this most easily?
If the data was in SQL server, i'd just write a long SQL statement selecting all where postcode LIKE 'B75'% OR 'B74'% etc etc.
But in excel I can only run a "filer" on one worksheet at a time... (I think)
Is it going to be easiest to throw all the data into sql server, or have I overlooked a method?
First solution is to let Excel do the task of sorting for you. You need to add filters to columns and select Sorting options.
Other solution is to Export Data to SQL Table(s). To do this, Open SQL Management Studio, Right Click on Database to which you want to export data to and Select "From Excel File". Do this for each Excel file you have. After importing all data in Database, sort Data using SQL Query.
Second Solution is reliable but first solution is faster. You need to decide which one you should select.

Filtering to new sheets from master sheet in Excel

Im wondering if this is possible with excel. I have a list of employees with multiple columns counting up their production in my company. It is a list of about 40 names and has about 10 columns of different tasks that are counted up to determine productivity. these people are split up into different teams and i have their names tagged with the team. For example, if joe smith was in customer service, his name says CSR-Joe Smith. I want to be able to use this excel sheet over and over again so i can simply run this master report and put the data into one sheet to populate a different sheet for each team. Is there a way to do this by looking at the team tag i add to each persons name and extract from the list of employees the team members that are on the same team? Im working on excel 2010 and have some knowledge of VB and C coding in excel but not a ton of experience with it. I also want to pull the values for each team member in each column.
here is an example if my words didnt make sense, each employee has a row like this.
name status1 status 2 status 3 status 4 status 5
CSR-Joe Smith 251 358 12 58 9
I should mention that this data is gathered from a SQL query that i want to pull out of MS SQL management studio and am copying into excel.
To sum it up, my solution for this kind of problem and related:
Solve database-problems in a database, whenever you can ;)
It is far more easy to make joins, selection and filtering inside a database, than inside excel.
If you have to do it in excel, see if subsum, pivotTable or vlookup can help you, because those could most likely do.
Be alarmed, when you start trying to solve database-problems with vba - usually there are better ways, or at least there should be. When you have a good datasource - like sql - it should definetly be your first attempt to get create a better view or table in sql.
Additionlly, you can use .odc connection-files with excel directly. By opening one, you create a new Workbook and you will open your connection as a worksheet.

Oracle SQL - Is there are more efficient way to organise a massive case statement

Currently I have a report which looks at different types of documents. Each document has an assigned timescale it should be completed by (i.e. 2 days, 4 days, etc). There are more than 100 types of document. Currently, this assigned timescale for each document is held in an excel spreadsheet and matched to the data in excel using a vlookup formula (based on assessment ID). Unfortunately there is no place in our database to put this assigned timescale, but I would like to be able to run a report from the database and just send it to users without having to do this extra manipulation in excel. I know that I could achieve this by writing a massive case statement (below is just an example)
i.e.
SELECT
ID,
CASE WHEN ID = 1 then '1 day'
WHEN ID = 2 then '42 days'
WHEN ID = 3 then '16 days'
ELSE 'CHECK' end as 'Timescale'
FROM TABLE1
But I did wonder if there was a more efficient way of doing this in the SQL (besides requesting an additional field in the database to record this!)? It might be that there isn't, but thought it was worth asking! Thanks.
If you have 100 different time scales it would be reasonable to add a TIMESCALE table to your database and get away from storing information which is important to your business in a spreadsheet. Nothing against Excel, fine product, some of my best friends are Excel spreadsheets - but I don't store business-critical information in them.
Share and enjoy.
So you want to join between an Oracle table and an excel sheet...
I think this is not entirely impossible. There are 2 ways.
Way 1. You can do the join in Oracle. That means that you have to write a Java Stored Procedure that can read the excel sheet. The next step is to create PL/SQL wrappers for wrapping this Java Stored Procedure. After that you can write an SQL statement that calls the Java Stored Procedure via the PL/SQL wrappers, this SQL statement can make a join with your Oracle-table.
Yes indeed, this is very complex.
Way 2. I think you can connect from an excel sheet to Oracle via ODBC. It should be possible to fetch data from Oracle within excel. So excel can do the join for you.
Yes indeed, this is very complex.
You can also put this extra data in a new Timescale table (like Bob Jarvis suggested) but you will have to synchronize between the excel sheet and the Oracle table.
You can also move all data to Oracle. Or maybe you can move all data to excel (probably not) ?