VBA - How to refer to column names in Array? - vba

To start, I have a scraping function that scrapes a table from a web page, and stores the data in a 2D array.
The 2D array starts from row 0 to however many rows of data are on the page, and columns 1 to however many columns there are.
Row 0 simply contains all the column names.
My ReDim:
ReDim Addresses(0 To lngTotalRecords, 1 To columns.Count) As String
I've then stored this 2D array into a scripting dictionary called dictClients, as there are multiple clients that all have their own entries for the same table on the web page.
So in my dictionary I have something like the following to refer to a particular address table for a particular client:
dictClients(1)("Addresses")
dictClients(2)("Addresses")
I now want to be able to check if a cell in a certain row contains a specific value, however the web page allows the columns to be reorganized so that:
dictClients(1)("Addresses")(1,1) 'row 1 column 1
will not always refer to the "Street Number" column. The street number column could be the following for someone else for example:
dictClients(1)("Addresses")(1,3) ' row 1 column 3
Given that these cells:
dictClients(1)("Addresses")(0,1) '(0,2) (0,3) etc.
all refer to the column's names, what's the best way for me to find the position of a particular named column?
Example: I want to get the value of the Postal Code cell in row 1, so I need to look in
dictClients(1)("Addresses")(1, POSTALCODECOLUMN),
which isn't always in the same position on the web page.
I was thinking of using the following function:
Public Function column(strArr() As String, strColumnName As String)
Dim i As Long
For i = 1 To UBound(strArr, 2)
If strArr(0, i) = strColumnName Then column = i
Next
End Function
But it just feels so lengthy calling it like:
strPostalCode =
dictClients(1)("Addresses")(1,column(dictClients(1)("Addresses"), "Postal Code")
Is there a better and easier way to do this?
Thanks.

Related

MS Access Extract Multiple Matching Text Strings from Long Text Field compared to Table List

Issue: Query is not able to pull all of the restricted words found in a Long Text Field. It is getting the restricted words from a Table Column of ~100 values.
Sample Data
Table: RecipeTable with Long Text Field: RecipeText
Example Contents of RecipeText Field: Add the rutabaga, leeks, carrots and cabbage to the Instant Pot®. Seal and cook on high pressure for 4 minutes. Quick release the steam. Thinly slice the brisket across the grain and transfer to a serving platter. Arrange the vegetables around the meat, sprinkle with the parsley and serve with the sour cream, horseradish and mustard on the side.
Desired Result:
Want to Compare RecipeText Field against every value in this Short Text Field RestrictedItem in Table: RestrictedTable.
RestrictedTable.RestrictedItem contains 100 values. Let's say it contains 6 for this exercise: milk, bake, spoon, carrots, mustard and steam.
Query would find these matched words in no particular order for a single record: carrots mustard steam
I've tried this: How to find words in a memo field with microsoft access
Result: Finds only 1 of many matches within the Long Text field.
Desired Result: Find ALL matched words extracted within the Long Text string. Duplicates & wildcards are fine. Case sensitive is bad.
Example Tried:
SELECT a.Adjectives, b.Content
FROM A, B
WHERE b.Content Like "*" & a.[adjectives] & "*"
LIKE and after is where I believe the issue is. I've tried using %, parentheses, spaces, etc to no avail.
Mine became this:
SELECT RecipeTable.RecipeText, RestrictedTable.RestrictedItem
FROM RecipeTable, RestrictedTable
WHERE RecipeTable.RecipeText LIKE "*" & RestrictedTable.RestrictedItem & "*";
Notes:
I can find lots of advice to find single words, but not comparing whole table columns to one field.
And, lots of advice to find the first substring or nth position, but I want all of the substrings that match. Not the position & I'm afraid that applying trimming, etc, will slow things down on searching 100 words & trimming for each one.
I am fine making this a calculated field on my form that holds the RecipeText field.
Also fine with making a button that would launch a query to compare the RecipeText field with the RestrictedTable.RestrictedItem List & fill in an empty field RestrictedFound on the same form.
The code below are two approaches to find all restricted words that are in a memo field. While this could all be done programmatically without staging/work tables I would recommend using a temporary or permanent table to extract the words from the memo field via the split function in VBA (after accounting for punctuation and other data scrubbing).
After splitting the words from the memo field into an array they could then be inserted into a separate table with a foreign key reference to RecipeTable. This could be a temporary table or permanent if needed and could be part of the workflow process. A field like PendingReview could be added to RecipeTable for processing new records then marked as false afterwards so they won't be processed again.
After the words were added to the other table it could be joined to RecipeTable
by foreign key and you should have all matches of restricted words.
Once you have the information you could store the stats and discard the work record from your temporary table or delete the work records until the process is run again.
You could do it all in VBA with a dictionary lookup of the restricted words, i.e., query restricted words table, add to a dictionary then loop through matching each word in the memo field with lower case or case insensitive comparison, but it may take a while.
First Code Snippet Below
(If you want compile time checks then you must Reference the Microsoft Scripting Runtime my path is C:\Windows\SysWOW64\scrrun.dll)
Dim dic as Dictionary
Dim memoField as string
Dim words() as String
Dim matchCnt as Integer
'Other variables I didnt declare
'Code to populate dictionary
'Do Until rstRestricted.EOF
' dic.add LCase$(rst("restrictedWord")), 0
' rstRestricted.MoveNext
'Loop
'rstRestricted.Close
'Set rstRestricted = Nothing
Set rst = New adodb.Recordset
rst.Open "SELECT [MemoField] FROM RecipeTable;"
lngRowCnt = CLng(rst.RecordCount) - 1
For x = 0 to lngRowCnt
memoField = LCase$(Nz(rst("MemoField")))
'Replace punctuation like commas, periods
'memoField = Replace(memoField, ",","")
'Now split after data scrubbed
words = Split(memoField, " ")
intWordCnt = UBound(words)
For z = 0 to intWordCnt
If LenB(words(z)) <> 0 Then
If dic.Exists(words(z) = True Then
matchCnt = dic(words(z))
dic(words(z)) = matchCnt + 1
End If
End If
Next z
Next x
Dim WordKeys() as Variant
Dim y as Integer
Dim restrictedWord as string
Dim wordCnt as Integer
WordKeys = dic.Keys
For y = 0 to UBound(WordKeys) '-1
restrictedWord = CStr(WordKeys(y))
wordCnt = CInt(WordKeys(restrictedWord))
'code to save or display stats
Next y
rst.Close
Set rst = Nothing
Set conn = Nothing
I would just do the split of all words into a working table with the word field indexed then do an aggregate with counts of restricted words.
Second Code Snippet
'Option Explicit
Dim sql as String
Dim memoDelimitedData() as String
'Other variables declared
'Code to open Recordset table for recipe and also code to open
'Work table with adOpenDynamic (SELECT * from WorkTable)
'loop through records to be processed
'Split Field (May need variant instead of array. My Access VBA is rusty)
words = Split(memoField, " ")
intWordCnt = UBound(words)
For x = 0 to intWordCnt
With rstWorkTable
.AddNew
!Word = words(x)
!ForeignKeyIdToRecipeTable = intForeignKeyId
.Update
End With
Next x
Then when you have the work table records added you can join to the RecipeTable and the RestrictedTable.
So build a WorkTable of delimited Words from the memo field. Have the foreign key reference to the recipe table then join the RestrictedTable to the WorkTable by the RestrictedItem.
If needed this could be a query for a make table or a staging table permanent table. etc.
So something like this would then give you matches, of any words in your restricted table:
SELECT RecipeTable.RecipeText, RestrictedTable.RestrictedItem
FROM RecipeTable
INNER JOIN WorkTable ON
RecipeTable.Id = WorkTable.RecipeTableId
INNER JOIN RestrictedTable ON
WorkTable.ForeignKeyIdToRecipeTable = RestrictedTable.RestrictedItem
MS Access Split Function
At that point you could do counts, sums, and other data.
I'm sorry I thought I had example code, but I couldn't find it. I had to do something like this in college many moons ago using VBA and Access (Word Count/Ranking assignment), but I can't find it. Nowadays I'd do this kind of stuff with SQL Server with numbers tables, XML/JSON functionality or the Full Text Searching capability.
Hopefully this may help point you in the right direction if you need to limit your work inside MS Access.
If you're not comfortable with working with ADODB or DAO recordsets you could build a CSV delimited file with the foreign key and the word then import that file into a work table.

How to output an excel array without knowing how many items it has

Scenario: I have a code that connects to an API (in this case Factset) and retrieves some data. The data output of the formula is an array of dates, which can have any number of dates inside.
Issue: Currently, my code puts the formula into the sheets and updates. The problem is, for an excel array to work (as far as I understand), I have to select as many cells as output items and press "ctrl+shift+enter". The problem is, for each item in my list, I don't know precisely how many output values will be, so I cannot pre-select that range, therefore my output just get the first data row of the array.
For example, the output should be:
15.01.2018
15.02.2018
15.03.2018
15.04.2018
15.05.2018
but since I don't know there will be 5 dates for that item, my current output is only:
15.01.2018
Question: Is there a possible way to do this, without having the final number of data rows that will be in the output? If so, how could I do it?
Code so far:
For i = 2 To numberofitems + 1
If wb.Worksheets("Dates").Cells(1, i).Value <> "" Then
wb.Worksheets("Call Prices").Cells(2, i).Formula = "=FDS(" & Cells(1, i).Address(0, 0) & ",""FI_(CALL,,DATE,,DATE)"")"
Application.Goto ActiveWorkbook.Sheets("Dates").Cells(2, i)
Application.SendKeys("^+{ENTER}")
End If
Next i
Obs: I also read some other posts with the .FormulaArray property, but could not understand how to use it without preselecting the range.
Obs2: The data is retrieved form the API (factset), so it is neither on the sheet nor in a VBA array. Since the data will come from the API, I don't know beforehand how many items it will have.
You can pull the dates into an array and then loop through each of the items in the array.
Sub datesArray()
vDates As Variant
vDates = Sheets("Sheets1").Range("A1:A5")
For i = 1 To UBound(vDates)
'your code
Next i
End Sub

Select cell from column with row value

This simple problem has caused me some recent issues. I have a range of cells which are columns that hold onto different types of information. Using a row value (Integer not Range) which is previously determined I am looking to perform a check with the values within a single cell.
For example, I look through a list of names in column A. If the name is found it holds onto the Row value. Let's assume that the row is 10. This row value will be used in checking the column values for this row (I.e. C10, J10, and K10). How can I select a single cell and then compare the values within those cells?
To get the equivalent to MATCH() / INDEX() or VLOOKUP() in VBA for getting the data for Darth Vader here:
we could use something like:
Sub GetTheRowValue()
Dim RowValue As Long
RowValue = Range("A:A").Find(What:="Darth Vader", After:=Range("A1")).Row
MsgBox Range("B" & RowValue).Value
End Sub
The finds the proper row and then acquires data from other columns in that row.

Extract substring of list based on another list

Using two lists, one consisting of names with added information in various forms (see below for example - list 1) and one consisting of the clear formatted names, i.e. with no added information (list 2)
List 1
--------
Netto City | Value
Imerco City | value
Bilka Suburb | value
Bauhaus, City | Value
City FDB Superb | Value
List 2
------
Netto
Imerco
Bilka
Bauhaus
FDB Super
What I am trying to do is create a filter, so that no matter what the first column of my source data(list 1) looks like, i will be able to sum the values based on (list 2).
Something similar to this: Excel - extracting data based on another list
I tried using vlookup, but that does not search for substrings, then i tried using
=IF(COUNTIF(A$4:A$9;"*"&D5&"*")>0;
INDIRECT(ADDRESS(MATCH("*"&D5&"*";A$4:A$9;0);4));"not found")
But that appears to do the opposite, search list 1 for a single cell value from list 2.
I can't quite get my head around if this works just as well, I havent been able to get it to work anyway, thus my search for the other way. Search List 2, for each item from List 1.
But, ultimately, what I am trying to accomplish is to create a list from the source data, which I can use to categorize each item in list 1 from, based on list 3
List 3
Bilka | Cat1
Imerco | Cat2
FDB Super | Cat1
etc.
For that to work, i need a clean list of the source data, without all the extra information which comes with it.
I use the following sumif
=SUMIFS($F$3:$F$703;$B$3:$B$703;
"="&$H4;$D$3:$D$703;">="&I$2;$D$3:$D$703;"<="&I$3)
to sum all sums belonging to a particular item in List 3 (where i've manually created List 3), between to dates.
The purpose of this is to create a sheet that contains all expenditures to a particular store or category of ones own choosing, for instance the ones listed in List 1, are primarily food stores.
Edit - Clarification.
What I am proposing to do is a multistage process.
Stage 1:
Insert original source data (done)
Stage 2:
Filter source data for unique values (done)
Stage 3:
Create list of approve names for each item in source data
- Ie, Bilka Suburb into Bilka, Netto City into Netto
Here 'Netto' and 'Bilka' are approved names which is manually created to allow for grouping in stage 4. I am looking to automatize this step.
Stage 4:
Group each item from the list of Stage 3, based on name and date-interval, weekly monthly whatever (done) if i could only get Stage 3 to work, as it works on my manually corrected data.
Stage 5:
Select appropriate category, and type for each item in resulting list from Stage 3:
Bilka, is a food place, so it would get the category 'food', same as netto, where Bauhaus would get the category 'Building Supplies', each of these items would get the type 'expense' where say wage would get the type 'income' (done)
the solution to stage 5, is just a vlookup, based on the category into a table that lists each category with a type, so that is simple enough.
Final Solution: Requires that the list to iterate over is in column G, and outputs the list of approved names in column H. There is the error of if not being able to know the difference between an item such as "Super" and "SU", I don't know how to fix that. If anyone has any suggestions on that I am all ears.
Sub LoopCells()
Sheets("RawData").Select
Sheets("RawData").Activate
LRApproved = Cells(Rows.Count, "H").End(xlUp).Row
LRsource = Cells(Rows.Count, "G").End(xlUp).Row
For Each approvedcell In Worksheets("RawData").Range("H2:H" & LRApproved).Cells 'Approved stores entered by users
For Each sourcecell In Worksheets("RawData").Range("G2:G" & LRsource).Cells 'items found from bank statement export
If InStr(UCase(sourcecell.Value), UCase(approvedcell.Value)) <> 0 Then
sourcecell.Offset(0, 2).Value = approvedcell.Value
End If
Next sourcecell
Next approvedcell
End Sub
Thanks for all the help.
Edit: Added final solution and VBA tag.
This works for me:
=SUM(B$3:B$7*NOT(ISERROR(SEARCH(A11,A$3:A$7))))
This assumes that your example list 1 is in range A3:B7 and your list 2 in A11:B15. Paste the above formula in cell B11 and press CtrlShift-Enter to enter it as an array formula. Then you can drag-copy it all the way down to B15.
Explanation: SEARCH for e.g. "Netto" in the cells of List 1. For cells that do not contain that string, SEARCH returns an error. So we're looking for cells that do not return an error. We now have an array of booleans indicating this. Multiply it element-by-element by the array of values. In this multiplication, TRUE is interpreted as 1 and FALSE as zero, so you're screening out the values that don't correspond to "Netto".
Here's a secreenshot of my setup:
Perhaps I've misunderstood but can't you use SUMIF?
=SUMIF(A$4:A$9;"*"&D5&"*";B$4:B$9)
instead of going with VBA, you can extract this with simple small formula. =Index(List2!A2:A10,Match(1,Countif(List1A2,""&List2!A2:A10&""),0)) (Press Ctrl+Shift+Enter). Assume you want to extract the list 2 in to list 1.

Is there a way to check for duplicate values in Excel WITHOUT using the CountIf function?

A lot of the solutions here on SO involve using CountIf to find duplicates. When I have a list of 100,000+ values however, it will often take minutes for CountIf to search for duplicates.
Is there a quicker way to search for duplicates within an Excel column WITHOUT using CountIf?
Thanks!
EDIT #1:
After reading the comments and replies I realize I need to go into greater detail. Let's pretend I'm a birdwatcher, and after I return from a birdwatching trip I input anywhere from 1 to 25 or 50 new birds that I saw on my trip into my "Master List of Birds Seen". This is really a dynamically growing list, and with each addition I want to make sure I'm not duplicating something that already exists in my list.
So, in column A of my file are the names of the birds. Column B-M might contain other attributes of the birds. I want to know if a bird that I just added in column A after my latest birdwatching trip ALREADY exists somewhere ELSE in my list. And, if it does, I would manually merge the data of the 2 entries and throw away some and keep some after careful review. I clearly don't want to have duplicate entries of the same bird in my database.
So, ultimately I want some indication that there is or isn't a duplicate somewhere else, and if there is duplicate please tell me what row to look in (or highlight or color both of the duplicates).
The fastest way that I know of (in case you are using Excel 2007/2010/2011) is to use Data (In Ribbon) | Remove Duplicates to find the total number of duplicates OR to remove duplicates. You might want to move data to a temp sheet before you test this.
The 2nd fastest way is to use Countif. Now Countif can be used in many ways to find duplicates. Here are two main ways.
1) Inserting a New Column next to the data and putting the formula and simply copying it down.
2) Using Countif in Conditional formatting to highlight cells which are duplicates. For more details, please see this link.
suggestions for a macro to find duplicates in a SINGLE column
EDIT:
My Apologies :)
Countif is the 3rd fastest way!
The 2nd fastest way is to use Pivot Tables ;)
What exactly is your main purpose of finding duplicates? Do you want to delete them? Or Do you want to highlight them? Or something else?
FOLLOWUP
Seems like I made a typo in the formula. Yes for large number of rows, CountIf does take minutes as you suggested.
Let me see if I can come up with a VBA code to suit your exact needs.
Sid
You can use VBA - the following function returns a list of unique entries within a list of 100,000 in less than a second. Usage: select a range, type the formula (=getUniqueListFromRange(YourRange)) and validate with CTRL+SHIFT+ENTER.
Public Function getUniqueListFromRange(parRange As Range) As Variant
' Returns a (1 to n,1 to 1) array with all the values without duplicates
Dim i As Long
Dim j As Long
Dim locKey As Variant
Dim locData As Variant
Dim locUniqueDict As Variant
Dim locUniqueList As Variant
On Error GoTo error_handler
locData = Intersect(parRange.Parent.UsedRange, parRange)
Set locUniqueDict = CreateObject("Scripting.Dictionary")
On Error Resume Next
For i = 1 To UBound(locData, 1)
For j = 1 To UBound(locData, 2)
locKey = UCase(locData(i, j))
If locKey <> "" Then locUniqueDict.Add locKey, locData(i, j)
Next j
Next i
If locUniqueDict.Count > 0 Then
ReDim locUniqueList(1 To locUniqueDict.Count, 1 To 1) As Variant
i = 1
For Each locKey In locUniqueDict
locUniqueList(i, 1) = locUniqueDict(locKey)
i = i + 1
Next
getUniqueListFromRange = locUniqueList
End If
error_handler: 'Empty range
End Function
If using Excel 2007 or later (which is likely from the 100,000+ values) you can choose:
Home Tab | Conditional Formatting > Highlight Cell Rules > Duplicate Values...
Right-click a highlighted cell and filter by selected cell color to show just the duplicates (be aware however this can be slow with conditional formatting).
Alternatively run this code and filter for colored cells which takes only a second on 100,000 cells:
Sub HighlightDupes()
Dim i As Long, dic As Variant, v As Variant
Application.ScreenUpdating = False
Set dic = CreateObject("Scripting.Dictionary")
i = 1
For Each v In Selection.Value2
If dic.exists(v) Then dic(v) = "" Else dic.Add v, i
i = i + 1
Next v
Selection.Font.Color = 255
For Each v In dic
If dic(v) <> "" Then Selection(dic(v)).Font.Color = 0
Next v
End Sub
Addendum:
To select only duplicate values without code or formulas, i have found this method useful:
Data Tab | Advanced Filter... Filter in Place, Unique Records Only, OK.
Now select the range of unique values and press Alt+; (Goto Special... Visible cells only). With this selection clear the filter and you will see that all unselected cells are duplicates, you can then press Ctrl+9 (Hide Rows) to show just the duplicates. These rows can be copied to another sheet if needed or marked with an "X".
You do not mention what you want to do when you find them. If you merely want to see where they are...
Sub HighLightCells()
ActiveSheet.UsedRange.Cells.FormatConditions.Delete
ActiveSheet.UsedRange.Cells.FormatConditions.Add Type:=xlCellValue, Operator:=xlEqual, Formula1:=ActiveCell
ActiveSheet.UsedRange.Cells.FormatConditions(1).Interior.ColorIndex = 4
End Sub
Preventing Duplicates with Data Validation
You can use Data Validation to prevent you entering duplicate bird names. See Debra Dalgelish's site here
Handling existing duplicates
My free Duplicate Master addin will let you
Select
Colour
List
Delete
duplicates.
But more importantly it will let you run more complex matching than exact strings, ie
Case Insensitive / Case Sensitive searches (sample below)
Trim/Clean data
Remove all blank spaces (including CHAR(160)) see the " mapgie" and "magpie" example below
Run regular expression matches (for example the sample below replaces s$ with "" to remove plurals)
Match on any combination of columns (ie Column A, all columns, Column A&B etc)
I'm surprised that no one has mentioned the RemoveDuplicates method.
ActiveSheet.Range("A:A").RemoveDuplicates Columns:=1
This will simply remove any duplicate entries on the active worksheet in column A. It takes milliseconds to run (tested with 200k rows). Mind you, this will strictly delete all the duplicate entries. Although that isn't how the original question was worded, I do believe that this still serves your purpose.
One simple way of finding unique values is to use the advance filter and filter for unique values only and copy and paste them into other sheet as when the pivot is removed you will get the whole data with the duplicate in them.
Sort the range
and in next column put `=if(a2=a1;1;if(a2=a3;1;0))
"1" will be displayed for duplicates.