How to remove duplicate values from 2 columns in excel using vba - vba

I am new to Excel VBA Programming. I have one excel sheet with two columns and each column has some email adresses separated by ##. like
ColumA
aa#yahoo.com##bb#yahoo.com##cc#yahoo.com
x#.com##y#y.com
ColumnB
zz#yahoo.com##aa#yahoo.com
aa#yahoo.com
As you can see that both column has two rows, I need 3rd column that should contain all the unique values like
ColumnC
aa#yahoo.com##bb#yahoo.com##cc#yahoo.com#zz#yahoo.com
x#.com##y#y.com##aa#yahoo.com
Thanks

Something like this with variant arrays and a dictionary is an efficient process of getting your desired outcome
[updated to remove delimiter at front of string, code is flexible on delimiter length]
SO seems to have removed the ability to upload image so my picture has fallen off ....
Sub GetUniques()
Dim strDelim As String
Dim X
Dim Y
Dim objDic As Object
Dim lngRow As Long
Dim lngRow2 As Long
strDelim = "##"
Set objDic = CreateObject("scripting.dictionary")
X = Range([a1], Cells(Rows.Count, "B").End(xlUp)).Value2
For lngRow = 1 To UBound(X, 1)
X(lngRow, 1) = X(lngRow, 1) & strDelim & X(lngRow, 2)
Y = Split(X(lngRow, 1), strDelim)
X(lngRow, 1) = vbNullString
For lngRow2 = 0 To UBound(Y, 1)
If Not objDic.exists(lngRow & Y(lngRow2)) Then
X(lngRow, 1) = X(lngRow, 1) & (strDelim & Y(lngRow2))
objDic.Add (lngRow & Y(lngRow2)), 1
End If
Next lngRow2
If Len(X(lngRow, 1)) > Len(strDelim) Then X(lngRow, 1) = Right(X(lngRow, 1), Len(X(lngRow, 1)) - Len(strDelim))
Next lngRow
[c1].Resize(UBound(X, 1), 1).Value2 = X
End Sub

Here's my take. How it works:
Dump columnA and B into a variant array
Combine each row, split into an array of emails, then weed out dupes with a dictionary.
Combine unique list into a single string and store in a new array
Transpose the new array onto column C.
Sub JoinAndUnique()
Application.ScreenUpdating = False
Dim varray As Variant, newArray As Variant
Dim i As Long, lastRow As Long
Dim temp As Variant, email As Variant
Dim newString As String, seperator As String
Dim dict As Object
Set dict = CreateObject("scripting.dictionary")
seperator = "##"
lastRow = range("A" & Rows.count).End(xlUp).Row
varray = range("A1:B" & lastRow).Value
ReDim newArray(1 To UBound(varray, 1))
On Error Resume Next
For i = 1 To UBound(varray, 1)
temp = Split(varray(i, 1) & seperator & varray(i, 2), seperator)
For Each email In temp
If Not dict.exists(email) Then
dict.Add email, 1
newString = newString & (seperator & email)
End If
Next
newArray(i) = Mid$(newString, 3)
dict.RemoveAll
newString = vbNullString
Next
range("C1").Resize(UBound(newArray)).Value = Application.Transpose(newArray)
Application.ScreenUpdating = True
End Sub
Note:
It's fairly similar to brettdj's answer, but there are a few differences worth mentioning:
I used more meaninful names for variables (for readability and to make it easier to edit)
I do clean up of the "##" at the start of the sentence
I use a new array rather than overwrite the values of an existing one
I choose to clear the dictionary after each cell
I choose to use "on error resume next" and just dump entries into the dictionary instead of checking if they exist or not (personal preference, makes no major difference)

The easiest way to do this would be to use the dictionary object, split function, and join function. Of course, you don't need to use those exact ones, but give it a try and see what you get.

Related

Sorting by column using character in middle of each cell, without helper column

Is it possible to sort a range by a column, but sort using a single character in the middle of the string in each cell?
So column looks like this:
red(7)
blue(4)
orange(9)
green(2)
etc..
I want to sort it using the number within the brackets.
My current code sorts the columns alphabetically:
With sheetSUMMARY
.Range(.Cells(summaryFirstRow, summaryReForenameCol)), _
.Cells(summaryLastRow, summaryReColourCol))). _
Sort _
key1:=.Range(.Cells(summaryFirstRow, summaryReColourCol)), _
.Cells(summaryLastRow, summaryReColourCol))), _
order1:=xlAscending, _
Header:=xlNo
End With
So it looks like this:
blue(4)
green(2)
orange(9)
red(7)
Without making a helper column in excel (which extracts the numbers), is it possible to sort it like this purely programatically? (I haven't really got space for a helper column at this stage)
green(2)
blue(4)
red(7)
orange(9)
You can use a Dictionary to store your values and their corresponding numbers and then there are a number of sorting methods. I opted to use an ArrayList to do the sorting rather than writing a bespoke sorting function.
Public Sub SortByNumber()
Dim arrayList As Object, inputDictionary As Object, outputDictionary As Object 'late binding so you can drop the code in easily
Dim rng As Range, r As Range
Dim num As Double
Dim v As Variant
Set rng = ThisWorkbook.Worksheets("Sheet1").Range("A1:A4")
Set arrayList = CreateObject("System.Collections.ArrayList")
Set inputDictionary = CreateObject("Scripting.Dictionary")
Set outputDictionary = CreateObject("Scripting.Dictionary")
'put current values into dictionary and arraylist
For Each r In rng
num = CLng(Mid(r.Value, InStr(r.Value, "(") + 1, Len(r.Value) - InStr(r.Value, "(") - 1))
Do While inputDictionary.exists(num) 'avoid errors with duplicates numbers (see comments)
num = num + 0.00000001
Loop
inputDictionary.Add Item:=r.Value, Key:=num
arrayList.Add num
Next r
arrayList.Sort
'use sorted arraylist to determine order of items in output dictionary
For Each v In arrayList.toarray
outputDictionary.Add Item:=v, Key:=inputDictionary.Item(v)
Next v
'output values to the next column -- just remove the offset to overwrite original values
rng.Offset(0, 1).Value = WorksheetFunction.Transpose(outputDictionary.keys())
End Sub
The result looks like this:
You can do something interesting, if you really do not want to add a helper column. Pretty much the following:
let's say your inputRange is Range("A1:A4")
declare a variant virtualRange, which would be a bit of a tricky - it would take the values of the inputRange and the next column:
virtualRange = Union(inputRange, inputRange.Offset(0, 1)).Value
then loop through your inputRange and assign the cell value to the second dimension of the virtualRange. It should pretty much look like this in the local window:
Now the funny part - pass the virtualRange to the SortDataBySecondValue and it will return the virtualRange sorted. Here is a really important point - if you pass the virtualRange with parenthesis, like this SortDataBySecondValue (virtualRange) nothing useful would happen - the parenthesis overrule the ByRef argument in SortDataBySecondValue() and the virtualRange would remain untact.
At the end you have your virtualRange sorted and you have to pass its values correctly to the inputRange. This is achievable with a simple loop:
For Each myCell In inputRange
Dim cnt As Long
cnt = cnt + 1
myCell = virtualRange(cnt, 1)
Next myCell
Now the inputRange is sorted as expected:
The whole code is here:
Option Explicit
Public Sub TestMe()
Dim inputRange As Range
Dim myCell As Range
Dim virtualRange As Variant
Set inputRange = Range("A1:A4")
virtualRange = Union(inputRange, inputRange.Offset(0, 1)).Value
For Each myCell In inputRange.Columns(1).Cells
virtualRange(myCell.Row, 2) = locateNumber(myCell)
Next myCell
SortDataBySecondValue virtualRange
For Each myCell In inputRange
Dim cnt As Long
cnt = cnt + 1
myCell = virtualRange(cnt, 1)
Next myCell
End Sub
Public Function locateNumber(ByVal s As String) As Long
Dim startIndex As Long
Dim endIndex As Long
startIndex = InStr(1, s, "(") + 1
endIndex = InStr(1, s, ")")
locateNumber = Mid(s, startIndex, endIndex - startIndex)
End Function
Sub SortDataBySecondValue(ByRef Data As Variant)
Dim i As Long
Dim j As Long
Dim temp As Variant
Dim sortBy As Long: sortBy = 2
ReDim temp(UBound(Data) - 1, sortBy)
For i = LBound(Data) To UBound(Data)
For j = i To UBound(Data)
If Data(i, sortBy) > Data(j, sortBy) Then
temp(i, 1) = Data(i, 1)
temp(i, sortBy) = Data(i, sortBy)
Data(i, 1) = Data(j, 1)
Data(i, sortBy) = Data(j, sortBy)
Data(j, 1) = temp(i, 1)
Data(j, sortBy) = temp(i, sortBy)
End If
Next j
Next i
End Sub
Try this:
Sub OrderByColumn()
Dim i As Long, tempColumn As Long, colorColumn As Long, color As String
'get table to variable
Dim tableToOrder As Range
'here ypou have to specify your own range!!
Set tableToOrder = Range("A1:C5")
colorColumn = tableToOrder.Column
tempColumn = colorColumn + tableToOrder.Columns.Count
'insert new column at the end of the table and populate with extracted numbers
Columns(tempColumn).Insert
For i = tableToOrder.Row To (tableToOrder.Rows.Count + tableToOrder.Row - 1)
color = Cells(i, colorColumn).Value
Cells(i, tempColumn).Value = Mid(color, InStr(1, color, "(") + 1, InStr(1, color, ")") - InStr(1, color, "(") - 1)
Next
i = i - 1 'now i points to last row in range
'order whole table accordingly to temporary column
Range(Cells(tableToOrder.Row, tableToOrder.Column), Cells(i, tempColumn)).Sort Key1:=Range(Cells(tableToOrder.Row, tempColumn), Cells(i, tempColumn))
'delete column
Columns(tempColumn).Delete
End Sub

VBA Delete Duplicates code faster

Currently using this code, however, I have a huge set of data, and this runs really slow for that. I need to remove any duplicate information, and keep the highest row of information.
dim dup as variant, r as long, lncheckduplicatescolumn as long
With wb_DST.Sheets(sWs_DST)
lncheckduplicatescolumn = .Cells(.Rows.Count, "A").End(xlUp).row
for r = lncheckduplicatescolumn to 2 step -1
dup = application.match(.cells(r, "A").value, .columns(1), 0)
if dup < r then .rows(dup).delete
next r
end with
Data:
Column A Column B
A 1
B 2
C 3
A 3
Result should be:
B 2
C 3
A 3
The order of data in column A doesnt matter as long as it is unique, and retains the information that is in the higher row number. While the code I shared works, it is too slow for a large data set.
Another fast method, is to use the Dictionary object. You can check if any of the values in Column A already exists in the Dictionary. If they do (meaning it's a duplicate), then don't delete them every time, this adds a long time for code's run-time. Instead, you can use a DelRng object, which is a Range that uses Union to merge multiple rows that are duplicates.
Later on, you can delete the entire ducplicates range at once by using DelRng.Delete.
Code
Option Explicit
Sub RemoveDuplicatesUsingDict()
Dim wb_DST As Workbook
Dim sWs_DST As String
' Dictionary variables
Dim Dict As Object
Dim DictIndex As Long, ExistIndex As Long
Dim DelRng As Range
Dim LastRow As Long, i As Long
' --- parameters for my internal testing ---
Set wb_DST = ThisWorkbook
sWs_DST = "Sheet1"
Application.ScreenUpdating = False
Set Dict = CreateObject("Scripting.Dictionary")
With wb_DST.Sheets(sWs_DST)
LastRow = .Cells(.Rows.Count, "A").End(xlUp).Row ' get last row with data in column "A"
For i = LastRow To 2 Step -1
If Not Dict.exists(.Range("A" & i).Value) Then ' value doesn't exists yet in Dictionary >> add this Key
Dict.Add .Range("A" & i).Value, .Range("A" & i).Value
Else ' value already exists in Dictionary >> add it to DelRng (later will delete the entire range)
If Not DelRng Is Nothing Then
Set DelRng = Application.Union(DelRng, .Rows(i)) ' add current row to existing DelRng
Else
Set DelRng = .Rows(i)
End If
End If
Next i
End With
' delete the entire range at 1-shot
If Not DelRng Is Nothing Then DelRng.Delete
Application.ScreenUpdating = True
End Sub
Fast use of data field array
Looping through a range isn't that fast - you can speed it up considerably if you create a data field array with your search data (array = needed range in column "A" - see 1) and loop therein. If your data set grows, this gets even faster in comparison to the above shown dictionary approach, though it rests a good and reliable method.
Search Method
Any array value is checked against a concatenated search string with already found unique values and added if not yet included - see 2)
The completed string is transformed to an array and written back to a given target column (e.g. "H") - see 3) and 4)
I even added a second column with the corresponding row numbers, so you should be in the position to use them for further action. You could write results to another sheet, too.
Code - method demo
Sub RemoveDuplicates()
Dim t As Double: t = Timer ' stop watch
Dim ws As Worksheet ' source sheet (object)
Dim i As Long ' row index
Dim a, arr, arr2 ' variant
Dim s As String, si As String
Const SEP = "|" ' delimiter
s = SEP: si = SEP
' 0) fully qualified range reference to source sheet
Set ws = ThisWorkbook.Worksheets("Sheet1")
' 1) write column A data to one based 2-dim data field array
a = ws.Range("A1:A" & ws.Cells(ws.Rows.Count, "A").End(xlUp).Row)
' 2) loop through data and check for duplicates in concatenated string s
For i = 2 To UBound(a)
If InStr(s, SEP & a(i, 1) & SEP) = 0 Then
If Len(a(i, 1)) > 0 Then
s = s & a(i, 1) & SEP
si = si & i & SEP
End If
End If
Next i
' 3) transform unique values to zero based 1-dim array
arr = Split(Mid(s, 2), SEP) ' duplicates string to array
arr2 = Split(Mid(si, 2), SEP) ' found row numbers
' 4) write result to column H2:H... ' <<< change target to wanted column
ws.Range("H:H").ClearContents '
ws.Range("H2:H" & (2 + UBound(arr))).Value = Application.Transpose(arr)
ws.Range("I2:I" & (2 + UBound(arr2))).Value = Application.Transpose(arr2)
Debug.Print UBound(arr) + 0 & " unique items found", Format(Timer - t, "0.00 seconds needed")
End Sub
=================================================================
EDIT
Version 2 -- includes overwriting original data with unique values
Here you find a slightly modified version overwriting the original data in 35 columns (A2:AI..) with unique values.
Sub RemoveDuplicates2()
' Edit: overwrite original data A2:AI{..} with unique values
Dim t As Double: t = Timer ' stop watch
Dim ws As Worksheet ' source sheet (object)
Dim i As Long ' row index
Dim a, arr, arr2 ' variant
Dim s As String, si As String
Const SEP = "|" ' delimiter
Const MyLastColumn = "AI" ' letter of last column (no 35) = "AI"
s = SEP: si = SEP
' fully qualified range reference to source sheet
Set ws = ThisWorkbook.Worksheets("Sheet1")
' write column A data to one based 2-dim data field array
a = ws.Range("A1:A" & ws.Cells(ws.Rows.Count, "A").End(xlUp).Row)
' loop through data and check for duplicates in concatenated string s
For i = 2 To UBound(a) ' For i = UBound(a) To 2 Step -1
If InStr(s, SEP & a(i, 1) & SEP) = 0 Then
If Len(Trim(a(i, 1))) > 0 Then
s = s & a(i, 1) & SEP
si = si & i & SEP
End If
End If
Next i
' write unique values to zero based 1-dim array (starts with index 0; last delimiter removed in this version)
arr2 = Split(Mid(si, 2, Len(si) - 2), SEP) ' found row numbers
' overwrite original data
For i = LBound(arr2) To UBound(arr2) ' starts with index 0!
s = "A" & arr2(i) & ":" & MyLastColumn & arr2(i)
arr = ws.Range(s) ' create 1-based 1-line (2-dim) array
s = "A" & i + 2 & ":" & MyLastColumn & i + 2 ' 0 + 2 = +2 ... start in row 2
ws.Range(s) = arr ' write back unique row values
Next i
s = "A" & UBound(arr2) + 3 & ":" & MyLastColumn & UBound(a) + 1
ws.Range(s).ClearContents ' clear rest of original data
Debug.Print UBound(arr2) + 1 & " unique items found", Format(Timer - t, "0.00 seconds needed") ' result
End Sub

Excel VBA loop performs very slow if used on a large data set and then crashes

I'm not a developer but I read a bit here and there to be able to understand some of it. This might be a simple problem that I'm facing but I can't seem to figure it out. So thank you for helping me on this!
I wrote with the help of Google a short script that is supposed to turn a CSV export into a readable format. It is supposed to do a few more things but I'm already facing performance issues just with the objective of making a few entries readable.
Here's what I have so far:
Sub MagicButton_Click()
'Find the last non-empty cell in column A
Dim LastRow As Long
LastRow = Range("A" & Rows.Count).End(xlUp).Row
'Set Variables to work with the cell content
Dim CellContent As String
Dim CellContentArr As Variant
'Set looping variables
Dim i, j As Integer
Dim FirstRow As Integer
Dim FirstCol As Integer
Dim ActiveCol As Integer
Dim itm As Variant
FirstRow = 1
FirstCol = 2
Dim x, y As String
'Loop (1) through all rows
For i = FirstRow To LastRow
'Save cell content to string
CellContent = ActiveSheet.Cells(i, 1).Text
'Split string into array
CellContentArr = Split(CellContent, "{")
'Reset column
ActiveCol = FirstCol
'Loop (2) through the array
For Each itm In CellContentArr
'Remove quotations and other symbols
itm = Application.WorksheetFunction.Clean(itm)
itm = Replace(itm, """", "")
'This is the part that creates performance issues
'For j = 1 To Len(itm)
' x = Mid(itm, j, 1)
' If x Like "[A-Z,a-z,0-9 :.-]" Then
' y = y & x
' End If
'Next j
'itm = y
'y = ""
'Write each item in array to an individual cells within the same row
ActiveSheet.Cells(i, ActiveCol) = itm
ActiveCol = ActiveCol + 1
Next itm
Next i
End Sub
This entire script works fine when I test on ~10 rows. When using it on the entire set of 220 rows, it becomes unresponsive and eventually crashes.
In the script I have commented what causes this performance issue. I'm guessing it is because there are three loops. The third loop iterates through every char in the string to check if it is an allowed char or not and then keeps or deletes it.
What can I do to improve performance, or at least, make it so that Excel doesn't turn unresponsive?
Sidenote: It is supposed to work both on Mac & Windows. I don't know if RegEx would have a better performance to filter out the unwanted char, but I also don't know if it is possible to use that for both Mac & Windows.
The answers that have been given would be good adjustments to your code. However, there might be a better approach to this.
Firstly, reading a range into an array and manipulating the resultant array is markedly faster than reading cell by cell.
Secondly, if you are iterating each character in your array and checking for specific items with a curly bracket signalling a new column, then couldn't you just do it all in one iteration. It seems a little redundant to split and clean first.
All in all, your code could be as simple as this:
Dim lastCell As Range
Dim v As Variant
Dim r As Long
Dim c As Long
Dim i As Integer
Dim output() As String
Dim b() As Byte
'Read the values into an array
With ThisWorkbook.Worksheets("Sheet1")
Set lastCell = .Cells(.Rows.Count, "A").End(xlUp)
v = .Range(.Cells(1, "A"), lastCell).Value2
End With
ReDim output(1 To UBound(v, 1), 1 To 1)
'Loop through the array rows and characters
For r = 1 To UBound(v, 1)
c = 1
'Convert item to byte array - just personal preference, you could iterate a string
b = StrConv(v(r, 1), vbFromUnicode)
For i = 0 To UBound(b)
Select Case b(i)
Case 45, 46, 58, 65 To 90, 97 To 122, 48 To 57 '-, :, ., A-Z, a-z, 0-9
output(r, c) = output(r, c) & Chr(b(i))
Case 123 '{
'add a column and expand output array if necessary
If Len(output(r, c)) > 0 Then
c = c + 1
If c > UBound(output, 2) Then
ReDim Preserve output(1 To UBound(v, 1), 1 To c)
End If
End If
Case Else
'skip it
End Select
Next
Next
'Write item to worksheet
ThisWorkbook.Worksheets("Sheet1").Cells(1, "B") _
.Resize(UBound(output, 1), UBound(output, 2)).Value = output
Three things - you need to disable screenupdating and you need to declare variables in a better way. Do not do it like "Dim a,b,c,d,e as Integer", because only the last one is integer, the others are variant. Last but not least, do not use Integer in VBA, but this is not your problem here.
This should work faster:
Sub MagicButton_Click()
'Find the last non-empty cell in column A
Dim LastRow As Long
LastRow = Range("A" & Rows.Count).End(xlUp).Row
'Set Variables to work with the cell content
Dim CellContent As String
Dim CellContentArr As Variant
'Set looping variables
Dim i As Long
dim j as Long
Dim FirstRow As Long
Dim FirstCol As Long
Dim ActiveCol As Long
Dim itm As Variant
FirstRow = 1
FirstCol = 2
Dim x as string
dim y As String
call onstart
'Loop (1) through all rows
For i = FirstRow To LastRow
'Save cell content to string
CellContent = ActiveSheet.Cells(i, 1).Text
'Split string into array
CellContentArr = Split(CellContent, "{")
'Reset column
ActiveCol = FirstCol
'Loop (2) through the array
For Each itm In CellContentArr
'Remove quotations and other symbols
itm = Application.WorksheetFunction.Clean(itm)
itm = Replace(itm, """", "")
'This is the part that creates performance issues
'For j = 1 To Len(itm)
' x = Mid(itm, j, 1)
' If x Like "[A-Z,a-z,0-9 :.-]" Then
' y = y & x
' End If
'Next j
'itm = y
'y = ""
'Write each item in array to an individual cells within the same row
ActiveSheet.Cells(i, ActiveCol) = itm
ActiveCol = ActiveCol + 1
Next itm
Next i
call onend
End Sub
Public Sub OnStart()
Application.AskToUpdateLinks = False
Application.ScreenUpdating = False
Application.Calculation = xlAutomatic
Application.EnableEvents = False
Application.DisplayAlerts = False
End Sub
Public Sub OnEnd()
Application.DisplayAlerts = True
Application.ScreenUpdating = True
Application.EnableEvents = True
Application.StatusBar = False
Application.AskToUpdateLinks = True
End Sub
Task List
Copy source range into an array
Clean array
Copy array back to source range
Split data into multiple columns using TextToColumns
Sub MagicButton_Click2()
Dim arData
Dim LastRow As Long, i As Integer
Dim dataRange As Range
LastRow = Range("A" & rowS.Count).End(xlUp).Row
Set dataRange = Range(Cells(1, 1), Cells(LastRow, 1))
arData = dataRange.value
For i = 1 To UBound(arData)
arData(i, 1) = AlphaNumericOnly(CStr(arData(i, 1)))
Next
dataRange.value = arData
dataRange.TextToColumns Destination:=Range("A1"), DataType:=xlDelimited, _
TextQualifier:=xlDoubleQuote, ConsecutiveDelimiter:=False, Tab:=True, _
Semicolon:=False, Comma:=False, Space:=False, Other:=True, OtherChar _
:="{", TrailingMinusNumbers:=True
End Sub
' http://stackoverflow.com/questions/15723672/how-to-remove-all-non-alphanumeric-characters-from-a-string-except-period-and-sp
Function AlphaNumericOnly(strSource As String) As String
Dim i As Integer
Dim strResult As String
For i = 1 To Len(strSource)
Select Case Asc(Mid(strSource, i, 1))
Case 48 To 57, 65 To 90, 97 To 123: 'include 32 if you want to include space I added 123 to include the {
strResult = strResult & Mid(strSource, i, 1)
End Select
Next
AlphaNumericOnly = strResult
End Function

Combine Rows with duplicate values, merge cells if different

I have similar question to
[combine Rows with Duplicate Values][1]
Excel VBA - Combine rows with duplicate values in one cell and merge values in other cell
I have data in this format (rows are sorted)
Pub ID CH Ref
no 15 1 t2
no 15 1 t88
yes 15 2 t3
yes 15 2 t3
yes 15 2 t6
compare adjacent rows (say row 4 and 5) , if col 2 and 3 match then if col 4 different merge col4, delete row. if col 2,3,4 match then delete row, don't merge col 4
Desired Output
key ID CH Text
no 15 1 t2 t88
yes 15 2 t3 t6
This first code section doesn't work right
Sub mergeCategoryValues()
Dim lngRow As Long
With ActiveSheet
Dim columnToMatch1 As Integer: columnToMatch1 = 2
Dim columnToMatch2 As Integer: columnToMatch2 = 3
Dim columnToConcatenate As Integer: columnToConcatenate = 4
lngRow = .Cells(65536, columnToMatch1).End(xlUp).row
.Cells(columnToMatch1).CurrentRegion.Sort key1:=.Cells(columnToMatch1), Header:=xlYes
.Cells(columnToMatch2).CurrentRegion.Sort key1:=.Cells(columnToMatch2), Header:=xlYes
Do
If .Cells(lngRow, columnToMatch1) = .Cells(lngRow - 1, columnToMatch1) Then 'check col 2 row lngRow, lngRow-1
If .Cells(lngRow, columnToMatch2) = .Cells(lngRow - 1, columnToMatch2) Then 'check col 3 row lngRow, lngRow-1
If .Cells(lngRow - 1, columnToConcatenate) = .Cells(lngRow, columnToConcatenate) Then
Else
.Cells(lngRow - 1, columnToConcatenate) = .Cells(lngRow - 1, columnToConcatenate) & "; " & .Cells(lngRow, columnToConcatenate)
End If
.Rows(lngRow).Delete
End If
End If
lngRow = lngRow - 1
Loop Until lngRow = 1
End With
Actual Output incorrect because when cells merge t3 will not match t3;t6, my comparison on col 4 will only work in very simple case only.
Actual Output
key ID CH Text
no 15 1 t2; t88
yes 15 2 t3; t3; t6
Therefore, I had to add these two sections to split the Concatenate cells and then remove duplicates
'split cell in Col d to col e+ delimited by ;
With Range("D2:D6", Range("D" & Rows.Count).End(xlUp))
.Replace ";", " ", xlPart
.TextToColumns other:=True
End With
'remove duplicates in each row
Dim x, y(), i&, j&, k&, s$
With ActiveSheet.UsedRange
x = .Value: ReDim y(1 To UBound(x, 1), 1 To UBound(x, 2))
For i = 1 To UBound(x)
For j = 1 To UBound(x, 2)
If Len(x(i, j)) Then
If InStr(s & "|", "|" & x(i, j) & "|") = 0 Then _
s = s & "|" & x(i, j): k = k + 1: y(i, k) = x(i, j)
End If
Next j: s = vbNullString: k = 0
Next i
.Value = y()
End With
End Sub
With additional code output is
Pub ID CH Ref
no 15 1 t2 t88
yes 15 2 t3 t6
Question: There must be much easier way to do this right than use three different methods? How about inserting new columns 5+ if col 4 items don't match?
Note: Remove duplicates code was found from user nilem at excelforum.
Edit: Col 1 will always be same if Col 2 and 3 match. If solution is much easier we can assume Col 1 is blank and ignore data.
I have printed book lookup table and need to convert to a simple format that will be used in equipment that use a 1960's language which has very limited commands. I am trying to preformat this data so I only need to search for one row that has all info.
Col D final output can be in col D with delimiter or into col D-K (only 8 max Ref) because I will parse to use on other machine. Whatever method is easier.
The canonical practise for deleting rows is to start at the bottom and work toward the top. In this manner, rows are not skipped. The trick here is to find rows above the current position that match columns B and C and concatenate the strings from column D before removing the row. There are several good worksheet formulas that can acquire the row number of a two-column-match. Putting one of them into practise with application.Evaluate would seem to be the most expedient method of collecting the values from column D.
Sub dedupe_and_collect()
Dim rw As Long, mr As Long, wsn As String
With ActiveSheet '<- set this worksheet reference properly!
wsn = .Name
With .Cells(1, 1).CurrentRegion
.RemoveDuplicates Columns:=Array(2, 3, 4), Header:=xlYes
End With
With .Cells(1, 1).CurrentRegion 'redefinition after duplicate removal
For rw = .Rows.Count To 2 Step -1 'walk backwards when deleting rows
If Application.CountIfs(.Columns(2), .Cells(rw, 2).Value, .Columns(3), .Cells(rw, 3).Value) > 1 Then
mr = Application.Evaluate("MIN(INDEX(ROW(1:" & rw & ")+(('" & wsn & "'!B1:B" & rw & "<>'" & wsn & "'!B" & rw & ")+('" & wsn & "'!C1:C" & rw & "<>'" & wsn & "'!C" & rw & "))*1E+99, , ))")
'concatenate column D
'.Cells(mr, 4) = .Cells(mr, 4).Value & "; " & .Cells(rw, 4).Value
'next free column from column D
.Cells(mr, Columns.Count).End(xlToLeft).Offset(0, 1) = .Cells(rw, 4).Value
.Rows(rw).EntireRow.Delete
End If
Next rw
End With
End With
End Sub
The removal of records on a three-column-match is done with the VBA equivalent of the Date ► Data Tools ► Remove Duplicates command. This only considers columns B, C and D and deletes the lower duplicates (keeping the ones closest to row 1). If Column A is important in this respect, additional coding would have to be added.
It's unclear to me whether you wanted column D as delimited string or separate cells as an end result. Could you clarify?
As I wrote above, I would iterate through the data and collect things into the User Defined Object. There is no need for the data to be sorted in this method; and duplicate REF's will be omitted.
One advantage of a User Defined Object is that it makes debugging easier as you can see more clearly what you have done.
We combine every line where ID and CH are the same, by using the property of the Collection object to raise an error if identical keys are used.
So far as combining the Ref's in a single cell with a delimiter, vs individual cells in columns D:K, either can be done simply. I chose to separate into columns, but changing it to combine into a single column would be trivial.
After Inserting the Class Module, you must rename it: cID_CH
You will note I placed the results on a separate worksheets. You could overwrite the original data, but I would advise against that.
Class Module
Option Explicit
Private pID As Long
Private pCH As Long
Private pPUB As String
Private pREF As String
Private pcolREF As Collection
Public Property Get ID() As Long
ID = pID
End Property
Public Property Let ID(Value As Long)
pID = Value
End Property
Public Property Get CH() As Long
CH = pCH
End Property
Public Property Let CH(Value As Long)
pCH = Value
End Property
Public Property Get PUB() As String
PUB = pPUB
End Property
Public Property Let PUB(Value As String)
pPUB = Value
End Property
Public Property Get REF() As String
REF = pREF
End Property
Public Property Let REF(Value As String)
pREF = Value
End Property
Public Property Get colREF() As Collection
Set colREF = pcolREF
End Property
Public Sub ADD(refVAL As String)
On Error Resume Next
pcolREF.ADD refVAL, refVAL
On Error GoTo 0
End Sub
Private Sub Class_Initialize()
Set pcolREF = New Collection
End Sub
Regular Module
Option Explicit
Sub CombineDUPS()
Dim wsSRC As Worksheet, wsRES As Worksheet
Dim vSRC As Variant, vRES() As Variant, rRES As Range
Dim cI As cID_CH, colI As Collection
Dim I As Long, J As Long
Dim S As String
'Set source and results worksheets and results range
Set wsSRC = Worksheets("sheet1")
Set wsRES = Worksheets("sheet2")
Set rRES = wsRES.Cells(1, 1)
'Get Source data
With wsSRC
vSRC = .Range("A2", .Cells(.Rows.Count, "D").End(xlUp))
End With
'Collect and combine data
Set colI = New Collection
On Error Resume Next
For I = 1 To UBound(vSRC, 1)
Set cI = New cID_CH
With cI
.PUB = vSRC(I, 1)
.ID = vSRC(I, 2)
.CH = vSRC(I, 3)
.REF = vSRC(I, 4)
.ADD .REF
S = CStr(.ID & "|" & .CH)
colI.ADD cI, S
If Err.Number = 457 Then
Err.Clear
colI(S).ADD .REF
ElseIf Err.Number <> 0 Then
Debug.Print Err.Number, Err.Description
Stop
End If
End With
Next I
On Error GoTo 0
'Create and populate Results Array
ReDim vRES(0 To colI.Count, 1 To 11)
'Header row
vRES(0, 1) = "Pub"
vRES(0, 2) = "ID"
vRES(0, 3) = "CH"
vRES(0, 4) = "Ref"
'populate array
For I = 1 To colI.Count
With colI(I)
vRES(I, 1) = .PUB
vRES(I, 2) = .ID
vRES(I, 3) = .CH
For J = 1 To .colREF.Count
vRES(I, J + 3) = .colREF(J)
Next J
End With
Next I
'Write the results to the worksheet
Set rRES = rRES.Resize(UBound(vRES, 1) + 1, UBound(vRES, 2))
With rRES
.EntireColumn.Clear
.Value = vRES
With .Rows(1)
.Font.Bold = True
.HorizontalAlignment = xlCenter
Range(.Cells(4), .Cells(11)).HorizontalAlignment = xlCenterAcrossSelection
End With
.EntireColumn.AutoFit
End With
End Sub
Original
Processed Results
variant using dictionary below
Sub test()
Dim Dic As Object: Set Dic = CreateObject("Scripting.Dictionary")
Dic.Comparemode = vbTextCompare
Dim Cl As Range, x$, y$, i&, Key As Variant
For Each Cl In Range("B2:B" & Cells(Rows.Count, "B").End(xlUp).Row)
x = Cl.Value & "|" & Cl.Offset(, 1).Value
y = Cl.Offset(, 2).Value
If Not Dic.exists(x) Then
Dic.Add x, Cl.Offset(, -1).Value & "|" & y & "|"
ElseIf Dic.exists(x) And Not LCase(Dic(x)) Like "*|" & LCase(y) & "|*" Then
Dic(x) = Dic(x) & "|" & y & "|"
End If
Next Cl
Range("A2:D" & Cells(Rows.Count, "B").End(xlUp).Row).ClearContents
i = 2
For Each Key In Dic
Cells(i, "A") = Split(Dic(Key), "|")(0)
Range(Cells(i, "B"), Cells(i, "C")) = Split(Key, "|")
Cells(i, "D") = Replace(Split(Replace(Dic(Key), "||", ";"), "|")(1), ":", ";")
i = i + 1
Next Key
Set Dic = Nothing
End Sub
before
after

VBA script to count string, insert rows, copy row, split cell

The department that provides me a spreadsheet to be used in my database now includes multiple text in a cell. In order to link to that data I have to turn it into multiple rows. Example: LC123/LC463/LC9846 needs to have the entire row copied with just one "LC" string in each row-
cell1 cell2 LC123
cell1 cell2 LC463
cell1 cell2 LC9846
I tried these two subroutines but obviously it failed
Sub InSert_Row()
Dim j As Long
j = InputBox(=SUM(LEN(ActiveCell)-LEN(SUBSTITUTE(ActiveCell,"LC",""))-1)
ActiveCell.EntireRow.Copy
ActiveCell.Offset(j).EntireRow.Insert Shift:=xlDown
End Sub
Sub SplitAndTranspose()
Dim N() As String
N = Split(ActiveCell, Chr(10))
ActiveCell.Resize(UBound(N) + 1) = WorksheetFunction.Transpose(N)
End Sub
The 2nd subroutine will split and copy but it doesn't insert rows, it writes over the rows below it.
'In memory' method
Inserting rows as necessary would be perhaps the most simple to understand, but the performance of making thousands of seperate row inserts would not be good. This would be fine for a one off (perhaps you only need a one-off) and should only take a minute or two to run but I thought what the heck and so wrote an approach that splits the data in memory using a collection and arrays. It will run in the order of seconds.
I have commented what it is doing.
Sub ProcessData()
Dim c As Collection
Dim arr, recordVector
Dim i As Long, j As Long
Dim rng As Range
Dim part, parts
'replace with your code to assign the right range etc
Set rng = ActiveSheet.UsedRange
j = 3 'replace with right column index, or work it out using Range.Find etc
arr = rng.Value 'load the data
'Process the data adding additional rows etc
Set c = New Collection
For i = 1 To UBound(arr, 1)
parts = Split(arr(i, j), "/") 'split the data based on "/"
For Each part In parts 'loop through each "LC" thing
recordVector = getVector(arr, i) 'get the row data
recordVector(j) = part 'replace the "LC" thing
c.Add recordVector 'add it to our results collection
Next part
Next i
'Prepare to dump the data back to the worksheet
rng.Clear
With rng.Parent
.Range( _
rng.Cells(1, 1), _
rng.Cells(1, 1).Offset(c.Count - 1, UBound(arr, 2) - 1)) _
.Value = getCollectionOfVectorsToArray(c)
End With
End Sub
'Helper method to return a vector representing our row data
Private Function getVector(dataArray, dataRecordIndex As Long)
Dim j As Long, tmpArr
ReDim tmpArr(LBound(dataArray, 2) To UBound(dataArray, 2))
For j = LBound(tmpArr) To UBound(tmpArr)
tmpArr(j) = dataArray(dataRecordIndex, j)
Next j
getVector = tmpArr
End Function
'Helper method to return an array from a collection of vectors
Function getCollectionOfVectorsToArray(c As Collection)
Dim i As Long, j As Long, arr
ReDim arr(1 To c.Count, LBound(c(1), 1) To UBound(c(1), 1))
For i = 1 To c.Count
For j = LBound(arr, 2) To UBound(arr, 2)
arr(i, j) = c(i)(j)
Next j
Next i
getCollectionOfVectorsToArray = arr
End Function
Edit:
Alternative "Range Insert" method.
It will be slower (although I made the number of discrete insert and copy operations be based on original row count, not some recursive sweep so it is not too bad) but is simpler to understand and so to perhaps tweak if needed. It should run in the order of a couple of minutes.
Sub ProcessData_RangeMethod()
Dim rng As Range
Dim colIndex As Long
Dim parts
Dim currRowIndex As Long
'replace with your code to assign the right range etc
Set rng = ActiveSheet.UsedRange
colIndex = 3 'replace with right column index, or work it out using Range.Find etc
Application.ScreenUpdating = False
Application.Calculation = xlCalculationManual
currRowIndex = 1
Do Until currRowIndex > rng.Rows.Count
parts = Split(rng.Cells(currRowIndex, colIndex), "/")
If UBound(parts) > 0 Then
rng.Range(rng.Cells(currRowIndex + 1, 1), rng.Cells(currRowIndex + UBound(parts), rng.Columns.Count)).Insert xlShiftDown
rng.Rows(currRowIndex).Copy rng.Range(rng.Cells(currRowIndex + 1, 1), rng.Cells(currRowIndex + UBound(parts), rng.Columns.Count))
rng.Range(rng.Cells(currRowIndex, colIndex), rng.Cells(currRowIndex + UBound(parts), colIndex)).Value = Application.Transpose(parts)
End If
currRowIndex = currRowIndex + 1 + UBound(parts)
Loop
Application.ScreenUpdating = True
Application.Calculation = xlCalculationAutomatic
End Sub