I'm collecting data from a spreadsheet and storing it in a 2-D Array, the idea is that once the script detects it's reading from a specific column, it would not read an entire row of the data (as this would be considered a duplicate).
CODE:
Private Sub LoadData()
cDOC_DEBUG "Loading document data..."
Dim x As Long 'Column Data - there is another function that reads when x = 0 = header; else every other value is considered "data"
Dim y As Long
With dataWS
For x = 1 To LR - 1
For y = 1 To LC - 1
If (IsInArray(.Cells(x + 1, y + 1).value, pData())) Then
cDOC_DEBUG "Added: " & .Cells(x + 1, y + 1).value
pData(x, y) = Trim(.Cells(x + 1, y + 1).value)
End If
Next y
Next x
End With
End Sub
Private Function IsInArray(stringToBeFound As String, arrString As Variant) As Boolean
IsInArray = (UBound(Filter(arrString, stringToBeFound)) > -1)
End Function
Private Sub cDOC_DEBUG(debugText As String)
If (ThisWorkbook.Worksheets("Settings").Cells(3, 2)) Then
Debug.Print debugText
End If
End Sub
Everything is loading into the array fine, until I start implementing my IsInArray function. I can see it has to do with the fact that it's searching through a single dimensional array, and my array is two dimensional; so it makes sense that it's getting a type mismatch error.
Each row within the spreadsheet is a segment of information that correlates to it's self.
Initial Data From Spreadsheet:
A B C D
1 header1 header2 header3 header4
2 a b c d
3 w x y z
4 a h j j
5 a b j d
6 w x u z
2x2 Final Array:
0 1 2 3
0 header1 header2 header3 header4
1 a b c d
2 w x y z
3 a h j j
Because Header1 & Header2 & Header4 from Excel rows 5 & 6 have the same values as Excel rows 2 and 3, this will not be read into the array.
Question:
How would I match the criteria above to not include the duplicates from a row.
Example Sudo Code:
If (Value being added matches all values from column Header1 & Header2 & Header3_ Then
Don't add to array
Another issue that I am aware of, is that there will be blank data within this array; is there something I can do to either 1 remove these or will I have to have another index for the array slots to keep track of?
You can loop rows/columns and use Index to slice a row/column out of the array and use Match to test if search value is in that column. Combine with Count to test for duplicates. If the count equals the number of columns ignore value (or column count -1... see next comment ==>). Not entirely sure about this imaginary column. Do you intend to dimension at start with an additional empty column?
Row Versions:
Exists:
Option Explicit
Public Sub CheckRow()
Dim arr(), i As Long
arr = [A1:D6].Value '<==2D array created
For i = LBound(arr, 1) To UBound(arr, 1) '<== loop rows
'look in each row for x and if found exit loop and indicate row where found
If Not IsError(Application.Match("x", Application.WorksheetFunction.Index(arr, i, 0), 0)) Then
Debug.Print "value found in column " & i
Exit For
End If
Next
End Sub
Duplicates:
Option Explicit
Public Sub CheckRow()
Dim arr(), i As Long
arr = [A1:D6].Value '<==2D array created
For i = LBound(arr, 1) To UBound(arr, 1) '<== loop rows
'look in each row for more than one "B" and if found exit loop and indicate row where found
If Application.Count(Application.Match(Application.WorksheetFunction.Index(arr, i, 0), "B", 0)) > 1 Then
Debug.Print i
Exit For
End If
Next
End Sub
exists:
Columns versions:
Exists:
Option Explicit
Public Sub CheckColumn()
Dim arr(), i As Long
arr = [A1:D6].Value '<==2D array created
For i = LBound(arr, 2) To UBound(arr, 2) '<== loop columns
'look in each column for x and if found exit loop and indicate column where found
If Not IsError(Application.Match("x", Application.WorksheetFunction.Transpose(Application.WorksheetFunction.Index(arr, 0, i)), 0)) Then
Debug.Print "value found in column " & i
Exit For
End If
Next
End Sub
Duplicates:
You can use Count to check for duplicates within an entire column, again sliced with Index:
Option Explicit
Public Sub CheckColumn()
Dim arr(), i As Long
arr = [A1:D6].Value '<==2D array created
For i = LBound(arr, 2) To UBound(arr, 2) '<== loop columns
'look in each column for more than one "B" and if found exit loop and indicate column where found
If Application.Count(Application.Match(Application.WorksheetFunction.Transpose(Application.WorksheetFunction.Index(arr, 0, i)), "B", 0)) > 1 Then
Debug.Print i
Exit For
End If
Next
End Sub
Using sample data in sheet:
Alternative using advanced Index function
This approach using a (late bound) dictionary should be helpful, if your data rows don't exceed the number of 65536. You'll get a 2-dim (1-based) array v with the unique data set of columns A,B and D.
In this example code results are written back to e.g. columns F:H and values of column C are omitted; if you want to maintain these values see ► Edit below.
Example code (omitting column C in resulting array)
Sub getUniqueRows()
Dim dict As Object, v, i&, ii&, n&, currRow$
Set dict = CreateObject("Scripting.Dictionary") ' late binding dictionary
With ThisWorkbook.Worksheets("MySheet") ' << change to your sheet name
n = .Cells(.Rows.Count, "A").End(xlUp).Row - 1 ' n items (omitting header line)
' [1] get data
v = .Range("A2:D" & n + 1).Value
' [2a]remove column C (i.e. allow columns 1, 2 and 4 only)
v = Application.Index(v, Evaluate("row(1:" & n & ")"), Array(1, 2, 4))
' [2b] check for unique ones
For i = 1 To n
currRow = Join(Application.Index(v, i, 0), ",") ' build string of cells A,B & D
If Not dict.Exists(currRow) Then dict.Add currRow, i
Next i
' [3] remove duplicate rows
v = Application.Transpose(Application.Index(v, dict.Items, Evaluate("row(1:" & 3 & ")")))
' [4] write data to any wanted range
.Range("F:H") = "" ' clear rows
.Range("F2").Resize(UBound(v), 3) = v ' write data
End With
Set dict = Nothing
End Sub
Note
The dict.Items collection in section [3] is an array of all found item numbers in the dictionary and allows the Index function to get only these items.
Additional links
See Insert new first column in datafield array without loops or API call
Edit - maintain values in column C
Due to comment: "ONLY using columns A, B, and D; Column C was not including in the criteria."
If you want to check values only in A,B and D, but maintain the C values in the resulting array you can use the following optimized code neglecting an empty values row.
Sub getUniqueRows2()
Dim dict As Object, v, i&, n&, j&, currRow$
Set dict = CreateObject("Scripting.Dictionary") ' late binding dictionary
With ThisWorkbook.Worksheets("MySheet") ' << change to your sheet name
n = .Cells(.Rows.Count, "A").End(xlUp).Row - 1 ' items counter (omitting header line)
' [1] get data
v = .Range("A2:D" & n + 1).Value
' [2] check for unique ones
For i = 1 To UBound(v)
' assign ONLY criteria of 1st, 2nd & 4th column to string value currRow
currRow = ""
For j = 0 To 2: currRow = currRow & v(i, Array(1, 2, 4)(j)) & ",": Next j
' add first unique occurrence to dictionary
If Not dict.Exists(currRow) Then ' add first occurrence
If Len(currRow) > 3 Then dict.Add currRow, i ' ... and ignore empty values
End If
Next i
' [3] remove duplicate rows
v = Application.Transpose(Application.Index(v, dict.Items, Evaluate("row(1:" & 4 & ")")))
' [4] write resulting array values anywhere, e.g. to columns F:I
.Range("F:I") = "" ' clear rows
.Range("F2").Resize(UBound(v), 4) = v ' write data
End With
Set dict = Nothing
End Sub
Related
How can I transform two columns like that
A B
foo foo1
foo foo2
foo foo3
faa faa1
faa faa2
fee fee1
fee fee2
In the following structure
foo
foo1,foo2,foo3
faa
faa1,faa2
fee
fee1,fee2
What I have tried to do so far is to create my loop.
Sub DowithIf()
rw = 11
cl = 2
erw = 1000
Do While rw < erw
rw = rw + 1
Loop
End Sub
But first thing I do not how to do is to detect that first value of column A (foo) is the same that second value of column A (foo). Go through A column if value is new save it as mention in the code above and save value in column B, is A2 equal that A1 if so save foo2, is A3 equal that A4, yes, save foo3, is A4 equal that A3 no, then new value...
EDIT:
You firstly must learn that a question must contain a piece of code, even if it does not work as you need. Then, it is also good to frequently check your question and answer clarification questions. Now, hopping that you understood my point, I make an exception and try showing a piece of code able to do what (I understood) you need. It should be very fast, using arrays and a Dictionary:
Sub processTwoColumns()
Dim sh As Worksheet, lastR As Long, arr, arrFin2C, arrfinUnique
Dim i As Long, k As Long, dict As Object
Set sh = ActiveSheet 'use here the sheet you need
lastR = sh.Range("A" & sh.rows.count).End(xlUp).row 'last row on A:A column
arr = sh.Range("A1:B" & lastR).value 'place the range to be processed in an array, for faster iteration
ReDim arrFin2C(1 To UBound(arr), 1 To 1) 'reDim the array to keep first way of returning
Set dict = CreateObject("Scripting.Dictionary") 'Set the Scripting Dictionary object
For i = 1 To UBound(arr) 'iterate between the array rows
arrFin2C(i, 1) = arr(i, 1) & " " & arr(i, 2) ' building the array to be return, by concatenation of the two columns
dict(arr(i, 1)) = dict(arr(i, 1)) & ", " & arr(i, 2)'place in a dictionary unique keys and items separated by ", "
Next i
sh.Range("D1:D" & lastR).value = arrFin2C 'drop the array content
ReDim arrfinUnique(1 To dict.count * 2, 1 To 1): k = 1 'reDim the array necessary to keep the processed result for the second way
For i = 1 To dict.count - 1 'iterate between the dictionary keys/items
arrfinUnique(k, 1) = dict.Keys()(i): k = k + 1 'place in the array the dictionary key
arrfinUnique(k, 1) = Mid(dict.items()(i), 3): k = k + 1 'place in the array the dictionary item (without leading ", ")
Next i
sh.Range("F2").Resize(UBound(arrfinUnique), 1).value = arrfinUnique 'drop the array content
End Sub
The above code will return the first required version in column "D:D", starting from the first row and the second one in "F:F", starting from the second row. The code can be easily adapted to return in any other place/sheet...
I did not know how to explain the question so I will attach images for explaining my situation. Here is the view of my Excel Sheet:
My Excel Sheet
The highlighted cells contain multiple values called ID's and are associated with respective Versions in the columns beside them. I use the following macro (details with great explanation here) to split these values into multiple rows in the same sheet.
Option Explicit
Private Const ID_IDX As Long = 0
Private Const VER_IDX As Long = 1
Private Const RNG_IDX As Long = 2
Private Sub RunMe()
Dim data As Variant, cols As Variant, items As Variant
Dim r As Long, c As Long, i As Long, n As Long
Dim ids() As String, vers() As String
Dim addItems As Collection, concatItems As Collection
Dim dataRng As Range, rng As Range
Dim writeID() As Variant, writeVer() As Variant, writeConcat() As Variant
Dim dataStartRow As Long
On Error Resume Next
'Define the range we're interested in and read into an array.
With Sheet1 'adjust for your worksheet object
Set dataRng = Application.InputBox(prompt:="Select the Range of cells:", Type:=8)
End With
data = dataRng.Value2
dataStartRow = 2
'Find the two target columns
cols = AcquireIdAndVerCol(data, 3, 8)
If IsEmpty(cols) Then
MsgBox "Unable to find Id and Ver columns."
Exit Sub
End If
With dataRng
'Add a column next to the version number column.
.Columns(cols(VER_IDX)).Offset(, 1).Insert Shift:=xlShiftToRight, CopyOrigin:=xlFormatFromLeftOrAbove
'Add a column to our range.
'This is to cover the case that the rightmost column is the version number column.
Set dataRng = .Resize(, .Columns.Count + 1)
End With
'Find the rows that need to be split and concatenate the target strings.
Set addItems = New Collection
Set concatItems = New Collection
For r = dataStartRow To UBound(data, 1)
ids = Split(data(r, cols(ID_IDX)), vbLf)
vers = Split(data(r, cols(VER_IDX)), vbLf)
n = IIf(UBound(ids) >= UBound(vers), UBound(ids), UBound(vers))
If n = 0 Then 'it's just one line of text.
'Add concatenated text to list.
concatItems.Add data(r, cols(ID_IDX)) & " " & data(r, cols(VER_IDX))
ElseIf n > 0 Then 'it's multiple lines of text.
'Transpose the id array.
ReDim writeID(1 To UBound(ids) + 1, 1 To 1)
For i = 0 To UBound(ids)
writeID(i + 1, 1) = ids(i)
Next
'Transpose the version array.
ReDim writeVer(1 To UBound(vers) + 1, 1 To 1)
For i = 0 To UBound(ids)
writeVer(i + 1, 1) = vers(i)
Next
'Add concatenated text to list.
For i = 0 To n
concatItems.Add (IIf(UBound(ids) <= n And UBound(vers) <= n, ids(i) & " " & vers(i), Empty))
Next
'Add the range to be split to the collection.
addItems.Add Array(writeID, writeVer, dataRng.Rows(r + 1).Resize(n))
Else 'it's an empty cell
'Add empty item to concatenated list in order to keep alignment.
concatItems.Add Empty
End If
Next
Application.ScreenUpdating = False
'Split the ranges in the list.
If addItems.Count > 0 Then
For Each items In addItems
'Add the rows.
With items(RNG_IDX)
.Insert Shift:=xlDown, CopyOrigin:=xlFormatFromLeftOrAbove
Set rng = .Offset(-.Rows.Count - 1).Resize(.Rows.Count + 1)
'Note: format your rng Range obect as desired here.
End With
'Write the id and version values.
rng.Columns(cols(ID_IDX)).Value = items(ID_IDX)
rng.Columns(cols(VER_IDX)).Value = items(VER_IDX)
Next
End If
'Write the concatenated values.
If concatItems.Count > 0 Then
ReDim writeConcat(1 To concatItems.Count + dataStartRow - 1, 1 To 1)
'Header to array.
writeConcat(1, 1) = "Concat values"
'Values from the collection to array.
i = dataStartRow
For Each items In concatItems
writeConcat(i, 1) = items
i = i + 1
Next
'Output array to range.
With dataRng.Columns(cols(VER_IDX) + 1)
.Value = writeConcat
.AutoFit
End With
End If
Application.ScreenUpdating = True
End Sub
Private Function AcquireIdAndVerCol(data As Variant, minCol As Long, maxCol As Long) As Variant
Dim result(1) As Long
Dim r As Long, c As Long, i As Long
Dim items() As String
'Check we're not operating outside bounds of data array.
If minCol < LBound(data, 2) Then minCol = LBound(data, 2)
If minCol > UBound(data, 2) Then minCol = UBound(data, 2)
If maxCol < LBound(data, 2) Then maxCol = LBound(data, 2)
If maxCol > UBound(data, 2) Then maxCol = UBound(data, 2)
'Loop through data to find the two columns.
'Once found, leave the function.
For r = 1 To UBound(data, 1)
For c = minCol To maxCol
items = Split(data(r, c), vbLf)
For i = 0 To UBound(items)
If result(ID_IDX) = 0 Then
If IsDocId(items(i)) Then
result(ID_IDX) = c
If result(VER_IDX) = 0 Then
Exit For
Else
AcquireIdAndVerCol = result
Exit Function
End If
End If
End If
If result(VER_IDX) = 0 Then
If IsDocVer(items(i)) Then
result(VER_IDX) = c
If result(ID_IDX) = 0 Then
Exit For
Else
AcquireIdAndVerCol = result
Exit Function
End If
End If
End If
Next
Next
Next
End Function
Private Function IsDocId(val As String) As Boolean
Dim n As Long
n = TryClng(val)
IsDocId = (n > 9999 And n <= 999999999)
End Function
Private Function IsDocVer(val As String) As Boolean
Dim n As Long, m As Long
Dim items() As String
items = Split(val, ".")
If UBound(items) <> 1 Then Exit Function
n = TryClng(items(0))
m = TryClng(items(1))
IsDocVer = (n > 0 And n <= 99) And (m >= 0 And m <= 9)
End Function
'-------------------------------------------------------------------
'Converts a variant to a Long or returns a fail value as a Long
'if the conversion failed.
'-------------------------------------------------------------------
Private Function TryClng(expr As Variant, Optional fail As Long = -1) As Long
Dim n As Long
n = fail
On Error Resume Next
n = CLng(expr)
On Error GoTo 0
TryClng = n
End Function
It gives the following output with an addition column named, Concat Values, which contains combined values of Id's and corresponding Versions:
Output
Problem:
It works flawlessly if all the ID's have corresponding Versions specified in the sheet separately as I mentioned above. However in cases, where there is only one Version number, and it's bound to 4 or more Id's, i.e. Same Version number is applicable for all the ID's, like such:
The output in the column Concat Values gets disoriented because we are using an array to output the Concat Values and the array is not accommodating the missing Versions for corresponding Id's. It looks like this:
Dislocated row values
I am trying to learn and figure out a way to update the collection and the array with new Concat Values before Outputting it to the column, so that each Concat Value gets placed in their corresponding ID and Version location. I hope that it makes sense. Please let me know for more clarification.
EDIT:
I will try and list all the possible Cases and Expected Output, including the worst case scenarios:
Here is the link to my excel sheet.
Usual Scenarios
Number of Id's = Number of Versions (Works perfectly, Concat Values get aligned in corresponding rows in the columns)
Multiple Id's - Single Version (In such cases, the Version # applicable to all the ID's is same i.e. one Version should be applied to all the ID's.)
Issue:
The Macro does the task of splitting the columns into rows, except the part where Concat values get misaligned.
Worst Case Scenarios
Multiple Id's - Multiple Versions, but less than total #ID's (In such cases, Versions should align to the topmost ID's and fill the ID's below with blanks)
Issue:
The Macro does the task of splitting the columns into rows, except the part where Concat values get misaligned.
Here 4 ID's have been given only 3 Versions, so Top 3 ID's are assigned 3 Versions and the 4th ID has no Version linked to it.
Similarly,
Here 4 ID's have been given only 2 Versions, so Top 2 ID's are assigned 2 Versions and the 3rd and 4th ID's have no Version linked to them.
Multiple Id's - No Version (In such cases, columns should split into rows based on #ID's and corresponding Version rows should be filled with blanks)
Issue:
The Macro does the task of splitting the columns into rows, except the part where Concat values get misaligned.
The complexity of the solution will depend on the complexity and variety of 'special cases'. Given your scenarios, it seems as if you could just take the last of the given versions and, for any versions missing below that line, just use that last used version.
When I gave my first answer, I anticipated this kind of issue, so changes to the code are trivial.
Firstly add an additional declaration in the RunMe Sub:
Dim curVer As String
and then you just need to adjust the ElseIf n > 0 case. Replace the code with this:
ElseIf n > 0 Then 'it's multiple lines of text.
'Resize the output arrays to max ('n')
ReDim writeID(1 To n + 1, 1 To 1)
ReDim writeVer(1 To n + 1, 1 To 1)
'Loop through the arrays to align id and versions.
For i = 0 To n
If i <= UBound(ids) Then
writeID(i + 1, 1) = ids(i)
End If
If i <= UBound(vers) Then
curVer = vers(i)
End If
writeVer(i + 1, 1) = curVer
Next
'Add concatenated text to list.
For i = 0 To n
concatItems.Add writeID(i + 1, 1) & " " & writeVer(i + 1, 1)
Next
'Add the range to be split to the collection.
addItems.Add Array(writeID, writeVer, dataRng.Rows(r + 1).Resize(n))
Too much code for me to read but I came up with my solution if I understood you problem correctly.
I guess it could be a good solution if you modify it. With my code it will be easier to produce a new table instead of adding rows I guess. Then you could just add the formatting which should be very easy.
Sub Test()
Dim xRange As Range
Dim xArrRange() As Variant
Dim xNewArrRange() As Variant
Dim xNewArrRangeResize() As Variant
Dim xNumberColumns As Long
Dim i As Long
Dim j As Long
Dim k As Long
Dim l As Long
Dim ii As Long
Dim jj As Long
Set xRange = Range("A2:C5")
xNumberColumns = 3
xArrRange = xRange.Value2
ReDim xNewArrRange(xRange.Rows.Count + 10, xNumberColumns) ' "xNumberColumns - 1" to have the number of columns
' "xNumberColumns" is one more
For i = LBound(xArrRange, 1) To UBound(xArrRange, 1)
Dim xTempArrVer As Variant
Dim xTempArrID As Variant
xTempArrVer = Split(xArrRange(i, 3), vbLf)
If UBound(xTempArrVer) = -1 Then ' If there are no version, initialize it with ""
ReDim xTempArrVer(0)
xTempArrVer(0) = ""
End If
xTempArrID = Split(xArrRange(i, 2), vbLf)
For j = LBound(xTempArrID, 1) To UBound(xTempArrID, 1)
If j > UBound(xTempArrVer, 1) Then
l = UBound(xTempArrVer, 1)
Else
l = j
End If
xNewArrRange(k, 0) = xArrRange(i, 1)
xNewArrRange(k, 1) = xTempArrID(j)
xNewArrRange(k, 2) = xTempArrVer(l)
If xTempArrVer(l) <> "" Then
xNewArrRange(k, 3) = xTempArrID(j) & " " & xTempArrVer(l)
Else
xNewArrRange(k, 3) = xTempArrID(j)
End If
k = k + 1
If k + 1 > UBound(xNewArrRange, 1) Then
ReDim Preserve xNewArrRange(UBound(xNewArrRange, 1) + 30, xNumberColumns)
End If
Next j
Next i
ReDim xNewArrRangeResize(k - 1, xNumberColumns) ' "xNumberColumns - 1" to have the number of columns
' "xNumberColumns" is one more
For ii = LBound(xNewArrRangeResize, 1) To UBound(xNewArrRangeResize, 1)
For jj = LBound(xNewArrRangeResize, 2) To UBound(xNewArrRangeResize, 2)
xNewArrRangeResize(ii, jj) = xNewArrRange(ii, jj)
Next jj
Next ii
Range(Cells(2, 6), Cells(UBound(xNewArrRangeResize, 1) + 1, 6 + UBound(xNewArrRangeResize, 2))).Value2 = xNewArrRangeResize
Debug.Print "Finish"
End Sub
This code produces this:
If your code produces good number of rows for each id etc, the most lazy solution would be just to populate columns of your table with part of my array which is produced at the end.
Edit:
I see there is something missing but that is because I calculated wrongly that Range.
Range(Cells(2, 6), Cells(UBound(xNewArrRangeResize, 1) + 1, 6 + UBound(xNewArrRangeResize, 2))).Value2 = xNewArrRangeResize
Currently using this code, however, I have a huge set of data, and this runs really slow for that. I need to remove any duplicate information, and keep the highest row of information.
dim dup as variant, r as long, lncheckduplicatescolumn as long
With wb_DST.Sheets(sWs_DST)
lncheckduplicatescolumn = .Cells(.Rows.Count, "A").End(xlUp).row
for r = lncheckduplicatescolumn to 2 step -1
dup = application.match(.cells(r, "A").value, .columns(1), 0)
if dup < r then .rows(dup).delete
next r
end with
Data:
Column A Column B
A 1
B 2
C 3
A 3
Result should be:
B 2
C 3
A 3
The order of data in column A doesnt matter as long as it is unique, and retains the information that is in the higher row number. While the code I shared works, it is too slow for a large data set.
Another fast method, is to use the Dictionary object. You can check if any of the values in Column A already exists in the Dictionary. If they do (meaning it's a duplicate), then don't delete them every time, this adds a long time for code's run-time. Instead, you can use a DelRng object, which is a Range that uses Union to merge multiple rows that are duplicates.
Later on, you can delete the entire ducplicates range at once by using DelRng.Delete.
Code
Option Explicit
Sub RemoveDuplicatesUsingDict()
Dim wb_DST As Workbook
Dim sWs_DST As String
' Dictionary variables
Dim Dict As Object
Dim DictIndex As Long, ExistIndex As Long
Dim DelRng As Range
Dim LastRow As Long, i As Long
' --- parameters for my internal testing ---
Set wb_DST = ThisWorkbook
sWs_DST = "Sheet1"
Application.ScreenUpdating = False
Set Dict = CreateObject("Scripting.Dictionary")
With wb_DST.Sheets(sWs_DST)
LastRow = .Cells(.Rows.Count, "A").End(xlUp).Row ' get last row with data in column "A"
For i = LastRow To 2 Step -1
If Not Dict.exists(.Range("A" & i).Value) Then ' value doesn't exists yet in Dictionary >> add this Key
Dict.Add .Range("A" & i).Value, .Range("A" & i).Value
Else ' value already exists in Dictionary >> add it to DelRng (later will delete the entire range)
If Not DelRng Is Nothing Then
Set DelRng = Application.Union(DelRng, .Rows(i)) ' add current row to existing DelRng
Else
Set DelRng = .Rows(i)
End If
End If
Next i
End With
' delete the entire range at 1-shot
If Not DelRng Is Nothing Then DelRng.Delete
Application.ScreenUpdating = True
End Sub
Fast use of data field array
Looping through a range isn't that fast - you can speed it up considerably if you create a data field array with your search data (array = needed range in column "A" - see 1) and loop therein. If your data set grows, this gets even faster in comparison to the above shown dictionary approach, though it rests a good and reliable method.
Search Method
Any array value is checked against a concatenated search string with already found unique values and added if not yet included - see 2)
The completed string is transformed to an array and written back to a given target column (e.g. "H") - see 3) and 4)
I even added a second column with the corresponding row numbers, so you should be in the position to use them for further action. You could write results to another sheet, too.
Code - method demo
Sub RemoveDuplicates()
Dim t As Double: t = Timer ' stop watch
Dim ws As Worksheet ' source sheet (object)
Dim i As Long ' row index
Dim a, arr, arr2 ' variant
Dim s As String, si As String
Const SEP = "|" ' delimiter
s = SEP: si = SEP
' 0) fully qualified range reference to source sheet
Set ws = ThisWorkbook.Worksheets("Sheet1")
' 1) write column A data to one based 2-dim data field array
a = ws.Range("A1:A" & ws.Cells(ws.Rows.Count, "A").End(xlUp).Row)
' 2) loop through data and check for duplicates in concatenated string s
For i = 2 To UBound(a)
If InStr(s, SEP & a(i, 1) & SEP) = 0 Then
If Len(a(i, 1)) > 0 Then
s = s & a(i, 1) & SEP
si = si & i & SEP
End If
End If
Next i
' 3) transform unique values to zero based 1-dim array
arr = Split(Mid(s, 2), SEP) ' duplicates string to array
arr2 = Split(Mid(si, 2), SEP) ' found row numbers
' 4) write result to column H2:H... ' <<< change target to wanted column
ws.Range("H:H").ClearContents '
ws.Range("H2:H" & (2 + UBound(arr))).Value = Application.Transpose(arr)
ws.Range("I2:I" & (2 + UBound(arr2))).Value = Application.Transpose(arr2)
Debug.Print UBound(arr) + 0 & " unique items found", Format(Timer - t, "0.00 seconds needed")
End Sub
=================================================================
EDIT
Version 2 -- includes overwriting original data with unique values
Here you find a slightly modified version overwriting the original data in 35 columns (A2:AI..) with unique values.
Sub RemoveDuplicates2()
' Edit: overwrite original data A2:AI{..} with unique values
Dim t As Double: t = Timer ' stop watch
Dim ws As Worksheet ' source sheet (object)
Dim i As Long ' row index
Dim a, arr, arr2 ' variant
Dim s As String, si As String
Const SEP = "|" ' delimiter
Const MyLastColumn = "AI" ' letter of last column (no 35) = "AI"
s = SEP: si = SEP
' fully qualified range reference to source sheet
Set ws = ThisWorkbook.Worksheets("Sheet1")
' write column A data to one based 2-dim data field array
a = ws.Range("A1:A" & ws.Cells(ws.Rows.Count, "A").End(xlUp).Row)
' loop through data and check for duplicates in concatenated string s
For i = 2 To UBound(a) ' For i = UBound(a) To 2 Step -1
If InStr(s, SEP & a(i, 1) & SEP) = 0 Then
If Len(Trim(a(i, 1))) > 0 Then
s = s & a(i, 1) & SEP
si = si & i & SEP
End If
End If
Next i
' write unique values to zero based 1-dim array (starts with index 0; last delimiter removed in this version)
arr2 = Split(Mid(si, 2, Len(si) - 2), SEP) ' found row numbers
' overwrite original data
For i = LBound(arr2) To UBound(arr2) ' starts with index 0!
s = "A" & arr2(i) & ":" & MyLastColumn & arr2(i)
arr = ws.Range(s) ' create 1-based 1-line (2-dim) array
s = "A" & i + 2 & ":" & MyLastColumn & i + 2 ' 0 + 2 = +2 ... start in row 2
ws.Range(s) = arr ' write back unique row values
Next i
s = "A" & UBound(arr2) + 3 & ":" & MyLastColumn & UBound(a) + 1
ws.Range(s).ClearContents ' clear rest of original data
Debug.Print UBound(arr2) + 1 & " unique items found", Format(Timer - t, "0.00 seconds needed") ' result
End Sub
I have one array with Sheet Names called SheetNames and I want to generate a sub array of it that only returns True at the condition (IF). I try to have a loop into a cell value onto different sheets, evaluating condition cell.value = "S". When checks that for the first D column (z = 4) I want to make the same check (IF condition) for columns D to DR at the same row.
I need to get similar result if I use formula at
Diary!C7
= IF (element!D6 = "S",CONCATENATE (element!B1, ", "), ""),
IF (element1!D6 = "S",CONCATENATE (element1!B1, ", "), ""), ....
IF (element!E6 = "S",CONCATENATE (element!B1, ", "), ""),
IF (element1!E6 = "S",CONCATENATE (element1!B1, ", "), "") .... )
Where element is a sheet name taken from an array with the sheet names who get the condition (Code S or another code).
SheetNames is one array with all the book sheets and FSheet (Filtered Sheet with condition) an array with only the filtered (with condition IF). When I can populate FSheet array for each sheet I test the condition then I must concatenate it's values at another sheet/cell and began the test condition again to the next cell (E6) ... But I'm trapped at the step to create FSheet.
Sub Test()
Dim ws As Worksheet
Dim SheetNames() As String, FSheets() As String, q As String
Dim element As Variant
Dim lastSheet As Integer, r As Integer, incrSheet As Integer, i As Integer
Dim Rgn As Range
' Enter the sheet names into an array. Redim array's size to the number of sheets (lastSheet)
For Each ws In ActiveWorkbook.Worksheets
ReDim Preserve SheetNames(lastSheet)
SheetNames(lastSheet) = ws.name
lastSheet = lastSheet + 1
Next ws
MsgBox lastSheet
' Test condition for each sheet/cell
For z = 4 To 11
For Each element In SheetNames()
incrSheet = 1
If ActiveWorkbook.Sheets(element).Cells(6, z).Value = "S" Then
ReDim Preserve FSheets(incrSheet)
FSheets(incrSheet) = element
incrSheet = incrSheet + 1
End If
Next element
Next z
i = 3
' Define the sheet to work (total project will have more than one, one for code we need test, S, C, etc)
With Worksheets("Diary")
.Activate
.Range("C7").Select
' Concatenate values at Summary page
Do
Cells(7, i).Select
For r = 1 To UBound(FSheets)
'Concatenate with &:
varConctnt = varConctnt & ", " & FSheets(r)
Next r
'remove the "&" before the first element:
varConctnt = Mid(varConctnt, 2)
q = varConctnt
varConctnt = ""
i = i + 1
ActiveCell.Value = q
Loop While i < 11
' Drag the formula for the rest of the rows
Range("C7:J7").Select
Selection.AutoFill Destination:=Range("C7:J12"), Type:=xlFillDefault
End With
End Sub
Where you are going wrong, is your attempt to dynamically set the range. Assuming you are testing the value of a single cell, it is much easier to use Cells, rather than Range, since you can use R1C1 notation. Try something like this:
incrSheet = 1
For z = 4 To 11
For Each element In SheetNames()
If ActiveWorkbook.Sheets(element).Cells(6, z).Value = "S" Then
ReDim Preserve FSheets(incrSheet)
FSheets(incrSheet) = element
MsgBox incrSheet
incrSheet = incrSheet + 1
End If
Next element
Next z
I'm trying to figure out a way to combine rows based on values in two specific columns in vba excel.
For Example:
Let's say I have the following sheet:
Column A Column J Column Z
1 A ?
1 A !
2 B ?
2 B !
And I need to convert it to this:
Column A Column J Column Z
1 A ?, !
2 B ?, !
Here's another method using User Defined Types and collections to iterate through the list and develop the combined results. For large sets of data, it should be considerably faster than reading through each cell on the worksheet.
I assume that you are grouping on Col J, and that Column A data does not need to be concatenated in the cell. If it does, the modifications to the routine would be trivial.
First, Insert a Class Module, rename it CombData and insert the following code into that module:
Option Explicit
Private pColA As String
Private pColJ As String
Private pColZConcat As String
Public Property Get ColA() As String
ColA = pColA
End Property
Public Property Let ColA(Value As String)
pColA = Value
End Property
Public Property Get ColJ() As String
ColJ = pColJ
End Property
Public Property Let ColJ(Value As String)
pColJ = Value
End Property
Public Property Get ColZConcat() As String
ColZConcat = pColZConcat
End Property
Public Property Let ColZConcat(Value As String)
pColZConcat = Value
End Property
Then Insert a Regular Module and insert the Code Below:
Option Explicit
Sub CombineData()
Dim cCombData As CombData
Dim colCombData As Collection
Dim V As Variant
Dim vRes() As Variant 'Results Array
Dim rRes As Range 'Location of results
Dim I As Long
'read source data into array
V = Range("A1", Cells(Rows.Count, "A").End(xlUp)).Resize(columnsize:=26)
'Set results range. Here it is set below the Source Data
'Could be anyplace, even on a different worksheet; or could overlay the
' original. Area below and to right is cleared
Set rRes = Range("A1").Offset(UBound(V) + 10)
Range(rRes, rRes.SpecialCells(xlCellTypeLastCell)).Clear
Set colCombData = New Collection
On Error Resume Next
For I = 1 To UBound(V)
Set cCombData = New CombData
cCombData.ColA = V(I, 1)
cCombData.ColJ = V(I, 10)
cCombData.ColZConcat = V(I, 26)
colCombData.Add cCombData, CStr(cCombData.ColJ)
If Err.Number <> 0 Then
Err.Clear
With colCombData(cCombData.ColJ)
.ColZConcat = .ColZConcat & ", " & V(I, 26)
End With
End If
Next I
On Error GoTo 0
ReDim vRes(1 To colCombData.Count, 1 To 26)
For I = 1 To UBound(vRes)
With colCombData(I)
vRes(I, 1) = .ColA
vRes(I, 10) = .ColJ
vRes(I, 26) = .ColZConcat
End With
Next I
rRes.Resize(UBound(vRes, 1), UBound(vRes, 2)) = vRes
End Sub
EDIT: Note that the source data is read into the Variant array V. If you examine V in the Watch Window, you will see that the first dimension represents the rows; and the second dimension the columns. So if you wanted, for example, to perform the same procedure on a different set of columns, you would merely change the references to the second dimension under the line that reads Set cCombData = New CombData. For example, column B data would be V(I,2), and so forth. Of course, you might want to rename the data types to make them more representative of what you are doing.
In addition, if your data starts at row 2, merely start the iteration through V with I = 2 instead of I = 1.
EDIT2: In order to both overwrite the original, and also maintain the contents of the columns not being processed, the following modification will do that for Columns A, J and Z. You should be able to modify it for whatever columns you choose to process.
Option Explicit
Sub CombineData()
Dim cCombData As CombData
Dim colCombData As Collection
Dim V As Variant
Dim vRes() As Variant 'Results Array
Dim rRes As Range 'Location of results
Dim I As Long, J As Long, K As Long
'read source data into array
V = Range("A1", Cells(Rows.Count, "A").End(xlUp)).Resize(columnsize:=26)
'Set results range. Here it is set below the Source Data
'Could be anyplace, even on a different worksheet; or could overlay the
' original. Area below and to right is cleared
Set rRes = Range("A1") '.Offset(UBound(V) + 10)
Range(rRes, rRes.SpecialCells(xlCellTypeLastCell)).Clear
Set colCombData = New Collection
On Error Resume Next
For I = 1 To UBound(V)
Set cCombData = New CombData
cCombData.ColA = V(I, 1)
cCombData.ColJ = V(I, 10)
cCombData.ColZConcat = V(I, 26)
colCombData.Add cCombData, CStr(cCombData.ColJ)
If Err.Number <> 0 Then
Err.Clear
With colCombData(cCombData.ColJ)
.ColZConcat = .ColZConcat & ", " & V(I, 26)
End With
End If
Next I
On Error GoTo 0
ReDim vRes(1 To colCombData.Count, 1 To 26)
For I = 1 To UBound(vRes)
With colCombData(I)
vRes(I, 1) = .ColA
vRes(I, 10) = .ColJ
vRes(I, 26) = .ColZConcat
'Note the 10 below is the column we are summarizing by
J = WorksheetFunction.Match(.ColJ, WorksheetFunction.Index(V, 0, 10), 0)
For K = 1 To 26
Select Case K 'Decide which columns to copy over
Case 2 To 9, 11 To 25
vRes(I, K) = V(J, K)
End Select
Next K
End With
Next I
rRes.Resize(UBound(vRes, 1), UBound(vRes, 2)) = vRes
End Sub
This is assuming that Column J is the key and Column A doesn't need to be appended. If Column A needs to be combined as well (not always the same), you would simply need to add another for each loop to check if the data is there, and add it if not, as done for col 26 in the code.
Sub CombineData()
x = 2
Do Until Cells(x, 1) = "" 'loop through every row in sheet starting at 2 (1 will never be removed, since it is the first data)
x2 = 1
Do Until x2 = x
If Cells(x, 10) = Cells(x2, 10) Then 'this is comparing column J. If another column is the reference then change 10 to the column number
splt = Split(Cells(x, 26), ", ")
For Each s In splt 'check to see if data already in column z
If s = Cells(x2, 26) Then GoTo alreadyEntered
Next
Cells(x, 26) = Cells(x, 26) & ", " & Cells(x2, 26) 'append column z data to row x
alreadyEntered:
Rows(x2).Delete Shift:=xlUp 'delete duplicate row
x = x - 1 'to keep x at same row, since we just removed a row
Exit Do
Else
x2 = x2 + 1
End If
Loop
x = x + 1
Loop
End Sub