Merging Rows of column B with the count of already merged rows A - vba

I want to merge cells in one row (belongs to Column B) with the count of already merged different cell(belongs to Column A) .How can i start coding ?
this is the screenshot that i want

Merging cells in a spreadsheet means taking two or more cells and
constructing a single cell out of them. When you merge two or more
adjacent horizontal or vertical cells, the cells become one larger
cell that is displayed across multiple columns or rows. When you
merge multiple cells, the contents of only one cell (the upper-left
cell for left-to-right languages, or the upper-right cell for
right-to-left languages) appear in the merged cell. The contents of
the other cells that you merge are deleted. For more details please
go through this MSDN article Merge and unmerge
cells
Simple VBA code for Merging Cell
Sub merg_exp_1()
ActiveSheet.Range("A1:C10").Merge
End Sub
Sample data before and after running the program is shown.
Now let us see, If we merge a row what happens. Sample code for this
exercise though general is being tested for one situation only and
it as follow :
Sub Merge_Rows()
Dim rng As Range
Dim rrow As Range
Dim rCL As Range
Dim out As String
Dim dlmt As String
dlmt = ","
Set rng = ActiveSheet.Range("A1:C5")
For Each rrow In rng.Rows
out = ""
For Each rCL In rrow.Cells
If rCL.Value <> "" Then
out = out & rCL.Value & dlmt
End If
Next rCL
Application.DisplayAlerts = False
rrow.Merge
Application.DisplayAlerts = True
If Len(rrow.Cells(1).Value) > 0 Then
rrow.Cells(1).Value = Left(out, Len(out) - 1)
End If
Next rrow
End Sub
Sample data before and after running the program is shown. You can see this won't meet your objective.
Next we can try merging by column approach. Here also we are trying
for one column i.e. Column B to see the effect. Sample code as
follows.
Sub Merge_col_exp()
Dim cnum As Integer
Dim rng As Range
Dim str As String
For i = ActiveSheet.UsedRange.Rows.Count To 1 Step -1
cnum = Cells(i, 1).MergeArea.Count
Set rng = Range(Cells(i, 2), Cells(i - cnum + 1, 2)) ' only to demonstrate working in 2nd column
For Each cl In rng
If Not IsEmpty(cl) Then str = str + "," + cl
Next
If str <> "" Then str = Right(str, Len(str) - 1)
Application.DisplayAlerts = False
rng.Merge
rng = str
Application.DisplayAlerts = True
str = ""
i = i - cnum + 1
Next i
End Sub
Sample data before and after running the program is shown. You can see this is closer to your requirement. You can extend functionality of this program by finding Last Column in the Actively used range. Extend program functionality to cover upto last column.

Related

Sum Values based on unique ID

Just started a new job. I'm automating a month-end report and I'm new at VBA. Been googling most of my issues with success, but I've finally run into a wall. In essence I'm downloading some data from SAP and from there I need to build a report.
My question is: How to do a sumif function using loops in VBA?
Data pull:
Sheet1 contains a product code and purchase amounts (columns A & B) respectively. One product code can have several purchases (several rows with the same product code).
Steps so far:
I arranged the data sheet1 to be in ascending order.
Copied unique values for the product codes onto another sheet (sheet2). So Sheet2 has a list of all the products (in ascending order).
I want to get the sum of all purchases in sheet2 column B (per product code). I know how to do this using formulas, but I need to automate this as much as possible. (+ I'm genuinely interested in figuring this out)
This is what I did in VBA so far:
Sub Macro_test()
Dim tb As Worksheet
Dim tb2 As Worksheet
Dim x As Integer
Dim y As Integer
Dim lrow As Long
Set tb = Sheets("sheet1")
Set tb2 = Sheets("sheet2")
lrow = tb.Cells(Rows.Count, "A").End(xlUp).Row
For x = 2 To lrow
For y = 2 To lrow
If tb2.Cells(x, 1).Value = tb.Cells(y, 1).Value Then
tb2.Cells(x, 2).Value = tb.Cells(y, 2).Value
End If
Next y
Next x
End Sub
If i'm not mistaken, for each product_code in sheet2 col A, I'm looping through all the product codes in sheet1 and getting back the LAST value it finds, instead of the sum of all values... I understand why it doesn't work, I just don't know how to fix it.
Any help would be much appreciated. Thanks!
This statement overwrites the value of tb2.Cells(x, 2).Value at each iteration:
tb2.Cells(x, 2).Value = tb.Cells(y, 2).Value
Instead, I think you need to keep adding to it:
tb2.Cells(x, 2).Value = tb2.Cells(x, 2).Value + tb.Cells(y, 2).Value
But I don't like the looks of your double-loop which uses only one lrow variable to represent the "last row" on the two different worksheets, that could be causing some issues.
Or, in your loop do something like this which I think will avoid the duplicate sum. Still, assumes the second worksheet doesn't initially have any value in
' Base our lRow on Sheet2, we don't care how many rows in Sheet1.
lrow = tb2.Cells(tb2.Rows.Count, 1).End(xlUp).Row
Dim cl as Range
Set cl = tb.Cells(2,1) 'Our initial cell value / ID
For x = 2 to lRow '## Look the rows on Sheet 2
'## Check if the cell on Sheet1 == cell on Sheet2
While cl.Value = tb2.Cells(x,1).Value
'## Add cl.Value t- the tb2 cell:
tb2.Cells(x, 2).Value = tb2.Cells(x, 2).Value + cl.Offset(0,1).Value
Set cl = cl.Offset(1) '## Reassign to the next Row
Wend
Next
But it would be better to omit the double-loop and simply use VBA to do 1 of the following:
1. Insert The Formula:
(See Scott Holtzman's answer).
This approach is better for lots of reasons, not the least of which is that the WorksheetFunction is optimized already, so it should arguably perform better though on a small dataset the difference in runtime will be negligible. The other reason is that it's stupid to reinvent the wheel unless you have a very good justification for doing so, so in this case, why write your own version of code that accomplishes what the built-in SumIf already does and is specifically designed to do?
This approach is also ideal if the reference data may change, as the cell formulas will automatically recalculate based on the data in Sheet1.
2. Evaluate the formula & replace with values only:
If you prefer not to retain the formula, then a simple Value assignment can remove the formula but retain the results:
With .Range(.Range("B2"), .Range("A2").End(xlDown).Offset(, 1))
.FormulaR1C1 = "=SUMIF(Sheet1!C[-1]:C[-1],RC[-1],Sheet1!C:C)"
.Value = .Value 'This line gets rid of the formula but retains the values
End With
Use this approach if you will be removing Sheet1, as removing the referents will break the formula on Sheet2, or if you otherwise want the Sheet2 to be a "snapshot" instead of a dynamic summation.
If you really need this automated, take advantage of VBA to place the formula for you. It's very quick and easy using R1C1 notation.
Complete code (tested):
Dim tb As Worksheet
Dim tb2 As Worksheet
Set tb = Sheets("sheet1")
Set tb2 = Sheets("sheet2")
Dim lrow As Long
lrow = tb.Cells(tb.Rows.Count, 1).End(xlUp).Row
tb.Range("A2:A" & lrow).Copy tb2.Range("A2")
With tb2
.Range("A2").CurrentRegion.RemoveDuplicates 1
With .Range(.Range("B2"), .Range("A2").End(xlDown).Offset(, 1))
.FormulaR1C1 = "=SUMIF(Sheet1!C[-1]:C[-1],RC[-1],Sheet1!C:C)"
End With
End With
Note that with R1C1 notation the C and R are not referring to column or row letters . Rather they are the column and row offsets from the place where the formula is stored on the specific worksheet. In this case Sheet!C[-1] refers to the entire A column of sheet one, since the formula is entered into column B of sheet 2.
I wrote a neat little algorithm (if you can call it that) that does what you want them spits out grouped by totals into another sheet. Basically it loops through the first section to get unique names/labels and stores them into an array. Then it iterates through that array and adds up values if the current iteration matches what the current iteration of the nested loop position.
Private Sub that()
Dim this As Variant
Dim that(9, 1) As String
Dim rowC As Long
Dim colC As Long
this = ThisWorkbook.Sheets("Sheet4").UsedRange
rowC = ThisWorkbook.Sheets("Sheet4").UsedRange.Rows.Count
colC = ThisWorkbook.Sheets("Sheet4").UsedRange.Columns.Count
Dim thisname As String
Dim i As Long
Dim y As Long
Dim x As Long
For i = LBound(this, 1) To UBound(this, 1)
thisname = this(i, 1)
For x = LBound(that, 1) To UBound(that, 1)
If thisname = that(x, 0) Then
Exit For
ElseIf thisname <> that(x, 0) And that(x, 0) = vbNullString Then
that(x, 0) = thisname
Exit For
End If
Next x
Next i
For i = LBound(that, 1) To UBound(that, 1)
thisname = that(i, 0)
For j = LBound(this, 1) To UBound(this, 1)
If this(j, 1) = thisname Then
thisvalue = thisvalue + this(j, 2)
End If
Next j
that(i, 1) = thisvalue
thisvalue = 0
Next i
ThisWorkbook.Sheets("sheet5").Range(ThisWorkbook.Sheets("Sheet5").Cells(1, 1), ThisWorkbook.Sheets("Sheet5").Cells(rowC, colC)).Value2 = that
End Sub
Yay arrays

Applying VBA RIGHT to an entire column - Infinite Loop Issue

I have data that I am working to Parse Out that I have imported from approval emails sent in Outlook. At this point I am just importing the CreationTime and the SubjectLine.
For the subject line I am able to use the Split function to separate out most of the data. I then am left with Job Codes in Column B and Position numbers in Column C which includes the text: "Job Codes: XXXX" and the four digit job code number and "PN XXXX" and either a four digit or 6 digit position number. I am trying to use the Right functionality to loop through the entire column and reformat the column just to show only the four digit job code number for Column B and either just the 4 digit or 6 digit position number (the actual numbers) for Column C
For Job Code Column B:
Currently my code works for Shortening the Job Codes but it involves adding a column, putting the RIGHT formula in that column for the shortened Job Code, then copying and pasting the formula as values back into the column and then deleting the original column.
The problem- Works but perhaps not the most efficient with a larger data set (currently 200 rows but will have 2000 or more)
Code:
Sub ShortenJobCodes()
Application.ScreenUpdating = False
Const R4Col = "=RIGHT(RC3,4)"
Dim oRng As Range
Dim LastRow As Long
Range("B1").EntireColumn.Insert
LastRow = Cells(Rows.Count, "A").End(xlUp).Row
Set oRng = Range("B:B")
Range(oRng, Cells(LastRow, "B")).FormulaR1C1 = R4Col
Set oRng = Nothing
Columns("B").Select
Selection.Copy
Selection.PasteSpecial Paste:=xlPasteValues
Range("C1").EntireColumn.Delete
Application.ScreenUpdating = True
End Sub
For Position Numbers Column C:
Currently I have mirrored the above code but added in an if statement using LEN to count if the characters are less than 8, if so then insert one RIGHT function if not insert the other RIGHT function. This also involves adding an additional column putting the RIGHT formula in that column for the shortened Position Number(Eliminating all but just the number), then copying and pasting the formula as values back into the column and then deleting the original column.
Problem - This works but seems to take forever to process and in fact looks like it is in an infinite loop. When I Esc out of it, it does add the column and then input the proper RIGHT formula (leaving just the numeric values) but the sub never seems to end, nor does it copy and paste the formulas as values or delete the original column. As noted above I realize this is likely a more efficient way to do this but I have tried a bunch of options without any luck.
I am realizing part of the loop might be due to the range itself being an entire column but I cannot find a way to stop that with the last row (even though I have a count in there).
Code:
Sub ShortenPositionNumbers()
Application.ScreenUpdating = False
Const R4Col = "=RIGHT(RC4,4)"
Const R6Col = "=RIGHT(RC4,6)"
Dim oRng As Range
Dim rVal As String
Dim y As Integer
Dim selCol As Range
Dim LastRow As Long
Range("C1").EntireColumn.Insert
LastRow = Cells(Rows.Count, "A").End(xlUp).Row
Set selCol = Range("D:D")
For Each oRng In selCol
oRng.Select
rVal = oRng.Value
If Len(oRng.Value) > 8 Then
oRng.Offset(0, -1).FormulaR1C1 = R6Col
Else
oRng.Offset(0, -1).FormulaR1C1 = R4Col
End If
Next
Set oRng = Nothing
Columns("C").Select
Selection.Copy
Selection.PasteSpecial Paste:=xlPasteValues
Range("D1").EntireColumn.Delete
Application.ScreenUpdating = True
End Sub
Major Question: Is there a way to use RIGHT/TRIM/LEN/LEFT functions to do this within a cell without having to add columns/delete columns and insert functions?
There are a few things you can do here to speed up your code. I'm only going to reference the second code block as you can apply similar logic to the first.
The first issue is that you create a LastRow variable but never reference it again. It looks like you meant to use this in the selCol range. You should change that line to Set selCol = Range("C1:C" & lastRow). This way, when you loop through the rows you only loop through the used rows.
Next, in the For-Each loop you Select every cell you loop through. There really isn't any reason to do this and takes substantially longer. You then create the variable rVal but never use it again. A better way to set up the loop is as follows.
For Each oRng in selCol
rVal = oRng.Value
If Len(rVal) > 8 Then
oRng.Value = Right(rVal, 6)
Else
oRng.Value = Right(rVal, 4)
End If
Next
This is much cleaner and no longer requires creating columns or copying and pasting.
Try this, it uses Evaluate and no loops or added columns.
Sub ShortenPositionNumbers()
Application.ScreenUpdating = False
Dim selCol As Range
Dim LastRow As Long
With ActiveSheet
LastRow = .Cells(Rows.Count, "A").End(xlUp).Row
Set selCol = .Range(.Cells(1, 3), .Cells(LastRow, 3))
selCol.Value = .Evaluate("INDEX(IF(LEN(" & selCol.Address(0, 0) & ")>8,RIGHT(" & selCol.Address(0, 0) & ",6),RIGHT(" & selCol.Address(0, 0) & ",4)),)")
End With
Application.ScreenUpdating = True
End Sub
Or work with arrays
Sub ShortenPositionNumbers()
Dim data As Variant
Dim i As Long
With Range("C3:C" & Cells(Rows.Count, "A").End(xlUp).Row)
data = Application.Transpose(.Value)
For i = LBound(data) to UBound(data)
If Len(data(i)) > 8 Then
data(i) = RIGHT(data(i),6)
Else
data(i) = RIGHT(data(i),4)
End If
Next
.Value = Application.Transpose(data)
End With
End Sub

Excel VBA: How to find first empty row within a Table for a Loop routine?

I reformatted a range of Sheets("Records") in a workbook as a Table (named "RecordsTable") to make it easier to do INDEX(MATCH,MATCH) functions for generating reports.... but now I screwed up my looping routine for filling that range from the input on Sheets("FORM").
It used to be:
Set r = Sheets("Records").Range(A & Rows.Count).End(x1Up).Offset(1, 0)
i = 0
For Each c In Range("dataRange")
'dataRange is a list of cells to reference from the FORM input sheet
r.Offset(0, i).Value = Worksheets("FORM").Range(c)
i = i + 1
Next
However this code is now selecting the first row at the END of "RecordsTable" (row 501, as I defined 500 rows in my table) and inserting the data there.
I tried to change it to this:
Set r = Sheets("Records").ListObjects("RecordsTable").DataBodyRange("A" & Rows.Count).End(x1Up).Offset(1, 0)
i = 0
For Each c In Range("dataRange")
r.Offset(0, i).Value = Worksheets("FORM").Range(c)
i = i + 1
Next
But this code is still selecting row 501 and making that row part of "RecordsTable".
How can I properly Set "r" to = the first empty row in "RecordsTable"?
For reference, Column "A" in "RecordsTable" has the header [INV #]. Also, when I step into the "Set r = ..." line, Rows.Count is returning a value of 1million+ (ie, total rows on the sheet) - if I understand this correctly, I want it to return a value of 500 (ie, total rows in table) - is that correct?
EDIT
"dataRange" is a single column list of cell references (I do have them labeled in column B, as #chrisneilsen suggest:
A
J6
Y6
J8
J10
Y8
etc.
They are the cells on Sheets("FORM") that I need to pull data from and populate into my table, in the order indicated in "dataRange".
Assuming you really have a Table, adding data to a Table (ListObject) using it's properties and methods:
Sub Demo()
Dim lo As ListObject
Dim c As Range
Set lo = Worksheets("Records").ListObjects("RecordsTable")
For Each c In Sheets("V").Range("dataRange")
If Not lo.InsertRowRange Is Nothing Then
lo.InsertRowRange.Cells(1, 1) = Sheets("FORM").Range(c)
Else
lo.ListRows.Add.Range.Cells(1, 1) = Sheets("FORM").Range(c)
End If
Next
End Sub
Note: looping a range on sheet V and using that as a pointer to data on sheet FORM, copied from your answer - I'm assuming you know what you are doing here
Based on OP comment, adding data a single new row
Sub Demo()
Dim lo As ListObject
Dim c As Range, TableRange As Range
Dim i As Long
Set lo = Worksheetsheets("Records").ListObjects("RecordsTable")
If Not lo.InsertRowRange Is Nothing Then
Set TableRange = lo.InsertRowRange
Else
Set TableRange = lo.ListRows.Add.Range
End If
i = 1
For Each c In Sheets("V").Range("dataRange")
TableRange.Cells(1, i) = Sheets("FORM").Range(c)
i = i + 1
Next
End Sub
Note, this assumes that the order of the table columns is the same as the order of dataRange. It may be better to include table field names in dataRange to avoid any mismatch issues
As mentioned in updated OP, if column labels are in the next column, replace the For loop with this (and add Dim r as Range, col as long to declarations)
For Each c In Sheets("V").Range("dataRange")
If Not c = vbNullString Then
Set r = Worksheets("FORM").Range(c.Value)
col = lo.ListColumns(c.Offset(, 1).Value).Index
TableRange.Cells(1, col) = r.Value
End If
Next

Fastest way to check if two ranges are equal in excel vba [duplicate]

This question already has answers here:
How to compare two entire rows in a sheet
(11 answers)
Closed 8 years ago.
Imagine you have two sets of data and the number of rows and columns are the same. Now you want check if data in cells in one set is equal to data in cells with the same relative address in the other set. If thats true for all cells of a row, remove the row from both sets. I can code this very easily by comparing each cell and that's not good for large data sets. See code below for two columns where the two sets of data happen to be in the same sheet side by side with 300 in column offset between them.
Dim RngOb As Range
Dim c As Range
Range("A1", "B1").Select
set RngOb = Range(Selection, Selection.End(xlDown))
For Each c In RngOb.Rows
If c.Cells(1,1).Value = c.Offset(0, 300).Cells(1,1).Value Then
If c.Cells(1,2).Value = c.Offset(0, 300).Cells(1,2).Value Then
c.EntireRow.Delete
End If
End If
Next
My actual data has more than 100 columns and different number of columns from day to day. I'm looking for a smart, fast way to do this for large data sets. I highly appriciate answers, feedback and criticism. :D
Here is a simple way to compare two rows in isomorphic ranges.............in this example row #5 of each range:
Sub RowCompare()
Dim ary1() As Variant
Dim Range1 As Range, Range2 As Range, rr1 As Range, rr2 As Range
Set Range1 = Range("B9:F20")
Set Range2 = Range("I16:M27")
Set rr1 = Range1.Rows(5)
Set rr2 = Range2.Rows(5)
ary1 = Application.Transpose(Application.Transpose(rr1))
ary2 = Application.Transpose(Application.Transpose(rr2))
st1 = Join(ary1, ",")
st2 = Join(ary2, ",")
If st1 = st2 Then
MsgBox "the same"
Else
MsgBox "different"
End If
End Sub
If you have embedded commas in the cells, then choose another character in the JOIN
If I understand your problem correctly, the following code should allow you to do what you want. Within the code, you select the range you wish to process; the first column of each data set, and the number of columns within each data set.
It does assume only two data sets, as you wrote, although that could be expanded. And there are ways of automatically determining the dataset columns, if there is no other data in between.
Option Explicit
Option Base 0
Sub RemoveDups()
Dim I As Long, J As Long
Dim rRng As Range
Dim vRng As Variant, vRes() As Variant
Dim bRng() As Boolean
Dim aColumns, lColumns As Long
Dim colRowsDelete As Collection
'vRng to include from first to last column to be tested
Set rRng = Range("f1", Cells(Rows.Count, "F").End(xlUp)).Resize(columnsize:=100)
vRng = rRng
ReDim bRng(1 To UBound(vRng))
'columns to be tested
'Specify First column of each data set
aColumns = Array(1, 13)
'num columns in each data set
lColumns = 3
For I = 1 To UBound(vRng)
bRng(I) = vRng(I, aColumns(0)) = vRng(I, aColumns(1))
For J = 1 To lColumns - 1
bRng(I) = bRng(I) And (vRng(I, aColumns(0) + J) = vRng(I, aColumns(1) + J))
Next J
Next I
'Rows to Delete
Set colRowsDelete = New Collection
For I = 1 To UBound(bRng)
If bRng(I) = True Then colRowsDelete.Add Item:=I
Next I
'Delete the rows
If colRowsDelete.Count > 0 Then
Application.ScreenUpdating = False
For I = colRowsDelete.Count To 1 Step -1
rRng.Rows(colRowsDelete.Item(I)).EntireRow.Delete
Next I
End If
Application.ScreenUpdating = True
End Sub

Need a better optimized code?

Need a much Optimized code.Well I Got a Project and I have Succefully made it work with the vba (Mostly helped by the stackoverflow programmers Thanks for that)
But Today I got a Feedback. Its deleting 2 more unique entries in the record But I dont know why its deleting Them.
The Algorithm I have applied
I have Used the COUNTIF function Which I found on google
="countif(A$1:A2,A3)=0" A3 is the active cell, Checks A2,A1 for dupes
It Throws False if there is a duplicate in The A column and True If it is a unique.What I have understood about Countif is that
It checks all the above columns values from that cell I mean let us take A4. SO it checks A2,A1,A3 for the duplicate. Similarly A10 checks for A1 to A9 and throws either TRue or False.Well It was working But I dont know what went wrong The code is not working for some entries.Its even showing False for the Unique entries sometimes.
And its taking more time to applye these formula as I have more amount of data. Im trying to make it cleaner and more Optimizing Way.People told me its not a c or some other Language to make it optimize but Im need of code that makes my code more optimized
I need code for these condtions can anyone help me as my countif failed.Im little helpless in doing so.
1)I have a column and I should check for duplicates in that column and delete that row if it is a duplicate
2) I have 35000 Old entries in the column and I have new entries 2000 everyweek these are appended. I need to check these 2000 entries from the total 37000 ( as we appened we get 35000+2000) and these delete operation need to be performed only on the newly appended 2000 entries but it should check the duplicates for entire column
Let me explain you clearly I have 2000 entries newly added,so Only these entries are to be checked for the duplicates from the 35000 entries and also from itself (2000 entries) and delete it if it is a duplicate and no duplicating operation should be performed on the 35000 entries old data.
I have found some codes but they are deleting even the duplicates of the 35000 entries. I have set the range but even though its not working.
Can anyone help me with the best code that takes less time?please thank you
Updating my question with the sample code I have
A B F G H I Y
PTY 39868.5 4 2 540 3 PTY39868.5425403
GTY 34446.1234 2 1 230 1 GTY34446.1234212301
PTY 3945.678 2 2 PTY3945.67822
GTY 34446.1234 2 1 230 1 GTY34446.1234212301
let us say these are old 35000 entries
Explaination to the above example.
The above are the 35000 entries. I have to check A,B,F,G,H,I columns for the dupes, if they are same I have to delete the row, I should not bother about the other columns c,d etc. so what I did is I have used one unused column Y and concatenated these 6 columns values into 1 at Y column using these
= A2 & B2 & F2 & G2 & H2 &I2 with the respective columns
Now checking the Y column for dupes and delete the entire row. as 2003 supports only for one column as far to my knowledge.
Notice that even the 35000 entries may have duplicates in it but I should not delete them. Example you can see the 2 and last row in my example code are dupes but I should not delete
as it is the old data.
A B F G H I Y
PTY 39868.5 4 2 540 3 PTY39868.5425403 'old
GTY 34446.1234 2 1 230 1 GTY34446.1234212301 'old
PTY 3945.678 2 2 PTY3945.67822 'old
GTY 34446.1234 2 1 230 1 GTY34446.1234212301 'old
PTY 3945.678 1 1 230 2 PTY3945.678112302 'new
PTY 39868.5 4 2 540 3 PTY39868.5425403 'new
PTY 3945.678 1 1 230 2 PTY3945.678112302 'new
Now note that New entry PTY (from last 2nd) is a duplicate of the original record(PTY at first) So I hava to delete it.And the last new entry is a duplicate of the new entry itself so I should delete it even that . SO in the above code I have to delete only the last 2 rows which are dupes of original record and also from it . But should not delete the GTY which is the dupe but which is in orginal record.
I think I gave a clear view now. Is concatenating them into one cell . Is it better way to approach? as conactenatin for 40000 entries taking just 2 seconds i think that doesnt matter but any more algorithms to these is much aprreciated
I heard counif treats 45.00 and 45.00000 as different is that right may be that was the problem with it? since I have decimal points in my data. I think I should do
= I2 & H2 & G2 & F2 & A2 & B2
which is better to concatenate? is this or the other i posted before?
BIG UPDATE:
It think the original questions threw me off - there may be a problem with the logic in the question. The following assumes you want to delete the cell, not entire row, for the duplicate entries.
If the 35000 old records do not include duplicates, then all you need to do is remove all duplicates from the entire column - so long as you start from row 1, you run no risk of deleting any of the 'old' rows since no duplicates exist in them.
Here is one way:
Sub UniqueList()
Application.ScreenUpdating = False
Dim vArray As Variant
Dim i As Long, j As Long, lastrow As Long
Dim dictionary As Object
Set dictionary = CreateObject("scripting.dictionary")
lastrow = Range("A" & Rows.Count).End(xlUp).Row
vArray = Range("A1:A" & lastrow).Value
On Error Resume Next
For i = 1 To UBound(vArray, 1)
For j = 1 To UBound(vArray, 2)
If Len(vArray(i, j)) <> 0 Then
dictionary(vArray(i, j)) = 1
End If
Next
Next
Columns("A:A").ClearContents
Range("A1").Resize(dictionary.Count).Value = _
Application.Transpose(dictionary.keys)
Application.ScreenUpdating = True
End Sub
If for some odd reason the 35000 old records DO include dupes and you only want to allow these 35000 records to do so, then you can use 2 dictionaries, but this would be an unusual case since you'd be treating the old records differently than new...
Sub RemoveNewDupes()
Application.ScreenUpdating = False
Dim lastRow As Long
Dim varray As Variant
Dim oldDict As Object, newDict As Object
Set oldDict = CreateObject("scripting.dictionary")
Set newDict = CreateObject("scripting.dictionary")
On Error Resume Next
lastRow = Range("A" & Rows.Count).End(xlUp).Row
'Add old entries to dictionary
varray = Range("A1:A35000").Value
For i = 1 To UBound(varray, 1)
oldDict.Add varray(i, 1), 1
Next
'Check for dupes
varray = Range("A35001:A" & lastRow).Value
For i = 1 To UBound(varray, 1)
If oldDict.exists(varray(i, 1)) = False Then
newDict.Add varray(i, 1), 1
End If
Next
'Delete and slap back on the unique list
Range("A35001", "A" & Rows.Count).ClearContents
Range("A35001").Resize(newDict.Count).Value = _
Application.Transpose(newDict.keys)
Application.ScreenUpdating = True
End Sub
Thanks to Reafidy for the advice and getting me to relook at this.
This is also a response to some of the comments and solutions made by other members so sorry if it does not straight away answer your question.
Firstly I believe that using excel in a database scenario that raw data and presentation data should be separated. This usually means a single worksheet with raw data and multiple other worksheets with presentation data. Then delete the raw data when necessary or archive.
When speed testing it is very difficult to get a level playing field in excel as there are many things that affect the results. Computer specs, available RAM etc.. Code must first be compiled before running any of the procedures. The test data is also important, when considering duplicates - how many duplicates vs how many rows. This sub loads some test data, altering the amount of rows vs the range of random numbers (duplicates) will give very different results for your code. I don't know what your data looks like so we are kind of working blind and your results may be very different.
'// This is still not very good test data, but should suffice for this situation.
Sub TestFill()
'// 300000 rows
For i = 1 To 300000
'// This populates a random number between 1 & 10000 - adjust to suit
Cells(i, "A").value = Int((100000 + 1) * Rnd + 1)
Next
End Sub
If we are talking about advanced filter vs an array & dictonary method then advanced filter will be quicker with a lower amount of rows but once you get above a certain amount of rows then the array method will be quicker. Then see what happens when you change the amount of duplicates.... :)
As a guideline or as a general rule using excels built in functions will be faster and I recommend always develop attempting to use these inbuilt functions, however there are often exceptions, like above when removing duplicates. :)
Deleting rows can be slow when looping if used incorrectly. If looping is used then it is important to keep synchronisation between code and the workbook out of the loop. This usually means read data to an array, loop through the data, then load the data from the array back to the presentation worksheet essentially deleting the unwanted data.
Sub RemoveDuplicatesA()
'// Copy raw data to presentation sheet
Range("A1", Cells(Rows.Count, "A").End(xlUp)).AdvancedFilter _
Action:=xlFilterCopy, CopyToRange:=Sheet2.Range("B1"), Unique:=True
End Sub
This will be the fastest method:
Sub RemoveDuplicatesB()
Dim vData As Variant, vArray As Variant
Dim lCnt As Long, lRow As Long
vData = ActiveSheet.UsedRange.Columns(1).value
ReDim vArray(0 To UBound(vData, 1), 0)
lCnt = 0
With CreateObject("Scripting.Dictionary")
For lRow = 1 To UBound(vData, 1)
If Not .Exists(vData(lRow, 1)) Then
vArray(lCnt, 0) = vData(lRow, 1): lCnt = lCnt + 1
.Add vData(lRow, 1), Nothing
End If
Next lRow
End With
'// Copy raw data to presentation sheet
Sheet2.Range("B1").Resize(lCnt).value = vArray
End Sub
Application transpose has a limitation of 65536 rows but as you are using 2003 you should be fine using it, therefore you can simplify the above code with:
Sub RemoveDuplicatesC()
Dim vData As Variant
Dim lRow As Long
vData = ActiveSheet.UsedRange.Columns(1).value
With CreateObject("Scripting.Dictionary")
For lRow = 1 To UBound(vData, 1)
If Not .exists(vData(lRow, 1)) Then
.Add vData(lRow, 1), Nothing
End If
Next lRow
'// Copy raw data to presentation sheet or replace raw data
Sheet2.Columns(2).ClearContents
Sheet2.Columns(2).Resize(.Count).value = Application.Transpose(.keys)
End With
End Sub
EDIT
Okay so #Issun has mentioned you want the entire row deleted. My suggestion was to improve your spreadsheet layout by having a raw data and presentation sheet which means you dont need to delete anything hence it would have been the fastest method. If you dont want to do that and would like to edit the raw data directly then try this:
Sub RemoveDuplicatesD()
Dim vData As Variant, vArray As Variant
Dim lRow As Long
vData = ActiveSheet.UsedRange.Columns(1).value
ReDim vArray(1 To UBound(vData, 1), 0)
With CreateObject("Scripting.Dictionary")
For lRow = 1 To UBound(vData, 1)
If Not .exists(vData(lRow, 1)) Then
varray(lRow, 0) = "x"
.Add vData(lRow, 1), Nothing
End If
Next lRow
End With
Application.ScreenUpdating = False
'// Modify the raw data
With ActiveSheet
.Columns(2).Insert
.Range("B1").Resize(lRow).value = vArray
.Columns(2).SpecialCells(xlCellTypeBlanks).EntireRow.Delete
.Columns(2).Delete
End With
Application.ScreenUpdating = True
End Sub
Before starting again from scratch your whole code, here are a few things you can try:
Optimize your VBA
There are several tips on the web about optimizing vba. In particular, you can do:
'turn off some Excel functionality so your code runs faster
'these two are especially very efficient
Application.ScreenUpdating = False
Application.Calculation = xlCalculationManual
'use these if you really need to
Application.DisplayStatusBar = False
Application.EnableEvents = False
'code goes here
'at the end, restore the default behavior
'calculate the formulas
Application.Calculate
Application.ScreenUpdating = True
Application.Calculation = xlCalculationAutomatic
Application.DisplayStatusBar = True
Application.EnableEvents = True
See here for more information
Optimize your algorithm
Especially when your inserting your COUNTIF formula, you can try to fill in instead of inserting the formula in each row.
On the deleting row part, you should try the solution I gave you in your previous thread: Delete duplicate entries in a column in excel 2003 vba to filter first on the True values and then to delete the visible cells. It is probably the fastest way.
[EDIT] Seems like Doc Brown's answer would be probably the best way to handle this (hey, this is a dictionary solution that wasn't written by Issun :)). Anyway, the VBA optimization tips are still relevant because this is quite a slow language.
OK, here's the advancedfilter method. Don't know if it is faster than the dictionary method. It would be interesting to know though, so let me know after you try it. I also included the delete portion so you would have to stop that portion if you want to do a true comparison. Also, you can make this a function instead of a sub and put in your variables, however you want to change it.
Sub DeleteRepeats()
Dim d1 As Double
Dim r1 As Range, rKeepers As Range
Dim wks As Worksheet
d1 = Timer
Set wks = ActiveSheet
Application.EnableEvents = False
Application.ScreenUpdating = False
'Make sure all rows are visible
On Error Resume Next
wks.ShowAllData
wks.UsedRange.Rows.Hidden = False
wks.UsedRange.Columns.Hidden = False
On Error GoTo 0
'Get concerned range
Set r1 = wks.Range("A1:A35000")
'Filter
r1.AdvancedFilter Action:=xlFilterInPlace, Unique:=True
'Get range of cells not to be deleted
Set rKeepers = r1.SpecialCells(xlCellTypeVisible)
On Error Resume Next
wks.ShowAllData
On Error GoTo 0
rKeepers.EntireRow.Hidden = True
'Delete all undesirables
r1.SpecialCells(xlCellTypeVisible).EntireRow.Delete
'show all rows
On Error Resume Next
wks.UsedRange.Rows.Hidden = False
On Error GoTo 0
Application.EnableEvents = False
Application.ScreenUpdating = False
Debug.Print Timer() - d1
End Sub
OK, here's a take on Doc's and Issun's use of Dictionaries. Before I wasn't convinced but after looking at it and testing it and comparing to advanced filter, I am convinced, dictionaries are better for this application. I don't know why Excel isn't faster on this point since they should be using faster algorithms, it's not the hiding, unhiding of the rows since that happens very quickly. So if anyone knows, let me know. This procedure takes just over 1 second on my slow computer:
Sub FindDupesAndDelete()
Dim d1 As Double
Dim dict As Object
Dim sh As Worksheet
Dim v1 As Variant
' Dim s1() As String
Dim rDelete As Range
Dim bUnion As Boolean
d1 = Timer()
bUnion = False
Set dict = CreateObject("Scripting.Dictionary")
Set sh = ActiveSheet
v1 = Application.Transpose(sh.Range("A1", "A" _
& sh.Cells.SpecialCells(xlCellTypeLastCell).row))
' ReDim s1(1 To UBound(v1))
Dim row As Long, value As String ', newEntry As Boolean
For row = 1 To sh.Cells.SpecialCells(xlCellTypeLastCell).row
value = v1(row)
If dict.Exists(value) Then
' newEntry = False
If bUnion Then
Set rDelete = Union(rDelete, sh.Range("A" & row))
Else
Set rDelete = sh.Range("A" & row)
bUnion = True
End If
Else
' newEntry = True
dict.Add value, 1
End If
' s1(row) = newEntry
Next
rDelete.EntireRow.Delete
' sh.Range("B1", "B" & UBound(v1)) = Application.Transpose(s1)
Debug.Print Timer() - d1
End Sub
Okay so now we have some more info here is a solution. It should execute almost instantly.
The code works by filling column y with your concatenate formula. It then adds all of column y to a dictionary and using the dictionary marks each row as a duplicate in column z. It then removes all the duplicates found after row 35000. Then finally it clears both column y and column z to remove the redundant data.
Sub RemoveDuplicates()
Dim vData As Variant, vArray As Variant
Dim lRow As Long
'// Get used range of column A (excluding header) and offset to get column y
With ActiveSheet.Range("A2", Cells(Rows.Count, "A").End(xlUp)).Offset(, 24)
'// Adds the concatenate formula to the sheet column (y)
.FormulaR1C1 = "=RC[-24]&RC[-23]&RC[-19]&RC[-18]&RC[-17]&RC[-16]"
'// Adds the formula results to an array
vData = .Resize(, 1).value
End With
'// Re dimension the array to the correct size
ReDim vArray(1 To UBound(vData, 1), 0)
'// Create a dictionary object using late binding
With CreateObject("Scripting.Dictionary")
'// Loop through each row in the array
For lRow = 1 To UBound(vData, 1)
'// Check if value exists in the array
If Not .exists(vData(lRow, 1)) Then
'// Value does not exist mark as non duplicate.
vArray(lRow, 0) = "x"
'// Add value to dictionary
.Add vData(lRow, 1), Nothing
End If
Next lRow
End With
'// Turn off screen updating to speed up code and prevent screen flicker
Application.ScreenUpdating = False
With ActiveSheet
'// Populate column z with the array
.Range("Z2").Resize(UBound(vArray, 1)) = vArray
'// Use error handling as speciallcells throws an error when none exist.
On Error Resume Next
'// Delete all blank cells in column z
.Range("Y35001", .Cells(Rows.Count, "Y").End(xlUp)).Offset(, 1).SpecialCells(xlCellTypeBlanks).EntireRow.Delete
'// Remove error handling
On Error GoTo 0
'// Clear columns y and z
.Columns(25).Resize(, 2).ClearContents
End With
'// Turn screen updating back on.
Application.ScreenUpdating = True
End Sub
NOTE: you can change all references "activesheet" to your sheet codename if you want.
NOTE2: it assumes you have headers and has left row 1 alone.
I have used your columns and test data as best I can. Here is the test fill I used:
Sub TestFill()
For i = 1 To 37000
With Range("A" & i)
.value = Choose(Int(2 * Rnd + 1), "PTY", "GTY")
.Offset(, 1).value = Round((40000 * (Rnd + 1)), Choose(Int(4 * Rnd + 1), 1, 2, 3, 4))
.Offset(, 5).value = Int(4 * Rnd + 1)
.Offset(, 6).value = Int(2 * Rnd + 1)
.Offset(, 7).value = Choose(Int(2 * Rnd + 1), "230", "540")
.Offset(, 8).value = Int(3 * Rnd + 1)
End With
Next i
End Sub
Lets say you have your entries in column A, and you want the result of your formula in column B (but much faster). This VBA macro should do the trick:
Option Explicit
Sub FindDupes()
Dim dict As Object
Dim sh As Worksheet
Set dict = CreateObject("Scripting.Dictionary")
Set sh = ActiveSheet
Dim row As Long, value As String
For row = 1 To sh.Cells.SpecialCells(xlCellTypeLastCell).row
value = sh.Range("A" & row).Text
If dict.Exists(value) Then
sh.Range("B" & row) = "False"
Else
sh.Range("B" & row) = "True"
dict.Add value, 1
End If
Next
End Sub
(Using a dictionary gives here almost linear running time, which should be a matter of seconds for 35.0000 rows, where your original formula had quadratic running time complexity).
Edit: due to your comment: you will have to fill the dictionary first by reading each entry at least once, that is something you cannot avoid easily. What you can avoid is to fill the rows of column B again when they are already filled:
Option Explicit
Sub FindDupes()
Dim dict As Object
Dim sh As Worksheet
Set dict = CreateObject("Scripting.Dictionary")
Set sh = ActiveSheet
Dim row As Long, value As String, newEntry As Boolean
For row = 1 To sh.Cells.SpecialCells(xlCellTypeLastCell).row
value = sh.Range("A" & row).Text
If dict.Exists(value) Then
newEntry = False
Else
newEntry = True
dict.Add value, 1
End If
If Trim(sh.Range("B" & row)) = "" Then sh.Range("B" & row) = newEntry
Next
End Sub
But I suspect this won't be much faster than my first solution.
Now that you have updated that you want the entire rows deleted and that the first 35000 rows are allowed to have dupes, here is a function that will do that for you. I think I came up with a clever method and it's blazing fast, too:
Sub RemoveNewDupes()
Application.ScreenUpdating = False
Dim lastRow As Long
Dim varray As Variant
Dim oldDict As Object, newDict As Object
Set oldDict = CreateObject("scripting.dictionary")
Set newDict = CreateObject("scripting.dictionary")
On Error Resume Next
lastRow = Range("A" & Rows.Count).End(xlUp).Row
'Add old entries to dictionary
varray = Range("A1:A35000").Value
For i = 1 To UBound(varray, 1)
oldDict.Add varray(i, 1), 1
Next
'Check for dupes
varray = Range("A35001:A" & lastRow).Value
For i = 35000 + UBound(varray, 1) To 35001 Step -1
If oldDict.exists(varray(i - 35000, 1)) = True Or _
newDict.exists(varray(i - 35000, 1)) = True Then
Range("A" & i).EntireRow.Delete
Else
newDict.Add varray(i - 35000, 1), 1
End If
Next
Application.ScreenUpdating = True
'A status message at the end for finishing touch
MsgBox UBound(varray, 1) - newDict.Count & _
" duplicate row(s) found and deleted."
End Sub
How it works:
First I store the 35000 cells into a dictionary file. Then I take a variant array of every cell 35001 onward and loop through them backwards to see if it's in the 35k dictionary or not, or that we haven't come across the value yet in the loop. If it finds that it's a dupe, it deletes the row.
The cool (if I may say) way that it does the row deletion is that when you create the varray, for say A35001 - A37000, it stores them as (1, 1) (2, 1) (...). So if you set "i" to the Ubound of the array + 35000 and step back to 35001, you will loop through all the additions backwardsfrom A37000 to A35001. Then when you want to delete the row, "i" is perfectly set to the row number the value was found in, so you can delete it. And since it goes backwards, it does not screw up the loop!