Excel [VBA] find duplicate data and delete the oldest one

Excel [VBA] find duplicate data and delete the oldest one - vba

I have a problem in MS Excel. I´ve a spreadsheet with data like this:
Name | timestamp
------------------------
Smith | 12.05.2015
Smith | 01.01.2015
Smith | 10.05.2015
Simpson | 14.04.2015
Simpson | 10.02.2015
Simpson | 21.03.2015
Simpson | 02.01.2015
The data I´ve is much bigger and komplex and there are duplicates with different timestamps. Now I want to delete the oldes ones and want an output like this:
Name | timestamp
Smith | 12.05.2015
Simpson | 14.04.2015
I know how to remove duplicates, but in this case it´s a little bit different. I hope you can help me to solve the problem.

You may not need VBA.
In my experience the Excel Remove Duplicates code works to remove the first encountered duplicates in a list.
So sort your data by Name ascending and timestamp descending, then remove the duplicates from the Name field only.
You should be left with the most recent names.

I did a bit of testing, and Range.RemoveDuplicates appears to keep the first entry for each duplicate value (at least in a sorted range, which you're going to use). Here's my solution:
Sub SortAndCondense()
'This subroutine sorts a table by name and secondarily by descending date. It then removes
'all duplicates in the name column. By sorting the dates in descending order, only the most
'recent entries for each name are preserved
Dim wrkSht As Worksheet
Set wrkSht = ThisWorkbook.Worksheets("Sheet1")
Dim dateTable As Range
Dim header1 As Range, header2 As Range
Set dateTable = wrkSht.Range("A2:B7") 'insert your actual table range; modify as necessary for column headers
Set header1 = wrkSht.Range("A2")
Set header2 = wrkSht.Range("B2")
'sort the column primarily by name, and secondarily by descending date. The order in which the names are sorted
'is irrelevant.
dateTable.Sort Key1:=header1, Key2:=header2, Order2:=xlDescending
'remove all duplicate names. The way remove duplicates works (so far as I can tell) is that it keeps only the
'topmost entry for each duplicate column.
dateTable.RemoveDuplicates 1
End Sub

Related

'Tidy up' Oracle SQL report output

I am writing some SQL to output reports via SQL*Plus (Oracle Reflection). The output files (xlsx or lst) contain a lot of information that isn't needed in this instance, like below:
Session altered.
Enter value for 1: LIF
old 3: where a.group = '&&1'
new 3: where a.group = '123'
URN |Title |Forename |Middle Name | Surname
--------------------------------------------------------------------------------------------------
123 | Mx | Smith |Bryn | Paul
Rows Selected 1
I am looking to suppress the leading rows so that 'URN' is located in cells A1, the row of '-' separating the headers from the data and finally the 'rows selected' output.
Thanks in advance!

Try:
SET HEADING OFF
More info on the options here.
Edit: The option for removing the count of rows selected at the end is
SET FEEDBACK OFF
just noticed it in the question.

Get row number when using "MAXIFS" function

I am using MAXIFS (or similar) to identify the wanted line in a table. but i do not need the max value, i need data from an adjecent column. Example:
=MAXIFS(TableComments1[CommentDate];TableComments1[T.Number];TableView1[#Number])
Basically, in this example i am searching for lines, matching "Number", with the latest date. But in a next step i require to get the row number of the date to enable the use of INDEX and return the appropriate column (TableComments1[Comment]).
I tried different approaches - no success.
PS: performance is also important here.
UPDATE, example lookup table:"TableComments1"
T.Number | Comment | CommentDate
==============+==============+===========
SCTASK0073347 | correction | 22/07/2018
SCTASK0073347 | update 11 | 25/07/2018
SCTASK0073347 | update 2 | 21/07/2018
PS: sorting "CommentDate" is not an option here.

After days of dabbling and finally posting the above question i found a solution myself. Not sure it is the best but performance seems okay.
Be aware: a more simple solution is possible, by sorting the table "CommentDate". This could not be guaranteed and was not desired in this use-case based on the question input.
recap: We want in table TableView1 to add the most recent comment for column "Number" with lookup from TableComments1 containing the comment history:
I got the idea from another post to use a helper column for combination of 2 criteria. New table layout:
T.Number | Comment | CommentDate | Helper1
==============+==============+=============+===================
SCTASK0073347 | correction | 22/07/2018 | 43303SCTASK0073347
SCTASK0073347 | find this! | 25/07/2018 | 43306SCTASK0073347
SCTASK0073347 | update 2 | 21/07/2018 | 43302SCTASK0073347
TASK9999 | comment | 25/07/2018 | 43306TASK9999
Formula breakdown
The formula for the Helper column just does CONCATENATE 2 columns:
=[#CommentDate]&[#[T.Number]]
Lets say we want: SCTASK0073347
Note: in the helper column we have value "43306SCTASK0073347";
where "43306" is the numerical representation of date "25/07/2018".
This will search for a match of "Number" and return the most recent "CommentDate":
=MAXIFS(TableComments1[CommentDate];TableComments1[T.Number];TableView1[#Number])
Returning "25/07/2018". Lets abbreviate the above to <<MostRecentDate>> for readability in next step(s).
This step, will search for a combination of above formula <<MostRecentDate>> & "Number" in the Helper column:
=MATCH(<<MostRecentDate>>&TableView1[#Number];TableComments1[Helper1];0)
..returning row number (2) matching helper table value "43306SCTASK0073347".
From this point forward we use MATCH (now returning the wanted row) and INDEX in a style VLOOKUP would do:
=INDEX(TableComments1[Comment];MATCH(<<MostRecentDate>>&TableView1[#Number];TableComments1[Helper1];0))
...returning the wanted column with desired comment "find this!".
Full/final formula, includes IFNA function to clear blank lookups with no comments:
=IFNA(INDEX(TableComments1[Comment];MATCH(MAXIFS(TableComments1[CommentDate];TableComments1[T.Number];TableView1[#Number])&TableView1[#Number];TableComments1[Helper1];0));"")

How to map two column using another column data

I have Five columns.
E.g.
Column1: Name
Column2: surname
Column3: mapping
Column4: Mapped data
Columns contain data like
Name Surname Mapping Name1 Surname1
1 ABC 1 AAAA 3 ABC QQQQ
2 XYZ 2 XXXX 1 XYZ AAAA
3 OPQ 3 QQQQ 4 OPQ RRRR
4 RST 4 RRRR 2 RST XXXX
Now my aim is to map name column to surname by using mapping column and result should be stored at Name1 and Surname1 column. I have more data in Name and Surname column, by writing number in Mapping column it will automatically map the surname to Name (the choice is given to user for entering number in mapped column then map the data accordingly) and result should be copied in Name1 and Surname1.
I am not getting any idea to achieve this using VBA. coding Plz help me.....

Amar, there are certainly plenty of ways to go about this using Excel's built in functions, however, since you asked about a VBA solution, here you go:
Function Map(n)
Map = Cells(n + 1, 2)
End Function
Placing the above code into the VBA editor of your project will allow you to use this custom function in the same way you would any of Excel's builtin functions. That is, entering =Map(C3) into any cell should give you the result you're after (where C3 is the cell containing your mapping number). The function works by returning the data in [row n (defined in your mapping column) +1 (to account for the header row); column 2 (the column containing your surname)]. The data in column "Name1" will always be the same as that in column "Name" (so it seems). So the function in your "Name1" column would simply be =A2
If this does not solve your problem, or you need further guidance, please let me know.
Supplement
#Amar, the comment by #freakfeuer is spot on. VBA is really overkill for something as simple as this and, as he points out, portability and security are both significant drawbacks. Offset is a fine alternative.

How to format SQL comparison in CSV file

I am running two SQL's on two different databases and comparing the results. I am writing the results to a csv file. Currently I am doing a 1 to 1 comparison of the results such that each element in a row of the result set is a row in the csv file.
table name | source column | source value | target value | difference type | target column
__________________________________________________________________________________________
Table A Column A A001 A001 SAME Column A
Table A Column B A002 B002 Different Column B
These are making the csv files far too long, and I wish to change this output to display each row of the result sets stacked on top of each other like this:
A001 A002 A003
A001 B002 A003
But I am not sure of a good way to indicate which columns would be different (I cannot color code in a csv file). Adding a column to the end which says which columns are different is an option, but I feel like there must be a better way.
I will also take suggestions on other possible ways to format these results.

Not sure what is you final goal.
But first you should include a row_id at begining of each row, also include what db that row is from
Then you may include one aditional character to indicate if they are Equal E- or No Equal N-
Also a final field to indicate if the rows as a whole are Equal or not
rowID DB FieldA FieldB FieldC Equal
1 A E-A001 N-A002 E-A003 NO
1 B E-A001 N-B002 E-A003 NO
And if you import that csv in Excel for example you can filter by column where start with N-

Split delimited field into Different rows

So each day I'm given an Excel worksheet with orders, they look something like this:
Date Vendor OrderID/Quantity Total
12/28/2013 Nike 1111111-8;2222222-12 20
12/29/2013 Adidas 3333333-5;4444444-10 15
12/30/2013 Wrangler 5555555-3 3
It's usable to most people I work with but not to me, as I want to identify each OrderID separate from the quantity. The "-" between the 7 digit number is to separate the ID from how many units are associated to it. But essentially when I import this table into access I want to create another table that splits these values.
Date Vendor OrderID Quantity
12/28/2013 Nike 1111111 8
12/28/2013 Nike 2222222 12
12/29/2013 Adidas 3333333 5
12/29/2013 Adidas 4444444 10
12/30/2013 Wrangler 5555555 3
This is much more useful to me but has been a daunting task to produce with two delimiters("-" and ";"). I am ok with VBA but I am struggling to find a solution to my conflict. So how would I go about doing this?

The most straightforward way I can think of is the VBA Split function. Please note that I set up the tblStaging staging table with all the fields as Text type for the import from Excel, but I set the tblOrders table up with (what I assume are) the correct types: Date as Date, Vendor as Text, OrderID as Number and Quantity as Number. See the comments in the code for details.
Public Sub SplitOrders()
Dim rsStaging As Recordset
Dim rsOrder As Recordset
Dim arrOrders() As String
Dim arrOrderDetails() As String
'Rename these for whatever your tables are called'
Set rsStaging = CurrentDb.OpenRecordset("tblStaging")
Set rsOrder = CurrentDb.OpenRecordset("tblOrders")
rsStaging.MoveFirst
While Not rsStaging.EOF
'Split into an array of Orders'
arrOrders = Split(rsStaging.Fields("OrderID/Quantity"), ";")
For i = 0 To UBound(arrOrders)
'Split the OrderID and Quantity for each Order'
arrOrderDetails = Split(arrOrders(i), "-")
'Create the new record in tblOrders'
With rsOrder
.AddNew
!Date = CDate(rsStaging!Date)
!Vendor = rsStaging!Vendor
!OrderID = CLng(arrOrderDetails(0)) 'If the OrderID can contain letters, symbols or leading zeros, omit the CLng( ... ) call'
!Quantity = CLng(arrOrderDetails(1))
.Update
End With
Next
rsStaging.MoveNext
Wend
End Sub

I'd look to break this down into different pieces rather than trying to parse the data all at once. For example, step 1 might be to import the file to a staging table (that looks like the first data example from your question). Step 2 would be to query the table to detect any rows that contain ';' (perhaps using the InStr function or a wildcard search like '*;*'. Take these records and parse them into two or more records. Third, identify any records that do not contain ';' and parse those into single records. All of the resulting (clean) records can go into your destination table for further analysis.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Excel [VBA] find duplicate data and delete the oldest one - vba

You may not need VBA. In my experience the Excel Remove Duplicates code works to remove the first encountered duplicates in a list. So sort your data by Name ascending and timestamp descending, then remove the duplicates from the Name field only. You should be left with the most recent names.

Related

'Tidy up' Oracle SQL report output

Get row number when using "MAXIFS" function

How to map two column using another column data

How to format SQL comparison in CSV file

Split delimited field into Different rows

Categories

Resources