Get row number when using "MAXIFS" function - excel-2016

I am using MAXIFS (or similar) to identify the wanted line in a table. but i do not need the max value, i need data from an adjecent column. Example:
=MAXIFS(TableComments1[CommentDate];TableComments1[T.Number];TableView1[#Number])
Basically, in this example i am searching for lines, matching "Number", with the latest date. But in a next step i require to get the row number of the date to enable the use of INDEX and return the appropriate column (TableComments1[Comment]).
I tried different approaches - no success.
PS: performance is also important here.
UPDATE, example lookup table:"TableComments1"
T.Number | Comment | CommentDate
==============+==============+===========
SCTASK0073347 | correction | 22/07/2018
SCTASK0073347 | update 11 | 25/07/2018
SCTASK0073347 | update 2 | 21/07/2018
PS: sorting "CommentDate" is not an option here.

After days of dabbling and finally posting the above question i found a solution myself. Not sure it is the best but performance seems okay.
Be aware: a more simple solution is possible, by sorting the table "CommentDate". This could not be guaranteed and was not desired in this use-case based on the question input.
recap: We want in table TableView1 to add the most recent comment for column "Number" with lookup from TableComments1 containing the comment history:
I got the idea from another post to use a helper column for combination of 2 criteria. New table layout:
T.Number | Comment | CommentDate | Helper1
==============+==============+=============+===================
SCTASK0073347 | correction | 22/07/2018 | 43303SCTASK0073347
SCTASK0073347 | find this! | 25/07/2018 | 43306SCTASK0073347
SCTASK0073347 | update 2 | 21/07/2018 | 43302SCTASK0073347
TASK9999 | comment | 25/07/2018 | 43306TASK9999
Formula breakdown
The formula for the Helper column just does CONCATENATE 2 columns:
=[#CommentDate]&[#[T.Number]]
Lets say we want: SCTASK0073347
Note: in the helper column we have value "43306SCTASK0073347";
where "43306" is the numerical representation of date "25/07/2018".
This will search for a match of "Number" and return the most recent "CommentDate":
=MAXIFS(TableComments1[CommentDate];TableComments1[T.Number];TableView1[#Number])
Returning "25/07/2018". Lets abbreviate the above to <<MostRecentDate>> for readability in next step(s).
This step, will search for a combination of above formula <<MostRecentDate>> & "Number" in the Helper column:
=MATCH(<<MostRecentDate>>&TableView1[#Number];TableComments1[Helper1];0)
..returning row number (2) matching helper table value "43306SCTASK0073347".
From this point forward we use MATCH (now returning the wanted row) and INDEX in a style VLOOKUP would do:
=INDEX(TableComments1[Comment];MATCH(<<MostRecentDate>>&TableView1[#Number];TableComments1[Helper1];0))
...returning the wanted column with desired comment "find this!".
Full/final formula, includes IFNA function to clear blank lookups with no comments:
=IFNA(INDEX(TableComments1[Comment];MATCH(MAXIFS(TableComments1[CommentDate];TableComments1[T.Number];TableView1[#Number])&TableView1[#Number];TableComments1[Helper1];0));"")

Related

mySQL query on matching all values in stored comma separated values in column

I have been searching for quite some time and foound multiple solutions for doing what I need backwards but not going the direction I need.
I have a column of values, these are requirements that must be met to use the record.
|id | name | required |
|1 | yes | 2,3,23 |
|2 | no | 2,3,7 |
Before the query runs I have an array of all the exsisting required values, this of course can be placed into a comma del string as well.
Example is I have an array (1,2,3,18,23), and I need to pull the correct record id #1 because my array meets all requirements.
I am always amazed at some queries to do things, but I am not finding a solution to this one.
thanks
-Chris

Transpose variable number of rows into columns in OpenRefine

I have an xml file containing records from a library catalogue. I have imported it into OpenRefine but all the values are in one column. I want to transpose it so each field in the record has its own column. However, this is complicated by the fact that a) each field is optional so does not exist in all records and b) many fields are repeatable so can appear multiple times in each record. Here's a simplified example of what the data looks like:
| RecordID | Tag | Data |
| 1 | 040a | CaABCD |
| 1 | 245a | Go fish |
| 1 | 245a | A guide to fish |
| 1 | 246i | Fish series |
| 1 | 260a | Fishing friends |
| 2 | 040a | CaABDC |
| 2 | 245a | Happy trails |
| 2 | 246i | Hiking series |
| 2 | 260i | The happy hiker |
| 2 | 500a | Notes |
I have read the Q&A here Openrefine - Transpose rows into columns based on text but the problem with this solution is that if I concatenate all the values together I have no way to be sure what field they belong in anymore, as my data is much more complicated than the data in that question (my actual data has 25+ fields and many thousands of records).
I was able to get closer using Google Sheets and making a pivot table with a calculated field (as in PivotTable to show values, not sum of values - see the answer at the very bottom). However, I still don't know how to handle the repeating fields. In the pivot table the multiple values are there but only the first displays (double-clicking on an individual cell brings up a details table which lists all the values), so when I copy-paste the table I lose the additional values. I would like to concatenate them but I cannot see a way to do so within the pivot table.
Can you think of any other way I could do this, in OpenRefine or another tool? Thanks!
The classic way to fix this in OpenRefine is to use "Transpose -> Columnize by key value". But this feature is poorly documented and can cause headaches even for OpenRefine developers. In your case, repeated fields will be problematic, so here is a possible solution.
1° Go to the "tag" column, click on "Transpose -> Columnize by key value" and use the following configuration (don't forget the "Note column (optional)")
The result will look like this (my dataset is not exactly the same as yours, I modified a value to do some test)
2° In the new column "Record ID: 040 a", click on "edit column -> Move Column To Beginning".
3° If you want to merge the repeated fields, go to each column that contains them and click on "Edit Cells -> Join Multi Value cells" by choosing a separator, for example "|".
The end result will look like this.
To get rid of unnecessary columns: Click on Export -> Custom tabular export and deselect the columns whose name starts with RecordId.
OpenRefine also has a native MARC importer which might be something worth trying if you need to work with MARC data in the future. MARCEdit also has some specific OpenRefine support built in.

SELECT MAX values for duplicate values in another column

I am having some trouble finding an answer for this one, so I apologize if it was somewhere else.
I have a table 'dbo.MileageImport' that has the following layout which I pulled to find duplicate entries:
|KEY | DATA |
---------------------
|V9864653 | 180288 |
|V9864653 | 22189 |
|V9864811 | 11464 |
|V9864811 | 12688 |
What I am having troubles with is when I run the following SQL in a DB2 environment:
SELECT KEY, MIN(DATA)
FROM dbo.MileageImport
GROUP BY KEY
HAVING (COUNT(KEY)>1);
It ends up pulling the following data:
|KEY | DATA |
---------------------
|V9864811 | 11464 |
|V9864653 | 180288 |
For some reason it's pulling the MIN value for V9864811, but not V9864653. If I inverse that and put MAX instead of MIN, it pulls the opposite values.
Is there something I am missing here so I can pull the MIN DATA value for only duplicate KEY records, or is there another way to do this? The report where this data comes from changes from month to month, so there could be different keys that end up being duplicated that I need to correct. Ultimately I am turning this into a DELETE statement to delete the lower of the two (or more) duplicated mileage entries.
Is your DATA column numerical? or a VARCHAR?
If you find its better to change it to a number if you can, maybe an integer if you aren't having any fractions and its just round numbers.
if not, then you could cast them to an integer value, but if there are lots of transactions or its a big table it will be slow and not ideal. Its bad practise to do that if you could just change the datatype!
SELECT KEY, MIN(CAST(DATA as Int))
FROM dbo.MileageImport
GROUP BY KEY
HAVING (COUNT(KEY)>1)

MariaDB - embed function to automatically sum columns and store result?

it is possible to store a function IN the table to automatically sum a group of columns and store the result in a final column?
ie:
+----+------------+-----------+-------------+------------+
| id | appleCount | pearCount | bananaCount | totalFruit |
+----+------------+-----------+-------------+------------+
| 1 | 300 | 60 | 120 | 480 |
+----+------------+-----------+-------------+------------+
where the column totalFruit is automatically calculated from the previous three columns and updated as the other columns update. in this specific application, there is ONLY going to be the one row. it would be spanky-handy to be able to just push the updated counts and then pull the calculated total out. i seem to recall reading about this ability somewhere, but for the life of me, i can't recall where... :poop:
if there is not way to do this, that's cool. but if there is... :smile:
TIA!
WR!
Yes, it is possible. But is it worth it? It is simple enough to do
SELECT ...
appleCount + pearCount + bananaCount AS totalFruit
...
See MariaDB Generated Columns for how to generate the extra column -- either as a real extra column or "virtual". What version of MariaDB?--There are a number of changes over time.
(MySQL users: 5.7.6 has a similar MySQL Generated Columns.)

sqlite variable and unknown number of entries in column

I am sure this question has been asked before, but I'm so new to SQL, I can't even combine the correct search terms to find an answer! So, apologies if this is a repetition.
The db I'm creating has to be created at run-time, then the data is entered after creation. Some fields will have a varying number of entries, but the number is unknown at creation time.
I'm struggling to come up with a db design to handle this variation.
As an (anonymised) example, please see below:
| salad_name | salad_type | salad_ingredients | salad_cost |
| apple | fruity | apple | cheap |
| unlikely | meaty | sausages, chorizo | expensive |
| normal | standard | leaves, cucumber, tomatoes | mid |
As you can see, the contents of "salad_ingredients" varies.
My thoughts were:
just enter a single, comma-separated string and separate at run-time. Seems hacky, and couldn't search by salad_ingredients!
have another table, for each salad, such as "apple_ingredients", which could have a varying number of rows for each ingredient. However, I can't do this, because I don't know the salad_name at creation time! :(
Have a separate salad_ingredients table, where each row is a salad_name, and there is an arbitrary number of ingredients fields, say 10, so you could have up to 10 ingredients. Again, seems slightly hacky, as I don't like to unused fields, and what happens if a super-complicated salad comes along?
Is there a solution that I've missed?
Thanks,
Dan
based on my experience the best solution is based on a normalized set of tables
table salads
id
salad_name
salad_type
salad_cost
.
table ingredients
id
name
and
table salad_ingredients
id
id_salad
id_ingredients
where id_salad is the corresponding if from salads
and id_ingredients is the corresponding if from ingredients
using proper join you can get (select) and filter (where) all the values you need