Return multiple rows before and after the match row based on time span in Excel/VBA - vba

I have the following kind of data:
+---------------+-------------------------+----
| time | item | line index number
+---------------+-------------------------+----
| 05:00:00 | | 1
| 05:00:01 | MatchingValue | 2
| 05:15:00 | | 3
| 06:00:00 | B | 4
| 06:01:00 | | 5
| 06:45:00 | | 6
| 07:00:00 | MatchingValue | 7
| 07:15:00 | | 8
| 08:00:00 | | 9
| 09:00:00 | | 10
+---------------+-------------------------+
What I am trying to do is to extract multiple rows before and after the matching row with item == "MatchingValue", together with the matching row itself . Those returned multiple rows are within 15 minutes of the time where item == "MatchingValue"
For example, if I was searching "MatchingValue" in the 2nd column, I would like to get the results of rows 1, 2, 3 and 6, 7, 8.
I know that one can get the return of rows 2, 7 at the same time by using array formula (e.g. Index and Match). but I really don't know how to use array formula for my own question.
I appreciate any assistance.

Easiest way is to add a helper column and filter your data in place or just use a pivot table to get only the data you need.
Formula in your helper column: =or(b2="MatchingValue",countifs(b:b,MatchingValue,A:A,">=" & A2-1/(24*4),A:A,"<=" & A2+1/(24*4))>0)
Of course you can also write array formula to collect your data in a new range but considering your already complex criteria and variable number of results that would be really a complex formula.

Related

Pandas apply to a range of columns

Given the following dataframe, I would like to add a fifth column that contains a list of column headers when a certain condition is met on a row, but only for a range of dynamically selected columns (ie subset of the dataframe)
| North | South | East | West |
|-------|-------|------|------|
| 8 | 1 | 8 | 6 |
| 4 | 4 | 8 | 4 |
| 1 | 1 | 1 | 2 |
| 7 | 3 | 7 | 8 |
For instance, given that the inner two columns ('South', 'East') are selected and that column headers are to be returned when the row contains the value of one (1), the expected output would look like this:
Headers
|---------------|
| [South] |
| |
| [South, East] |
| |
The following one liner manages to return column headers for the entire dataframe.
df['Headers'] = df.apply(lambda x: df.columns[x==1].tolist(),axis=1)
I tried adding the dynamic column range condition by using iloc but to no avail. What am I missing?
For reference, these are my two failed attempts (N1 and N2 being column range variables here)
df['Headers'] = df.iloc[N1:N2].apply(lambda x: df.columns[x==1].tolist(),axis=1)
df['Headers'] = df.apply(lambda x: df.iloc[N1:N2].columns[x==1].tolist(),axis=1)
This works:
df=pd.DataFrame({'North':[8,4,1,7],'South':[1,4,1,3],'East':[8,8,1,7],\
'West':[6,4,2,8]})
df1=df.melt(ignore_index=False)
condition1=df1['variable']=='South'
condition2=df1['variable']=='East'
condition3=df1['value']==1
df1=df1.loc[(condition1|condition2)&condition3]
df1=df1.groupby(df1.index)['variable'].apply(list)
df=df.join(df1)

Comparing every row in table with the master row

I have a Redshift table with single VARCHAR column named "Test" and several float columns. The "Test" column has unique values, one of them is "Control", others are not hardcoded.
Tables has ~10 rows (not static) and ~10 columns.
I need to generate the Looker report which will show the original data and the difference between the corresponding float columns in "Control" and other Tests.
Input Example:
Test | Metric_1 | Metric_2
----------------------------
Control| 10 | 100
A | 12 | 120
B | 8 | 80
The desirable report:
| Control | A | A-Control | B | B-Control
|---------|----|-----------|---|-----------
Metric_1 | 10 | 12 | 2 | 8 | -2
Metric_2 | 100 | 120| 20 | 80| -20
To calculate the difference for the each row with "Control"
I tried:
SELECT T.test,
T.metric_1 - Control.metric_1 AS DIFF1,
T.metric_2 - Control.metric_2 AS DIFF2,
...
FROM T, (SELECT * FROM T WHERE test='Control') AS Control
I can do part of work in Looker (it can transpose),
part in SQL, but still cannot figure out how to build this report.
You could transpose the test dimension, being able to build part of it:
| Control | A | B |
|---------|----|---|
Metric_1 | 10 | 12 | 8 |
Metric_2 | 100 | 120| 80|
Then operate on top of this results using table calculations.
You can use the functions pivot_where() or pivot_index().
For example, pivot_where(test = 'A', metric) - pivot_where(test = 'Control', metric)

Creating a view that joins multiple tables on an ID and a timestamp that needs to be rounded

I have a web application that sends data to my sqlite database into different tables depending on the information. I would like to make a view that merges multiple tables together based on cownumber and TS[timestamp] (There are no updates to my table, so a change to the same cownumber send the full record as a new entry with a new timestamp). The ajax calls are made table by table so the TS do not exactly sync up generally they can be 5-20 seconds off depending on the connection
Here is a sample of the three tables
+----master_animal-----+
+----------------------------------------------------+
| cownumber | height | weight | ts |
+-----------+----------+--------+--------------------+
| 1 | 150 | ... | 2017-12-01 12:28:00|
| 2 | 170 | ... | 2017-12-03 17:16:00|
| 3 | 60 | ... | 2017-12-03 08:09:00|
| 4 | 109 | ... | 2017-12-04 23:23:00|
+----animal_inventory-----+
+-------------------------------------------------------------+
| cownumber | brandlocation| dateacquired| ts |
+-----------+--------------+-------------+--------------------+
| 1 | ... | ... | 2017-12-01 12:28:50|
| 2 | ... | ... | 2017-12-03 17:16:30|
| 3 | ... | ... | 2017-12-03 08:09:12|
| 4 | ... | ... | 2017-12-04 23:23:23|
+----experiment-----+
+-------------------------------------------------------------+
| cownumber | ageatwean | birthweight | ts |
+-----------+--------------+-------------+--------------------+
| 1 | ... | ... | 2017-12-01 12:28:20|
| 2 | ... | ... | 2017-12-03 17:16:41|
| 3 | ... | ... | 2017-12-03 08:09:24|
| 4 | ... | ... | 2017-12-04 23:23:11|
The View I wrote
CREATE VIEW testing
AS SELECT a.height,a.weight,a.cownumber,
b.brandlocation,b.dateacquired,
c.ageatwean,c.birthweight
FROM master_animal a, animal_inventory b, experiment c
WHERE a.cownumber=b.cownumber
AND ROUND(a.ts/10000) = ROUND(b.ts/10000)
AND a.cownumber=c.cownumber
AND ROUND(a.ts/10000) = ROUND(c.ts/10000);
The query I wrote
Select * from testing where cownumber = 1;
What I was hoping to get back was
+----testing-----+
+----------------------------------------------------+
| cownumber | height | weight | brandlocation| dateacquired | ageatwean |birthweight |
+-----------+--------+--------+--------------+--------------+-----------+------------+
| 941 | 0 | ... | ... | ... | ... | .. |
Where there will be one row for cownumber 941 as long as all the correlated records were within a few seconds of each other. I am not exactly sure if I need to divide by 10000 or smaller. The same record should be no more than 50 seconds apart from each other. Anything more than 50 seconds apart should be considered a new record.
When I test this where there is only one record for that cownumber it works fine. But lets say I change some information from each table. I provide a new height, a new brandlocation.
Instead of getting two rows. The first row being the initial data entry and the second row showing the same cownumber with the changed values, I get back 8 rows with partial changes.
height|weight|cownumber|brandlocation|dateacquired|ageatwean|birthweight|
0.0|0.0|941|0|0|0.0|0
0.0|0.0|941|0|0|0.0|0
0.0|0.0|941|Left Hip|0|0.0|0
0.0|0.0|941|Left Hip|0|0.0|0
50.0|0.0|941|0|0|0.0|0
50.0|0.0|941|0|0|0.0|0
50.0|0.0|941|Left Hip|0|0.0|0
50.0|0.0|941|Left Hip|0|0.0|0
I assume the issue is in my where clause but I am not sure exactly how to fix it
The timestamps are stored as strings. When you try to divide it, the database tries to convert it to a number, which results in 2017. So all timestamps end up being the same.
Dividing cannot determine the distance; the values 9999 and 10000 will end up different although they are right near each other. (And an integer division results in an integer result, so the ROUND() has no effect.)
To compute the distance, convert the timestamp into a number of seconds first, and then use abs():
SELECT ...
FROM master_animal m
JOIN animal_inventory i ON m.cownumber = i.cownumber
AND abs(strftime('%s', m.ts) - strftime('%s', i.ts)) <= 50
JOIN experiment e ON m.cownumber = e.cownumber
AND abs(strftime('%s', m.ts) - strftime('%s', e.ts)) <= 50;

Combine column x to n in OpenRefine

I have a table with an unknown number of columns, and I need to combine all columns after a certain point. Consider the following:
| A | B | C | D | E |
|----|----|---|---|---|
| 24 | 25 | 7 | | |
| 12 | 3 | 4 | | |
| 5 | 5 | 5 | 5 | |
Columns A-C are known, and the information in them correct. But column D to N (an unknown number of columns starting with D) needs to be combined as they are all parts of the same string. How can I combine an unknown number of columns in OpenRefine?
As some columns may have empty cells (the string may be of various lengths) I also need to disregard empty cells.
There is a two step approach to this that should work for you.
From the first column you want to merge (Col D in this case) choose Transpose->Transpose cells across columns into rows
You will be asked to set some options. You'll want to choose 'From Column' D and 'To Column' N. Then choose to transpose into One Column, assign a name to that column, make sure the option to 'Ignore Blank Cells' is checked (should be checked by default. Then click Transpose.
You'll get the values that were previously in cols D-N appearing in rows. e.g.
| A | B | C | D | E | F |
|----|----|---|---|---|---|
| 1 | 2 | 3 | 4 | 5 | 6 |
Transposes to:
| A | B | C | new |
|----|----|---|-----|
| 1 | 2 | 3 | 4 |
| | | | 5 |
| | | | 6 |
You can then use the dropdown menu from the head of the 'new' column to choose
Edit cells->Join multi-value cells
You'll be asked what character you want to use to separate the characters in the joined cell. Probably in your use case you can delete the joining character and combine the cells without any joining characters. This will give you:
| A | B | C | new |
|----|----|---|-----|
| 1 | 2 | 3 | 456 |

Calculate the difference between two non-adjacent columns, based on a "match column" using Excel VBA

I'm looking for the most efficient way to compare two sets of two columns, thus:
Set 1:
A | B | C |
11_22 | 10 | |
33_44 | 20 | |
55_66 | 30 | |
77_88 | 40 | |
99_00 | 50 | |
Set 2:
J | K |
33_44 | 19 |
99_00 | 47 |
77_88 | 40 |
For each match between column A and J, column C should display the difference between the adjacent cells
(in this case 33_44, 99_00, and 77_88) in B and K, respectively, with the full
amount in column B if no match exists in J
A | B | C
11_22 | 10 | 10
33_44 | 20 | 1
55_66 | 30 | 30
77_88 | 40 | 0
99_00 | 50 | 3
I'm thinking of creating two multi-dimensional arrays containing values
in the ranges (A, B) and (J, K), with a nested loop, but am not sure how to
get the result back into column C when a match occurs. Creating a third "result array" and outputting that on a fresh sheet would work too.
It is possible to do a lot with ADO, for example: Excel VBA to match and line up rows