Include missing dates with missing values with libreoffice-calc

Include missing dates with missing values with libreoffice-calc - missing-data

I searched a lot, but didn't find an answer to the following question:
Financial data often come as daily data but with missing dates (weekends, banking holidays ...). I would like to have those data really on a daily basis with missing values, where originally the dates were missing.
So far I did this in liberoffice-calc half-manually, which takes a lot of time. I didn't find ways to really automate this, as there is no fixed rule, which dates are missing.
Example:
I have:
21/12/18 1
27/12/18 2
28/12/18 3
02/01/19 4
I want:
21/12/18 1
22/12/18
23/12/18
24/12/18
25/12/18
26/12/18
27/12/18 2
28/12/18 3
29/12/18
30/12/18
31/12/18
01/01/19
02/01/19 4

I'm not familiar with liberoffice-calc. In Excel or Google Sheets, I would use a lookup table.
In one tab of the spreadsheet, I was enter the dates I want in column A. In another tab, I would place the actual data I have. Then, in column B of the first tab, I would lookup the value for that day from the data on the second tab.

Assuming 1 is in B2, put a start date in say D2 and in E2:
=IFERROR(INDEX(B:B,MATCH(D2,A:A,0)),"")
then copy both down to suit.

Related

calculate value based on other column values with some step for rows of other columns

total beginner here. If my question is irrelevant, apologies in advance, I'll remove it. So, I have a question : using pandas, I want to calculate an evolution ratio for a week data compared with the previous rolling 4 weeks mean data.
df['rolling_mean_fourweeks'] = df.rolling(4).mean().round(decimals=1)
from here I wanna create a new column for the evolution ratio based on the week data compared with the row of the rolling mean at the previous week.
what is the best way to go here? (I don't have big data) I have tried unsuccessfully with .shift() but am very foreign to .shift()... I should get NAN for week 3 (fourth week) and ~47% for fifth week.
Any suggestion for retrieving the value at row with step -1?
Thanks and have a good day!

Your idea about using shift can perfectly work. The shift(x) function simply shifts a series (a full column in your case) of x steps.
A simple way to check if the rolling_mean_fourweeks is a good predictor can be to shift Column1 and then check how it differs from rolling_mean_fourweeks:
df['column1_shifted'] = df['Column1'].shift(-1)
df['rolling_accuracy'] = ((df['column1_shifted']-df['rolling_mean_fourweeks'])
/df['rolling_mean_fourweeks'])
resulting in:

Levenshtein for multiple words on multiple columns

I'm trying to make search a bit more friendly and wanted to exploit the Levenshtein distance. This works great but if a value in a column has a length of 25 characters long, the distance to only 3 characters is too far. In this case, it performs worse than the LIKE method. I solved this by splitting all words into their own rows using regexp_split_to_table. This is nice, but it's still not working if I have multiple words as input.
For example:
Let the data look as following
id
col1
col2
1
one two
three
2
two
one
3
horse
tree
4
house
three
using regexp_split_to_table would transform this to
id
col
1
one
1
two
1
three
2
one
2
two
2
two
3
horse
3
tree
4
house
4
three
If I search for one tree, I'd like to compare one with each word but also compare tree with each word and then order by the sum of both distances.
I have no idea where to start. I also do not know if this is the best approach to do this (it seems somewhat excessive but I'm also not an expert). Maybe I'm also overthinking this. I'd appreciate a hint into the right direction :).

How do I tally how many times a word appears on a certain row?

I have four sets of data representing a softball schedule. Looks like this:
Day Team 1 Team 2
M A Team B Team
T C Team D Team
....
but four times over. I want to be able to change the schedule and have it automatically tally how many times a team plays on a given day. Ideas?

You would us something like this:
=COUNTIF(2:2,"A Team")
Edit:
You can use a SUMPRODUCT() Function with math operands * and +:
=SUMPRODUCT(($A$2:$A$43=H$1)*(($B$2:$B$43=$G2)+($C$2:$C$43=$G2)))
So how it works:
Since TRUE/FALSE is a Boolean and it can be reduced to 1/0 respectively. Using the * and + operands is like AND and OR respectively.
The SUMPRODUCT iterates through the range and test each criterion inside the () So it first test whether the cell in column A is equal to H1, if so it returns a 1, or a 0 if not. the next part sets up the OR if in the same row the team name is found it also returns a 1. 1 * 1 = 1. SUMPRODUCT keeps track of all the 1 and 0 and adds them together, so you get the count.
If there are other columns that have the team names just add those columns with the + area.

Ok, so let's start with making your table a real table via "Start > format as table" and call your table "data". Then you have three columns called data[Day], data[Team 1] and data[Team 2]. For instance this:
Day Team 1 Team 2
Monday A Team B team
Tuesday C Team D Team
Wednesday C Team A Team
Monday B Team C Team
Now comes the ugly part. You need a matrix of 7*10 (days * teams)
(Cell E1) Team 1 Team 2 Team 3 Team 4 ...
Monday *1
Tuesday
Wednesday
...
Formula *1
=SUMPRODUCT((data[Day]=$E2)*((data[Team 1]=F$1)+(data[Team 2]=F$1)))
Now drag down that formula till Sunday and then copy it to the other teams (when I tried dragging it to the other teams, Excel messed up the column names!).
This will automatically fill the matrix and tell you which team plays how often on a specific day.
What does it do? Basically SUMPRODUCT can not only build products, but it can also evaluate boolean conditions. So if on Monday, Team A plays, then the first column would return (for Team A / Monday):
1*(1+0)
SUMPRODUCT does that for each line in the matrix and then sums up the result.

Quick Delta Between Two Rows/Columns in GoodData

Right now, I see there are quick ways to get things like Sum/Avg/Max/Etc. for two or more rows or columns when building a table in GoodData.
quick total options
I am building a little table that shows last week and the week prior, and I'm trying to show the delta between them.
So if the first column is 100 and the second is 50, I want '-50'
If the first column is 25 and the second is 100, i want '75'
Is there an easy way to do this?

Let’s consider, that the first column contains result of calculating of metric #1 and the second column contains result of calculating of metric #2, you can simply create a metric #3, which would be defined as the (metric #1 - metric #2) or vice versa.

Parse data from Morningstar Direct to worksheet

I have to put together a report every quarter using data pulled off of Morningstar Direct. I have to automate the whole process, or at least parts of it. We have put this report together for the last two quarters, and we use the same format each time. So, we already have the general templates for the report - now I'm just looking for a way to pull the data from Morningstar and putting into the templates correctly.
Does anyone have any general idea where I should start?
A B C D E F
Group Name Weight Gross Net Contribution
Equity 25% 10% 8% .25
IBM 5% 15% 12%
AAPL 7% 23% 18%
Fixed Income 25% 5% 4% .17
10 Yr Bond 10% 7% 5%
Emerging Mrkts
And it goes on breaking things into more groups, and there are many more holdings within each group.
What I want it to do is search until it finds "Equity", for example, and then go over one row, grab the name of the position, its weight, and its net return, and do that for each holding in Equity. The for it to do the same thing in Fixed Income, and on and on - selecting the names, weights, and nets for each holding. Then copy and pasting them into another workbook.
Anyway that is possible?

It sounds like you need to parse your information. By using left(), right(), and mid() you can select the good data and ignore the superfluous. You could separate the data in one cell into multiple cells in the desired format.
A B
Name Address
John Q. Public 123 My Street, City, State, Zip
E (First Name) F (Middle Initial) (extra work to program missing data)
=LEFT(A2,FIND(" ",A2)) =MID(A2,LEN(E2)+1,FIND(" ",MID(A2,LEN(E2)-1,99)))
G (Last Name) H (City)
=MID(A2,(LEN(E2)+LEN(F2)+2),99) =MID(B2,LEN(H2)+2,FIND(",",MID(B2,LEN(H2)+2,99))-1)
I (State)
=MID(B2,(LEN(I2)+LEN(H2)+4),FIND(",",MID(B2,(LEN(I2)+LEN(H2)+4),99))-1)
J (Zip Code)
=MID(B2,(LEN(H2)+LEN(I2)+LEN(J2)+6),99)
This code will parse the name in the cell A2 and address in cell B2 into separate fields.
Similar cuts should allow you to get rid of the unwanted data.
==================================================================
7/8/2015
Your data seems to be your desired output. If so, please provide sanitized input data for comparison. You probably need to loop through your input to find the groups. When the group changes, prepare the summary figures.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Include missing dates with missing values with libreoffice-calc - missing-data

Assuming 1 is in B2, put a start date in say D2 and in E2: =IFERROR(INDEX(B:B,MATCH(D2,A:A,0)),"") then copy both down to suit.

Related

calculate value based on other column values with some step for rows of other columns

Levenshtein for multiple words on multiple columns

How do I tally how many times a word appears on a certain row?

Quick Delta Between Two Rows/Columns in GoodData

Parse data from Morningstar Direct to worksheet

Categories

Resources