Pattern sequence in Google Sheets - sequence

I have a pattern sequence (car, apple, chair). I want it to be inserted "n" times in other column, ¿how could I make it with a formula?
Colors don't matter, I just want to repeat the text patern (car, apple, chair, car, apple, chair...)

You can put this formula in your cell G2, it will automatically add both columns G and H:
={sequence(H1),TRANSPOSE(split(rept(join(";",A2:A)&";",H1/counta(A2:A)),";"))}
with formatting:
={
sequence(H1),
TRANSPOSE(
split(
rept(
join(";",A2:A)&";",
H1/counta(A2:A)
),
";")
)
}
If you just want the formula to the H column then here it is:
=TRANSPOSE(split(rept(join(";",A2:A)&";",H1/counta(A2:A)),";"))

Related

R code for matching multiple stings in two columns and returning into a third separated by a comma

I have two dataframes. The first df includes column b&c that has multiple stings seperated by a comma. the second has three columns, one that includes all stings in column B, two that includes all strings in c, and three is the resulting string I want to use.
x <- data.frame("uuid" = 1:2, "first" = c("jeff,fred,amy","tina,cat,dog"), "job" = c("bank teller,short cook, sky diver, no job, unknown job","bank clerk,short pet, ocean diver, hot job, rad job"))
x1 <- data.frame("meta" = c("ace", "king", "queen", "jack", 10, 9, 8,7,6,5,4,3), "first" = c("jeff","jeff","fred","amy","tina","cat","dog","fred","amy","tina","cat","dog"), "job" = c("bank teller","short cook", "sky diver", "no job", "unknown job","bank clerk","short pet", "ocean diver", "hot job", "rad job","bank teller","short cook"))
The result would be
result <- data.frame("uuid" = 1:2, "combined" = c("ace,king,queen,jack","5,9,8"))
Thank you in advance!
I tried to beat my head against the wall and it didn't help
Edit- This is the first half of the puzzle BUT it does not search for and then concat the strings together in a cell, only returns the first match found rather than all matches.
Is there a way to exactly match a string in one column with couple of strings in another column in R?

Can OpenRefine easily do One Hot Encoding?

I have a dataset like a multiple choice quiz result. One of the fields is semi-colon delimited. I would like to break these in to true/false columns.
Input
Student
Answers
Alice
B;C
Bob
A;B;D
Carol
A;D
Desired Output
Student
A
B
C
D
Alice
False
True
True
False
Bob
True
True
False
True
Carol
True
False
False
True
I've already tried "Split multi-valued cells" and "Split in to several columns", but these don't give me what I would like.
I'm aware that I could do a custom grel/python/jython along the lines of "if value in string: return true" for each value, but I was hoping there would be a more elegant solution.
Can anyone suggest a starting point?
GREL in OpenRefine has a somehow limited number of datastructures, but you can still build simple algorithms with it.
For your encoding you need two datastructures:
a list (technical array) of all available categories.
a list of the categories in the current cell.
With this you can check for each category, whether it is present in the current cell or not.
Assuming that the number of all available categories is somehow assessable,
I will use a hard coded list ["A", "B", "C", "D"].
The list of categories in the current cell we get via value.split(/\s*;\s*/).
Note that I am using an array instead of string matching
and use splitting with a regular expression considering whitespace.
This is mainly defensive programming and hopefully the algorithm will still be understandable.
So let's wrap this all together into a GREL expression and create a new column (or transform the current one):
with(
value.split(/\s*;\s*/),
cell_categories,
forEach(
["A", "B", "C", "D"],
category,
if(cell_categories.inArray(category), 1, 0)))
.join(";")
You can then split the new column into several columns using ; as separator.
The new column names you have to assign manually (sry ;).
Update: here is a more elaborate version to automatically extract the categories.
The idea is to create a single record for the whole dataset to be able to access all the entries in the column "Answers" and then extract all available categories from it.
Create a new column "Record" with content "Record".
Move the column "Record" to the beginning.
Blank down the column "Record".
Add a new column "Categories" based on the column "Answers" with the following GREL expression:
if(row.index>0, "",
row.record.cells["Answers"].value
.join(";")
.split(/\s*;\s*/)
.uniques()
.sort()
.join(";"))
Fill down the column "Categories".
Add a new column "Encoding" based on the column "Answers with the following GREL expression:
with(
value.split(/\s*;\s*/),
cell_categories,
forEach(
cells["Categories"].value.split(";"),
category,
if(cell_categories.inArray(category), 1, 0)))
.join(";")
Split the column "Encoding" on the character ;.
Delete the columns "Record" and "Categories".

using list as an argument in groupby() in pandas and none of the key elements match column or index names

So I have a random values of dataframe as below and a book I am studying uses a list was groupby key (key_list). How is the dataframe grouped in this case since none of list values match column or index names? So, the last two lines are confusing to me.
people = pd.DataFrame(np.random.randn(5,5), columns = ['a','b','c','d','e'], index=['Joe','Steve','Wes','Jim','Travis'])
key_list = ['one','one','one','two','two']
people.groupby(key_list).min()
people.groupby([len, key_list]).min()
Thank you in advance!
The user guide on groupby explains a lot and I suggest you have a look at it. I'll explain as much as I understand for your use case.
You can verify the groups created using the group method:
people.groupby(key_list).groups
{'one': Index(['Joe', 'Steve', 'Wes'], dtype='object'),
'two': Index(['Jim', 'Travis'], dtype='object')}
You have your dictionary with the keys 'one' and two' being the groups from the key_list list. As such when you ask for the 'min', it looks at each group and picks out the minimum, indexed from the first column. Let's inspect group 'one' using the get_group method:
people.groupby(key_list).get_group('one')
a b c d e
Joe -0.702122 0.277164 1.017261 -1.664974 -1.852730
Steve -0.866450 -0.373737 1.964857 -1.123291 1.251595
Wes -0.043835 -0.011108 0.214802 0.065022 -1.335713
You can see that Steve has the lowest value from column 'a'. when you run the next line it should give you that:
people.groupby(key_list).get_group('one').min()
a -0.866450
b -0.373737
c 0.214802
d -1.664974
e -1.852730
dtype: float64
The same concept applies when you run it on the second group 'two'. As such, when you run the first part of your groupby code:
people.groupby(key_list).min()
You get the minimum row indexed at 'a' for each group:
a b c d e
one -0.866450 -0.373737 0.214802 -1.664974 -1.852730
two -1.074355 -0.098190 -0.595726 -2.194481 0.232505
The second part of your code, which involves the len applies the same grouping concept. In this case, it groups the dataframe according to the length of the strings in its index: (Jim, Joe, Wes) - 3 letters, (Steve) - 5 letters, (Travis) - 6 letters, and then groups with the key_list to give the final output:
a b c d e
3 one -0.702122 -0.011108 0.214802 -1.664974 -1.852730
two -0.928987 -0.098190 3.025985 0.702471 0.232505
5 one -0.866450 -0.373737 1.964857 -1.123291 1.251595
6 two -1.074355 1.110879 -0.595726 -2.194481 0.394216
Note that for 3 it spills out 'one' and 'two' because 'Joe' and 'Wes' are in group 'one' but the lowest is 'Joe', while 'Jim' is the only three letter word in group 'two'. The same concept goes for 5 letter and 6 letter words.

How can I compare two sets of data having two columns in excel? Picture below will elaborate

Below are two sets of data. Each has two columns. I want that that the similar data comes in front of each other.
This is a manual solution with formulas and sorting.
Imagine the following data in columns A to E:
Enter the following formulas into columns G to K
Column G: =IFERROR(IF(VLOOKUP(D:D,A:B,2,FALSE)=E:E,1,2),3)
Column H: =IF(G:G<3,D:D,"")
Column I: =IFERROR(VLOOKUP(H:H,A:B,2,FALSE),"")
Column J: =D:D
Column K: =IFERROR(VLOOKUP(J:J,D:E,2,FALSE),"")
The column G sort by now shows:
1 if part and quantity matched
2 if only part matched
3 if nothing matched
So if you now select data from A3:K10 and sort by column G (sort by) then it will result in this:

Convert cell text in progressive number

I have written this SQL in PostgreSQL environment:
SELECT
ST_X("Position4326") AS lon,
ST_Y("Position4326") AS lat,
"Values"[4] AS ppe,
"Values"[5] AS speed,
"Date" AS "timestamp",
"SourceId" AS smartphone,
"Track" as session
FROM
"SingleData"
WHERE
"OsmLineId" = 44792088
AND
array_length("Values", 1) > 4
AND
"Values"[5] > 0
ORDER BY smartphone, session;
Now I have imported the result in Matlab and I have six vectors and one cell (because the text from the UUIDs was converted in cell) all of 5710x1 size.
Now I would like convert the text in the cell, in a progressive number, like 1, 2, 3... for each different session code.
In Excel it is easy with FIND.VERT(obj, matrix, col), but I do not know how do it in Matlab.
Now I have a big cell with a lot of codes like:
ff95465f-0593-43cb-b400-7d32942023e1
I would like convert this cell in an array of numbers where at the first occurrence of
ff95465f-0593-43cb-b400-7d32942023e1 -> 1
and so on. And you put 2 when a different code appear, and so on.
OK, I have solve.
I put the single session code in a second cell C.
At this point, with a for loop, I obtain:
%% Converting of the UUIDs into integer
C = unique(session);
N = length(session);
session2 = zeros(N, 1);
for i = 1:N
session2(i) = find(strcmp(C, session(i)));
end
Thanks to all!