Add something in a number for a column in data frame by using for loop - dataframe

How to add "dash" after certain integers in a column by using for loop?
For example, number: 0550, after added dash:0-550
Here is my function for it. But once I applied, KeyError: "age" to be returned. I am stuck by using for-loop to update columns' value for a while. Thank you

Related

How to add a column that substract the difference between two existing columns ? GREL in OpenRefine

So I'm trying to find a simple way to create a new column that displays the difference between two existing columns (each with numbers)... I can't seem to find the proper GREL expression....
So I'm trying to find the amount of items sold with a column named "stock_before" and the other named "stock_after".
I click on edit column from the column "stock_before" and then add column based on this column.
For the GREL I have already entered is:
value-cells["Stock_after"]
It returns no syntax error but still all of the cells for preview say "null"... I have transformed the value of the columns to numbers.
For Python I have tried:
substract(value,"Stock_after")
Same no syntax error but still everything null.
This seems so ridiculously simple but I couldn't find an answer... You can guess I'm fairly new to all this :) Hope someone out there can help me!
thanks for your having the patience to read this and thanks for your time if you answer!
I'd like something similar to this (3 columns):
Stock_before, Stock_after, dif
1,1,0
3,1,2
4,4,0
2,1,1
In GREL, the expression cells["Stock_after"] returns a Cell object representing the corresponding cell, not the actual value of that cell. To get the value, you need to use cells["Stock_after"].value.
So your final GREL expression should be value - cells["Stock_after"].value.
You should also make sure your values are stored as numerals, not strings: they should appear in green in the table. If they do not, use a "To number" operation on both columns first.
You can find out more about GREL and Cell objects here:
https://github.com/OpenRefine/OpenRefine/wiki/Variables

Postgres - split number and letter doesnt fill column

I have received help for splitting a column wit nr and letter.
In the SQL script it all works perfect. It runs complete, with no errors.
But the columns itself doesn't get filled.
I have tried to create te columns in advance as text or as integer. But it doesn't get filled. The SQL query it self turn out ok. But in reality it stay empty. What is wrong?
Your question is not completely clear, but it sounds like what you are trying to do is take a value from one column of a table, split it and use the result to update two other columns in the same table.
If that is the case, you would want to be using the SQL UPDATE command instead of SELECT.
UPDATE d1_plz_whatever
SET nr=SUBSTRING(hn FROM '^[0-9]+'),
zusatz =SUBSTRING(hn FROM '[a-zA-Z]+$');

OpenRefine - Fill between cells but not at the end of the list

I have a list of stock prices for several stocks. Some of the values are missing due to weekends, holidays and probably other reasons.
The gaps are not consistent. Some are two days and some are more than that.
I want to fill the gaps with the last known value but not at the end of the list.
I have tried in Excel to test a few cells below and if it's now empty, do the fill. The problem is that due to the inconsistency of the gaps, it's a tedious task to change the function for all the cases.
Is there a way to test for the end of a list?
UPDATE - added a screenshot.
See this screenshot. I want to fill where the blue dots are. The red dots are at the end of the list and I don't want to fill those cells.
I am looking for a way to detect the end of the list and stop the filling when the end is detected.
I think this is pretty difficult in OpenRefine and probably a different tool would work better. The main issue is that OpenRefine does not offer the ability to easily work across rows so 'summing a column' (or part of a column) is tricky - this is mentioned in https://github.com/OpenRefine/OpenRefine/issues/200
However, you can do this by forcing OpenRefine in Record mode with the whole project containing a single record. Once you've done this you can access all values in a column using syntax like:
row.record.cells["Column name"].value
This gives an array of all the non-blank values in the column. Since this ignores blank values, in order to have a true view of the values in the column you have to fill in blank cells with a value.
So I think you could probably achieve what you want as follows:
For each column you are going to work with do a cell transform to put a dummy value in empty cells - e.g. if(isBlank(value),"null",value)
Create a new column at the start of your project and put a single value in the very first cell in that column
Switch to Record mode
At this point you should have a single 'Record' in your project - e.g.
You can now access all cells in a column using syntax like row.record.cells["Column 1"].value. You can combine this with 'forRange' to iterate through the contents of this array, using the row.index as the marker for the current row.
I used the following formula to add a new column to the project:
with(row.record.cells["Column 1"].value,w,if(forRange(row.index,w.length(),1,i,w[i].toNumber()).sum()>0,"a","b"))
Then...
Change back to 'Row' mode
Remove the 'null' placeholder from the original column
Create a facet on the 'fill filter' column
In my case I filter to 'a'
Use the 'fill down' option
Remove the filter
And remove the 'record' column
Rather a long winded way of doing it to say the least, but so far I've not been able to find anything better while not going outside OpenRefine. I'm guessing you could probably compress steps 5-11 into a single step or smaller number of steps.
If you want to access the array of cell values using Jython as suggested by iMitwe you need to use:
row["record"]["cells"]["Column 1"]["value"]
instead of
row.record.cells["Column 1"].value
(step 5)
I am doing this on the top of my head, but I think your best chance my be using the fill down option in record mode:
first move your column to the first column and switch to record mode.
then use the following GREL: row.record.cells["data"].value[-1] where data is the name of your column
The [-1] will take the last value and fill the blank. For the case with the red dot, since there is no value it should remains empty. Let us know how it goes.
Unless there's something I am missing or not seeing...
I would have just sorted reverse (date ascending) on the Date column, then individually use Fill Down on each column, except for that last column where you could then use a Date facet on your column Date to specify the exact Date range you wanted to work with, then fill down on that last column, then remove the Date range facet.

Need a simple search function to display most common value in a column. (with ambiguous choices)

I have a very large array of data with many columns that display different outputs for the values presented. I would like to add a row above the data that will display the most common occurring value or word below.
Generally I would like to have each top of the column (right under the column label in row 1) have the most common value below. I will then use this value for various data analysis functions!
Is this possible, and if so, how? Preferably this will not require VBA, but simply a short code in the cell.
One caveat: The exact values may vary, so there is no set list where I can say "it will be one of these."
Any ideas appreciated!
Try a series of =COUNTIF(A:A,"VALUE TO SEARCH") functions if you want to stay away from VBA.
Otherwise, the best method would be to iterate through each column via VBA. With this method, you can even count the "varying" values and return the count and/or the value itself.
http://www.excel-easy.com/examples/most-frequently-occurring-word.html
This is a single formula you would write at the top of each column. Does not require VBA. You can replace the set range to an entire column, such as (A:A) instead of (A1:A7).
If you mean an array as in a data type, it could work differently but it depends what you're trying to do.
With data from A3 through A16, in A2 enter:
=INDEX($A$3:$A$16,MODE(MATCH($A$3:$A$16,$A$3:$A$16,0)))
This will work for text as well as numbers. Adjust this to match the column size.

Use columns.add(...) in Word with non-uniform column widths?

Problem I'm having is that table.Columns.add(ref Object BeforeColumn) requires a reference to another column in the table. However, when I try to access the last column in the table to pass as a reference using table.Columns.Add(table.Columns[table.Columns.Count])
I get the error:
"Cannot access individual columns in this collection because the table has mixed cell widths."
As my current work around, I catch the error, and call table.Columns.DistributeWidth() to make sure the columns are uniform and run the rest of the code. However, I lose the formatting of my cell widths this way, which is unfortunate.
Is there any way I can workaround this without losing the cell width?
(I realize one way is to store every cell's width before running this process, and then re-applying the widths afterward, but this seems like a very costly solution to something that should be simpler)
I've found one way to do it. Here's how I approached it.
*Caution, I'm assuming that the table is uniform. i.e. The number of columns is the same across all the rows. (Note, the API has a Table.uniform function, but the description is not complete. In the API it says "True if all the rows in a table have the same number of columns." However, it also checks if the columns have uniform width).
Instead of using table.Columns.Add(table.Columns[table.Columns.Count]) to add a column before the last below, I select a cell in the table and used the insert command:
//assuming table is the name of the table you want to add columns to
table.Cell(1, table.Columns.Count).Select();
word.Selection selection = table.Application.ActiveWindow.Selection;
selection.InsertColumns();
This might actually be a better way to add columns, as the api gives you way more options on how to insert (i.e. use InsertColumnsRight to insert to the right of the column). The Columns.Add(...) function by default inserts to the left of the select