OpenRefine - Converting text column with numbers to numeric type - openrefine

I created an Age column based on the difference between 2 date columns. Initially I was able to create the Age facet without any issues.
But on reopening the project I am unable to create the Age facet. It looks like the Age column has somehow turned into a text column. And I have to keep converting all cells to numeric before being able to facet by Age each time I reopen the project.
Another issue (is this the cause?) with this column is I noticed wrong values in some Age cells and tried to update the value to leave it blank (i.e. age unknown) but having converted all cells in them column to Numeric the operation wasn't allowed. I am not sure if somewhere I left a cell as blank (text) space while making these corrections. If this is the reason for refine to treat the column as text how do I identify those particular cells and then set them to unknown numeric or null values?
Thanks.

Related

How to alter data types of multiple columns in table? (postgresql)

Uploaded a csv file to postgresql database and trying to change the columns that contain numbers from default text to numeric datatype.
I understand that you can manually alter each individual column datatype using ALTER and adding "," for multiple columns. However, the csv file contains 50 columns so trying to find out if there's a more DRY approach to changing the datatypes.
I was thinking about looping through certain sections of the csv file that need to be changed from text to numeric. For example, if columns 10 through 25 (out of 50) need to be changed, how would I select the starting point at column 10 and end at column 25?
If there's a way to loop through all columns and change the data type depending on the values in the column, that would be good to know. I was thinking this might be a problem because all values are set to default "text", so determining datatype based on values would be problematic. If this is not the case, would like to know how to approach the problem this way. Thanks!

Crystal reports linking issue

I have 2 columns, they are both patient numbers. One is defined as a string
and the other is defined as Float, null. So the link is not working. What must I do to the one defined as number? It comes from an excel. I have changed the cells there to text, it's showing this change when I upload to SQL Server, but Crystal reports sees it as a number and it won't link on the string column.
I thought to add a column to the table then copy in the contents, is that the way to go?
ALTER TABLE [Programmer].[dbo].['Preventive Care-Colon Cancer Sc']
ADD PatNum datatype nvarchar(50)

Excel to SQL table field value appending with 0

I loaded an Excel file into an SQL table. The Excel file, one field consists of VARCHAR data (of data type general). When loaded into an SQL table, some of these values are prefixed with zero.
Example: in the Excel file it is 1081999 the same value become 01081999 in the SQL table.
What might be the reason for this ?
Excel will hide leading 0's as it identifies the fields content as a number and displays it as such. I would assume that the excel worksheet does indeed contain these leading 0's and they are simply not shown by Excel. If you change the type of the column from General to Text do they show up??
As a side note, if these are indeed numbers you should be storing them in a numeric datatype in the database...

OpenRefine - Fill between cells but not at the end of the list

I have a list of stock prices for several stocks. Some of the values are missing due to weekends, holidays and probably other reasons.
The gaps are not consistent. Some are two days and some are more than that.
I want to fill the gaps with the last known value but not at the end of the list.
I have tried in Excel to test a few cells below and if it's now empty, do the fill. The problem is that due to the inconsistency of the gaps, it's a tedious task to change the function for all the cases.
Is there a way to test for the end of a list?
UPDATE - added a screenshot.
See this screenshot. I want to fill where the blue dots are. The red dots are at the end of the list and I don't want to fill those cells.
I am looking for a way to detect the end of the list and stop the filling when the end is detected.
I think this is pretty difficult in OpenRefine and probably a different tool would work better. The main issue is that OpenRefine does not offer the ability to easily work across rows so 'summing a column' (or part of a column) is tricky - this is mentioned in https://github.com/OpenRefine/OpenRefine/issues/200
However, you can do this by forcing OpenRefine in Record mode with the whole project containing a single record. Once you've done this you can access all values in a column using syntax like:
row.record.cells["Column name"].value
This gives an array of all the non-blank values in the column. Since this ignores blank values, in order to have a true view of the values in the column you have to fill in blank cells with a value.
So I think you could probably achieve what you want as follows:
For each column you are going to work with do a cell transform to put a dummy value in empty cells - e.g. if(isBlank(value),"null",value)
Create a new column at the start of your project and put a single value in the very first cell in that column
Switch to Record mode
At this point you should have a single 'Record' in your project - e.g.
You can now access all cells in a column using syntax like row.record.cells["Column 1"].value. You can combine this with 'forRange' to iterate through the contents of this array, using the row.index as the marker for the current row.
I used the following formula to add a new column to the project:
with(row.record.cells["Column 1"].value,w,if(forRange(row.index,w.length(),1,i,w[i].toNumber()).sum()>0,"a","b"))
Then...
Change back to 'Row' mode
Remove the 'null' placeholder from the original column
Create a facet on the 'fill filter' column
In my case I filter to 'a'
Use the 'fill down' option
Remove the filter
And remove the 'record' column
Rather a long winded way of doing it to say the least, but so far I've not been able to find anything better while not going outside OpenRefine. I'm guessing you could probably compress steps 5-11 into a single step or smaller number of steps.
If you want to access the array of cell values using Jython as suggested by iMitwe you need to use:
row["record"]["cells"]["Column 1"]["value"]
instead of
row.record.cells["Column 1"].value
(step 5)
I am doing this on the top of my head, but I think your best chance my be using the fill down option in record mode:
first move your column to the first column and switch to record mode.
then use the following GREL: row.record.cells["data"].value[-1] where data is the name of your column
The [-1] will take the last value and fill the blank. For the case with the red dot, since there is no value it should remains empty. Let us know how it goes.
Unless there's something I am missing or not seeing...
I would have just sorted reverse (date ascending) on the Date column, then individually use Fill Down on each column, except for that last column where you could then use a Date facet on your column Date to specify the exact Date range you wanted to work with, then fill down on that last column, then remove the Date range facet.

Importing Excel into SQL2005, issue with currency symbols and text in a number field

I have a field in my Excel as follows
€250
€240
Free
....
In my SQL2005 Preview this looks as follows
250
240
(Blank)
So it doesnt like Symbols and Text in this column altho it is going to a varchar column.
Any Ideas ?
Check whether the € is stored in your excel file as part of the value of a cell or as formatting.
I was trying to reproduce your problem, and when I entered €250 into a cell, when I later selected the cell to check the value, it was 250 (Excel had decided that I was trying to give the cell a numerical value, formatted as currency)
If that isn't the problem, trying using nchar as your column type rather than varchar. nchar allows a wider range of symbols.