Replacing all "0" and "?" in my dataset with NA except the response variable with "0" and "1' ( factors) using R - missing-data

Here is an example of what my dataset looks like, though large with over 7,000 observation with 64 variables.

That's in Excel, right?
You can simply highlight "X#" columns, go to Edit -> Find -> Replace and use that tool to replace all "0" and "?" with "NA". Be sure to enable the "Find entire cells only" option.

Related

Power BI- Add leading zeros to number column that contain letters

I am using a table from an excel file which has a TEXT column "Service Code".
When I upload the data into Power BI, it automatically changes the field type to number and removes leading zeros. For Example, "000230" becomes "230", and "010000" becomes "10000". I created a custom column and used = Number.ToText([#"Service Code"],"000000). This worked as all values in the column are 6 digits long, however, a few of them have a letter at the end which is causing error. For example,10014A or 10017Z. Is there a way to do this without causing error?
Service Code
Custom
Desired output
230
000230
000230
10000
010000
010000
10014A
Error
10014A
10017Z
Error
10017Z
I used the "Custom Column from Examples" feature in PowerQuery and typed in the first two required values.
It created this formula:
Text.PadStart(Text.From([orig], "en-US"), 6, "0")

Pentaho spoon + redoing field enclosures in output file

I'm new to Pentaho 8.3 CE (Spoon) and am trying add an extra column to a CSV file by concatenating 3 other text fields together. I'm using 2 options - Calculator and the inbuilt 'Concat fields' transformations.
The issue I'm facing is that some rows are enclosed by " " while others aren't... e.g.
Field A = "One thing, another thing"
Field B = Yet another thing
Field C = Final thing
Ideally, I want,
New field = "One thing, another thing Yet another thing Final thing",
I find I can't get the final " to enclose each line, so it looks like "One thing, another... Final thing
How do I get Pentaho to add that final " on? I've set to force the enclosure on.
enter image description here
First strip the double quotes with a String operations step or a Replace in String step (the latter allows regexp search and replace).
The use a Concat strings step to join them all together comma separated.
Finally, either prepend & append double quotes, or when writing out with e.g. a text file output, add the enclosure character.

Open Refine / Google Refine - Remove blank cells in a column

The task is simple to understand, I have a table like this:
And I would like to edit the column "L1_latitud" to collapse (or remove) all the blank cells:
It looks like a simple task but I can't find out a way to deal with it.
Not sure this is a programming question, but if what you show is a single Refine record (you can check by switching from Row mode to Record mode for viewing), you should be able to use "Join multi-valued" cells to collapse all the values into a single string with separators. From there the split(), filter(), join() methods would allow you to filter out the empty values and put the string back together. Finally, "Split multi-valued cells" would split them out into separate cells again.
I sense that you've already done some processing here, so there might be an easier way to do this if you started a step or two earlier in the process.
Create "Facet" -> "Customized facets" -> "Facet by null"
then simply exclude True choice in facet

Column references in formulas

I am a little stuck at the moment. I am working on an array of data and need to find a way to input column numbers into formulas.
-I have used the match function to find the corresponding column number for a value.
ex. "XYZ" matched with Column 3, which is equivalent to C1:Cxxxxxx
-now for inputing the C1:Cxxxxxx into a formula to get data for that particular column, I would like to be able to directly reference the Column 3 part, because I plan on using this workbook in the future and the column needed to run the calculation may or may not be column 3 the next time I use it.
- is there any way to tell excel to use a formula to tell excel which column to use for an equation?
so a little more detail, I have the equation
=AND(Sheet3!$C$1:$C$250000=$A$4,Sheet3!$B$1:$B$250000=$B$4)
instead of specifying to use column C, is there a way to use a formula to tell it to use C?
EDIT: more additional info;
"i am basically running the equivalent of a SQL where statement where foo and bar are true, I want excel to spit out a concatenated list of all baz values where foo and bar are true. ideally i would like it to ONLY return baz values that are true, then I will concat them together separately. the way I got it now, the expression will test every row separately to see if true; if there is 18K rows, there will be 18K separate tests.. it works, but it's not too clean. the goal is to have as much automated as possible. *i do not want to have to go in and change the column references every time I add a new data arra*y"
Thanks
You can use INDEX, e.g. if you have 26 possible columns from A to Z then this formula will give you your column C range (which you can use in another formula)
=INDEX(Sheet3!$A$1:$Z$250000,0,3)
The 0 indicates that you want the whole column, the 3 indicates which column. If you want the 3 can be generated by another formula like a MATCH function
Note: be careful with AND in
=AND(Sheet3!$C$1:$C$250000=$A$4,Sheet3!$B$1:$B$250000=$B$4)
AND only returns a single result not an array, if you want an array you might need to use * like this
=(Sheet3!$C$1:$C$250000=$A$4)*(Sheet3!$B$1:$B$250000=$B$4)
You could use ADDRESS to generate the text, you then need to use INDIRECT as you are passing a string rather than a range to the fomula
=AND(INDIRECT(ADDRESS(1,3,,,"Sheet3") & ":" & ADDRESS(250000,3))=$A$4
,INDIRECT(ADDRESS(1,2,,,"Sheet3") & ":" & ADDRESS(250000,2))=$B$4)
Obviously replace the 3s and 2s in the ADDRESS formulae with your MATCH function you used to get the column number. The above assumes the column for $B$1:$B$25000 is also found using `MATCH', otherwise it is just:
=AND(INDIRECT(ADDRESS(1,3,,,"Sheet3") & ":" & ADDRESS(250000,3))=$A$4
,Sheet3!$B$1:$B$25000=$B$4)
Note a couple of things:
You only need to use "Sheet3" on the first part of the INDRECT
Conditions 3 and 4 in the ADDRESS formula are left as default, this
means they return absolute ($C$1) reference and are A1 style as
opposed to R1C1
EDIT
Given the additional info maybe using an advanced filter would get you near to what you want. Good tutorial here. Set it up according to the tutorial to familiarise yourself with it and then you can use some basic code to set it up automatically when you drop in a new dataset:
Paste in the dataset and then use VBA to get the range the dataset uses then apply the filter with something like:
Range("A6:F480").AdvancedFilter Action:=xlFilterInPlace, CriteriaRange:= _
Sheets("Sheet1").Range("A1:B3"), Unique:=False
You can also copy the results into a new table, though this has to be in the same sheet as the original data. My suggestion would be paste you data into hidden columns to the left and put space for your criteria in rows 1:5 of the visible columns and then have a button that gets the used range for your data, applies the filter and copies the data below the criteria:
Range("A6:F480").AdvancedFilter Action:=xlFilterCopy, CriteriaRange:=Sheets _
Range("H1:M3"), CopyToRange:=Range("H6"), Unique:=False
Button would need to clear the destination cells first etc, make sure you have enough hidden columns etc but it's all possible. Hope this helps.

Speeding up looping through a textfile in vbs

Good afternoon,
I have a problem with my code where im looping through a textfile. The textfile has approx 10,000 lines so I came up with using the instr search function to find the line number by finding the character number in which the "test name" appears and then using the mid function and counting left to find the line number.
eg.
000004###24503###Open Account Web ISA single###2#########Please enter your first name.###False#########Mr############callie####################################################################################################################################################################################################################################################################################################################################666###Imagenericpassword###Ops#######################################################################################################################################Cash ISA 2009 / 2010##########################################################################################################################################################################################################################################
So in this case it finds "Open Account Web ISA single" and counts left to find 000004. So this saves me looping through 10,000 lines.
So next I split this line into an array using ### as a delimeter, this results in lots of empty "columns" since they were empty when i concatinated the data from excel. This leaves me with a total of around 247 columns. My issue is I dont want to really loop through 247 columns since lots of them contain...well nothing. Is there a quicker way for me to do this?
I used to use excel but its far too slow.
You can remove the empty columns:
Set re = New RegExp
re.Pattern = "(###){2,}"
re.Global = True
withoutEmptyCols=re.Replace(input,"###")
This is the result for your example:
000004###24503###Open Account Web ISA single###2###Please enter your first name.###False###Mr###callie###666###Imagenericpassword###Ops###Cash ISA 2009 / 2010###