RevoScaleR can not find file, directory that exists - microsoft-r

I am using RevoscaleR and I have successfully converted csv files to xdf files which I have saved to my local disk.
However, when I try to run functions that call these xdf files I get an error message that there is no such file or directory:
The file or directory 'P:/PROPENSITY/CL_Generic_Retail_201506' cannot be found.
Let me expose the whole process:
My working directory:
> getwd()
[1] "P:/PROPENSITY"
I used this code to convert csv file to xdf:
rx_CL_Generic_Retail_201506 <- rxImport(
inData = "CL_Generic_Retail_201506_23-05-2017.csv",
outFile = "CL_Generic_Retail_201506.xdf",
overwrite = TRUE
)
Then I used this code to check that the conversion was successful:
rxSummary(formula = ~ Avg_Deposits + Total_Num_ + Sumof_CC_AVGBAL_,
data = "CL_Generic_Retail_201506.xdf"
)
Summary Statistics Results for: ~Avg_Deposits + Total_Num_ + Sumof_CC_AVGBAL_
Data: "CL_Generic_Retail_201506.xdf" (RxXdfData Data Source)
File name: CL_Generic_Retail_201506.xdf
Number of valid observations: 7155413
Name Mean StdDev Min Max ValidObs MissingObs
Avg_Deposits 4562.914627 128614.5683 -325684032 69317080.0 7155413 0
Total_Num_ 7.062068 247.1506 1 224579.0 831567 6323846
Sumof_CC_AVGBAL_ 951.484138 2249.3149 0 164746.6 601304 6554109
Up to that point everything was fine.
I continued to convert files to xdf files.
Then I returned to that same file and tried to run the same function (summary) and I got the following error message:
> rxSummary(formula = ~ Avg_Deposits + Total_Num_ + Sumof_CC_AVGBAL_,
+
+ data = "CL_Generic_Retail_201506.xdf"
+
+ )
The file or directory 'CL_Generic_Retail_201506.xdf' cannot be found.
In case I repeat the process and run again rxImport the rxSummary function runs again. But then after a while, the same error repeats.
Could this have to do with back slashes?
I.e.: The message is:
The file or directory 'P:\PROPENSITY\CL_Generic_Retail_201506.xdf' cannot be found.
But when I ask R to print the working directory it returns:
> getwd()
[1] "P:/PROPENSITY"
Observe that in the RevoScaleR error message the slashes are \ while R's output of getwd() has /.
If this is the problem what I could do about it?
By the way this problem occurs in a workstation where Windows and RevoScaleR are installed. In a notebook running also RevoScaleR the problem does not appear.
I would appreciate any suggestion.
---------------------------------------------------------------------------
Here is an image of the directory where it is apparent that the files exist.
Image of the PROPENSITY folder with the xdf files

Try using append = "rows". The last csv is probably empty, resulting in overwritting a xdf with an empty xdf which is no file.
rx_CL_Generic_Retail_201506 <- rxImport(inData = "CL_Generic_Retail_201506_23-05-2017.csv", outFile = "CL_Generic_Retail_201506.xdf", overwrite = TRUE,
append = "rows"
)

Related

How to find rejected files due to errors in apache beam java sdk

I Have N number of same type files to be processed and I will be giving a wildcard input pattern(C:\\users\\*\\*).
So now how do I find the file name and record ,that has been rejected while uploading to bigquery in java.
I guess BQ writes to the temp location path that you pass to your pipeline and not to local [honestly not sure about this].
In my case, with python, I used to pass tmp location as GCS bucket, and when I error is show, they usually shows the name of the log file that contains the rejected errors in the command line logs.
And then I use gsutil cp command to copy it to my local computer and read it
BigQuery I/O (Java and Python SDK) supports deadletter pattern: https://beam.apache.org/documentation/patterns/bigqueryio/.
Java
result
.getFailedInsertsWithErr()
.apply(
MapElements.into(TypeDescriptors.strings())
.via(
x -> {
System.out.println(" The table was " + x.getTable());
System.out.println(" The row was " + x.getRow());
System.out.println(" The error was " + x.getError());
return "";
}));
Python
errors = (
result['FailedRows']
| 'PrintErrors' >>
beam.FlatMap(lambda err: print("Error Found {}".format(err))))

Microsoft R- Tidyverse: If a data file fails to load, create an empty tibble/table in it's place

I am really not sure how to phrase this concisely.. My question is: Is it possible to add an error handling feature so that if a data file (such as a csv) fails to load as a table/tibble, create a blank version of it?
Here is what I mean:
My normal csv load looks like this:
Monday2 <- paste0(my_file_location/my_file_name",Monday,".csv")
leads1 <- tibble(read.csv(Monday2))
Tuesday2 <- paste0("my_file_location/My_file_name",Tuesday,".csv")
leads2 <- tibble(read.csv(Tuesday2))
Wednesday2 <- paste0("my_file_location/my_file_name",Wednesday,".csv")
leads3 <- tibble(read.csv(Wednesday2))
If for some reason my csv failed to load (the file doesn't exist, or I entered the name incorrectly for example) can a blank version of it be created?
My idea for the blank tibble would look like this:
Leads21 <- tibble("Column1"= "", "Column2"= "", "Column3"= "")
Leads22 <- tibble("Column1"= "", "Column2"= "", "Column3"= "")
Leads23 <- tibble("Column1"= "", "Column2"= "", "Column3"= "")
This blank tibble would be the exact same columns as a properly loaded file. I have 5 files I bind each Friday in an automated process.. and if a file fails to load I can catch it downstream in my process (one of the columns is the file name/date) but I don't want the whole process to fail.
a typical 'failed to load' error looks like this:
In file(file, "rt") : cannot open file 'my_file_location/My_file_name_2022-03-27.csv': No such
file or directory
The bind of all 5 files then fails with an error message like:
### Join full weeks worth of leads into 1 file
Leads <- bind_rows(leads1,leads2,leads3, leads4, leads5)
Error in list2(...) : object 'leads1' not found
This then causes the rest of my code to fail/act incorrectly. If I can bind an empty tibble, my code could finish running and I can check for missing files at the end. Ultimately if a file is missing it is not as important as processing the existing files (so stopping my code to locate/fix the failed load is not important)
My background is in microsoft access VBA and I keep trying to write something like:
If tibble Leads1 exists, use it.. If tibble Leads1 does not exist use Leads21
not sure how to do this in R. I have been trying to read/understand the try() wrapper, but I don't understand how to use it in my case.

Bigquery error (ASCII 0) encountered for external table and when loading table

I'm getting this error
"Error: Error detected while parsing row starting at position: 4824. Error: Bad character (ASCII 0) encountered."
The data is not compressed.
My external table points to multiple CSV files, and one of them contains a couple of lines with that character. In my table definition I added "MaxBadRecords", but that had no effect. I also get the same problem when loading the data in a regular table.
I know I could use DataFlow or even try to fix the CSVs, but is there an alternative to that does not include writing a parser, and hopefully just as easy and efficient?
is there an alternative to that does not include writing a parser, and hopefully just as easy and efficient?
Try below in Google Cloud SDK Shell (with use of tr utility)
gsutil cp gs://bucket/badfile.csv - | tr -d '\000' | gsutil cp - gs://bucket/fixedfile.csv
This will
Read your "bad" file
Remove ASCII 0
Save "fixed" file into new file
After you have new file - just make sure your table now points to that fixed one
Sometimes it occurs that a final byte appears in file.
What could help is replacing it thanks to :
tr '\0' ' ' < file1 > file2
You can clean the file using an external tool like python or PowerShell. There is no way to load any file with an ASCII0 in bigquery
This is a script that can clear the file with python:
def replace_chars(self,file_path,orignal_string,new_string):
#Create temp file
fh, abs_path = mkstemp()
with os.fdopen(fh,'w', encoding='utf-8') as new_file:
with open(file_path, encoding='utf-8', errors='replace') as old_file:
print("\nCurrent line: \t")
i=0
for line in old_file:
print(i,end="\r", flush=True)
i=i+1
line=line.replace(orignal_string, new_string)
new_file.write(line)
#Copy the file permissions from the old file to the new file
shutil.copymode(file_path, abs_path)
#Remove original file
os.remove(file_path)
#Move new file
shutil.move(abs_path, file_path)
The same but for PowerShell:
(Get-Content "C:\Source.DAT") -replace "`0", " " | Set-Content "C:\Destination.DAT"

writeOGR error: creation of output file failed

I'm an R rookie and attempting to create home ranges from fish telemetry data using kernel density estimates within the adehabitatHR package
kud <- kernelUD(muskydetectdata.P[,6], h="href", extent = 5)
class(kud)
image(kud)
kud[[1]]#h
muskykud.P95 <- getverticeshr(kud, percent = 95)
muskykud.P95
muskykud.P50 <- getverticeshr(kud, percent = 50)
muskykud.P50
when exporting to a shapefile
writeOGR(muskydetectdata.sp,"musky_kde1", "gps",
driver="ESRI Shapefile",
dataset_options= "FieldName= id")
an error message is displayed
##creation of output file failed
I have also attempted to use writeSpatialShape with similar results
I'm using R version 3.3.2 on windows 64 bit
I had the same problem and have solved it only when I added a full name of my directory and a name of a layer plus a shp suffix:
writeOGR(muskydetectdata.sp, dsn="d:/your directory here/musky_kde.shp", layer="musky_kde", driver="ESRI Shapefile")
I had that same error.
I resolved mine by correcting the directory it was saving to (making sure it existed)
e.g.
writeOGR(muskydetectdata.sp, dsn = save.dir, layer = filename.save, driver = 'ESRI Shapefile')
where save.dir is the directory you want saved as a string and filename.save is the filename you want it saved as (excluding extension)
I guess you are trying to write on an existing file and the writeOGR function don't allow that. I guess this is a known behavior of some drivers supported by OGR (as far as I remember in R as in python and in the C API).
You have to check if the file exists prior to your writing and removing it (or changing the path you want to use).
For example here the first write operation succeed but the attempt to overwrite the file fails with your error message :
> rgdal::writeOGR(spdf, 'b.shp', layer="brazil", driver='ESRI Shapefile')
> rgdal::writeOGR(spdf, 'b.shp', layer="brazil", driver='ESRI Shapefile')
Error in rgdal::writeOGR(spdf, "b.shp", layer = "brazil", driver = "ESRI Shapefile") :
Creation of output file failed

How to run same syntax on multiple spss files

I have 24 spss files in .sav format in a single folder. All these files have the same structure. I want to run the same syntax on all these files. Is it possible to write a code in spss for this?
You can use the SPSSINC PROCESS FILES user submitted command to do this or write your own macro. So first lets create some very simple fake data to work with.
*FILE HANDLE save /NAME = "Your Handle Here!".
*Creating some fake data.
DATA LIST FREE / X Y.
BEGIN DATA
1 2
3 4
END DATA.
DATASET NAME Test.
SAVE OUTFILE = "save\X1.sav".
SAVE OUTFILE = "save\X2.sav".
SAVE OUTFILE = "save\X3.sav".
EXECUTE.
*Creating a syntax file to call.
DO IF $casenum = 1.
PRINT OUTFILE = "save\TestProcess_SHOWN.sps" /"FREQ X Y.".
END IF.
EXECUTE.
Now we can use the SPSSINC PROCESS FILES command to specify the sav files in the folder and apply the TestProcess_SHOWN.sps syntax to each of those files.
*Now example calling the syntax.
SPSSINC PROCESS FILES INPUTDATA="save\X*.sav"
SYNTAX="save\TestProcess_SHOWN.sps"
OUTPUTDATADIR="save" CONTINUEONERROR=YES
VIEWERFILE= "save\Results.spv" CLOSEDATA=NO
MACRONAME="!JOB"
/MACRODEFS ITEMS.
Another (less advanced) way is to use the command INSERT. To do so, repeatedly GET each sav-file, run the syntax with INSERT, and sav the file. Probably something like this:
get 'file1.sav'.
insert file='syntax.sps'.
save outf='file1_v2.sav'.
dataset close all.
get 'file2.sav'.
insert file='syntax.sps'.
save outf='file2_v2.sav'.
etc etc.
Good luck!
If the Syntax you need to run is completely independent of the files then you can either use: INSERT FILE = 'Syntax.sps' or put the code in a macro e.g.
Define !Syntax ()
* Put Syntax here
!EndDefine.
You can then run either of these 'manually';
get file = 'file1.sav'.
insert file='syntax.sps'.
save outfile ='file1_v2.sav'.
Or
get file = 'file1.sav'.
!Syntax.
save outfile ='file1_v2.sav'.
Or if the files follow a reasonably strict naming structure you can embed either of the above in a simple bit of python;
Begin Program.
imports spss
for i in range(0, 24 + 1):
syntax = "get file = 'file" + str(i) + ".sav.\n"
syntax += "insert file='syntax.sps'.\n"
syntax += "save outfile ='file1_v2.sav'.\n"
print syntax
spss.Submit(syntax)
End Program.