Reading files via a loop - read.table

I am trying to read a set of files via a loop
for (i in 1:81){
Y[i,1:1001] = t(read.table(sprintf('/Users/pitman/Desktop/Desktop/Desktop/DataSets/code_Yingjie/RDsystem/%d_1000.dat',ss[i])))
}
After some processing I get the error message
Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
cannot open file '/Users/ebp/Desktop/Desktop/Desktop/DataSets/code/RDsystem/60_1000.dat': No such file or directory
The first 59 files appear to have been read. Why balk at file 60?
When I repeat this command (after trying some other things) I can change the file number at which the error appears -- 34 or 51 or 60...
Any pointers appreciated

Related

Microsoft R- Tidyverse: If a data file fails to load, create an empty tibble/table in it's place

I am really not sure how to phrase this concisely.. My question is: Is it possible to add an error handling feature so that if a data file (such as a csv) fails to load as a table/tibble, create a blank version of it?
Here is what I mean:
My normal csv load looks like this:
Monday2 <- paste0(my_file_location/my_file_name",Monday,".csv")
leads1 <- tibble(read.csv(Monday2))
Tuesday2 <- paste0("my_file_location/My_file_name",Tuesday,".csv")
leads2 <- tibble(read.csv(Tuesday2))
Wednesday2 <- paste0("my_file_location/my_file_name",Wednesday,".csv")
leads3 <- tibble(read.csv(Wednesday2))
If for some reason my csv failed to load (the file doesn't exist, or I entered the name incorrectly for example) can a blank version of it be created?
My idea for the blank tibble would look like this:
Leads21 <- tibble("Column1"= "", "Column2"= "", "Column3"= "")
Leads22 <- tibble("Column1"= "", "Column2"= "", "Column3"= "")
Leads23 <- tibble("Column1"= "", "Column2"= "", "Column3"= "")
This blank tibble would be the exact same columns as a properly loaded file. I have 5 files I bind each Friday in an automated process.. and if a file fails to load I can catch it downstream in my process (one of the columns is the file name/date) but I don't want the whole process to fail.
a typical 'failed to load' error looks like this:
In file(file, "rt") : cannot open file 'my_file_location/My_file_name_2022-03-27.csv': No such
file or directory
The bind of all 5 files then fails with an error message like:
### Join full weeks worth of leads into 1 file
Leads <- bind_rows(leads1,leads2,leads3, leads4, leads5)
Error in list2(...) : object 'leads1' not found
This then causes the rest of my code to fail/act incorrectly. If I can bind an empty tibble, my code could finish running and I can check for missing files at the end. Ultimately if a file is missing it is not as important as processing the existing files (so stopping my code to locate/fix the failed load is not important)
My background is in microsoft access VBA and I keep trying to write something like:
If tibble Leads1 exists, use it.. If tibble Leads1 does not exist use Leads21
not sure how to do this in R. I have been trying to read/understand the try() wrapper, but I don't understand how to use it in my case.

gdx file is not created when trying to export data from excel to gams

I'm trying to export data from Excel to GAMS and I'm using the next code for this aim:
Set c row labels /c1*c10/
x column labels /x1*x2/;
Parameter d(c,x);
$call GDXXRW Data.xlsx trace=3 par=d rng=Sheet1!a1 cdim=9 xdim=1
$GDXIN Data.gdx
$LOAD d
$GDXIN
Display d;
The directory of the Excel file is as follows:
gamsdir\projdir\Data.xlsx
But when it runs, the following errors occur:
Msg : No such file or directory for gamsdir\projdir\Data.gdx.
Unable to open gdx file for $GDXIN
GDXIN file not open - ignore the rest line
What could be the reason for this? Why isn't the gdx file created?
There could be several reasons. The most likely one is that the GDX file does not exist because there was an error calling gdxxrw. What does the log of gdxxrw say? In general, I would suggest checking the errorLevel after any $call ... to ensure that this succeeded before you continue. In your example, you could do it like this:
$call.checkErrorlevel GDXXRW Data.xlsx trace=3 par=d rng=Sheet1!a1 cdim=9 xdim=1

RevoScaleR can not find file, directory that exists

I am using RevoscaleR and I have successfully converted csv files to xdf files which I have saved to my local disk.
However, when I try to run functions that call these xdf files I get an error message that there is no such file or directory:
The file or directory 'P:/PROPENSITY/CL_Generic_Retail_201506' cannot be found.
Let me expose the whole process:
My working directory:
> getwd()
[1] "P:/PROPENSITY"
I used this code to convert csv file to xdf:
rx_CL_Generic_Retail_201506 <- rxImport(
inData = "CL_Generic_Retail_201506_23-05-2017.csv",
outFile = "CL_Generic_Retail_201506.xdf",
overwrite = TRUE
)
Then I used this code to check that the conversion was successful:
rxSummary(formula = ~ Avg_Deposits + Total_Num_ + Sumof_CC_AVGBAL_,
data = "CL_Generic_Retail_201506.xdf"
)
Summary Statistics Results for: ~Avg_Deposits + Total_Num_ + Sumof_CC_AVGBAL_
Data: "CL_Generic_Retail_201506.xdf" (RxXdfData Data Source)
File name: CL_Generic_Retail_201506.xdf
Number of valid observations: 7155413
Name Mean StdDev Min Max ValidObs MissingObs
Avg_Deposits 4562.914627 128614.5683 -325684032 69317080.0 7155413 0
Total_Num_ 7.062068 247.1506 1 224579.0 831567 6323846
Sumof_CC_AVGBAL_ 951.484138 2249.3149 0 164746.6 601304 6554109
Up to that point everything was fine.
I continued to convert files to xdf files.
Then I returned to that same file and tried to run the same function (summary) and I got the following error message:
> rxSummary(formula = ~ Avg_Deposits + Total_Num_ + Sumof_CC_AVGBAL_,
+
+ data = "CL_Generic_Retail_201506.xdf"
+
+ )
The file or directory 'CL_Generic_Retail_201506.xdf' cannot be found.
In case I repeat the process and run again rxImport the rxSummary function runs again. But then after a while, the same error repeats.
Could this have to do with back slashes?
I.e.: The message is:
The file or directory 'P:\PROPENSITY\CL_Generic_Retail_201506.xdf' cannot be found.
But when I ask R to print the working directory it returns:
> getwd()
[1] "P:/PROPENSITY"
Observe that in the RevoScaleR error message the slashes are \ while R's output of getwd() has /.
If this is the problem what I could do about it?
By the way this problem occurs in a workstation where Windows and RevoScaleR are installed. In a notebook running also RevoScaleR the problem does not appear.
I would appreciate any suggestion.
---------------------------------------------------------------------------
Here is an image of the directory where it is apparent that the files exist.
Image of the PROPENSITY folder with the xdf files
Try using append = "rows". The last csv is probably empty, resulting in overwritting a xdf with an empty xdf which is no file.
rx_CL_Generic_Retail_201506 <- rxImport(inData = "CL_Generic_Retail_201506_23-05-2017.csv", outFile = "CL_Generic_Retail_201506.xdf", overwrite = TRUE,
append = "rows"
)

Uploading job fails on the same file that was uploaded successfully before

I'm running regular uploading job to upload csv into BigQuery. The job runs every hour. According to recent fail log, it says:
Error: [REASON] invalid [MESSAGE] Invalid argument: service.geotab.com [LOCATION] File: 0 / Offset:268436098 / Line:218637 / Field:2
Error: [REASON] invalid [MESSAGE] Too many errors encountered. Limit is: 0. [LOCATION]
I went to line 218638 (the original csv has a headline, so I assume 218638 should be the actual failed line, let me know if I'm wrong) but it seems all right. I checked according table in BigQuery, it has that line too, which means I actually successfully uploaded this line before.
Then why does it causes failure recently?
project id: red-road-574
Job ID: Job_Upload-7EDCB180-2A2E-492B-9143-BEFFB36E5BB5
This indicates that there was a problem with the data in your file, where it didn't match the schema.
The error message says it occurred at File: 0 / Offset:268436098 / Line:218637 / Field:2. This means the first file (it looks like you just had one), and then the chunk of the file starting at 268436098 bytes from the beginning of the file, then the 218637th line from that file offset.
The reason for the offset portion is that bigquery processes large files in parallel in multiple workers. Each file worker starts at an offset from the beginning of the file. The offset that we include is the offset that the worker started from.
From the rest of the error message, it looks like the string service.geotab.com showed up in the second field, but the second field was a number, and service.geotab.com isn't a valid number. Perhaps there was a stray newline?
You can see what the lines looked like around the error by doing:
cat <yourfile> | tail -c +268436098 | tail -n +218636 | head -3
This will print out three lines... the one before the error (since I used -n +218636 instead of +218637), the one that had the error, and the next line as well.
Note that if this is just one line in the file that has a problem, you may be able to work around the issue by specifying maxBadRecords.

Puzzling Python I/O error: [Errno 2] No such file or directory

I'm trying to grab an XML file from a server (using Python 3.2.3), but I keep getting this error that there's "no such file or directory". I'm sure the URL is correct, since it outputs the URL in the error message, and I can copy-n-paste it and load it in my browser. So I'm very puzzled how this could be happening. Here's my code:
import xml.etree.ElementTree as etree
class Blah(object):
def getXML(self,xmlurl):
tree = etree.parse(xmlurl)
return tree.getroot()
def pregameData(self,url):
try:
x = self.getXML('{0}linescore.xml'.format(url))
except IOError as err:
x = "I/O error: {0}".format(err)
return x
if __name__ == '__main__':
x = Blah()
l = ['http://gd2.mlb.com/components/game/mlb/year_2013/month_04/day_15/gid_2013_04_15_anamlb_minmlb_1/',
'http://gd2.mlb.com/components/game/mlb/year_2013/month_04/day_15/gid_2013_04_15_phimlb_cinmlb_1/',
'http://gd2.mlb.com/components/game/mlb/year_2013/month_04/day_15/gid_2013_04_15_slnmlb_pitmlb_1/'
]
for url in l:
pre = x.pregameData(url)
print(pre)
And it always returns this error:
I/O error: [Errno 2] No such file or directory: 'http://gd2.mlb.com/components/game/mlb/year_2013/month_04/day_15/gid_2013_04_15_anamlb_minmlb_1/linescore.xml'
I/O error: [Errno 2] No such file or directory: 'http://gd2.mlb.com/components/game/mlb/year_2013/month_04/day_15/gid_2013_04_15_phimlb_cinmlb_1/linescore.xml'
I/O error: [Errno 2] No such file or directory: 'http://gd2.mlb.com/components/game/mlb/year_2013/month_04/day_15/gid_2013_04_15_slnmlb_pitmlb_1/linescore.xml'
You can copy-n-paste those URL's and see the files do exist in those locations. I even copied the files & directories to localhost, and tried this as localhost in case the foreign server had some kind of block. It gave me the same errors, so that's not an issue. I wondered if Etree's parse() can't handle HTTP, but the documentation doesn't say anything about that, so I'm guessing that's not an issue either.
UPDATE: As suggested in the comments, I went with using open(), but it still returned the error. Importing & trying urllib.request.urlopen(url) returns an error that AttributeError: 'module' object has no attribute 'request'.
You're correct, xml.etree dosen't automatically download and parse urls, if you want to do that you'll need to download it yourself first (using urllib or requests...).
The documentation explicitly states that parse takes a filename or fileobject, if it would support an url i'm sure it would say so explicitly. lxml.etree.parse() for example does.