I am trying to perform a simple linear regression on a csv data set, but R won't read the dataset - read.csv

I am running the code below to use a CSV file so that I can perform a linear regression. A few fixes I found here and on other sites included the "setwd" command and closing the CSV file before running the command. I am still generating the error.
setwd("C:/Users/Tommy/Desktop/")
dataset = file.choose("Project_subset.csv")
dataset = read.csv("dataset")`
> dataset = read.csv("dataset")
Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") : cannot open file 'dataset': No such file or directory
I appreciate the help on a simple problem.
I entered several different codes to read the csv file and none have been successful. I keep getting the error above that the file does not exist. I also used the file.exist() code and it returned FALSE. I am very confused as this seems to be a simple command to use.

Related

GrADS script sometimes throws error while opening grib ctl file

Getting this error for GrADS (SUN Grid Engine) logic. However, the grib file and its associated ctl file exist and are valid. If I try to re-run the grads script job, it succeeds. I just don't understand why it sometimes fails.
opening ctl file
/data/myprogram/20211027/gribs/mygribname.grb.ctl
Open Error: Can't open binary data file
File name = /data/myprogram/20211027/gribs/mygribname.grb
The funny thing is... the code is trying to open the ctl file and not the gribfile. Not sure why it even tried to open the gribfile instead.
I figured out the answer to my question. I had to study a little bit of GrADS so I could interpret the logs better. The error above is thrown when the file does not exist. In my case, the file was there. However, it was getting generated after it was accessed by the GrADS code. And sometimes, the file was getting written as GrADS code was trying to access it. I added timestamps to the operations with full iso mode. The operations were happening within milliseconds of each other.

DATASET_CANT_CLOSE error number 32 "Broken Pipe"

I experienced an error in SAP ABAP which says DATASET_CANT_CLOSE with error number 32 (Broken Pipe). Question is: what procedure triggered this kind of error?
As far as I know, this error was triggered by:
CLOSE DATASET dset
But I can't reproduce the error since I don't know what procedure does trigger this kind of error.
This is the code I use:
method GENERATE_TXT_FILE.
DATA :
lwa_data TYPE t_line,
lv_param TYPE sxpgcolist-parameters.
"Upload File to Server
*Open Dataset
OPEN DATASET im_file_name FILTER 'dos2ux'
FOR OUTPUT IN TEXT MODE ENCODING DEFAULT.
CLEAR lwa_data.
LOOP AT it_data INTO lwa_data.
CATCH SYSTEM-EXCEPTIONS file_access_errors = 4
OTHERS = 8.
TRANSFER lwa_data-lines TO im_file_name.
ENDCATCH.
IF sy-subrc <> 0.
CLEAR lwa_data.
EXIT.
ENDIF.
CLEAR lwa_data.
ENDLOOP.
*Close Dataset
CLOSE DATASET im_file_name.
As I have investigated through the background job log, it seems that the current server which run the background job haven't got mapped yet to the text file folder. Solution is to re-map the server to text file folder.
You are using the FILTER extension to OPEN DATASET - which can be a HUGE security issue as well as raise loads of portability issues unless you know what you're doing, but that's not what the question is about. From the documentation:
When the statement OPEN DATASET is executed, a process is started in
the operating system for the specified statement. When the file is
opened for reading, a channel (pipe) is linked with STDOUT of the
process, from which the data is read during file reading. The file
itself is linked with STDIN of the process. When the file is opened
for writing, a channel (pipe) is linked to STDIN of the process, to
which data is passed when writing. The output of the process is
diverted to this file.
In your case, the filter command probably decided to bail out - see this answer among many. Why is hard to investigate - you may have to go through various system logs to find out. If the problem really is some unmapped network folder, you could try switching to UNC paths.

Storing from wildcard input path

I’m having issues using wildcard input paths in Pig.
If I run the following commands:
A = load ‘/something/*.csv’ using PigStorage(‘,’)
dump A;
I see the output from all csv files in the something folder printed to my console after the job is run.
If, however, I run a store instead:
A = load ‘/something/*.csv’ using PigStorage(‘,’)
store A into ‘somedestination’;
The job fails with the following error message:
Input(s):
Failed to read data from “/something/*.csv”
It looks like the store is attempting to load from the literal path instead of globbing using the wildcard, but if that’s the case then why does it work during the dump? Is there another way to accomplish this?
You may not have the permission to write to that folder.
The dump essentially writes to the tmp folder (or another folder if the configuration is different) and then prints that to the screen.
Do a dump. Look at the log. It should say something like:
Input(s):
Successfully read 0 records from: "‘/something/*.csv’"
Output(s):
Successfully stored 0 records in: "file:/tmp/temp1865628879/tmp-1573237939"
Then next time try and store to the folder that you saw when you did the dump. If that works fine, then you have a permissions problem.

Internal error while loading to Bigquery table

I ran this command to load 11 files to a Bigquery table:
bq load --project_id=ardent-course-601 --source_format=NEWLINE_DELIMITED_JSON dw_test.rome_defaults_20140819_test gs://sm-uk-hadoop/queries/logsToBq_transformLogs/rome_defaults/20140819/23af7218-617d-42e8-884e-f213a583094a/part* /opt/sm-analytics/projects/logsTobqMR/jsonschema/rome_defaultsSchema.txt
I got this error:
Waiting on bqjob_r46f38146351d545_00000147ef890755_1 ... (11s) Current status: DONE
BigQuery error in load operation: Error processing job 'ardent-course-601:bqjob_r46f38146351d545_00000147ef890755_1': Too many errors encountered. Limit is: 0.
Failure details:
- File: 5: Unexpected. Please try again.
I tried many times after that and still got the same error.
To debug what went wrong, I instead load each file one by one to the Bigquery table. For example:
/usr/local/bin/bq load --project_id=ardent-course-601 --source_format=NEWLINE_DELIMITED_JSON dw_test.rome_defaults_20140819_test gs://sm-uk-hadoop/queries/logsToBq_transformLogs/rome_defaults/20140819/23af7218-617d-42e8-884e-f213a583094a/part-m-00011.gz /opt/sm-analytics/projects/logsTobqMR/jsonschema/rome_defaultsSchema.txt
There are 11 files total and each ran fine.
Could someone please help? Is this a bug on Bigquery side?
Thank you.
There was an error reading one of the files: gs://...part-m-00005.gz
Looking at the import logs, it appears that the gzip reader encountered an error decompressing the file.
It looks like that file may not actually be compressed. BigQuery samples the header of the first file in the list to determine whether it is dealing with compressed or uncompressed files and to determine the compression type. When you import all of the files at once, it only samples the first file.
When you run the files individually, bigquery reads the header of the file and determines that it isn't actually compressed (despite having the suffix '.gz') so imports it as a normal flat file.
If you run a load that doesn't mix compressed and uncompressed files, it should work successfully.
Please let me know if you think this is not the case and I'll dig in some more.

IS it possible to manage NO FILE error in Pig?

I'm trying to load simple file:
log = load 'file_1.gz' using TextLoader AS (line:chararray);
dump log
And I get an error:
2014-04-08 11:46:19,471 [main] ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR 2997: Unable to recreate exception from backend error: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Input Pattern hdfs://hadoop1:8020/pko/file*gz matches 0 files
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:288)
at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:1054)
Is is possible to manage such situation before error appears?
Input Pattern hdfs://hadoop1:8020/pko/file*gz matches 0 files
The error is the input file doesn't exist in the given hdfs path.
log = load 'file_1.gz' using TextLoader AS (line:chararray);
as you haven’t mentioned the absolute path of file_1.gz , it will taken the home hdfs dir of the user with which you are running your pig-script
Unfortunately in the current version of Pig (0.15.0) it is impossible to manage these errors without using UDF's.
I suggest creating a Java or Python script using try and catch to take care of this.
Here's a good website that might be of some use to you: https://wiki.apache.org/pig/PigErrorHandlingInScripts
Good luck learning Pig!
I'm facing this issue as well. My load command is:
DATA = LOAD '${qurwf_folder_input}/data/*/' AS (...);
I want to load all files from the data subfolders, but the data folder is empty and I got the same error as you. What I did, in my particular case, was to create an empty folder in the data directory. So the LOAD returns an empty dataset and the script did not fail.
By the way, I'm using Oozie workflow to run the scripts, and in the prepare, I create the empty folders.