Error in data frame creation in R in Spark using as.data.frame - sql

I am trying to convert SparkDataFrame to R data frame.
%python
temp_df.createOrReplaceTempView("temp_df_r")
%r
temp_sql = sql("select * from temp_df_r")
temp_r = as.data.frame(temp_sql)
Error in as.data.frame.default(temp_sql) :
cannot coerce class ‘structure("SparkDataFrame", package = "SparkR")’ to a data.frame
Sometimes I get error, it's still unknown why I get error sometimes and sometimes not.

I need more details. What environment do you use?

Related

Error: ('HY000', 'The driver did not supply an error!') - with string

I am trying to push a pandas DataFrame from python to Impala but am getting a very uninformative error. The code I am using looks as such:
cursor = connection.cursor()
cursor.fast_executemany = True
cursor.executemany(
f"INSERT INTO table({', '.join(df.columns.tolist())}) VALUES ({('?,' * len(df.columns))[:-1]})",
list(df.itertuples(index=False, name=None))
)
cursor.commit()
connection.close()
This works for the first 23 rows and then suddenly throws this error:
Error: ('HY000', 'The driver did not supply an error!')
This doesn't help me locate the issue at all. I've turned all Na values to None so there is compatibility, Unfortunatly I can't share the data.
Does anyone have any ideas/ leads as to how I would solve this. Thanks

ERROR: MethodError: objects of type Tuple{} are not callable

I have never used Julia and I am following a tutorial. I have reached an instruction that are written below. When I tried the Lon3D, Lat3D,Depth3D =LonLatDepthGrid(lon,lat,depth) I got and error saying ERROR: UndefVarError: LonLatDepthGrid not defined
I tried creating the variable using LonLatDepthGrid =()
I ran the Lon3d.... command again then got the following error * MethodError: objects of type Tuple{} are not callable*
What am I missing????
THanks
K
Here is the order in which I am defining variables:
lat = ncread("APVC.ANT+RF.Ward.2014_kmps.nc","latitude")
lon = ncread("APVC.ANT+RF.Ward.2014_kmps.nc","longitude")
depth = ncread("APVC.ANT+RF.Ward.2014_kmps.nc","depth")
Vs_3D = ncread("APVC.ANT+RF.Ward.2014_kmps.nc","vs")
depth = -1 .* depth
Lon3D,Lat3D,Depth3D = LonLatDepthGrid(lon, lat, depth);
I tried creating the variable using
LonLatDepthGrid =()
I ran the Lon3d.... command again then got the following error * MethodError: objects of type Tuple{} are not callable*
What am I missing????
THanks
K
You created a tuple called LonLatDepthGrid:
julia> LonLatDepthGrid =()
julia> typeof(LonLatDepthGrid)
Tuple{}
But then you tried to use the tuple as a function:
julia> LonLatDepthGrid(...)
ERROR: MethodError: objects of type Tuple{} are not callable
Also, it's difficult to help you further without knowing what the packages you're using...
Please read the Julia documentation about installing packages from their repositories.
GeophysicalModelGenerator.jl needs to be installed:
julia> ]add GeophysicalModelGenerator
...
julia> using GeophysicalModelGenerator
Now this should compile:
Lat = 1.0:3:10.0;
Lon = 11.0:4:20.0;
Depth = (-20:5:-10)*km;
Lon3D,Lat3D,Depth3D = LonLatDepthGrid(Lon, Lat, Depth);

Sparklyr : sql temporary error : argument is not interpretable as logical

Hi I'm new to sparklyr and I'm essentially running a query to create a temporary object in spark.
The code is something like
ts_data<-tbl(sc,"db.table") %>% filter(condition) %>% compute("ts_data")
sc is my spark connection.
I have run the same code before and it works but now I get the following error.
Error in if (temporary) sql("TEMPORARY ") : argument is not
interpretable as logical
I have tried changing filters, tried it with new tables, R versions and snapshots. Yet it still gives the same exact error. I am positive there are no syntax errors
Can someone help me understand how to fix this?
I ran into the same problem. Changing compute("x") to compute(name = "x") fixed it for me.
This was a bug of sparklyr, and is fixed with version 1.7.0. So either use matching by argument (name = x) or update your sparklyr version.

Creating list for Pandas_datareader symbol warning

How can i create a list which would log the symbols for each symbol warning ?
Everytime I execute data = web.DataReader(ticker, 'yahoo', start, end) i get symbol warnings, i want to create a list of symbols which i got the warning for how can i do that?
SymbolWarning: Failed to read symbol: 'BRK.B', replacing with NaN.
warnings.warn(msg.format(sym), SymbolWarning)
Full code :
start = datetime.date(2008,11,1)
end = datetime.date.today()
# df = web.get_data_yahoo(tickers, start, end)
df = web.DataReader(tickers, 'yahoo', start, end)
It looks like yahoo in your case is rejecting requests after a set limit.
I also ran into the same error. Basically we need to catch the warnings and extract the symbols from it.
Easier said than done.. I wasn't able to do it with try & except. (If anybody was able to, please let me know..).
I found out warnings had a catch_warnings subclass (context manager) which I can use to capture the warning messages and extract the symbols from them.
The way I did (maybe not the best way):
import warnings
import pandas_datareader
list_ = ["AAPL","TSLA","XYZ","QUAL","IOV"] # Sample list of symbols
bad_symbol_list=[]
df = pandas_datareader.yahoo.daily.YahooDailyReader(list_, start='2008-01-11',end='2020-01-31', interval='d')
with warnings.catch_warnings(record=True) as err:
warnings.resetwarnings()
df_processed=df.read()
for w in err:
print(w.message)
tmp = w.message.args[0].replace("'","")
bad_symbol_list +=[ tmp.split(" ")[4].replace("," , "") ]
print(bad_symbol_list)
Output:
Failed to read symbol: 'XYZ', replacing with NaN.
Failed to read symbol: 'IOV', replacing with NaN.
['XYZ', 'IOV']

Could not infer the matching function for org.apache.pig.builtin.SUM as multiple or none of them fit. Please use an explicit cast

I wanted to do the sum of a column which contains long type numbers.
I tried many possible ways but still the cast error is not getting resolved.
My pig code:
raw_ds = LOAD '/tmp/bimallik/data/part-r-00098' using PigStorage(',') AS (
d1:chararray, d2:chararray, d3:chararray, d4:chararray, d5:chararray,
d6:chararray, d7:chararray, d8:chararray, d9:chararray );
parsed_ds = FOREACH raw_ds GENERATE d8 as inBytes:long, d9 as outBytes:long;
X = FOREACH parsed_ds GENERATE (long)SUM(parsed_ds.inBytes) AS inBytes;
dump X;
Error snapshot:
2015-11-20 02:16:26,631 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1045:
Could not infer the matching function for org.apache.pig.builtin.SUM as multiple or none of them fit. Please use an explicit cast.
Details at logfile: /users/bimallik/pig_1448014584395.log
2015-11-20 02:17:03,629 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 50% complet
#ManjunathBallur Thanks for the input.
I changed my code as below now
<..same as before ...>
A = GROUP parsed_ds by inBytes;
X = FOREACH A GENERATE SUM(parsed_ds.inBytes) as h;
DUMP X;
Now A is generating a bag of common inBytes and X is giving sum of each bag's inBytes's summation which is again consisting of multiple rows where as I need one single summation value.
In local mode I was getting the same issue (pig -x local).
I have tried all the solutions available on the internet but noting seems to be working for me.
I toggled my PIG from local to the mapreduce mode and tried the solution. It worked.
In mapreduce mode all the solutions seem to be working.