ValueError: Protocol not known: http:https - dataframe

I am trying to upload a dataset of image links and captions to huggingface datasets.
df1 = pd.read_csv('xx.csv')
features = Features({
'link': Image(decode=True),
'caption': Value(dtype='string'),
})
ds = Dataset.from_pandas(df1, features=features)
ds.features
ds.push_to_hub("xx/xx")
I got the below error
100%|██████████| 3/3 [14:03<00:00, 281.08s/ba]
0%| | 0/3 [03:27<?, ?ba/s] hub: 0%| | 0/67 [00:00<?, ?it/s]
Pushing dataset shards to the dataset hub: 1%|▏ | 1/67 [03:27<3:48:29, 207.72s/it]
Output exceeds the size limit. Open the full output data in a text editor
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-2-58ace97f5644> in <module>
4 ds.features
5
----> 6 ds.push_to_hub("xx/xx")
c:\Users\96654\anaconda3\lib\site-packages\datasets\arrow_dataset.py in push_to_hub(self, repo_id, split, private, token, branch, max_shard_size, num_shards, shard_size, embed_external_files)
4942 max_shard_size=max_shard_size,
4943 num_shards=num_shards,
-> 4944 embed_external_files=embed_external_files,
4945 )
4946 organization, dataset_name = repo_id.split("/")
c:\Users\96654\anaconda3\lib\site-packages\datasets\arrow_dataset.py in _push_parquet_shards_to_hub(self, repo_id, split, private, token, branch, max_shard_size, num_shards, embed_external_files)
4805 desc="Pushing dataset shards to the dataset hub",
4806 total=num_shards,
-> 4807 disable=not logging.is_progress_bar_enabled(),
4808 ):
4809 shard_path_in_repo = path_in_repo(index, shard)
c:\Users\96654\anaconda3\lib\site-packages\tqdm\std.py in __iter__(self)
1163
1164 try:
-> 1165 for obj in iterable:
...
--> 240 raise ValueError("Protocol not known: %s" % protocol)
241 bit = known_implementations[protocol]
242 try:
ValueError: Protocol not known: http:https
I'm uploading an image-caption CSV file to huggingface datasets by decoding the images links to display them in a column

Related

Converting financial information to pandas data frame

I am trying to get stock data such as balance_sheet, income_statement and cash_flow for multiple stocks and converting it to a data frame for manipulations.
here is the getting the data part of the code :
**import yahoo_fin.stock_info as yfs
tickers = ['AMZN','AAPL','MSFT','DIS','GOOG']
balance_sheet=[]
income_statement=[]
cash_flow=[]
balance_sheet.append({ticker : yfs.get_balance_sheet(ticker) for ticker in tickers})
income_statement.append({ticker : yfs.get_income_statement(ticker) for ticker in tickers })
cash_flow.append({ticker : yfs.get_cash_flow(ticker) for ticker in tickers})**
This part works well and returns a dictionary for each category. I then this :
my_dict=cash_flow+balance_sheet+income_statement
dff=pd.DataFrame.from_dict(my_dict, orient='columns', dtype=None, columns=None)
Note that when I try orient='index' I get the following error message :
**AttributeError Traceback (most recent call last)
in
1 my_dict=cash_flow+balance_sheet+income_statement
----> 2 dff=pd.DataFrame.from_dict(my_dict, orient='index', dtype=None, columns=None)
3 # dff=dff.set_index('endDate')
4 dff
5 # cash_flow
/opt/anaconda3/lib/python3.8/site-packages/pandas/core/frame.py in from_dict(cls, data, orient, dtype, columns)
1361 if len(data) > 0:
1362 # TODO speed up Series case
-> 1363 if isinstance(list(data.values())[0], (Series, dict)):
1364 data = _from_nested_dict(data)
1365 else:
enter code here
AttributeError: 'list' object has no attribute 'values'**
If someone could let me know what I'm doing wrong that would be very appreciated ! :)

I cant retrieve footprints from place information

enter code hereWhen I try to retrieve footprints from place name using
import osmx as ox
tags = {'building': True}
gdf = ox.geometries_from_place('Piedmont, California, USA', tags)
I get the following error message:
IllegalArgumentException: Argument must be Polygonal or LinearRing
PredicateError: Failed to evaluate <_FuncPtr object at 0x13a2ea120>
In the past, I have successfully used the old version to retrieve footprints ox.footprints_from_place(). However this does not work anymore, and neither does the new method. Does anybody had the same issues with the new version (1.0.1) of the osmnx package?
Due to stackoverflow restrictions I can't post the complete traceback message. It seems that osmnx does not create the required polygon. The first error entries are:
---------------------------------------------------------------------------
PredicateError Traceback (most recent call last)
<ipython-input-13-98877af3189c> in <module>
1 import osmnx as ox
2 tags = {'building': True}
----> 3 gdf = ox.geometries_from_place('Piedmont, California, USA', tags)
/opt/anaconda3/envs/gerdaenv/lib/python3.7/site-packages/osmnx/geometries.py in geometries_from_place(query, tags, which_result, buffer_dist)
214
215 # create GeoDataFrame using this polygon(s) geometry
--> 216 gdf = geometries_from_polygon(polygon, tags)
217
218 return gdf
/opt/anaconda3/envs/gerdaenv/lib/python3.7/site-packages/osmnx/geometries.py in geometries_from_polygon(polygon, tags)
264
265 # create GeoDataFrame from the downloaded data
--> 266 gdf = _create_gdf(response_jsons, polygon, tags)
267
268 return gdf
/opt/anaconda3/envs/gerdaenv/lib/python3.7/site-packages/osmnx/geometries.py in _create_gdf(response_jsons, polygon, tags)
428
429 # Apply .buffer(0) to any invalid geometries
--> 430 gdf = _buffer_invalid_geometries(gdf)
431
432 # Filter final gdf to requested tags and query polygon
/opt/anaconda3/envs/gerdaenv/lib/python3.7/site-packages/osmnx/geometries.py in _buffer_invalid_geometries(gdf)
891
892 # create a filter for rows with invalid geometries
--> 893 invalid_geometry_filter = ~gdf["geometry"].is_valid
894
895 # if there are invalid geometries
/opt/anaconda3/envs/gerdaenv/lib/python3.7/site-packages/geopandas/base.py in is_valid(self)
168 """Returns a ``Series`` of ``dtype('bool')`` with value ``True`` for
169 geometries that are valid."""
--> 170 return _delegate_property("is_valid", self)
171
172 #property
The last traceback messages are :
/opt/anaconda3/envs/gerdaenv/lib/python3.7/site-packages/shapely/predicates.py in __call__(self, this)
23 def __call__(self, this):
24 self._validate(this)
---> 25 return self.fn(this._geom)
/opt/anaconda3/envs/gerdaenv/lib/python3.7/site-packages/shapely/geos.py in errcheck_predicate(result, func, argtuple)
582 """Result is 2 on exception, 1 on True, 0 on False"""
583 if result == 2:
--> 584 raise PredicateError("Failed to evaluate %s" % repr(func))
585 return result
586
PredicateError: Failed to evaluate <_FuncPtr object at 0x13a2ea120>

TFLite model load error "RuntimeError: Encountered unresolved custom op: BroadcastArgs.Node number 0 (BroadcastArgs) failed to prepare."

I'm trying to load ma TF Lite model. The model was made from evaluation policy TF Agents
tf__dir = os.path.join(path_to_model)
interpreter = tf.lite.Interpreter(model_path=tf__dir)
interpreter.allocate_tensors()
And the is an error:
RuntimeError Traceback (most recent call last)
<ipython-input-17-ee8c0e46aeb2> in <module>
5 print(interpreter.get_input_details())
6 print(interpreter.get_output_details())
----> 7 interpreter.allocate_tensors()
8
9 # Get input and output tensors.
~/anaconda3/envs/genv/lib/python3.7/site-packages/tensorflow/lite/python/interpreter.py in allocate_tensors(self)
240 def allocate_tensors(self):
241 self._ensure_safe()
--> 242 return self._interpreter.AllocateTensors()
243
244 def _safe_to_run(self):
~/anaconda3/envs/genv/lib/python3.7/site-packages/tensorflow/lite/python/interpreter_wrapper/tensorflow_wrap_interpreter_wrapper.py in AllocateTensors(self)
108
109 def AllocateTensors(self):
--> 110 return _tensorflow_wrap_interpreter_wrapper.InterpreterWrapper_AllocateTensors(self)
111
112 def Invoke(self):
RuntimeError: Encountered unresolved custom op: BroadcastArgs.Node number 0 (BroadcastArgs) failed to prepare.
How can I solve it? No information with similar errors in Internet

I am not able to upload the github data link to Google Colab

what I am trying:
url ='https://github.com/Anubhav1107/Machine_Learning_A-Z/blob/master/Part%202%20-%20Regression/Section%205%20-%20Multiple%20Linear%20Regression/50_Startups.csv'
dataset = pd.read_csv(url)
and what I am getting:
ParserError Traceback (most recent call last) in ()
1 url ='https://github.com/Anubhav1107/Machine_Learning_A-Z/blob/master/Part%202%20-%20Regression/Section%205%20-%20Multiple%20Linear%20Regression/50_Startups.csv'
----> 2 dataset = pd.read_csv(url)
3 frames /usr/local/lib/python3.6/dist-packages/pandas/io/parsers.py
in read(self, nrows) 1993 def read(self, nrows=None): 1994
try:
-> 1995 data = self._reader.read(nrows) 1996 except StopIteration: 1997 if self._first_chunk:
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.read()
pandas/_libs/parsers.pyx in
pandas._libs.parsers.TextReader._read_low_memory()
pandas/_libs/parsers.pyx in
pandas._libs.parsers.TextReader._read_rows()
pandas/_libs/parsers.pyx in
pandas._libs.parsers.TextReader._tokenize_rows()
pandas/_libs/parsers.pyx in pandas._libs.parsers.raise_parser_error()
ParserError: Error tokenizing data. C error: Expected 1 fields in line
75, saw 7
Use this instead.
url = 'https://raw.githubusercontent.com/Anubhav1107/Machine_Learning_A-Z/master/Part%202%20-%20Regression/Section%205%20-%20Multiple%20Linear%20Regression/50_Startups.csv'
Otherwise you'll just get the HTML, not the CSV.

Why this errror appears during fit while creating decision Tree Classifier

Hi I am trying Decision Tree Classifier by following this video Hello World - Machine Learning Recipes #1 Google Developers.
Here is my Code.
#Import the Pandas library
import pandas as pd
#Load the train and test datasets to create two DataFrames
train_url = "http://s3.amazonaws.com/assets.datacamp.com/course/Kaggle/train.csv" train = pd.read_csv(train_url)
#Print the head of the train and test dataframes
train.head()
test_url = "http://s3.amazonaws.com/assets.datacamp.com/course/Kaggle/test.csv" test = pd.read_csv(test_url)
#Print the head of the train and test dataframes
test.head()
#from sklearn import tree
from sklearn import tree
#find the best feature to predict Survival rate
#define X_features and Y_labels
col_names=['Pclass','Age','SibSp','Parch']
X_features= train[col_names]
#assign survial to label
Y_labels= train.Survived
#create a decision tree classifier
clf=tree.DecisionTreeClassifier()
#fit (find patterns in Data)
clf=clf.fit(X_features, Y_labels)
clf.predict(test[col_names])
Getting Error
ValueError Traceback (most recent call last) in () 13#Y_train_sparse=Y_labels.to_sparse() 14 # fit (find patterns in Data) ---> 15 clf=clf.fit(X_features, Y_labels) 16 #clf.predict(test[col_names])
C:\Users\nitinahu\AppData\Local\Continuum\Anaconda3\lib\site-packages\sklearn\tree\tree.py
in fit(self, X, y, sample_weight, check_input, X_idx_sorted) 152
random_state = check_random_state(self.random_state) 153 if
check_input: --> 154 X = check_array(X, dtype=DTYPE,
accept_sparse="csc") 155 if issparse(X): 156 X.sort_indices()
C:\Users\nitinahu\AppData\Local\Continuum\Anaconda3\lib\site-packages\sklearn\utils\validation.py
in check_array(array, accept_sparse, dtype, order, copy,
force_all_finite, ensure_2d, allow_nd, ensure_min_samples,
ensure_min_features, warn_on_dtype, estimator) 396 % (array.ndim,
estimator_name)) 397 if force_all_finite: --> 398
_assert_all_finite(array) 399 400 shape_repr = _shape_repr(array.shape)
C:\Users\nitinahu\AppData\Local\Continuum\Anaconda3\lib\site-packages\sklearn\utils\validation.py
in _assert_all_finite(X) 52 and not np.isfinite(X).all()): 53 raise
ValueError("Input contains NaN, infinity" ---> 54 " or a value too
large for %r." % X.dtype) 55 56
ValueError: Input contains NaN, infinity or a value too large for
dtype('float32').
Just check all the values u r getting in the responses.
One or two is giving out of bound values and that is causing an overflow to occur.