I have to import a processed h5ad file, but it seems that X has been passed as a numpy array instead of a numpy matrix. See below:
# Read the data
data_path = "/home/bbb5130/snOMICS/maria/msrna.h5ad"
adata = sn.pp.read_h5ad(data_path, pr_process="Yes")
adata
But the output was:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In [15], line 3
1 # Read the data
2 data_path = "/home/bbb5130/snOMICS/maria/msrna.h5ad"
----> 3 adata = sn.pp.read_h5ad(data_path, pr_process="Yes")
4 adata
File ~/miniconda3/envs/snOMICS/lib/python3.9/site-packages/scanet/preprocessing.py:54, in Preprocessing.read_h5ad(cls, filename, pr_process)
51 return sc.read_h5ad(filename)
52 else:
53 # initial preprocessing as it is required later
---> 54 return cls._intial(adata)
File ~/miniconda3/envs/snOMICS/lib/python3.9/site-packages/scanet/preprocessing.py:35, in Preprocessing._intial(adata)
33 adata.var['mt'] = adata.var_names.str.startswith('MT-')
34 mito_genes = adata.var_names.str.startswith('MT-')
---> 35 adata.obs['percent_mito'] = np.sum(adata[:, mito_genes].X, axis=1).A1 / np.sum(adata.X, axis=1).A1
36 sc.pp.calculate_qc_metrics(adata, qc_vars=['mt'], percent_top=None, inplace=True)
37 sc.pp.filter_cells(adata, min_genes=0)
AttributeError: 'ArrayView' object has no attribute 'A1'
Is there anyway I can change the format, so the file can be read?
Thanks in advance.
Related
I wanna stemming my dataset. Before stemming, I did tokenize use nltk tokenize
You can see the output on the pic
Dataset
Col Values
But when i do stemming, it return error :
[Error][3]
TypeError Traceback (most recent call
last)
<ipython-input-102-7700a8e3235b> in <module>()
----> 1 df['Message'] = df['Message'].apply(stemmer.stem)
2 df = df[['Message', 'Category']]
3 df.head()
5 frames
/usr/local/lib/python3.7/dist-
packages/Sastrawi/Stemmer/Filter/TextNormalizer.py in
normalize_text(text)
2
3 def normalize_text(text):
----> 4 result = str.lower(text)
5 result = re.sub(r'[^a-z0-9 -]', ' ', result, flags =
re.IGNORECASE|re.MULTILINE)
6 result = re.sub(r'( +)', ' ', result, flags =
re.IGNORECASE|re.MULTILINE)
TypeError: descriptor 'lower' requires a 'str' object but received a
'list'
Hope all you guys can help me
Ive'been trying to transform all my logs in a dict through xmltodict.parse function
The thing is, when I try to convert a single row to a variable it works fine
a = xmltodict.parse(df['CONFIG'][0])
Same to
parsed[1] = xmltodict.parse(df['CONFIG'][1])
But when I try to iterate the entire dataframe and store it on a dictionaire I get the following
for ind in df['CONFIG'].index:
parsed[ind] = xmltodict.parse(df['CONFIG'][ind])
---------------------------------------------------------------------------
ExpatError Traceback (most recent call last)
/tmp/ipykernel_31/1871123186.py in <module>
1 for ind in df['CONFIG'].index:
----> 2 parsed[ind] = xmltodict.parse(df['CONFIG'][ind])
/opt/conda/lib/python3.9/site-packages/xmltodict.py in parse(xml_input, encoding, expat, process_namespaces, namespace_separator, disable_entities, **kwargs)
325 parser.ParseFile(xml_input)
326 else:
--> 327 parser.Parse(xml_input, True)
328 return handler.item
329
ExpatError: syntax error: line 1, column 0
Can you try this?
for ind in range(len(df['CONFIG'])):
parsed[ind] = xmltodict.parse(df['CONFIG'][ind])
I follow the template and change the link , but it doesn't work
https://colab.research.google.com/github/tensorflow/tensorflow/blob/master/tensorflow/lite/g3doc/tutorials/model_maker_image_classification.ipynb#scrollTo=3jz5x0JoskPv
This is my datasets
https://firebasestorage.googleapis.com/v0/b/lol-fypproject.appspot.com/o/lol.tgz?alt=media&token=d07b81bd-442f-4ebe-920e-3772598fbb20
original code
image_path = tf.keras.utils.get_file(
'flower_photos.tgz',
'https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz',
extract=True)
image_path = os.path.join(os.path.dirname(image_path), 'flower_photos')
I changed in that
image_path = tf.keras.utils.get_file(
'lol.tgz',
'https://firebasestorage.googleapis.com/v0/b/lol-fypproject.appspot.com/o/lol.tgz?alt=media&token=d07b81bd-442f-4ebe-920e-3772598fbb20',
extract=True)
image_path = os.path.join(os.path.dirname(image_path), 'lol')
the line wrong and error message is showed
data = ImageClassifierDataLoader.from_folder(image_path)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-15-a5e7646aca55> in <module>()
----> 1 data = ImageClassifierDataLoader.from_folder(image_path)
2 train_data, test_data = data.split(0.9)
/usr/local/lib/python3.7/dist-
packages/tensorflow_examples/lite/model_maker/core/data_util/image_dataloader.py
in
from_folder(cls, filename, shuffle)
69 all_image_size = len(all_image_paths)
70 if all_image_size == 0:
---> 71 raise ValueError('Image size is zero')
72
73 if shuffle:
ValueError: Image size is zero
I have find the problem
the path of the zip file is not the right structure as the sample
In a machine learning project, suppose I have 3 cat images and 2 dog images. when I make a dataframe for the training data.
#pre processing train data
filenames = os.listdir('/content/train')
categories = []
for filename in os.listdir('/content/train'):
category = filename.split('.')[0]
if category == 'dog' :
categories.append(1) #1 for dog and 0 for cat
else :
categories.append(0)
#make a dictonary
df = pd.DataFrame(
{
'filename' : filenames,
'category':categories
}
)
It gives an error because I haven't the same amount of dog, cat images.
ValueError Traceback (most recent call last)
<ipython-input-28-2d4e2440ba41> in <module>()
12 {
13 'filename' : filenames,
---> 14 'category' : categories
15 }
16 )
3 frames
/usr/local/lib/python3.6/dist-packages/pandas/core/internals/construction.py in extract_index(data)
395 lengths = list(set(raw_lengths))
396 if len(lengths) > 1:
--> 397 raise ValueError("arrays must all be same length")
398
399 if have_dicts:
ValueError: arrays must all be same length
Is there any way to fix it without adding any image to the training dataset?
I am using an amazon dataset to do sentiment analysis. Dataset content is
https://i.stack.imgur.com/qcKZp.png
dataset con be found on:
https://www.kaggle.com/PromptCloudHQ/amazon-reviews-unlocked-mobile-phones
I am trying to remove html from Review column.
This is what I am doing. Note: dataset is assigned to df.
df_removedNoise = []
def removingHTML(text):
soup = BeautifulSoup(text, 'lxml').get_text()
return soup
def removingNoise(text):
html_removed = removingHTML(text)
return html_removed
for i in df["Reviews"]:
text = removingNoise(i)
df_removedNoise.append(text)
Even though Reviews column has object as a datatype, I am still getting an error like.
TypeError Traceback (most recent call last)
<ipython-input-83-3591f5d7a54f> in <module>
9
10 for i in df["Reviews"]:
---> 11 df_removedNoise.append(removingNoise(i))
<ipython-input-83-3591f5d7a54f> in removingNoise(text)
5
6 def removingNoise(text):
----> 7 html_removed = removingHTML(text)
8 return html_removed
9
<ipython-input-83-3591f5d7a54f> in removingHTML(text)
1 df_removedNoise = []
2 def removingHTML(text):
----> 3 soup = BeautifulSoup(text, 'lxml').get_text()
4 return soup
5
~/anaconda3/lib/python3.7/site-packages/bs4/__init__.py in __init__(self, markup, features, builder, parse_only, from_encoding, exclude_encodings, **kwargs)
244 if hasattr(markup, 'read'): # It's a file-type object.
245 markup = markup.read()
--> 246 elif len(markup) <= 256 and (
247 (isinstance(markup, bytes) and not b'<' in markup)
248 or (isinstance(markup, str) and not '<' in markup)
TypeError: object of type 'float' has no len()
Any help will be appreciated!
Check for NaN with df[df['Reviews'].isnull()], if you find any try to dropna first