pandas read .txt doc with specific seperator - pandas

I'm trying to read a .txt file using pandas that has a ^ type separator.
I keep running into error UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf8 in position 7: invalid start byte
tried to use pd.read_csv(txt.file , sep = '^' , header = None)
txt.file has no headers ,
Am I missing an argument?
UPDATE:
13065^000000000^aaaaa^test , conditions^123455^^01.01.01:Date^^^^^^ 77502^000000123^aaaaa^test, conditions^123456^^^^^^^^
seems there is is uneven amount of the sep ^ on each row.
How could I fix?

Related

Error: 'utf-8' codec can't decode byte 0xb2 in position 181: invalid start byte

I'm trying to run the following code:
file_list=['TC20_test1.csv','TC20_test2.csv','TC20_test3.csv','TC20_test4.csv','TC20_test5.csv']
main_dataframe = pd.DataFrame(pd.read_csv(file_list[0])) #Read data
for i in range(1,len(file_list)):
data = pd.read_csv(file_list[i])
df = pd.DataFrame(data)
main_dataframe = pd.concat([main_dataframe,df],axis=1)
print(main_dataframe)
The main idea is to extract data from the same place in various csv files and add them to a data frame, however I keep getting this error code every time I try to run it.
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb2 in position 181: invalid start byte.
Any help would be greatly appreciated :))

Error 'utf-8' codec can't decode byte 0xf4

i'm trying to open an csv file but i'm having an error 'utf-8' codec can't decode byte 0xf4 in position 1: invalid continuation byte. What i did is :
df = pd.read_csv("covidrisks.csv")
I had to save the file under csv utf-8 and that solved my problem

'utf-8' codec can't decode byte 0xdb in position 1:

The following code gives me a
import pandas as pd
path = 'C:\\Users\\vlac284\\Desktop\\Lighting\\sample_Myer\\sample_2.xlsx
df = pd.read_csv(path)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xdb in position 1: invalid continuation byte
Similar postings did not help.
You have an Excel format file (.xlsx) and you are trying to read it as csv (read_csv). They are 2 different formats... that will not work.
Either save your Excel file as .csv or use a library / method that reads Excel.

Loading data from csv file using pandas

File "<ipython-input-10-1de27a02fcfe>", line 1
df = pd.read_csv('C:\Users\niharika\Downloads\questions-data.csv')
^
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape
Even though I place r in front of C, changing \ to / and also \\. Nothing worked to me.could someone please help me.
just try
df = pd.read_csv('C:\\Users\\niharika\\Downloads\\questions-data.csv')
Try adding an r in front of the string like this
df = pd.read_csv(r'C:\Users\niharika\Downloads\questions-data.csv')
This case seems similar to yours
Error reading csv file unicodeescape

Python can't decode \x byte

I have a csv file with about 9 million rows. While processing it in Python, I got an error:
UnicodeEncodeError: 'charmap' codec can't encode character '\xe9' in position 63: character maps to undefined
Turns out the string is Beyonc\xe9 . So I guess it's something like é.
I tried just printing '\xe' in Python and it failed:
>>> print('\xe')
File "<stdin>", line 1
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 0-2: truncated \xXX escape
So I can't even replace or strip the backslash by s.replace('\\x', '') or s.strip('\\x').
Is there a quick way to fix this over the whole file? I tried to set the encoding while reading the file:
pandas.read_csv(inputFile, encoding='utf-8')
but it didn't help. Same problem.
Python version:
python --version
Python 3.5.2
although I installed 3.6.5
Windows 10
Update:
Following #Matti's answer I changed the encoding in pandas.read_csv() to latin1 and now the string became Beyonc\xc3\xa9. And \xc3\xa9 is unicode for é.
This is the line that's failing:
print(str(title) + ' , ' + str(artist))
title = 'Crazy In Love'
artist = 'Beyonc\xc3\xa9'
api is from lyricsgenius
The '\xe9' in the error message isn't an actual backslash followed by letters, it's just a representation of a single byte in the file. Your file is probably encoded as Latin-1, not UTF-8 as you specify. Specify 'latin1' as the encoding instead.