How do you read a txt file (from SQLCMD) into Pandas DataFrame? - sql

I've Google searched but haven't found a way to parse SQL txt file outputs and import as Pandas DataFrame. I have, within the cmd line:
sqlcmd -S server_name -E -Q "select top 10 * from table_name" -o "test.txt"
This produces a text file, which isn't exactly the best format, since it has dashed lines and a comment saying (10 rows affected), but whatever.
Now, I do:
import numpy as np
import pandas as pd
df_test = pd.read_csv('test.txt', sep = ' ')
And it produces an error:
ParserError: Error tokenizing data. C error: Expected 10006 fields in line 3, saw 14963
Anyone know how to parse a SQL test file within Python?
Thanks!
Edit: This would be the first column in the txt file:

Add error handeling to the read_csv:
df_test = pd.read_csv('test.txt', sep = ' ', errors='coerce')

Related

How to solve for delimter conflicts

I have a large .TXT file which is delimited by ";". Unfortunately some of my values contain ";" aswell, which in that case is not a delimiter but recognized as delimiter by pandas. Becasue of this I have difficulties reading the .txt. files into pandas because some lines have more columns than the others. Background: I am trying to combine several .txt files into 1 dataframe and get the following error: ParserError: Error tokenizing data. C error: Expected 21 fields in line 443, saw 22.
So when checking line 443 I saw indeed that that line had 1 more instance of ";" because it was part of one of the values.
Reproduction:
Text file 1:
1;2;3;4
23123213;23123213;23123213;23123213
123;123;123;123
123;123;123;123
1;1;1;1
123;123;123;123
12;12;12;12
3;3;3;3
Text file 2:
1;2;3;4
23123213;23123213;23123213;23123213
123;123;123;123
123;123;12;3;123
1;1;1;1
123;123;123;123
12;12;12;12
3;3;3;3
Code:
import pandas as pd
import glob
import os
path = r'C:\Users\file'
all_files = glob.glob(path + "/*.txt")
li = []
for filename in all_files:
df = pd.read_csv(filename, index_col=None, header=0, delimiter=';')
li.append(df)
frame = pd.concat(li, axis=0, ignore_index=True)

pandas read csv giving creating incorrect columns

I am using pandas to read a csv file
import pandas as pd
data = pd.read_csv('file_name.csv', "ISO-8859-1")
Op:
col1,col2,col3
"sample1","sample2","sample3"
but the DF is just creating 1 column instead of the 5 it is supposed to create. I checked the csv and its fine.
Any suggestions on why this could be happening will be useful.

Loading CSV file with pandas - Error Tokenizing

i am trying to load a csv file
import pandas as pd
dfc = pd.read_csv('data/Vehicles0515.csv', sep =',')
but i have the following error
ParserError: Error tokenizing data. C error: Expected 22 fields in line 3004427, saw 23
i have read to include error_bad_lines = False
but it doesn't solve the problem
Thanks a lot
Sometimes parser is getting confused by the head of csv file.
try this:
dfc = pd.read_csv('data/Vehicles0515.csv', header=None)
or
dfc = pd.read_csv('data/Vehicles0515.csv', skiprows=2)
Also you don't need provide an comma separator owing to fact that comma is default value in pandas read_csv method.

How to make pandas read csv input as string, not as an url

I'm trying to load a csv (from an API response) into pandas, but keep getting an error
"ValueError: stat: path too long for Windows" and "FileNotFoundError: [Errno 2] File b'"fwefwe","fwef..."
indicating that pandas interprets it as an url, not a string.
The code below causes the errors above.
fake_csv='"fwefwe","fwefw","fwefew";"2","5","7"'
df = pd.read_csv(fake_csv, encoding='utf8')
df
How do I force pandas to interpret my argument as a csv string?
You can do that using StringIO:
import io
fake_csv='"fwefwe","fwefw","fwefew";"2","5","7"'
df = pd.read_csv(io.StringIO(fake_csv), encoding='utf8', sep=',', lineterminator=';')
df
Result:
Out[30]:
fwefwe fwefw fwefew
0 2 5 7

inputting and aligning protein sequence

I have a script for finding mutated positions in protein sequence.The following script will do this.
import pandas as pd #data analysis python module
data = 'MTAQDDSYSDGKGDYNTIYLGAVFQLN,MTAQDDSYSDGRGDYNTIYLGAVFQLN,MTSQEDSYSDGKGNYNTIMPGAVFQLN,MTAQDDSYSDGRGDYNTIMPGAVFQLN,MKAQDDSYSDGRGNYNTIYLGAVFQLQ,MKSQEDSYSDGRGDYNTIYLGAVFQLN,MTAQDDSYSDGRGDYNTIYPGAVFQLN,MTAQEDSYSDGRGEYNTIYLGAVFQLQ,MTAQDDSYSDGKGDYNTIMLGAVFQLN,MTAQDDSYSDGRGEYNTIYLGAVFQLN' #protein sequences
df = pd.DataFrame(map(list,data.split(',')))
I = df.columns[(df.ix[0] != df).any()]
J = [pd.get_dummies(df[i], prefix=df[i].name+1, prefix_sep='') for i in I]
print df[[]].join(J)
Here I gave the data(hard coded) ie, input protein sequences .Normally in an application user has to give the input sequences ie, I mean soft coding.
Also here alignment is not done.I read biopython tutorial and i got following script,but I don't know how to add these scripts to above one.
from Bio import AlignIO
alignment = AlignIO.read("c:\python27\proj\data1.fasta", "fasta")
print alignment
How can I do these
What I have tried :
>>> import sys
>>> import pandas as pd
>>> from Bio import AlignIO
>>> data=sys.stdin.read()
MTAQDDSYSDGKGDYNTIYLGAVFQLN
MTAQDDSYSDGRGDYNTIYLGAVFQLN
MTSQEDSYSDGKGNYNTIMPGAVFQLN
MTAQDDSYSDGRGDYNTIMPGAVFQLN
MKAQDDSYSDGRGNYNTIYLGAVFQLQ
MKSQEDSYSDGRGDYNTIYLGAVFQLN
MTAQDDSYSDGRGDYNTIYPGAVFQLN
MTAQEDSYSDGRGEYNTIYLGAVFQLQ
MTAQDDSYSDGKGDYNTIMLGAVFQLN
MTAQDDSYSDGRGEYNTIYLGAVFQLN
^Z
>>> df=pd.DataFrame(map(list,data.split(',')))
>>> I=df.columns[(df.ix[0]!=df).any()]
>>> J=[pd.get_dummies(df[i],prefix=df[i].name+1,prefix_sep='')for i in I]
>>> print df[[]].join(J)
But it is giving empty DataFrame as output.
I also tried following, but i don't know how to load these sequences into my script
while 1:
var=raw_input("Enter your sequence here:")
print "you entered ",var
Please help me.
When you read in data via:
sys.stdin.read()
Sequences are separating using '\n' rather than ',' (printing data would confirm whether this is the case, it may be system dependent), so you should split using this:
df = pd.DataFrame(map(list,data.split('\n')))
A good way to check this kind of thing is to go through it line by line, where you would see that df was a one row DataFrame (which then propagates to make I empty).
Aside: what a well written piece of code you are using! :)