Converting xls to xlsx using xlrd - openpyxl

I am using the exact script below (except the file path) to try to convert xls to xlsx. The script is successful but not producing any output. Am I missing something basic like inputting some variable values or file names, or not saving the file properly as a xlsx?
import xlrd
import os
from openpyxl.workbook import Workbook
filenames = os.listdir("file_path")
for fname in filenames:
if fname.endswith(".xls"):
def cvt_xls_to_xlsx(fname):
book_xls = xlrd.open_workbook(fname)
book_xlsx = Workbook()
sheet_names = book_xls.sheet_names()
for sheet_index in range(0,len(sheet_names)):
sheet_xls = book_xls.sheet_by_name(sheet_names[sheet_index])
if sheet_index == 0:
sheet_xlsx = book_xlsx.active
sheet_xlsx.title = sheet_names[sheet_index]
else:
sheet_xlsx = book_xlsx.create_sheet(title=sheet_names[sheet_index])
for row in range(0, sheet_xls.nrows):
for col in range(0, sheet_xls.ncols):
sheet_xlsx.cell(row = row+1 , column = col+1).value = sheet_xls.cell_value(row, col)
cvt_xls_to_xlsx(fname)
book_xlsx.save(fname.xlsx)
I have updated the above script with the last two rows of commands, but I am now getting the following error message:
File "C:\Users\local\Documents\Tasks\Python\Excelconvert.py", line 26, in
cvt_xls_to_xlsx(fname)
File "C:\Users\local\Documents\Tasks\Python\Excelconvert.py", line 10, in cvt_xls_to_xlsx
book_xls = xlrd.open_workbook(fname)
File "C:\Users\local\Python34\lib\site-packages\xlrd__init__.py", line 157, in open_workbook
ragged_rows=ragged_rows,
File "C:\Users\local\Python34\lib\site-packages\xlrd\book.py", line 92, in open_workbook_xls
biff_version = bk.getbof(XL_WORKBOOK_GLOBALS)
File "C:\Users\local\Python34\lib\site-packages\xlrd\book.py", line 1278, in getbof
bof_error('Expected BOF record; found %r' % self.mem[savpos:savpos+8])
File "C:\Users\local\Python34\lib\site-packages\xlrd\book.py", line 1272, in bof_error
raise XLRDError('Unsupported format, or corrupt file: ' + msg)
xlrd.biffh.XLRDError: Unsupported format, or corrupt file: Expected BOF record; found b'# - Copy'
What does the '# - Copy' mean? I have seen others suggest it relates to the file format but usually that is an xml error, and not my above. Any solutions would be appreciated.

Related

PDF to txt python3

I'm trying to convert pdf file to txt.
`
import re
import PyPDF2
with open('123.pdf', 'rb') as pdfFileObj:
pdfreader = PyPDF2.PdfFileReader(pdfFileObj)
x = pdfreader.numPages
pageObj = pdfreader.getPage(x + 1)
text = pageObj.extractText()
file1 = open(f"C:\\Users\\honorr\\Desktop\\ssssssss\{re.sub('pdf$','txt',pdfFileObj)}", "a")
file1.writelines(text)
file1.close()
Errors:
Traceback (most recent call last):
File "C:\Users\honorr\Desktop\ssssssss\main.py", line 5, in <module>
pageobj = pdfreader.getPage(x + 1)
File "C:\Users\honorr\Desktop\ssssssss\venv\lib\site-packages\PyPDF2\_reader.py", line 477, in getPage
return self._get_page(pageNumber)
File "C:\Users\honorr\Desktop\ssssssss\venv\lib\site-packages\PyPDF2\_reader.py", line 492, in _get_page
return self.flattened_pages[page_number]
IndexError: list index out of range
`
How to fix it?
So i don't know why i have this errors. Maybe somebody tell me another way to convert from PDF to TXT?
You're setting x to the number of pages, but then trying to get page x + 1, which doesn't exist. Depending on how the library is implemented (I'm not familiar with PyPDF2), you may need to try pdfreader.getPage(x) or pdfreader.getPage(x - 1) to get it to work. This will only get the last page in the document though.

Unknown report type: xlsx: - Odoo 10

I am trying to create an xlsx report. I tried below code but getting an error:
Traceback (most recent call last):
File "/home/shar/Projects/git/odoo/addons/web/controllers/main.py", line 72, in wrap
return f(*args, **kwargs)
File "/home/shar/Projects/git/odoo/addons/web/controllers/main.py", line 1485, in index
request.session.db, request.session.uid, request.session.password, report_id])
File "/home/shar/Projects/git/odoo/odoo/http.py", line 118, in dispatch_rpc
result = dispatch(method, params)
File "/home/shar/Projects/git/odoo/odoo/service/report.py", line 35, in dispatch
res = fn(db, uid, *params)
File "/home/shar/Projects/git/odoo/odoo/service/report.py", line 142, in exp_report_get
return _check_report(report_id)
File "/home/shar/Projects/git/odoo/odoo/service/report.py", line 120, in _check_report
raise UserError('%s: %s' % (exc.message, exc.traceback))
UserError: (u"Unknown report type: xlsx: (, NotImplementedError(u'Unknown report type: xlsx',), )", '')
Here is my code:
*.py
# -*- coding: utf-8 -*-
from odoo.addons.report_xlsx.report.report_xlsx import ReportXlsx
class PartnerXlsx(ReportXlsx):
def generate_xlsx_report(self, workbook, data, partners):
for obj in partners:
report_name = obj.name
# One sheet by partner
sheet = workbook.add_worksheet(report_name[:31])
bold = workbook.add_format({'bold': True})
sheet.write(0, 0, obj.name, bold)
PartnerXlsx('report.module_name.res.partner.xlsx',
'res.partner')
*.xml
<report
id="partner_xlsx"
model="res.partner"
string="Print to XLSX"
report_type="xlsx"
name="res.partner.xlsx"
file="res.partner.xlsx"
attachment_use="False"
/>
Your code seems right, but remember that all other odoo rules still apply, don't forget to:
Add 'report_xlsx' as a dependency in _openerp_.py manifest
Add your .xml file in data dict inside _openerp_.py manifest ('data': ['report/file.xml'])
Add an __init__.py file with from . import <report_file_name> inside your report folder (where your .py file lies, and desirably, also your xml file, as declared on the manifest).
Add from . import report inside your addon __init__.py file
Update your addon inside odoo app.
Should work after that.
We don't have a report_type as xlsx format
we have only qweb-pdf,qweb-html,controller
report_type will accept anyone of these (qweb-pdf,qweb-html,controller)
It does not have xlsx as such
Please refer 'ir.actions.report.xml' class for further reference

How to write a pickle file to S3, as a result of a luigi Task?

I want to store a pickle file on S3, as a result of a luigi Task. Below is the class that defines the Task:
class CreateItemVocabulariesTask(luigi.Task):
def __init__(self):
self.client = S3Client(AwsConfig().aws_access_key_id,
AwsConfig().aws_secret_access_key)
super().__init__()
def requires(self):
return [GetItem2VecDataTask()]
def run(self):
filename = 'item2vec_results.tsv'
data = self.client.get('s3://{}/item2vec_results.tsv'.format(AwsConfig().item2vec_path),
filename)
df = pd.read_csv(filename, sep='\t', encoding='latin1')
unique_users = df['CustomerId'].unique()
unique_items = df['ProductNumber'].unique()
item_to_int, int_to_item = utils.create_lookup_tables(unique_items)
user_to_int, int_to_user = utils.create_lookup_tables(unique_users)
with self.output()[0].open('wb') as out_file:
pickle.dump(item_to_int, out_file)
with self.output()[1].open('wb') as out_file:
pickle.dump(int_to_item, out_file)
with self.output()[2].open('wb') as out_file:
pickle.dump(user_to_int, out_file)
with self.output()[3].open('wb') as out_file:
pickle.dump(int_to_user, out_file)
def output(self):
files = [S3Target('s3://{}/item2int.pkl'.format(AwsConfig().item2vec_path), client=self.client),
S3Target('s3://{}/int2item.pkl'.format(AwsConfig().item2vec_path), client=self.client),
S3Target('s3://{}/user2int.pkl'.format(AwsConfig().item2vec_path), client=self.client),
S3Target('s3://{}/int2user.pkl'.format(AwsConfig().item2vec_path), client=self.client),]
return files
When I run this task I get the error ValueError: Unsupported open mode 'wb'. The items I try to dump into a pickle file are just python dictionaries.
Full traceback:
Traceback (most recent call last):
File "C:\Anaconda3\lib\site-packages\luigi\worker.py", line 203, in run
new_deps = self._run_get_new_deps()
File "C:\Anaconda3\lib\site-packages\luigi\worker.py", line 140, in _run_get_new_deps
task_gen = self.task.run()
File "C:\Users\user\Documents\python workspace\pipeline.py", line 60, in run
with self.output()[0].open('wb') as out_file:
File "C:\Anaconda3\lib\site-packages\luigi\contrib\s3.py", line 714, in open
raise ValueError("Unsupported open mode '%s'" % mode)
ValueError: Unsupported open mode 'wb'
This is an issue that only happens on python 3.x as explained here. In order to use python 3 and write a binary file or target (ie using 'wb' mode) just set format parameter for S3Target to Nop. Like this:
S3Target('s3://path/to/file', client=self.client, format=luigi.format.Nop)
Notice it's just a trick and not so intuitive nor documented.

Python: index out of range but I can't see why

I'm trying to code a program that opens a file, creates a list that contains each line of this file, and removes some words from this list. I have an Index Out of Range error.
#! /usr/bin/python3
# open the file
f = open("test.txt", "r")
# a list that contains each line of my file (without \n)
lines = f.read().splitlines()
# close the file
f.close()
# some words I want to delete from the file
data = ["fire", "water"]
# for each line of the file...
for i in range(len(lines)):
# if this line is in [data]
if lines[i] in data:
# delete this line from [data]
print(lines[i])
del lines[i]
This is my text file:
sun
moon
*
fire
water
*
metal
This is my output:
fire
Traceback (most recent call last):
File "debug.py", line 16, in <module>
if lines[i] in data:
IndexError: list index out of range
I'll address the out of index error first.
What's happening, in your first post, is that it takes the length of the list
which is 7, so len(lines) = 7 which will result in range(7) = (0,7) which will do this:
lines[0] = sun
lines[1] = moon
lines[2] = *
lines[3] = fire
lines[4] = water
lines[5] = *
lines[6] = metal
=>
lines[0] = sun
lines[1] = moon
lines[2] = *
deleted lines[3] = fire
lines[3] = water
lines[4] = *
lines[5] = metal
If you now delete lines[3] it will still try to iterate over lines[6] although it won't exist anymore.
It will also continue to iterate over the next index lines[4] so water will never be checked, it is absorbed.
I hope this helps and I suggest you do this instead:
# open the file
f = open("test.txt", "r")
# a list that contains each line of my file (without \n)
lines = f.read().splitlines()
# close the file
f.close()
# some words I want to delete from the file
data = ["fire", "water"]
for i in data:
if i in lines:
lines.remove(i)
print(lines)
Output:
['sun', 'moon', '', '', 'metal']

pypyodbc execute returns list index out of range error

I have a function that runs 3 queries and returns the result of the last (using the previous ones to create the last) when I get to the 3rd query, it get a list index our of range error. I have ran this exact query as the first query (with manually entered variables) and it worked fine.
This is my code:
import pypyodbc
def sql_conn():
conn = pypyodbc.connect(r'Driver={SQL Server};'
r'Server=HPSQL31\ni1;'
r'Database=tq_hp_prod;'
r'Trusted_Connection=yes;')
cursor = conn.cursor()
return conn, cursor
def get_number_of_jobs(ticket):
# Get Connection
conn, cursor = sql_conn()
# Get asset number
sqlcommand = "select top 1 item from deltickitem where dticket = {} and cat_code = 'Trq sub'".format(ticket)
cursor.execute(sqlcommand)
asset = cursor.fetchone()[0]
print(asset)
# Get last MPI date
sqlcommand = "select last_test from prevent where item = {} and description like '%mpi'".format(asset)
cursor.execute(sqlcommand)
last_recal = cursor.fetchone()[0]
print(last_recal)
# Get number of jobs since last recalibration
sqlcommand = """select count(i.item)
from deltickhdr as d
join deltickitem as i
on d.dticket = i.dticket
where i.start_rent >= '2017-03-03 00:00:00'
and i.meterstart <> i.meterstop
and i.item = '002600395'""" #.format(last_recal, asset)
cursor.execute(sqlcommand)
num_jobs = cursor.fetchone()[0]
print(num_jobs)
cursor.close()
conn.close()
return num_jobs
ticketnumber = 14195 # int(input("Ticket: "))
get_number_of_jobs(ticketnumber)
Below is the error(s) i get when i get to the 3rd cursor.execute(sqlcommand)
Traceback (most recent call last):
File "C:\Program Files\JetBrains\PyCharm Community Edition 2016.3.2\helpers\pydev\pydevd.py", line 1596, in <module>
globals = debugger.run(setup['file'], None, None, is_module)
File "C:\Program Files\JetBrains\PyCharm Community Edition 2016.3.2\helpers\pydev\pydevd.py", line 974, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "C:\Program Files\JetBrains\PyCharm Community Edition 2016.3.2\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "C:/Users/bdrillin/PycharmProjects/Torque_Turn_Data/tt_sub_ui.py", line 56, in <module>
get_number_of_jobs(ticketnumber)
File "C:/Users/bdrillin/PycharmProjects/Torque_Turn_Data/tt_sub_ui.py", line 45, in get_number_of_jobs
cursor.execute(sqlcommand)
File "C:\ProgramData\Anaconda3\lib\site-packages\pypyodbc.py", line 1470, in execute
self._free_stmt(SQL_CLOSE)
File "C:\ProgramData\Anaconda3\lib\site-packages\pypyodbc.py", line 1994, in _free_stmt
check_success(self, ret)
File "C:\ProgramData\Anaconda3\lib\site-packages\pypyodbc.py", line 1007, in check_success
ctrl_err(SQL_HANDLE_STMT, ODBC_obj.stmt_h, ret, ODBC_obj.ansi)
File "C:\ProgramData\Anaconda3\lib\site-packages\pypyodbc.py", line 972, in ctrl_err
state = err_list[0][0]
IndexError: list index out of range
Any help would be great
I've had the same error.
Even though I haven't come to the definite conclusion about what this error means I thought my guessing might help anyone else ending up here.
In my case, the problem was a conflict with a datatype length (NVARCHAR(24) and CHAR(10)).
So I guess this IndexError in ctrl_err function just means there is an error in your SQL code that pypyodbc does not know how to handle.
I know this is not much of an answer, but I know it would have saved me a couple of hours had I known this was not some bug in pypyodbc but an inconsistency in the data I was inserting.
Kind regards,
Luka