Change excel cell number format "General" to "Text", using openpyxl 1.8.6 - openpyxl

How to change excel cell number format General to Text irrespective of data in cell?
I am using openpyxl 1.8.6.

I know this question is really old, but it could still be relevant as I just found it when Googling the same thing. The above will work most of the time, but not in those annoying cases where Excel starts acting like guys I've known in the past, and thinking everything is a date. ;)
I would do it like this:
cell = ws['A1']
cell.number_format = '#'
The '#' is a placeholder that forces text formatting. The docs suggest this works on version 1.8.6, as well as latest release.

Based on #SuperScienceGrl's great answer:
from openpyxl.styles import numbers
cell.number_format = numbers.FORMAT_TEXT
You can see a full list of format at their official readthedocs page

Number formatting applies only to numbers. If you want to change a number to text then you must change the datatype:
ws['A1'] = str(ws['A1'].value)
Version 1.8.6 is no longer supported. You should consider upgrading to a more recent release.

Applying #run_the_race answer to a column:
#!/usr/bin/env python3
from openpyxl import load_workbook
from openpyxl.styles import numbers
xlsx_file = 'file.xlsx'
# openning:
wb = load_workbook(filename = xlsx_file)
# set column O of default sheet to be text:
ws = wb.active
for row in ws[2:ws.max_row]: # skip the header
cell = row[14] # column O
cell.number_format = numbers.FORMAT_TEXT
# saving:
wb.save(xlsx_file)
Same wrapped in a script accepting a column "name", like AA:
#!/usr/bin/env python3
import argparse
from openpyxl import load_workbook
from openpyxl.styles import numbers
# ==============
## parsing args:
desc="""
Converts given column of the xlsx file (default sheet) to a text format.
Dependencies:
pip3 install --user --upgrade openpyxl
"""
parser = argparse.ArgumentParser(description=desc, formatter_class=argparse.RawDescriptionHelpFormatter)
parser.add_argument('--version', action='version', version='%(prog)s 0.01')
parser.add_argument('-f', '--file',
help = "xlsx file",
dest = 'xlsx_file',
type = argparse.FileType('r'),
)
parser.add_argument('-c', '--column',
help = "column (default to %(default)s)",
dest = 'column',
type = str,
default = "A",
)
args = parser.parse_args()
# =========
## program:
xlsx_file = args.xlsx_file.name
column_number = sum(
[ ord(char) - 97 + i*26 for i,char in enumerate(
list( args.column.lower() )
) ]
)
# openning:
wb = load_workbook(filename = xlsx_file)
# convert given column of the default sheet to a text format:
ws = wb.active
for row in ws[2:ws.max_row]: # skip the header
cell = row[column_number]
cell.number_format = numbers.FORMAT_TEXT
# saving:
wb.save(xlsx_file)

Related

Change number format in Excel using names of headers - openpyxl [duplicate]

I have an Excel (.xlsx) file that I'm trying to parse, row by row. I have a header (first row) that has a bunch of column titles like School, First Name, Last Name, Email, etc.
When I loop through each row, I want to be able to say something like:
row['School']
and get back the value of the cell in the current row and the column with 'School' as its title.
I've looked through the OpenPyXL docs but can't seem to find anything terribly helpful.
Any suggestions?
I'm not incredibly familiar with OpenPyXL, but as far as I can tell it doesn't have any kind of dict reader/iterator helper. However, it's fairly easy to iterate over the worksheet rows, as well as to create a dict from two lists of values.
def iter_worksheet(worksheet):
# It's necessary to get a reference to the generator, as
# `worksheet.rows` returns a new iterator on each access.
rows = worksheet.rows
# Get the header values as keys and move the iterator to the next item
keys = [c.value for c in next(rows)]
for row in rows:
values = [c.value for c in row]
yield dict(zip(keys, values))
Excel sheets are far more flexible than CSV files so it makes little sense to have something like DictReader.
Just create an auxiliary dictionary from the relevant column titles.
If you have columns like "School", "First Name", "Last Name", "EMail" you can create the dictionary like this.
keys = dict((value, idx) for (idx, value) in enumerate(values))
for row in ws.rows[1:]:
school = row[keys['School'].value
I wrote DictReader based on openpyxl. Save the second listing to file 'excel.py' and use it as csv.DictReader. See usage example in the first listing.
with open('example01.xlsx', 'rb') as source_data:
from excel import DictReader
for row in DictReader(source_data, sheet_index=0):
print(row)
excel.py:
__all__ = ['DictReader']
from openpyxl import load_workbook
from openpyxl.cell import Cell
Cell.__init__.__defaults__ = (None, None, '', None) # Change the default value for the Cell from None to `` the same way as in csv.DictReader
class DictReader(object):
def __init__(self, f, sheet_index,
fieldnames=None, restkey=None, restval=None):
self._fieldnames = fieldnames # list of keys for the dict
self.restkey = restkey # key to catch long rows
self.restval = restval # default value for short rows
self.reader = load_workbook(f, data_only=True).worksheets[sheet_index].iter_rows(values_only=True)
self.line_num = 0
def __iter__(self):
return self
#property
def fieldnames(self):
if self._fieldnames is None:
try:
self._fieldnames = next(self.reader)
self.line_num += 1
except StopIteration:
pass
return self._fieldnames
#fieldnames.setter
def fieldnames(self, value):
self._fieldnames = value
def __next__(self):
if self.line_num == 0:
# Used only for its side effect.
self.fieldnames
row = next(self.reader)
self.line_num += 1
# unlike the basic reader, we prefer not to return blanks,
# because we will typically wind up with a dict full of None
# values
while row == ():
row = next(self.reader)
d = dict(zip(self.fieldnames, row))
lf = len(self.fieldnames)
lr = len(row)
if lf < lr:
d[self.restkey] = row[lf:]
elif lf > lr:
for key in self.fieldnames[lr:]:
d[key] = self.restval
return d
The following seems to work for me.
header = True
headings = []
for row in ws.rows:
if header:
for cell in row:
headings.append(cell.value)
header = False
continue
rowData = dict(zip(headings, row))
wantedValue = rowData['myHeading'].value
I was running into the same issue as described above. Therefore I created a simple extension called openpyxl-dictreader that can be installed through pip. It is very similar to the suggestion made by #viktor earlier in this thread.
The package is largely based on source code of Python's native csv.DictReader class. It allows you to select items based on column names using openpyxl. For example:
import openpyxl_dictreader
reader = openpyxl_dictreader.DictReader("names.xlsx", "Sheet1")
for row in reader:
print(row["First Name"], row["Last Name"])
Putting this here for reference.

Autocorrect a column in a pandas dataframe using pyenchant

I tried to apply the code from the accepted answer of this question to one of my dataframe columns where each row is a sentence, but it didn't work.
My code looks this:
from enchant.checker import SpellChecker
checker = SpellChecker("id_ID")
h = df['Jawaban'].astype(str).str.lower()
hayo = []
for text in h:
checker.set_text(text)
for s in checker:
sug = s.suggest()[0]
s.replace(sug)
hayo.append(checker.get_text())
I got this following error:
IndexError: list index out of range
Any help is greatly appreciated.
I don't get the error using your code. The only thing I'm doing differently is to import the spell checker.
from enchant.checker import SpellChecker
checker = SpellChecker('en_US','en_UK') # not using id_ID
# sample data
ds = pd.DataFrame({ 'text': ['here is a spllng mstke','the wrld is grwng']})
p = ds['text'].str.lower()
hayo = []
for text in p:
checker.set_text(text)
for s in checker:
sug = s.suggest()[0]
s.replace(sug)
print(checker.get_text())
hayo.append(checker.get_text())
print(hayo)
here is a spelling mistake
the world is growing

how to make R datafile to Python type

I want to make R datatype to Python datatype below is the whole code
def convert_datafiles(datasets_folder):
import rpy2.robjects
rpy2.robjects.numpy2ri.activate()
pandas2ri.activate()
for root, dirs, files in os.walk(datasets_folder):
for name in files:
# sort out .RData files
if name.endswith('.RData'):
name_ = os.path.splitext(name)[0]
name_path = os.path.join(datasets_folder, name_)
# creat sub-directory
if not os.path.exists(name_path):
os.makedirs(name_path)
file_path = os.path.join(root, name)
robj = robjects.r.load(file_path)
# check out subfiles in the data frame
for var in robj:
###### error happend right here
myRData = pandas2ri.ri2py_dataframe( var )
####error happend right here
# convert to DataFrame
if not isinstance(myRData, pd.DataFrame):
myRData = pd.DataFrame(myRData)
var_path = os.path.join(datasets_folder,name_,var+'.csv')
myRData.to_csv(var_path)
os.remove(os.path.join(datasets_folder, name)) # clean up
print ("=> Success!")
I want to make R datatype to pythone type, but the error keeps popping up like this : AttributeError: 'str' object has no attribute 'dtype'
How should I do to resolve this error?
The rpy2 documentation is somewhat incomplete when it comes to interaction with pandas, but unit tests will provide examples of conversion. For example:
rdataf = robjects.r('data.frame(a=1:2, '
' b=I(c("a", "b")), '
' c=c("a", "b"))')
with localconverter(default_converter + rpyp.converter) as cv:
pandas_df = robjects.conversion.ri2py(rdataf)

Break document sections into list for export Python

I am very new to Python, and I am trying to break some legal documents into sections for export into SQL. I need to do two things:
Define the section numbers by the table of contents, and
Break up the document given the defined section numbers
The table of contents lists section numbers: 1.1, 1.2, 1.3, etc.
Then the document itself is broken up by those section numbers:
1.1 "...Text...",
1.2 "...Text...",
1.3 "...Text...", etc.
Similar to the chapters of a book, but delimited by ascending decimal numbers.
I have the document parsed using Tika, and I've been able to create a list of sections with some basic regex:
import tika
import re
from tika import parser
parsed = parser.from_file('test.pdf')
content = (parsed["content"])
headers = re.findall("[0-9]*[.][0-9]",content)
Now I need to do something like this:
splitsections = content.split() by headers
var_string = ', '.join('?' * len(splitsections))
query_string = 'INSERT INTO table VALUES (%s);' % var_string
cursor.execute(query_string, splitsections)
Sorry if all this is unclear. Still very new to this.
Any help you can provide would be most appreciated.
Everything tested except the last part with DB. Also the code can be improved, but this is another task. The main task is done.
In the list split_content there are all pieces of info you wanted (i.e. the text between 2.1 and 2.2, then 2.2 and 2.3, and so on, EXCLUDING num+name of sections itself (i.e. excluding 2.1 Continuation, 2.2 Name and so on).
I replaced tika by PyPDF2, as tika does not provide instruments needed for this task (i.e. I did not find how to provide the num of page I need and get its content).
def get_pdf_content(pdf_path,
start_page_table_contents, end_page_table_contents,
first_parsing_page, last_phrase_to_stop):
"""
:param pdf_path: Full path to the PDF file
:param start_page_table_contents: The page where the "Contents table" starts
:param end_page_table_contents: The page where the "Contents Table" ends
(i.e. the number of the page where Contents Table ENDs, i.e. not the next one)
:param first_parsing_page: The 1st page where we need to start data grabbing
:param last_phrase_to_stop: The phrase that tells the code where to stop grabbing.
The phrase must match exactly what is written in PDF.
This phrase will be excluded from the grabbed data.
:return:
"""
# ======== GRAB TABLE OF CONTENTS ========
start_page = start_page_table_contents
end_page = end_page_table_contents
table_of_contents_page_nums = range(start_page-1, end_page)
sections_of_articles = [] # ['2.1 Continuation', '2.2 Name', ... ]
open_file = open(pdf_path, "rb")
pdf = PyPDF2.PdfFileReader(open_file)
for page_num in table_of_contents_page_nums:
page_content = pdf.getPage(page_num).extractText()
page_sections = re.findall("[\d]+[.][\d][™\s\w;,-]+", page_content)
for section in page_sections:
cleared_section = section.replace('\n', '').strip()
sections_of_articles.append(cleared_section)
# ======== GRAB ALL NECESSARY CONTENT (MERGE ALL PAGES) ========
total_num_pages = pdf.getNumPages()
parsing_pages = range(first_parsing_page-1, total_num_pages)
full_parsing_content = '' # Merged pages
for parsing_page in parsing_pages:
page_content = pdf.getPage(parsing_page).extractText()
cleared_page = page_content.replace('\n', '')
# Remove page num from the start of "page_content"
# Covers the case with the page 65, 71 and others when the "page_content" starts
# with, for example, "616.6 Liability to Partners. (a) It is understood that"
# i.e. "61" is the page num and "6.6 Liability ..." is the section data
already_cleared = False
first_50_chars = cleared_page[:51]
for section in sections_of_articles:
if section in first_50_chars:
indx = cleared_page.index(section)
cleared_page = cleared_page[indx:]
already_cleared = True
break
# Covers all other cases
if not already_cleared:
page_num_to_remove = re.match(r'^\d+', cleared_page)
if page_num_to_remove:
cleared_page = cleared_page[len(str(page_num_to_remove.group(0))):]
full_parsing_content += cleared_page
# ======== BREAK ALL CONTENT INTO PIECES ACCORDING TO TABLE CONTENTS ========
split_content = []
num_sections = len(sections_of_articles)
for num_section in range(num_sections):
start = sections_of_articles[num_section]
# Get the last piece, i.e. "11.16 FATCA" (as there is no any "end" section after "11.16 FATCA", so we cant use
# the logic like "grab info between sections 11.1 and 11.2, 11.2 and 11.3 and so on")
if num_section == num_sections-1:
end = last_phrase_to_stop
else:
end = sections_of_articles[num_section + 1]
content = re.search('%s(.*)%s' % (start, end), full_parsing_content).group(1)
cleared_piece = content.replace('™', "'").strip()
if cleared_piece[0:3] == '. ':
cleared_piece = cleared_piece[3:]
# There are few appearances of "[Signature Page Follows]", as a "last_phrase_to_stop".
# We need the text between "11.16 FATCA" and the 1st appearance of "[Signature Page Follows]"
try:
indx = cleared_piece.index(end)
cleared_piece = cleared_piece[:indx]
except ValueError:
pass
split_content.append(cleared_piece)
# ======== INSERT TO DB ========
# Did not test this section
for piece in split_content:
var_string = ', '.join('?' * len(piece))
query_string = 'INSERT INTO table VALUES (%s);' % var_string
cursor.execute(query_string, parts)
How to use: (one of the possible way):
1) Save the code above in my_pdf_code.py
2) In the python shell:
import path.to.my_pdf_code as the_code
the_code.get_pdf_content('/home/username/Apollo_Investment_Fund_VIII_LPA_S1.pdf', 2, 4, 24, '[Signature Page Follows]')

How do I import xyz and roll/pitch/yaw from csv file to Blender?

I want to know if it is possible to import data of attitude and position (roll/pitch/yaw & xyz) from a comma separated file to Blender?
I recorded data from a little RC car and I want to represent its movement in a 3D world.
I have timestamps too, so if there's a way to animated the movement of the object it'll be superb!!
Any help will be greatly appreciated!!
Best Regards.
A slight modifcation, making use of the csv module
import bpy
import csv
position_vectors = []
filepath = "C:\\Work\\position.log"
csvfile = open(filepath, 'r', newline='')
ofile = csv.reader(csvfile, delimiter=',')
for row in ofile:
position_vectors.append(tuple([float(i) for i in row]))
csvfile.close()
This will get your points into Blender. Note the delimiter parameter in csv.reader, change that accordingly. With a real example file of your RC car we could provide a more complete solution.
For blender v2.62:
If you have a file "positions.log" looking like:
-8.691985196313894e-002; 4.119284642631801e-001; -5.832147659661263e-001
1.037146774956164e+000; 8.137243553005405e-002; -5.703274929662892e-001
-3.602584527944123e-001; 8.378614512537046e-001; 2.615265921163826e-001
6.266465707681335e-001; -1.128416901202341e+000; -1.664644365541639e+000
3.327523280880091e-001; 4.488553740582839e-001; -2.449449085462368e+000
-7.311567199869298e-001; -1.860587923723032e+000; -1.297179602213110e+000
-7.453603745688361e-003; 4.770473577895327e-001; -2.319515785100494e+000
1.935170866863264e-001; -2.010280476717868e+000; 3.748000986190077e-001
5.201529166915653e-001; 3.952972788761738e-001; 1.658581747430548e+000
4.719198263774027e-001; 1.526020825619557e+000; 3.187088567866725e-002
you can read it with this python script in blender (watch out for the indentation!)
import bpy
from mathutils import *
from math import *
from bpy.props import *
import os
import time
# Init
position_vector = []
# Open file
file = open("C:\\Work\\position.log", "r")
# Loop over line in file
for line in file:
# Split line at ";"
splittet_line = line.split(";")
# Append new postion
position_vector.append(
Vector((float(splittet_line[0]),
float(splittet_line[1]),
float(splittet_line[2]))))
# Close file
file.close()
# Get first selected object
selected_object = bpy.context.selected_objects[0]
# Get first selected object
for position in position_vector:
selected_object.location = position
This reads the file and updates the position of the first selected object accordingly. Way forward: What you have to find out is how to set the keyframes for the animation...
Consider this python snippet to add to the solutions above
obj = bpy.context.object
temporalScale=bpy.context.scene.render.fps
for lrt in locRotArray:
obj.location = (lrt[0], lrt[1], lrt[2])
# radians, and do you want XYZ, or ZYX?
obj.rotation_euler = (lrt[3], lrt[4], lrt[5])
time = lrt[6]*temporalScale
obj.keyframe_insert(data_path="location", frame=time)
obj.keyframe_insert(data_path="rotation_euler", frame=time)
I haven't tested it, but it will probably work, and gets you started.
With a spice2xyzv file as input file. The script writed by "Mutant Bob" seems to work.
But the xyz velocity data are km/s not euler angles, I think, and the import does not work for the angles.
# Records are <jd> <x> <y> <z> <vel x> <vel y> <vel z>
# Time is a TDB Julian date
# Position in km
# Velocity in km/sec
2456921.49775 213928288.518 -446198013.001 -55595492.9135 6.9011736 15.130842 0.54325805
Is there a solution to get them in Blender? Should I convert velocity angle to euler, is that possible in fact?
I use this script :
import bpy
from mathutils import *
from math import *
from bpy.props import *
import os
import time
# Init
position_vector = []
# Open file
file = open("D:\\spice2xyzv\\export.xyzv", "r")
obj = bpy.context.object
temporalScale=bpy.context.scene.render.fps
for line in file:
# Split line at ";"
print("line = %s" % line)
line = line.replace("\n","")
locRotArray = line.split(" ")
print("locRotArray = %s" % locRotArray )
#for lrt in locRotArray:
print(locRotArray[1])
obj.location = (float(locRotArray[1]), float(locRotArray[2]), float(locRotArray[3]))
# radians, and do you want XYZ, or ZYX?
obj.rotation_euler = (float(locRotArray[4]), float(locRotArray[5]), float(locRotArray[5]))
time = float(locRotArray[0])*temporalScale
print("time = %s" % time)
obj.keyframe_insert(data_path="location", frame=time)
obj.keyframe_insert(data_path="rotation_euler", frame=time)