AttributeError: 'GPT2TokenizerFast' object has no attribute 'max_len' - tokenize

I am just using the huggingface transformer library and get the following message when running run_lm_finetuning.py: AttributeError: 'GPT2TokenizerFast' object has no attribute 'max_len'. Anyone else with this problem or an idea how to fix it? Thanks!
My full experiment run:
mkdir experiments
for epoch in 5
do
python run_lm_finetuning.py
--model_name_or_path distilgpt2
--model_type gpt2
--train_data_file small_dataset_train_preprocessed.txt
--output_dir experiments/epochs_$epoch
--do_train
--overwrite_output_dir
--per_device_train_batch_size 4
--num_train_epochs $epoch
done

The "AttributeError: 'BertTokenizerFast' object has no attribute 'max_len'" Github issue contains the fix:
The run_language_modeling.py script is deprecated in favor of language-modeling/run_{clm, plm, mlm}.py.
If not, the fix is to change max_len to model_max_length.

I use this command to solve it.
pip install transformers==3.0.2

Related

Hi, I am trying to apply covariance risk budget constraints in a Markowitz mean variance model. I am not able to use the solver 'solveRdonlp2'

Error in loadNamespace(x) : there is no package called ‘Rdonlp2’
Where can I download the package
This worked for me:
Install rtools (https://cran.r-project.org/bin/windows/Rtools/rtools42/rtools.html)
install.packages("Rdonlp2", repos="http://R-Forge.R-project.org")

Getting error in Karate v1.1.0 - TypeError: invokeMember (contains) on ["ABC","XYZ","OTHR","NEW"] failed due to: Message not supported

I was using Karate v0.9.6 all this while. Recently thought of upgrading the version to 1.1.0 and then 1.2.0.
One thing is troubling a lot is as belows,
Earlier I used to use 'contains' to verify in the schema that
#An array of expected values
def dept_type_code = ["ABC","XYZ","OTHR","NEW"]
##Then verify in the schema that the type_code has any one of those values in the array
def index_department_type_schema = {"code": '#? dept_type_code.contains(_)'}
It was working in 0.9.6 but with 1.1.0 is failing with error;
TypeError: invokeMember (contains) on ["ABC","XYZ","OTHR","NEW"] failed due to: Message not supported.
I'm sure I'm missing important part from the release notes. I would really appreciate any solution to this problem.
Thanks!
replacing .contains with .includes resolved the issue
https://github.com/karatelabs/karate/wiki/1.0-upgrade-guide#java-api-s-for-maps-and-lists-are-no-longer-visible-within-js-blocks

Pylint: same pylint and pandas version on 2 machines, 1 fails

I have 2 places running the same linting job:
Machine 1: Ubuntu over SSH
pandas==1.2.3
pylint==2.7.4
python 3.8.10
Machine 2: Gitlab CI Docker image, python:3.8.12-buster
pandas==1.2.3
pylint==2.7.4
Python 3.8.12
The Ubuntu machine is able to lint all the code fine, and it has for many months. Same for the CI job, except it had been running Python 3.7.8. Now that I upgraded the Docker image to Python 3.8.12, it throws several no-member linting errors on some Pandas objects. I've tried clearing CI caches etc.
I wish I could provide something more reproducible. But, to check my understanding of what a linter is doing, is it theoretically possible that a small version difference in python messes up pylint like this? For something like a no-member error on Pandas objects, I would think the dominant factor is the pandas version, but those are equal, so I'm confused!
Update:
I've looked at the Pandas code for pd.read_sql_query, which is what's causing the no-member error. It says:
def read_sql_query(
sql,
con,
index_col=None,
coerce_float=True,
params=None,
parse_dates=None,
chunksize: Optional[int] = None,
) -> Union[DataFrame, Iterator[DataFrame]]:
In Docker, I get E1101: Generator 'generator' has no 'query' member (no-member) (because I'm running .query on the returned dataframe). So it seems Pylint thinks that this function returns a generator. But it does not make this assumption in my other setup. (I've also verified the SHA sum of pandas/io/sql.py matches). This seems similar to this issue, but I am still baffled by the discrepancy in environments.
A fix that worked was to bump a limit like:
init-hook = "import astroid; astroid.context.InferenceContext.max_inferred = 500"
in my .pylintrc file, as explained here.
I'm unsure why/if this is connected to my change in Python version, but I'm happy to use this and move on for now. It's probably complex.
(Another hack was to write a function that returns the passed arg if the passed arg is a dataframe, and returns 1 dataframe if the passed arg is an iterable of dataframes. So the ambiguous-type object could be passed through this wrapper to clarify things for Pylint. While this was more intrusive on our codebase, we had dozens of calls to pd.read_csv and pd.real_sql_query, and only about 3 calls caused confusion for Pylint, so we almost used this solution)

How to get an edge's id using TraCi?

I'm using a python code with the traci library to know if there are any vehicles near a certain distance to a chosen vehicle, to test a solution I'm trying to implement I need to know a vehicle's current edge.
I'm on Ubuntu 18.04.3 LTS, using sublime to edit the code and the os, sys, optparse, subprocess, random, math libraries. I've tried using getLaneId and getEdgeId, the last one is not in the documentation but I tough I've seen it somewhere and tried to test it.
. Another option that i had was using getNeighbors but i didn't know exactly how to use it and it returned the same error message as the previous commands.
def run():
step = 0
while traci.simulation.getMinExpectedNumber() > 0:
traci.simulationStep()
print(step)
print(distancia("veh1","veh0"))
step += 1
if step > 2:
print(traci.vehicle.getLaneId("veh0"))
traci.close()
sys.stdout.flush()
All of them returned the following error message : AttributeError: VehicleDomain instance has no attribute 'getLaneId'. But I think the vehicle domain has indeed the getLaneId attribute since it is in the documentation: https://sumo.dlr.de/pydoc/traci._vehicle.html#VehicleDomain-getSpeed.
I was expecting it to return the edge's id. Please I need help with this problem. Thank you in advance.
The TraCI command for edgeID can be found in the _vehicle.VehicleDomain module. The syntax is as follows:
traci._vehicle.VehicleDomain.getRoadID(self, vehicleID)
It needs to be getLaneID with a capital D.

How to fix `TypeError: get_sheet_by_name() missing 1 required positional argument: 'name' error` in `openpyxl`

I am trying to display my worksheet names using openpyxl. I get the error "TypeError: get_sheet_by_name() missing 1 required positional argument: 'name'" How can I fix this?
I am on windows 10 OS. I am using Python 3.7 with openpyxl installed pip.
os.chdir(r'C:\Users\zhiva\Desktop')
wb= openpyxl.load_workbook('Book1.xlsx')
wb.get_sheet_by_name()
wb.get_sheet_by_name()
I expected the output as ['Sheet1','Sheet2','Sheet3']
Looks like what you want is:
wb.get_sheet_names()
wb.get_sheet_by_name will get a specific sheet, but you have to pass it the name, hence the error.
See documentation
Can use this as well,
wb_obj.sheetnames
Output :
['Sheet1', 'Sheet2', 'Sheet3']
And in case, if specific worksheet is required, then
wb_obj['Sheet12']