I have 2 places running the same linting job:
Machine 1: Ubuntu over SSH
pandas==1.2.3
pylint==2.7.4
python 3.8.10
Machine 2: Gitlab CI Docker image, python:3.8.12-buster
pandas==1.2.3
pylint==2.7.4
Python 3.8.12
The Ubuntu machine is able to lint all the code fine, and it has for many months. Same for the CI job, except it had been running Python 3.7.8. Now that I upgraded the Docker image to Python 3.8.12, it throws several no-member linting errors on some Pandas objects. I've tried clearing CI caches etc.
I wish I could provide something more reproducible. But, to check my understanding of what a linter is doing, is it theoretically possible that a small version difference in python messes up pylint like this? For something like a no-member error on Pandas objects, I would think the dominant factor is the pandas version, but those are equal, so I'm confused!
Update:
I've looked at the Pandas code for pd.read_sql_query, which is what's causing the no-member error. It says:
def read_sql_query(
sql,
con,
index_col=None,
coerce_float=True,
params=None,
parse_dates=None,
chunksize: Optional[int] = None,
) -> Union[DataFrame, Iterator[DataFrame]]:
In Docker, I get E1101: Generator 'generator' has no 'query' member (no-member) (because I'm running .query on the returned dataframe). So it seems Pylint thinks that this function returns a generator. But it does not make this assumption in my other setup. (I've also verified the SHA sum of pandas/io/sql.py matches). This seems similar to this issue, but I am still baffled by the discrepancy in environments.
A fix that worked was to bump a limit like:
init-hook = "import astroid; astroid.context.InferenceContext.max_inferred = 500"
in my .pylintrc file, as explained here.
I'm unsure why/if this is connected to my change in Python version, but I'm happy to use this and move on for now. It's probably complex.
(Another hack was to write a function that returns the passed arg if the passed arg is a dataframe, and returns 1 dataframe if the passed arg is an iterable of dataframes. So the ambiguous-type object could be passed through this wrapper to clarify things for Pylint. While this was more intrusive on our codebase, we had dozens of calls to pd.read_csv and pd.real_sql_query, and only about 3 calls caused confusion for Pylint, so we almost used this solution)
I went to GitHub issues to raise a support ticket but thought of asking the question first to avoid noise.
This is what the docs says-
Omit the version completely or use "latest" to load the latest one (not recommended for production usage):
/npm/jquery#latest/dist/jquery.min.js
/npm/jquery/dist/jquery.min.js
According to the doc, either we can latest or omit it completely to load the latest version. But I'm seeing a difference-
With latest added (URL 1 - U1)
Example- https://cdn.jsdelivr.net/npm/#letscooee/web-sdk#latest/dist/sdk.min.js
It loads the last released version that is cached for 24 hours. That means if we release v2 & v3 within 24 hours, the above URL will still show v1.
The caching period is 1 week.
Without latest (URL 2 - U2)
Example- https://cdn.jsdelivr.net/npm/#letscooee/web-sdk/dist/sdk.min.js
While we omit the latest completely, this loads the latest release immediately i.e. v3 and the caching period is also 1 week.
I have requested for the purge API as per their docs but I believe this behaviour is not aligning with their docs.
Tried to Google the cause and read their docs 3 times. Am I missing something?
Edit 1
After reading Martin's answer, I did the following-
(To view the images, open them in new tab & remove t before .png)
Step Taken
# Time
U1
U2
Purge Cache
12:39:00 UTC
Purged
Purged
See Age Header
# 12:40 UTC
0
0
See Date Header
# 12:40 UTC
Sun, 12 Sep 2021 12:40:25 GMT
Sun, 12 Sep 2021 12:40:31 GMT
Headers
12:41:00 UTC
Result
12:41:00 UTC
Points to latest release 0.0.3
Points to latest release 0.0.3
Publish new NPM release 0.0.4
12:48:00 UTC
Refresh both the URLs
12:49:00 UTC
Shows old release 0.0.3
Shows latest release 0.0.4
The last step shows that I was wrong here. This is working as expected (i.e. showing 0.0.3 only) as per the docs
The caching time is the same in both cases - 12 hours at the CDN level and 7 days in the browser: cache-control: public, max-age=604800, s-maxage=43200
That doesn't necessarily mean both URLs will always return the same content because both the CDN and your browser calculate the expiration for each URL independently, based on when it was first retrieved, so the CDN may serve different versions for up to 12 hours after the release.
Seems to me both the links point to the same sdk URL.
As per how cdns work would be to mention the version of the sdk for example:
<script src="https://unpkg.com/three#0.126.0/examples/js/loaders/GLTFLoader.js"></script>
or as per below which will always point to the latest version of the sdk:
<script src="https://cdn.rawgit.com/mrdoob/three.js/master/examples/js/loaders/GLTFLoader.js"></script>
I need to use the sportsipy API to get the schedule for all teams in a dataframe. This is what I have:
from sportsreference.nba.schedule import Schedule
league = ['MIL','CHO','LAL','LAC','SAC','ATL','MIA','DAL','POR',
'HOU','NOP','PHO','WAS','MEM','BOS','DEN','TOR','SAS',
'PHI','BRK','UTA','IND','OKC','ORL','MIN','DET',
'NYK','CLE','CHI','GSW']
for i in league:
mil2019 = Schedule( i , year = '2020')
mil2019.dataframe_extended
The error i get is:
TypeError: unsupported operand type(s) for -: 'NoneType' and 'NoneType'
As mentioned above in the comment above, I believe your import is wrong. Using package sportsipy 0.6.0 and following the docs: https://sportsipy.readthedocs.io/en/stable/ I was able to achieve your desired result using following code:
from sportsipy.nba.schedule import Schedule
# MIL removed from league list as it is used to initiate league_schedule
league = ['CHO','LAL','LAC','SAC','ATL','MIA','DAL','POR',
'HOU','NOP','PHO','WAS','MEM','BOS','DEN','TOR','SAS',
'PHI','BRK','UTA','IND','OKC','ORL','MIN','DET',
'NYK','CLE','CHI','GSW']
league_schedule = Schedule('MIL', year="2020").dataframe
for team in league:
league_schedule = league_schedule.append(Schedule(team , year="2020").dataframe)
(Resulting dataframe has dimensions: [2286 rows x 15 columns])
Same should work with dataframe_extended, but it takes rather long time to get all that data. Maybe double check if you need all of it.
In case I am wrong and package you want to use in your question is correct please add additional info to your question, such as where can we get that package.
It appears you are using the module from pip install sportsreference from here which is on v0.5.2 in which case that is a valid import, even though you mentioned you're using sportsipy which caused some confusion for others. The latest version has refactored the package name to sportsipy.
If it wasn't a valid import, it would be throwing an import error on the very first line, so I'm not sure why folks are getting hung up on that.
You really should include the entire Python traceback, not just the final message, so we can determine exactly where in your code and the module's source code this exception is being raised. Also include the specific version of the library you're using, e.g. from pip freeze.
My initial thought is one of the requests somewhere for one of these teams is returning something unexpected and the library is not handling it properly, but without the full traceback that's just a theory.
It's probably a bug in v0.5.2 of sportsipy. I would try using the latest version from git and see if you can reproduce the error. Something, somewhere isn't validating things are what it expects before trying to do things with it. If I had the full traceback, I could tell you exactly what.
You could try catching the TypeError and passing on it, to see if skipping it allows everything else to continue working, but without knowing exactly where the error is coming from it's hard to say for sure at this point.
for i in league:
try:
mil2019 = Schedule( i , year = '2020')
mil2019.dataframe_extended
except TypeError:
pass
This won't fix the problem, it's actually hiding it, but if it's just one record from one game that is returning something unexpected this would at least let you get the rest of the results, possibly. It's also possible the issue would create other problems later, depending on exactly what it is. Again, this is where the whole traceback would have been helpful.
I will say that trying your code for just one team works for me. For example:
from sportsreference.nba.schedule import Schedule
mil2019 = Schedule("MIL", year="2020")
print(mil2019.dataframe_extended.head(10))
Returns this:
away_assist_percentage ... winning_name
201910240HOU 67.4 ... Milwaukee Bucks
201910260MIL 71.7 ... Miami Heat
201910280MIL 42.2 ... Cleveland Cavaliers
201910300BOS 55.3 ... Milwaukee Bucks
201911010ORL 51.1 ... Milwaukee Bucks
201911020MIL 70.6 ... Toronto Raptors
201911040MIN 48.0 ... Milwaukee Bucks
201911060LAC 42.9 ... Milwaukee Bucks
201911080UTA 41.2 ... Milwaukee Bucks
201911100OKC 57.4 ... Milwaukee Bucks
[10 rows x 82 columns]
It takes forever just to get the games from one team. The library is not passing around an existing requests.Session() when calling PyQuery (even though PyQuery supports a session kwarg), so every request for every box score is renegotiating a fresh TCP connection which is absurd, but I digress:
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): www.basketball-reference.com:80
DEBUG:urllib3.connectionpool:http://www.basketball-reference.com:80 "GET /teams/MIL/2020_games.html HTTP/1.1" 301 183
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): www.basketball-reference.com:443
DEBUG:urllib3.connectionpool:https://www.basketball-reference.com:443 "GET /teams/MIL/2020_games.html HTTP/1.1" 200 34571
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): www.basketball-reference.com:443
DEBUG:urllib3.connectionpool:https://www.basketball-reference.com:443 "GET /boxscores/201910240HOU.html HTTP/1.1" 200 46549
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): www.basketball-reference.com:443
I would add some debugging to your code to establish which team your code is working on when this exception is raised. Try first with one team like I did and confirm it generally works, then iterate through the list of teams with logging enabled like:
import logging
from sportsreference.nba.schedule import Schedule
logging.basicConfig(level=logging.DEBUG)
league = ['CHO', 'LAL', 'LAC', 'SAC', 'ATL', 'MIA', 'DAL', 'POR',
'HOU', 'NOP', 'PHO', 'WAS', 'MEM', 'BOS', 'DEN', 'TOR', 'SAS',
'PHI', 'BRK', 'UTA', 'IND', 'OKC', 'ORL', 'MIN', 'DET',
'NYK', 'CLE', 'CHI', 'GSW']
for i in league:
logging.info("Working on league: %s", i)
mil2019 = Schedule(i, year="2020")
print(mil2019.dataframe_extended)
This way you will know specifically what league and which specific request is responsible for the issue and that would help you determine what the root cause is.
Is it some sort of overflow?
phantomjs> new Date("1400-03-01T00:00:00.000Z")
"1400-03-01T00:00:00.000Z"
phantomjs> new Date("1400-02-28T20:59:59.000Z")
"1400-02-27T20:59:59.000Z"
what you would expect:
>>(new Date("1400-03-01T00:00:00.000Z")).toISOString()
"1400-03-01T00:00:00.000Z"
>>(new Date("1400-02-28T20:59:59.000Z")).toISOString()
"1400-02-28T20:59:59.000Z"
apparently there is a gap of 24 hours when parsing dates between the 28th of February in 1400 and the 1st of March in 1400.
any ideas?
Phantomjs anyway is obsolete but still ... our legacy tests are failing when we try to upgrade to chrome headless ...
PhantomJS uses a version of Qt WebKit which is maintained independently of Qt.
The date format you are using is part of the ISO-8601 date and time format. [related]
The version of Qt WebKit that PhantomJS uses has a function that parses dates of the form defined in ECMA-262-5, section 15.9.1.15 (similar to RFC 3339 / ISO 8601: YYYY-MM-DDTHH:mm:ss[.sss]Z).
In the source code, we can see that the function used to parse these types of dates is called:
double parseES5DateFromNullTerminatedCharacters(const char* dateString)
The file that contains this function in the PhantomJS repository has not been updated since July 27, 2014, while the official file was updated as recently as October 13, 2017.
It appears that there is a problem in the logic having to do with handling leap years.
Here is a comparison of DateMath.cpp between the most recent versions from the official qtwebkit repository (left) and the PhantomJS qtwebkit repository (right).
So I am trying to learn groovy, but I am experiencing some issues with datetime module.
I was looking for answer for 2 hours but I found only one "advice" that haven't helped me.
When I run groovysh in console I get warnings about module that was unable to load:
V 31, 2018 10:10:35 DOP. org.codehaus.groovy.runtime.m12n.MetaInfExtensionModule
newModule
WARNING: Module [groovy-datetime] - Unable to load extension class [org.apache.g
roovy.datetime.extensions.DateTimeExtensions]
V 31, 2018 10:10:35 DOP. org.codehaus.groovy.runtime.m12n.MetaInfExtensionModule
newModule
WARNING: Module [groovy-datetime] - Unable to load extension class [org.apache.g
roovy.datetime.extensions.DateTimeStaticExtensions]
It is groovy 2.5.0, can anybody help me? It also appears with every run of scripts, it's extremely annoying.