This question's answers are a community effort. Edit existing answers to improve this post. It is not currently accepting new answers or interactions.
I'm looking for a string.contains or string.indexof method in Python.
I want to do:
if not somestring.contains("blah"):
continue
Use the in operator:
if "blah" not in somestring:
continue
If it's just a substring search you can use string.find("substring").
You do have to be a little careful with find, index, and in though, as they are substring searches. In other words, this:
s = "This be a string"
if s.find("is") == -1:
print("No 'is' here!")
else:
print("Found 'is' in the string.")
It would print Found 'is' in the string. Similarly, if "is" in s: would evaluate to True. This may or may not be what you want.
Does Python have a string contains substring method?
99% of use cases will be covered using the keyword, in, which returns True or False:
'substring' in any_string
For the use case of getting the index, use str.find (which returns -1 on failure, and has optional positional arguments):
start = 0
stop = len(any_string)
any_string.find('substring', start, stop)
or str.index (like find but raises ValueError on failure):
start = 100
end = 1000
any_string.index('substring', start, end)
Explanation
Use the in comparison operator because
the language intends its usage, and
other Python programmers will expect you to use it.
>>> 'foo' in '**foo**'
True
The opposite (complement), which the original question asked for, is not in:
>>> 'foo' not in '**foo**' # returns False
False
This is semantically the same as not 'foo' in '**foo**' but it's much more readable and explicitly provided for in the language as a readability improvement.
Avoid using __contains__
The "contains" method implements the behavior for in. This example,
str.__contains__('**foo**', 'foo')
returns True. You could also call this function from the instance of the superstring:
'**foo**'.__contains__('foo')
But don't. Methods that start with underscores are considered semantically non-public. The only reason to use this is when implementing or extending the in and not in functionality (e.g. if subclassing str):
class NoisyString(str):
def __contains__(self, other):
print(f'testing if "{other}" in "{self}"')
return super(NoisyString, self).__contains__(other)
ns = NoisyString('a string with a substring inside')
and now:
>>> 'substring' in ns
testing if "substring" in "a string with a substring inside"
True
Don't use find and index to test for "contains"
Don't use the following string methods to test for "contains":
>>> '**foo**'.index('foo')
2
>>> '**foo**'.find('foo')
2
>>> '**oo**'.find('foo')
-1
>>> '**oo**'.index('foo')
Traceback (most recent call last):
File "<pyshell#40>", line 1, in <module>
'**oo**'.index('foo')
ValueError: substring not found
Other languages may have no methods to directly test for substrings, and so you would have to use these types of methods, but with Python, it is much more efficient to use the in comparison operator.
Also, these are not drop-in replacements for in. You may have to handle the exception or -1 cases, and if they return 0 (because they found the substring at the beginning) the boolean interpretation is False instead of True.
If you really mean not any_string.startswith(substring) then say it.
Performance comparisons
We can compare various ways of accomplishing the same goal.
import timeit
def in_(s, other):
return other in s
def contains(s, other):
return s.__contains__(other)
def find(s, other):
return s.find(other) != -1
def index(s, other):
try:
s.index(other)
except ValueError:
return False
else:
return True
perf_dict = {
'in:True': min(timeit.repeat(lambda: in_('superstring', 'str'))),
'in:False': min(timeit.repeat(lambda: in_('superstring', 'not'))),
'__contains__:True': min(timeit.repeat(lambda: contains('superstring', 'str'))),
'__contains__:False': min(timeit.repeat(lambda: contains('superstring', 'not'))),
'find:True': min(timeit.repeat(lambda: find('superstring', 'str'))),
'find:False': min(timeit.repeat(lambda: find('superstring', 'not'))),
'index:True': min(timeit.repeat(lambda: index('superstring', 'str'))),
'index:False': min(timeit.repeat(lambda: index('superstring', 'not'))),
}
And now we see that using in is much faster than the others.
Less time to do an equivalent operation is better:
>>> perf_dict
{'in:True': 0.16450627865128808,
'in:False': 0.1609668098178645,
'__contains__:True': 0.24355481654697542,
'__contains__:False': 0.24382793854783813,
'find:True': 0.3067379407923454,
'find:False': 0.29860888058124146,
'index:True': 0.29647137792585454,
'index:False': 0.5502287584545229}
How can in be faster than __contains__ if in uses __contains__?
This is a fine follow-on question.
Let's disassemble functions with the methods of interest:
>>> from dis import dis
>>> dis(lambda: 'a' in 'b')
1 0 LOAD_CONST 1 ('a')
2 LOAD_CONST 2 ('b')
4 COMPARE_OP 6 (in)
6 RETURN_VALUE
>>> dis(lambda: 'b'.__contains__('a'))
1 0 LOAD_CONST 1 ('b')
2 LOAD_METHOD 0 (__contains__)
4 LOAD_CONST 2 ('a')
6 CALL_METHOD 1
8 RETURN_VALUE
so we see that the .__contains__ method has to be separately looked up and then called from the Python virtual machine - this should adequately explain the difference.
if needle in haystack: is the normal use, as #Michael says -- it relies on the in operator, more readable and faster than a method call.
If you truly need a method instead of an operator (e.g. to do some weird key= for a very peculiar sort...?), that would be 'haystack'.__contains__. But since your example is for use in an if, I guess you don't really mean what you say;-). It's not good form (nor readable, nor efficient) to use special methods directly -- they're meant to be used, instead, through the operators and builtins that delegate to them.
in Python strings and lists
Here are a few useful examples that speak for themselves concerning the in method:
>>> "foo" in "foobar"
True
>>> "foo" in "Foobar"
False
>>> "foo" in "Foobar".lower()
True
>>> "foo".capitalize() in "Foobar"
True
>>> "foo" in ["bar", "foo", "foobar"]
True
>>> "foo" in ["fo", "o", "foobar"]
False
>>> ["foo" in a for a in ["fo", "o", "foobar"]]
[False, False, True]
Caveat. Lists are iterables, and the in method acts on iterables, not just strings.
If you want to compare strings in a more fuzzy way to measure how "alike" they are, consider using the Levenshtein package
Here's an answer that shows how it works.
If you are happy with "blah" in somestring but want it to be a function/method call, you can probably do this
import operator
if not operator.contains(somestring, "blah"):
continue
All operators in Python can be more or less found in the operator module including in.
So apparently there is nothing similar for vector-wise comparison. An obvious Python way to do so would be:
names = ['bob', 'john', 'mike']
any(st in 'bob and john' for st in names)
>> True
any(st in 'mary and jane' for st in names)
>> False
You can use y.count().
It will return the integer value of the number of times a sub string appears in a string.
For example:
string.count("bah") >> 0
string.count("Hello") >> 1
Here is your answer:
if "insert_char_or_string_here" in "insert_string_to_search_here":
#DOSTUFF
For checking if it is false:
if not "insert_char_or_string_here" in "insert_string_to_search_here":
#DOSTUFF
OR:
if "insert_char_or_string_here" not in "insert_string_to_search_here":
#DOSTUFF
You can use regular expressions to get the occurrences:
>>> import re
>>> print(re.findall(r'( |t)', to_search_in)) # searches for t or space
['t', ' ', 't', ' ', ' ']
Currently I have two data frames. I am trying to get a fuzzy match of client names using fuzzywuzzy's process.extractOne function. When I have run the following script on sample data I get good results and no error, but when I run the following on my current data frames I get both an Attribute and Type error. I am not able to provide the data for security reasons, but if anyone can figure out why I am getting errors based on the script provided I would be much obliged.
names2 = list(dftr3['Common Name'])
names3 = dict(zip(names2,names2))
def get_fuzz_match(row):
match = process.extractOne(row['CLIENT_NAME'],choices = n3.keys(),score_cutoff = 80)
if match:
return n3[match[0]]
return np.nan
dfmi4['Match Name'] = dfmi4.apply(get_fuzz_match, axis=1)
I know not having some examples makes this more difficult to troubleshoot, so I will answer any question and edit the post to help this process along. The specific errors are:
1.AttributeError: 'dict_keys' object has no attribute 'items'
2.TypeError: expected string or buffer
The AttributeError is straightforward and to be expected, I think. Fuzzywuzzy's process.extract function, which does most of the actual work in process.extractOne, uses a try:... except: clause to determine whether to process the choices parameter as dict-like or list-like. I think you are seeing the exception because the TypeError is raised during the except: clause.
The TypeError is trickier to pin down, but I suspect it occurs somewhere in the StringProcessor class, used in the processor module, again called by extract, which uses several string methods and doesn't catch exceptions. So it seems likely that your apply call is passing something that is not a string. Is it possible that you have any empty cells?
I am stuck with the following question:
I have designed a PsychoPy Experiment in Windows, Version 1.82.01. It runs perfectly there.
Now I have copied the same experiment on a MacBook Air under the version 1.83.01.
Since then, the experiment starts, but after a while, I get the following error message
#Running:
/Users/Kataha/Desktop/Experiment/Experiment_FFOV_Kinder3_lastrun.py #
2015-12-05 15:26:39.876 python[1314:117629]
ApplePersistenceIgnoreState: Existing state will not be touched.
New state will be written to /var/folders/c8/
qy0wd2ws3r115rg30wxxg6940000gn/T/org.psychopy.PsychoPy2.savedState
Traceback (most recent call last):
File
"/Users/Kataha/Desktop/Experiment/Experiment_FFOV_Kinder3_lastrun.py",
line 389, in <module>
if Fix_kreuz.status == STARTED and t >= (0.0 + (SOA-win.monitorFramePeriod*0.75)):
#most of one frame period left
TypeError: unsupported operand type(s) for -: 'unicode' and 'numpy.float64'
The code in line 389 looks like the following:
# *Fix_kreuz* updates
if t >= 0.0 and Fix_kreuz.status == NOT_STARTED:
# keep track of start time/frame for later
Fix_kreuz.tStart = t # underestimates by a little under one frame
Fix_kreuz.frameNStart = frameN # exact frame index
Fix_kreuz.setAutoDraw(True)
if Fix_kreuz.status == STARTED and t >= (0.0 + (SOA-win.monitorFramePeriod*0.75)): #most of one frame period left
Fix_kreuz.setAutoDraw(False)
The variable SOA is defined in an excel sheet:
Excel Sheet with variables
I cannot figure out, what the problem is. I hope, someone can help me. Thank you!
In your Excel file, notice that in the SOA column, the '0.5' values are left-justified whereas the '3' values are right justified. In Excel, left-justified cells indicate that those are text values rather than numeric. Compare to the first two columns, which are all text (and all left-justified), and the last column, which is all numeric (and all right-justified).
You could try exporting this file to .csv rather than native .xlsx. If these are actually numeric values, then that column would be read by PsychoPy as numeric. But you should probably identify what is making Excel regard those entries in that column as being text. Can't tell from the font, but sometimes a cause for this is typing "oh point five" rather than "zero point five".
This is consistent with #Michiel's answer that unicode (i.e. text) values are polluting the SOA variable, which should contain floating point variables. But it isn't consistent with your type test, which found that the value was a float.
Apparently the variable SOA contains an unicode value:
... SOA-win.monitorFramePeriod ...
unsupported operand type(s) for -: 'unicode' and 'numpy.float64'
You don't show where SOA is defined. Can you show that, or perhaps somewhere post the entire project? To verify what the types exactly are, you could add these lines:
print "SOA=", SOA, typeof(SOA)
print "monitorFramePeriod=", win.monitorFramePeriod, typeof(SOAwin.monitorFramePeriod)
before the line
if Fix_kreuz.status == STARTED ...
A possible fix might be, but this is a wild guess:
if Fix_kreuz.status == STARTED and t >= (0.0 + (float(SOA)-win.monitorFramePeriod*0.75)):
I am trying to read from a file where each line contains some integer
But when I gave like this
f=open("data.txt")
a=readline(f)
arr=int64[]
push!(arr,int(a))
I am getting
ERROR: no method getindex(Function)
in include_from_node1 at loading.jl:120
The error comes from int64[], since int64 is a function and you are trying to index it with []. To create an array of Int64 (note the case), you should use, e.g., arr = Int64[].
Another problem in your code is the int(a) - since you have an array of Int64, you should also specify the same type when parsing, e.g., push!(arr,parseint(Int64,a))
Specifically, I have a model that has a field like this
pub_date = models.DateField("date published")
I want to be able to easily grab the object with the most recent pub_date. What is the easiest/best way to do this?
Would something like the following do what I want?
Edition.objects.order_by('pub_date')[:-1]
obj = Edition.objects.latest('pub_date')
You can also simplify things by putting get_latest_by in the model's Meta, then you'll be able to do
obj = Edition.objects.latest()
See the docs for more info. You'll probably also want to set the ordering Meta option.
Harley's answer is the way to go for the case where you want the latest according to some ordering criteria for particular Models, as you do, but the general solution is to reverse the ordering and retrieve the first item:
Edition.objects.order_by('-pub_date')[0]
Note:
Normal python lists accept negative indexes, which signify an offset from the end of the list, rather than the beginning like a positive number. However, QuerySet objects will raise AssertionError: Negative indexing is not supported. if you use a negative index, which is why you have to do what insin said: reverse the ordering and grab the 0th element.
Be careful of using
Edition.objects.order_by('-pub_date')[0]
as you might be indexing an empty QuerySet. I'm not sure what the correct Pythonic approach is, but the simplest would be to wrap it in an if/else or try/catch:
try:
last = Edition.objects.order_by('-pub_date')[0]
except IndexError:
# Didn't find anything...
But, as #Harley said, when you're ordering by date, latest() is the djangonic way to do it.
This has already been answered, but for more reference, this is what Django Book has to say about Slicing Data on QuerySets:
Note that negative slicing is not supported:
>>> Publisher.objects.order_by('name')[-1]
Traceback (most recent call last):
...
AssertionError: Negative indexing is not supported.
This is easy to get around, though. Just change the order_by()
statement, like this:
>>> Publisher.objects.order_by('-name')[0]
Refer the link for more such details. Hope that helps!