Best way to optimize query while searching string in big sentences - sql

I have requirement as below
poem belongs to poet
poet has many poems
If user searching for word "ruby"
It should give,
Total number of times word ruby used in all poems.
Show all the poems which has the word ruby.
Number of times word ruby used in each poems.
Total number of poets used the word ruby.
Total number of times each poets used the word ruby.
So my query in model Poem is here
poems= where("poem_column like ?", "%#{word}%" )
#results = {}
poems.each do |poem|
words = poem.poem_column.split
count = 0
words.each do |word|
count += 1 if word.upcase.include?(word.upcase)
end
#results[poem] = count # to get each poem using word ruby
end
And to get poets count
in Poem Model
#poets = poems.select("distinct(poet_id)")
#poets.each do |poet|
#poets_word_count << poems.where("poet_id = #{poem.poet_id}").count
end
Where poems are around 50k. its taking almost more than 1 minute.
I know am doing in wrong way but i couldnt optimize it in any other way.
i think the below lines taking too much time as it looping each words of all poems.
words.each do |word|
count += 1 if word.upcase.include?(word.upcase)
end
Can anyone of you show me the way to optimize it.As lack of knowledge in queries i couldnt do it in any other way.
Thanks in advance

Not an answer, just a test.
First, reduce the data extracting keywords for each poem as they are saved:
rails g resource Keyword word occurrences poem_id:integer
rails db:migrate
Then in your Poem model:
# add more words
EXCLUDED_WORDS = %w( the a an so that this these those )
has_many :keywords
before_save :set_keywords
# { :some => 3, :word => 2, :another => 1}
def keywords_hash(how_many = 5)
words = Hash.new 0
poem_column.split.each do |word|
words[word] += 1 if not word.in? EXCLUDED_WORDS
end
Hash[words.sort { |w, w1| w1 <=> w }.take(how_many)]
end
def set_keywords
keywords_hash.each do | word, occurrences |
keywords.create :word => word, :occurrences => occurrences
end
end
In Keyword model:
belongs_to :poem
def self.poem_ids
includes(:poem).map(&:poem_id)
end
def self.poems
Poem.where(id: poem_ids)
end
Then when you have word to search for:
keywords = Keyword.where(word: word)
poems = keywords.poems
poets = poems.poets
To use this last part, you would need in Poem model:
def self.poet_ids
includes(:poet).map(&:poet_id)
end
def self.poets
Poet.where(id: poet_ids)
end
As far as I see this way would require just 3 queries, no joins, so it seems to make sense.
I will think in how to extend this way to search by the entire content.

Im my opnion, you can change following code quoted from your post:
poems.each do |poem|
words = poem.poem_column.split
count = 0
words.each do |word|
count += 1 if word.upcase.include?(word.upcase)
end
#results[poem] = count # to get each poem using word ruby
end
to:
poems.each {|poem| #results[poem] = poem.poem_column.scan(/ruby/i).size}

Related

How to avoid double-extraction of patterns in SpaCy?

I'm using an incident database to identify the causes of accidents. I have defined a pattern and a function to extract the matching patterns. However, sometimes this function creates overlapping results. I saw in a previous post that we can use for span in spacy.util.filter_spans(spans):
to avoid repetition of answers. But I don't know how to rewrite the function with this. I will be grateful for any help you can provide.
pattern111 = [{'DEP':'compound','OP':'?'},{'DEP':'nsubj'}]
def get_relation111(x):
doc = nlp(x)
matcher = Matcher(nlp.vocab)
relation= []
matcher.add("matching_111", [pattern111], on_match=None)
matches = matcher(doc)
for match_id, start, end in matches:
matched_span = doc[start: end]
relation.append(matched_span.text)
return relation
filter_spans can be used on any list of spans. This is a little weird because you want a list of strings, but you can work around it by saving a list of spans first and only converting to strings after you've filtered.
def get_relation111(x):
doc = nlp(x)
matcher = Matcher(nlp.vocab)
relation= []
matcher.add("matching_111", [pattern111], on_match=None)
matches = matcher(doc)
for match_id, start, end in matches:
matched_span = doc[start: end]
relation.append(matched_span)
# XXX Just add this line
relation = [ss.text for ss in filter_spans(relation)]
return relation

Creating a function to count the number of pos in a pandas instance

I've used NLTK to pos_tag sentences in a pandas dataframe from an old Yelp competition. This returns a list of tuples (word, POS). I'd like to count the number of parts of speech for each instance. How would I, say, create a function to count the number of being verbs in each review? I know how to apply functions to features - no problem there. I just can't wrap my head around how to count things inside tuples inside lists inside a pd feature.
The head is here, as a tsv: https://pastebin.com/FnnBq9rf
Thank you #zhangyulin for your help. After two days, I learned some incredibly important things (as a novice programmer!). Here's the solution!
def NounCounter(x):
nouns = []
for (word, pos) in x:
if pos.startswith("NN"):
nouns.append(word)
return nouns
df["nouns"] = df["pos_tag"].apply(NounCounter)
df["noun_count"] = df["nouns"].str.len()
As an example, for dataframe df, noun count of the column "reviews" can be saved to a new column "noun_count" using this code.
def NounCount(x):
nounCount = sum(1 for word, pos in pos_tag(word_tokenize(x)) if pos.startswith('NN'))
return nounCount
df["noun_count"] = df["reviews"].apply(NounCount)
df.to_csv('./dataset.csv')
There are a number of ways you can do that and one very straight forward way is to map the list (or pandas series) of tuples to indicator of whether the word is a verb, and count the number of 1's you have.
Assume you have something like this (please correct me if it's not, as you didn't provide an example):
a = pd.Series([("run", "verb"), ("apple", "noun"), ("play", "verb")])
You can do something like this to map the Series and sum the count:
a.map(lambda x: 1 if x[1]== "verb" else 0).sum()
This will return you 2.
I grabbed a sentence from the link you shared:
text = nltk.word_tokenize("My wife took me here on my birthday for breakfast and it was excellent.")
tag = nltk.pos_tag(text)
a = pd.Series(tag)
a.map(lambda x: 1 if x[1]== "VBD" else 0).sum()
# this returns 2

How can I count the number of times I successfully compare data in two tables?

I'm using the where method to lookup and compare two columns in two separate tables for identical values (order & player - both integers). I'm trying to count the number of times the lookup is successful (meaning successfully matches columns) for the current user, and then put that into an instance variable (<%= #total %>) to be displayed on the current user's index view (Once I have the count I'm multiplying by 5, but that's arbitrary). The real-world application for this could be a leaderboard-type-thing as users make different predictions and can be ranked based on how well those predictions match results.
Searched several answers that seemed to be in the neighborhood (Counting occurrences of a given input) but not quite getting me toward a solution (counting occurrences of items in an array python).
(How to count the number of times two values appear in two columns in any order)
Predictions Controller:
def index
if current_user.present?
#predictions = current_user.predictions.order(:order)
else
#predicitons = Prediction.all
end
#total = Result.where({ :order => #current_user.predictions.pluck(:order), :player_id => #current_user.predictions.pluck(:player_id) }).uniq.count*5.to_i
end
def new
if current_user.predictions.count < '32'.to_i
#prediction = Prediction.new(player_id: params[:player_id])
else
redirect_to "/predictions", alert: "You've maxed out on predictions"
end
end
def create
#prediction = current_user.predictions.build(predictions_params)
if #prediction.save
redirect_to "/predictions/new", notice: "This prediction's just been created."
else
redirect_to "/predictions", alert: "Something happened."
end
end
def edit
end
Results Controller:
def new
#result = Result.new
end
def create
#result = Result.new(results_params)
if #result.save
redirect_to "/results"
else
render "New"
end
end
Tried different code in the predictions controller, index action. What I have here comes closest but I'm not getting consistent results as it counts every prediction, right and wrong. I've tried setting the variable to '0'.to_i before the lookup, but still getting inconsistent results.
#total = '0'.to_i
I feel like I'm overlooking something fairly obvious to more experienced eyes. I realize also there are more advanced ways to accomplish this that I'm not aware of. A suggestion was made to store the variable in a session or cookie which is outside of my experience but I'm continuing to research this as an option.
Thanks for any feedback.
Example db entry match:
=> #<Prediction id: 126, round: nil, number: nil, created_at: "2017-03-21 05:44:32", updated_at: "2017-03-21 13:05:11", player_id: 2, order: 2, user_id: 2 >
=> #<Result id: 98, info: nil, created_at: "2017-04-20 15:37:17", updated_at: "2017-04-20 15:37:17", player_id: 2, order: 2>

Checking errors in my program

I'm trying to make some changes to my dictionary counter in python. I want make some changes to my current counter, but not making any progress so far. I want my code to show the number of different words.
This is what I have so far:
# import sys module in order to access command line arguments later
import sys
# create an empty dictionary
dicWordCount = {}
# read all words from the file and put them into
#'dicWordCount' one by one,
# then count the occurance of each word
you can use the Count function from collections lib:
from collections import Counter
q = Counter(fileSource.read().split())
total = sum(q.values())
First, your first problem, add a variable for the word count and one for the different words. So wordCount = 0 and differentWords = 0. In the loop for your file reading put wordCount += 1 at the top, and in your first if statement put differentWords += 1. You can print these variables out at the end of the program as well.
The second problem, in your printing, add the if statement, if len(strKey)>4:.
If you want a full example code here it is.
import sys
fileSource = open(sys.argv[1], "rt")
dicWordCount = {}
wordCount = 0
differentWords = 0
for strWord in fileSource.read().split():
wordCount += 1
if strWord not in dicWordCount:
dicWordCount[strWord] = 1
differentWords += 1
else:
dicWordCount[strWord] += 1
for strKey in sorted(dicWordCount, key=dicWordCount.get, reverse=True):
if len(strKey) > 4: # if the words length is greater than four.
print(strKey, dicWordCount[strKey])
print("Total words: %s\nDifferent Words: %s" % (wordCount, differentWords))
For your first qs, you can use set to help you count the number of different words. (Assume there is a space between every two words)
str = 'apple boy cat dog elephant fox'
different_word_count = len(set(str.split(' ')))
For your second qs, using a dictionary to help you record the word_count is ok.
How about this?
#gives unique words count
unique_words = len(dicWordCount)
total_words = 0
for k, v in dicWordCount.items():
total_words += v
#gives total word count
print(total_words)
You don't need a separate variable for counting word counts since you're using dictionary, and to count the total words, you just need to add the values of the keys(which are just counts)

A migration in a rails 3 app that will increase the value of an integer field by +1

I am trying to write a migration that will increase the value of an integer field by +1. I've tried several variations and done searches. What I am looking for is something like:
def self.up
User.update_all(:category => [:category + 1])
end
Thanks.
Maybe this will do it
User.update_all("category = (category + 1)")