I'm doing a small project that pulls data from tmdb's API.
Right now I have a /tv view that takes an id and request the TV show associated with that id. It results in a url like example.com/tv/23521. Looking at tmdb's own site their URL structure seems to something like "id-slug-title". Regardless of what comes after the ID it still redirects you to the right page.
How is that done? It would seem that it takes in the URL, splits it at "-" and uses the first parameter as ID. I am not sure how to do that in Flask though. I was thinking of using before and after request methods, but I'm worried that will just make unnecessary API calls. In order to get the slug title, I would have to make at least one call with the ID to get the title and then slugify that title.
The route accepts both an id and a slug, where the slug is optional:
#app.route('/tv/<int:id>', defaults={'slug': None})
#app.route('/tv/<int:id>-<slug>')
def tv(id, slug):
# ...
Note that you don't have to do any splitting yourself; the route matches if there is an integer number followed by a dash and some more text, or if it is just a number.
Only the id parameter is needed to find the right page. The slug is simply checked against the 'canonical' and you are redirected if it doesn't match:
page = load_page(id)
if slug != page.slug:
return redirect(url_for('tv', id=id, slug=page.slug))
Don't recalculate the slug each time, just store that in the database. You'll have to load the page info anyway for you to be able to serve it.
You could put that behaviour in a decorator and pass in the loaded page data into the view:
#app.route('/tv/<int:id>', defaults={'slug': None})
#app.route('/tv/<int:id>-<slug>')
#tv_page
def tv(page):
# ...
with tv_page then handling the parameters:
from functools import wraps
def tv_page(view_func):
#wraps(view_func)
def wrapper(id, slug):
page = load_page(id)
if slug != page.slug:
return redirect(url_for('tv', id=id, slug=page.slug))
return view_func(page)
return wrapper
Related
there is a video content type field in which there is a link slug, and when a new video is created, in the get request we get a null slug. tell me what's the matter. didn't install slugify
docs
getting stuck at getting {slug : null} after api call in strapi?
ok, this is what I did
I made a variable before POST request based on one of my form fields (eg:name field)
my formValues is an object with values of form fields like this
formValues = {name:"whatever" , decsription:"whatever"}
make a variable:
const slug = formValues.name.split(" ").join("-") + "-" + Math.random();
now we might have same names, so that's why I used a random value (you might want to use uuid or something like that)
then you send it like this
const res = await axios.post(`${API_URL}/api/events`,{...formValues, slug });
// you may not need to send an object with the shape like this
// but the point is you concat your custom slug to the object you want to send
notice I'm adding a custom slug from frontend which is somehow random but based off of one of the fields, but it doesn't really matter, right now strapi does not have any documentation about this common problem, it seems like the best solution might be usingstrapi-plugin-slugify but if that didn't work for you feel free to use my solution
everyone, I've been learning scrapy for a month. I need assistance with following problems:
Suppose there are 100-200 urls and I use Rule to extract further links from those urls and I want to limit the request of those links, like maximum 30 requests for each url. Can I do that?
If I'm searching a keyword on all urls, If the word is found on particular url, then I want scrapy to stop searching from that url and move to next one.
I've tried limiting url but it doesn't work at all.
Thanks, i hope everything is clear.
You can use a process_links callback function with your Rule, this will be passed the list of extracted links from each response, and you can trim it down to your limit of 30.
Example (untested):
class MySpider(CrawlSpider):
name = "test"
allowed_domains = ['example.org']
rules = (
Rule(LinkExtractor(), process_links="dummy_process_links"),
)
def dummy_process_links(self, links):
links = links[:30]
return links
If I understand correctly, and you want stop after finding some word in the page of the response, all you need to do is find the word:
def my_parse(self, response):
if b'word' is in response.body:
offset = response.body.find(b'word')
# do something with it
I want to access multiple url in a single scenario.
When the url is defined in Background and the another url is used in Scenario, the url is changed.
If I use path, the behavior is not expected.
Can it fix url in Background?
Feature: examples
Background:
* url 'https://jsonplaceholder.typicode.com'
Scenario: get all users and then get the first user by id
Given path 'users'
When method get
Then status 200
Given url 'https://api.github.com/search/repositories'
And param q = 'intuit/karate'
When method get
Then status 200
# The expected behavior is accessed to 'https://jsonplaceholder.typicode.com/users'.
# But the accual behavior is accessed to 'https://api.github.com/search/repositories/users'.
Given path 'users'
When method get
Then status 200
No, but if you move Given url 'https://api.github.com/search/repositories' to a second Scenario: it will work fine.
This is a deliberate design. Look at the hello world example. It makes 2 calls, but the url is mentioned only once, because the second call is just a path addition. This is the typical REST pattern.
So if you need to really do a different API call, you have to use the full url:
Background:
* def baseUrl = 'https://jsonplaceholder.typicode.com'
Scenario: get all users and then get the first user by id
Given url baseUrl
And path 'users'
When method get
Then status 200
Given url 'https://api.github.com/search/repositories'
And param q = 'intuit/karate'
When method get
Then status 200
Given url baseUrl
And path 'users'
When method get
Then status 200
I'm trying to write a small web crawler with Scrapy.
I wrote a crawler that grabs the URLs of certain links on a certain page, and wrote the links to a csv file. I then wrote another crawler that loops on those links, and downloads some information from the pages directed to from these links.
The loop on the links:
cr = csv.reader(open("linksToCrawl.csv","rb"))
start_urls = []
for row in cr:
start_urls.append("http://www.zap.co.il/rate"+''.join(row[0])[1:len(''.join(row[0]))])
If, for example, the URL of the page I'm retrieving information from is:
http://www.zap.co.il/ratemodel.aspx?modelid=835959
then more information can (sometimes) be retrieved from following pages, like:
http://www.zap.co.il/ratemodel.aspx?modelid=835959&pageinfo=2
("&pageinfo=2" was added).
Therefore, my rules are:
rules = (Rule (SgmlLinkExtractor (allow = ("&pageinfo=\d",
), restrict_xpaths=('//a[#class="NumBtn"]',))
, callback="parse_items", follow= True),)
It seemed to be working fine. However, it seems that the crawler is only retrieving information from the pages with the extended URLs (with the "&pageinfo=\d"), and not from the ones without them. How can I fix that?
Thank you!
You can override parse_start_url() method in CrawlSpider:
class MySpider(CrawlSpider):
def parse_items(self, response):
# put your code here
...
parse_start_url = parse_items
Your rule allows urls with "&pageinfo=\d" . In effect only the pages with matching url will be processed. You need to change the allow parameter for the urls without pageinfo to be processed.
Currently my URL's appear as www.website.com/entries/1, I'd like to make them appear as www.website.com/title-of-entry. I've been messing around with routes and have been able to get the entry title to display in the URL, but Rails is unable to find the entry without supplying an ID. If I send the ID along with the parameters, the URL appears as www.website.com/title-of-entry?=1. Is there anyway I can pass the ID without having it appear in the URL as a parameter? Thanks!
Like most things, there's a gem for this.
FriendlyID.
Installation is easy and you'll be up and running in minutes. Give it a whirl.
Ususally you'll want to to save this part in the database title-of-entry (call the field slug or something`). Your model could look something like this:
class Entry < ActiveRecord::Base
before_validation :set_slug
def set_slug
self.slug = self.title.parameterize
end
def to_param
self.slug
end
end
Now your generated routes look like this: /entries/title-of-entry
To find the corresponding entries you'll have to change your controller:
# instad of this
#entry = Entry.find(params[:id]
# use this
#entry = Entry.find_by_slug(params[:id])
Update
A few things to bear in mind:
You'll have to make sure that slug is unique, otherwise Entry.find_by_slug(params[:id]) will always return the first entry with this slug it encounters.
Entry.find_by_slug(params[:id]) will not raise a ActiveRecord::RecordNotFound exception, but instead just return nil. Consider using Entry.find_by_slug!(params[:id]).
If you really want your routes to look like this /title-of-entry, you'll probably run into problems later on. The router might get you unexpected results if a entry slug looks the same as another controller's name.