I have about 7 query-string parameters in my URL :
http://www.examplesitname.com/EN/en/tshirt-jeans.aspx?productid=324175730&documentid=295110&producttitle=Pyjama+Tshirt&categoryid=55479572&source=TreeStructureNavigation&numberpage=1&pos=TG_n_n
If I break it down following are the query string parameters :
productid
documentid
producttitle
categoryid
source
numberpage
pos
Out of these I need to only display productId and documentId to the search engine, what is the best approach to achieve this?
I could accommodate one more query string parameter named "extendedattributes" which would contain a comma seperated list of remaining parameters which I could separate back in the request and create a response accordingly, but is that a good way to achieve this ? Is there any other better way ?
Thanks
Google Webmaster Tools will let you designate URL-string parameters to ignore or not ignore when they index your site. (Look under "Site Configuration" and then "Settings.") Doesn't help you with other crawlers, of course, so this is only a partial solution.
First thing that comes to my mind: # the rest of parameters as follows. And then use JavaScript/Ajax to retrieve rest of the parameter and load content accordingly. However, this method may require design changes as anything after # does not reach to the web server.
http://www.examplesitname.com/EN/en/tshirt-jeans.aspx?productid=324175730&documentid=295110#producttitle=Pyjama+Tshirt&categoryid=55479572&source=TreeStructureNavigation&numberpage=1&pos=TG_n_n
Use robots.txt or other techniques to remove all alternatives and add to a sitemap only the urls you need. Search engines will only index those you want.
Related
I'm searching against a table of news articles. The 2 relevant columns are ArticleTitle and ArticleText. When I want to search an article for a particular term, i started out with
column LIKE '%term%'.
However that gave me a lot of articles with the term inside anchor links, for example <a href="example.com/*term*> which would potentially return an irrelevant article.
So then I switched to
column LIKE '% term %'.
The problem with this query is it didn't find articles who's title or text began/ended with the term. Also it didn't match against things like term- or term's, which I do want.
It seems like the query i want should be able to do something like this
'%[^a-z]term[^a-z]%
This should exclude terms within anchor links, but everything else. I think this query still excludes strings that begin/end with the term. Is there a better solution? Does SQL-Server's FULL TEXT INDEXING solve this problem?
Additionally, would it be a good idea to store ArticleTitle and ArticleText as HTML-free columns? Then i could use '%term%' without getting anchor links. These would be 2 extra columns though, because eventually i will need the original HTML for formatting purposes.
Thanks.
SQL Server's LIKE allows you to define Regex-like patterns like you described.
A better option is to use fulltext search:
WHERE CONTAINS(ArticleTitle, 'term')
exploits the index properly (the LIKE '%term%' query is slow), and provides other benefit in the search algorithm.
Additionally, you might benefit from storing a plaintext version of the article alongside the HTML version, and run your search queries on it.
SQL is not designed to interpret HTML strings. As such, you'd only be able to postpone the problem till a more difficult issue arrives (for example, a comment node that contains your search terms as part of a plain sentence).
You can still utilize FULL TEXT as a prefilter and then run an HTML analysis on the application layer to further filter your result set.
First of all, I have read RESTful URL design for search and How to design RESTful search/filtering? questions. I am trying to design more advanced options for searching in a simple and RESTful way.
The answers to those questions have given me some insight and clues on how to design my previous application url pattern for search/filter functionality.
First, I came up with quite nice and simple solution for basic filtering options, using pattern:
Equality search: key = val
IN search: key = val1 & key = val2
But as application has grown, so were the search requirements. And I ended up with some rather unpleasant and complex url pattern for advanced searching options which included:
Negation search: key-N = val
Like search: key-L = val
OR search: key1-O = val1 & key2 = val2
Range search: key1-RS = val1 & key1-RE = val2
Whats more, beside filters, query has to get information about pagination and order by, so the filter parameter has F- suffix, order by fields has O- suffix, and pagination has P- suffix.
I hope that at this point I do not have to add that parsing such request is rather malicious task, with the possibility of ambiguity if key will contain '-'. I have created some regexp to parse it, and it works quite well as for now, but...
Now I am starting to write a new web app and I have the chance to redesign this piece from scratch.
I am wondering about creating object in a browser containing all information in structured and self-explanatory way and send it to server as as JSON string, like:
filter = {{'type':'like','field':key,'value':val1,'operator':'and','negation':false},..}
But I get strange feeling that this is not good idea - I really don't know why.
So, this would be the definition of my context. Now the question:
I am searching for simpler and safer pattern for implementing advanced search including options I mentioned above as RESTful GET parameters - can you share some ideas?
Or maybe some insights on not doing this in a RESTful way?
Also, if you see some pitfalls in JSON way, please share them.
EDIT:
I know what makes sending json as get parameter, not so good idea. Encoding it - it makes it ugly and hard to read.
Info provided by links sended by thierry templier, gave me something to think about and I managed to design more consistient and safe filter handling in GET parameters. Below is definition of syntax.
For filters - multiple F parameters (one for each search criterium):
F = OPERATOR:NEGATION:TYPE:FIELD:VAL[:VAL1,:VAL2...]
allowed values:
[AND|OR]:[T|F]:[EQ|LI|IN|RA]:FIELD_NAME:VALUE1:VALUE2...
For order by - multiple O parameters (one for each ordered field):
O = ODINAL_NO:DIRECTION:FIELD
allowed values:
[0-9]+:[ASC|DESC]:FIELD_NAME
Pagination - single P parameter:
P = ITEMS_PER_PAGE:FROM_PAGE:TO_PAGE
I think this will be good solution - it meets all my requirements, it is easy to parse and write, it is readable and I do not see how that syntax can become ambiguous.
I wloud appreciate any thoughts on that idea - do you see any pitfalls?
There are several options here. But it's clear that if your queries tend to be complex with operators, and so on... you can't use a set of query parameters. I see two approaches:
Provide the query as JSON content to a method POST
Provide the query in a query parameter with a specific format / grammar to a method GET
I think that you could have a look at what ElasticSearch for their queries. They are able to describe very complex queries with JSON contents (using several levels). Here is a link to their query DSL: http://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl.html.
You can also have a look at what OData does for queries. They choose another approach with a single query parameter $filter. Here are some links that can give you some examples: https://msdn.microsoft.com/en-us/library/hh169248(v=nav.70) and http://www.odata.org/documentation/odata-version-3-0/url-conventions/. This option requires to have a grammar on the server side to parse your query.
In general, this link could also give you some hints at this level in its section "Filtering data": https://templth.wordpress.com/2014/12/15/designing-a-web-api/.
Hope it gives you some helpful hints to design your queries within your RESTful services ;-)
Thierry
I'm currently using the query
SELECT Url FROM Link WHERE CONTAINS(Url, 'href=blah')
It is including results with href=/blah. Any way I can tell the query to act more like WHERE Url LIKE '%href=blah%' and still use the full-text catalog?
Your problem is that = and / are both word breakers, in other words, sql fulltext is actually searching for href and blah
There are a couple of options you could try. First you could filter down the search domain using the fulltext engine, then search the subset of data using LIKE. You'll need to experiment to see how to squeeze out the best performance.
The other option is, if href=blah is a consistent term you could add that to a custom dictionary. A good article on this is here.
If I have a website like:
google.com/index.html?c=123123&p=shoes
Will it be better for SEO to have it as:
google.com/index.html?code=123123&footwear=shoes
I mean, does giving useful names to query string parameters help SEO?
Yes query string could help Google to understand the meaning of the page.
What is important with query string that you display unique content when changing the value of a parameter.
Example:
google.com/index.html?code=123123&footwear=shoes
google.com/index.html?code=123123&footwear=shoesB
If in this case you display the same content you can occur in duplicated issues.
(You can also use canonical URL)
The best would be re-write the the string as URL friendly like
google.com/footwear/shoes/name-product-ID
A unique URL for each product.
Here some useful resource of duplicate issue
http://www.seomoz.org/learn-seo/duplicate-content
Hope can help
What drawbacks can you think of if I design my REST API with query strings without parameter values? Like so:
http://host/path/to/page?edit
http://host/path/to/page?delete
http://host/path/to/page/+commentId?reply
Instead of e.g.:
http://host/api/edit?page=path/to/page
http://host/api/delete?page=path/to/page
http://host/api/reply?page=path/to/page&comment=commentId
( Edit: Any page-X?edit and page-X?delete links would trigger GET requests but wouldn't actually edit or delete the page. Instead, they show a page with a <form>, in which page-X can be edited, or a <form> with a Really delete page-X? confiramtion dialog. The actual edit/delete requests would be POST or DELETE requests. In the same manner as host/api/edit?page=path/to/page shows a page with an edit <form>. /Edit. )
Pleace note that ?action is not how query strings are usually formatted. Instead, they are usually formated like so: ?key=value;key2=v2;key3=v3
Moreover, sometimes I'd use URLs like this one:
http://host/path/to/page?delete;user=spammer
That is, I'd include a query string parameter with no value (delete) and one parameter with a value (user=spammer) (in order to delete all comments posted by the spammer)
My Web framework copes fine with query strings like ?reply. So I suppose that what I'm mostly wondering about, is can you think of any client side issues? Or any problems, should I decide to use another Web framework? (Do you know if the frameworks you use provides information on query strings without parameter values?)
(My understanding from reading http://labs.apache.org/webarch/uri/rfc/rfc3986.html is that the query string format I use is just fine, but what does that matter to all clients and server frameworks everywhere.)
(I currently use the Lift-Web framework. I've tested Play Framework too and it was possible to get hold of the value-less query strings parameters, so both Play and Lift-Web seems okay from my point of view.)
Here is a related question about query strings with no values. However, it deals with ASP.NET functions returning null in some cases: Access Query string parameters with no values in ASP.NET
Kind regards, Kaj-Magnus
Query parameters without value are no problem, but putting actions into the URI, in particular destructive ones, is.
Are you seriously thinking about "restful" design, and having a GET be a destructive action?