I am currently working on a search infrastructure which uses elasticsearch as the indexing engine. The requirement is to crawl and index 5 subdomains:
subdomain a is related to products
subdomain b is related to FAQs/Questions
subdomain c is related to internet plans
Now, once you search for anything related to products, it is required to prioritize searching in subdomain a -- that is, top results must belong to subdomain a. If one searches for questions, then top results must primarily come from subdomain b, and so on.
My idea is to separately index based on subdomain url, then give each some sort of priority using index.priority in elasticsearch. However, that proved to be unstable and still does not produce the desired effect.
Any other possible approaches you can suggest? Thanks in advance!
Related
We've been working successfully with the AdWords API (Version: 201708 -
Google Ads Python Client Library) for a good while building internal reports for our application. Until, that is, we hit placements…
I define placements as anywhere an AdWords ad is shown. The placement might be a domain, page, ad unit, app you name it! Placements are a very broad definition.
For our app to work for placements we need to join API spend data with activity on our website.
To do this we are running AdWords API reports and then collecting session data using AdWords ValueTrack parameters.
The ValueTrack parameters are easy enough, as there seems to be only 1 option: {placement}.
However, it's on the API where things get interesting, the API has numerous options for getting placement data. For example:
https://developers.google.com/adwords/api/docs/reference/v201708/AdGroupCriterionService.MobileApplication
https://developers.google.com/adwords/api/docs/appendix/reports/url-performance-report
https://developers.google.com/adwords/api/docs/appendix/reports/placement-performance-report#criteria
https://developers.google.com/adwords/api/docs/appendix/reports/automatic-placements-performance-report#domain
https://developers.google.com/adwords/api/docs/reference/v201708/AdGroupCriterionService
After spending some time going back and forth on the various options, and burning lots of dev time, we've come to the conclusion that there must be some best practice advice out there for joining placement data from the API and ValueTrack. One that works for all types of placements, including:
Websites
Apps
AdSense
Blogspot
AMP
An example of where we are running into a matching problem is "10060.android.com.nytimes.android.adsenseformobileapps.com"... this is a placement we see coming in from ValueTrack but has no match in any of our spend reports. (In fact there are many many adsenseformobileapps.com traffic sources for which there are no spend items).
Also seeing strings like "mobileapp::2-com.mobilesrepublic.appy". These show up on our spend side but only appear in our ValueTrack around 10% of the time. Some match. The vast majority don't.
A definitive workflow on this would be SO useful for ourselves and no doubt other users…
Thanks!
According to https://developers.google.com/adwords/api/docs/guides/valuetrack-mapping
the incoming ValueTrack placement should map to the following report fields:
PlacementPerformanceReport.Criteria
CriteriaPerformanceReport.Criteria
AutomaticPlacementsPerformanceReport.DisplayName
In addition to this I have also found this report useful:
UrlPlacementPerformanceReport.Domain and .Url
But I have found it is not so clear in practice. For one thing each of these reports return a slightly different subset of results..and none of these subsets exactly match the ValueTrack data set.
Here are the exceptions I have found:
subdomains
ValueTrack placements have urls with www on them... some of the time. None of the other reports do, so you will either have to strip www from ValueTrack or add www to your report data in order to match them. But be careful, other subdomains are preserved (like edition.cnn.com) and not all urls have a subdomain, so you can't just strip all the subdomains from Valuetrack and you can't just add www to all the urls in the reports. What I have found actually matches the best is the url field from the UrlPlacementPerformanceReport... but for this field you just need to strip everything after the / to get a best case matching subset. To use the other reports you would need to strip all the subdomain information from ValueTrack and sum the totals from those records. This means you would lose potentially useful data such as differences between espn.com, scores.espn.com, insider.espn.com and games.espn.com. Using the UrlPlacementPerformanceReport.url is the only way to preserve that info.
mobileapp::
ValueTrack reports on mobileapp:: placements. Many of the reports return these values too but I have found that each report just gives a subset of the whole. In particular the CriteriaPerformanceReport.Criteria report gives you many mobileapp:: values that none of the other reports do, but the other reports give you at least a few values that the CriteriaPerformanceReport doesn't. To be complete you would have to take a Union of the mobileapps: returned by the criteria performance report and another report such as the UrlPlacementPerformanceReport.url.
anonymous.google
ValueTrack provides sudomains to anonymous.google that look like a8122ac7e5da8e49.anonymous.google. If you want to match this information to your spend the only report that has this detail is UrlPlacementPerformanceReport.url.
adsenseformobileapps.com
ValueTrack provides detailed domains such as 1.iphone.com.localtvllc.fox2.adsenseformobileapps.com. None of the adwords reports can match this. The best you can get is a single summation record for the entire adsenseformobileapps.com group.
A friend of mine has opened a local business and not knowing what to do choose a URL with the name of her company (which is a playword). The chosen name is quite bad for google ranking because not meaningful: not indicating the nature of business nor the location (city).
I would like her to buy two new domains:
businessname-business-type-city.com
businessnamebusinesstypecity.com
is that still ok with google? I was doing that some years ago and ranked first on the search.
You can get as many domain names as you want, and set up the DNS to point to the ip of playword. But don't think it would be worth it, and no guarantee it would generate more hits.. Google search take location in consideration, so you're probably best of branding playword, and generate buzz in other ways; social media, flyers, many Google ads, and sponsored posts on Facebook
And be sure to have good semantics on your page
If the domain name is the name of the business then it is meaningful. Value of keywords in domain names has been diminished in recent times. Market the business, not the domain name. (And google local business results will help)
I have a client who has presented the following situation:
A parent company works with two distributors of their products. Both distributors want a new website developed. They both sell the same product, so will want to share content and basic page layouts. For example, product listings will be same across both sites, as will copy about the products they provide.
My concerns here are with SEO and duplicate content. Google defines duplicate content as:
Duplicate content generally refers to substantive blocks of content
within or across domains that either completely match other content or
are appreciably similar.
It seems that in this case, where two distributors are selling the same product, each having a website that duplicates content, is legitimate. But, I have a feeling either site could get penalised. So perhaps having two sites would be too damaging.
Any thoughts on this much appreciated.
Thanks
If it can't be helped, it's fine. Ideally you would want to make unique descriptions and content.
As recent as a couple months back, I had a staging site that didn't have the noindex tag on it by mistake. The staging site and the actual site were both ranking well for keywords.
While you are probably fine, you should still look into allotting time for content development.
For example we have 5 landing pages, running under the same URL, being served randomly based on their weightage.
What I want is to check which page is converting more and increase it's weightage automatically so that it get served more.
This is the simple explanation of my problem. Is there any standard algorithms and techniques available to achieve this. What I don't want is to reinvent the wheel.
Thanks.
I would do something very simple such as keeping a running count of how many times a landing page was converted. After that you have a variety of choices:
a) Sort them by hit count and serve the top one(s)
b) Serve the pages in a fashion weighted by the number of hits. For example, you could serve the pages based on a probability distribution derived from the hit count (e.g. if you have two landing pages and page A is hit twice more than page B, you serve page A twice as often as page B). Tweaking the function will allow to control the relative rates at which different pages are served.
However, this begs a question: if a user returns twice to your site, will they get a different page? If so, wouldn't that be confusing?
I have a client who over the years has managed to get their product to the top of Google for many different search terms. They're adamant that the new site shouldn't have a detrimental effect to their google ranking.
The site will be replacing the site that is on there current domain, as well as going up on to 5 further domains.
Will any of this lose the client there current ranking on google?
Google re-ranks the sites it has regularly. If the site changes, the ranking very well could... if more or fewer people link to it or if the terms on the site (the content) is different.
The effect might be good or bad, but uploading different content isn't going to make their rank go away overnight or anything like that.
Page Rank is most about incoming links. So if the incoming links won't be broken page rank will not be affected that much.
Though, overall ranking is not just Page Rank, so... further discussion is needed
if they retain current link structure they should be fine