Test google news sitemap? - google-news

How can I test / validate my Googlenews sitemap?
If I go to the search console I have an option to add/test sitemap. However it saying I have an invalid XML tag:
Parent tag: publication
Tag: keywords
But I can see this tag is valid so I think the validator is testing it as a normal sitemap not a googlenews specific one:
https://support.google.com/news/publisher/answer/74288?hl=en#submitsitemap
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:news="http://www.google.com/schemas/sitemap-news/0.9">
<url>
<loc>
http://www.website.com/page
</loc>
<news:news>
<news:publication>
<news:name>Sitename/news:name>
<news:language>en</news:language>
<news:keywords>Shopping</news:keywords>
</news:publication>
<news:title>Page title here</news:title>
<news:publication_date>2015-11-12T14:16:31+00:00</news:publication_date>
</news:news>
</url>
<url>
<loc>
http://www.website.com/other-page
</loc>
<news:news>
<news:publication>
<news:name>Sitename</news:name>
<news:language>en</news:language>
<news:keywords>Shopping</news:keywords>
</news:publication>
<news:title>
Page 2 title here
</news:title>
<news:publication_date>2015-11-12T12:52:03+00:00</news:publication_date>
</news:news>
</url>
<url>
If I go to the news tools homepage in google its telling me that the site is included in Google News. But how can I check that my sitemap is working correctly?

From Google itself: Validating a News Sitemap
The following XML schemas define the elements and attributes that can appear in a News Sitemap file. A News Sitemap can contain both News-specific elements and core Sitemap elements. You can download the schemas from the links below:
For News-specific elements: http://www.google.com/schemas/sitemap-news/0.9/sitemap-news.xsd.
For core Sitemap elements: http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd
There are a number of tools available to help you validate the structure of your Sitemap based on these schemas. You can find a list of XML-related tools at each of the following locations:
http://www.w3.org/XML/Schema#Tools
http://www.xml.com/pub/a/2000/12/13/schematools.html
In order to validate your News Sitemap file against a schema, the XML file will need additional headers as shown below:
<?xml version='1.0' encoding='UTF-8'?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:news="http://www.google.com/schemas/sitemap-news/0.9"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9
http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd
http://www.google.com/schemas/sitemap-news/0.9
http://www.google.com/schemas/sitemap-news/0.9/sitemap-news.xsd">
<url>
...
</url>
</urlset>

The tag is valid but was in the wrong place in the XML structure.
<url>
<loc>
http://www.website.com/page
</loc>
<news:news>
<news:publication>
<news:name>Sitename/news:name>
<news:language>en</news:language>
</news:publication>
<news:title>Page title here</news:title>
<news:publication_date>2015-11-12T14:16:31+00:00</news:publication_date>
<news:keywords>Shopping</news:keywords>
</news:news>
</url>

Related

problem with formatting date inside velocity template

im trying to create a sitemap via velocity template in dotcms
this is my code
#set($articles = $json.fetch('https://xxxxx.xxxx.xxx/api/content/render/false/type/json/query/+contentType:YmArticlePage%20+(conhost:b5d82078-a844-4104-9664-4348c2420bba%20conhost:SYSTEM_HOST)%20+(wfscheme:d61a59e1-a49c-46f2-a929-db2b4bfa88b2*)%20+(wfstep:dc3c9cd0-8467-404b-bf95-cb7df3fbc293*)%20/limit/1/orderby/score,modDate%20desc'))
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
#foreach($article in $articles.contentlets)
<url>
<loc>$!{host}$!{article.path}</loc>
<changefreq>always</changefreq>
<priority>0.6</priority>
<lastmod>$date.format('medium', $article.modDate) $article.modDate</lastmod>
</url>
#end
</urlset>
i am able to render the $article.modDate, but if i am to format it using dateViewtool it doesnt seem to format it and just display the string "$date.format('medium', $article.modDate)" when viewed in the browser.
see screenshot below.
the dotcms content type api is throwing the dates as string. So i have to convert them first into dates and then i can do the formatting
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
#foreach($article in $articles.contentlets)
#set($md = $date.toDate("yyyy-MM-dd HH:mm:ss",$article.modDate))
<url>
<loc>$!{host}$!{article.path}</loc>
<changefreq>always</changefreq>
<priority>0.6</priority>
<lastmod>$date.format("dd MMM yyyy HH:mm:ss", $md)</lastmod>
</url>
#end
</urlset>
this works for me

One Sitemap image per language, possible?

I generate a sitemap image file which looks like that:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:image="http://www.google.com/schemas/sitemap-image/1.1">
<url>
<loc>https://mywebsite.com/en/VISION-ART/image/boy-with-pipe-pablo-picasso</loc>
<image:image>
<image:loc>https://mywebsite.com/include/php/render/framed/VR/1/image/VndpwKicqcA%3D/ver//frame/gold-bronze-b88-1/borderSize/0.90/mat/fa3caad76c/matSize/6/matSizePercent/12/maxSize/500/minSize/400/le-garcon-a-la-pipe-pablo-picasso.jpg</image:loc>
<image:title>Boy with pipe - PABLO PICASSO</image:title>
<image:caption>Boy with pipe - PABLO PICASSO, art photography of VISION D&apos;ART. , smokehouse, pipe, boy, worker, blue work, painting, smoke, PABLO PICASSO</image:caption>
</image:image>
</url>
<url>
<loc>https://mywebsite.com/en/VISION-ART/image/the-dream-pablo-picasso</loc>
<image:image>
<image:loc>https://mywebsite.com/include/php/render/framed/VR/1/image/VndowKicqcA%3D/ver//frame/gold-bronze-b88-1/borderSize/0.90/mat/26ecae929f/matSize/6/matSizePercent/12/maxSize/500/minSize/400/le-reve-pablo-picasso.jpg</image:loc>
<image:title>The dream - PABLO PICASSO</image:title>
<image:caption>The dream - PABLO PICASSO, art photography of VISION D&apos;ART. , sleep, dream, PABLO PICASSO, oil painting, painting, drawing, abstract, woman</image:caption>
</image:image>
</url>
...
</urlset>
Can I generate others sitemap image for others language.
For, the same image page, <loc></loc> will be different, for example, for french version :
https://mywebsite.com/en/VISION-ART/image/the-dream-pablo-picasso
>>>
https://mywebsite.com/fr/VISION-ART/image/le-reve-pablo-picasso
And of course, title and caption will be different as well.
But <image:loc></image:loc> will remain the same.
So, can I generate one sitemap image per language ? And how to tell google for which language the sitemap image is for?

Multiple sitemaps within single sitemap file

I have a site with a sitemap, around 150 entries to URLs that are static on the site. lastmod element is set to 2012. This sitemap was updated approximately a year ago.
The last couple of lines of this SM file are:
<url><loc>http://example.com/sitemap2.xml</loc><changefreq>daily</changefreq></url>
<url><loc>http://example.com/siteMap3.xml</loc><changefreq>daily</changefreq></url>
</urlset>
Sitemap 2 contains the same logic but with links targetting specific products and sitemap 3 does the same but aimed towards categories. These two are generated daily.
The main sitemap.xml is registered. An external SEO advisor ran a test and advised the sitemap is not updated and does not list the links for products and categories.
How could I check if what he has said is correct? If he is correct what could I have done wrong here?
Having multiple sitemaps is fine, but you should link them from a sitemap index file.
For your case, you could use something like this:
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>http://example.com/sitemap1.xml</loc>
<lastmod>2012</lastmod>
</sitemap>
<sitemap>
<loc>http://example.com/sitemap2.xml</loc>
<lastmod>2016-10-11</lastmod>
</sitemap>
<sitemap>
<loc>http://example.com/sitemap3.xml</loc>
<lastmod>2016-10-11</lastmod>
</sitemap>
</sitemapindex>
Instead of changefreq, you have to use lastmod, which takes the date of the last modification of the sitemap (not of the sitemap entries).
This sitemap index file can then be linked in your robots.txt (and/or be submitted to search engines).

Multi-tiered sitemap?

From: http://www.sitemaps.org/protocol.html :
If you want to list more than 50,000 URLs, you must create multiple
Sitemap files <...> If you do provide multiple Sitemaps, you should
then list each Sitemap file in a Sitemap index file. Sitemap index
files may not list more than 50,000 Sitemaps and must be no larger
than 10MB (10,485,760 bytes) and can be compressed. You can have more
than one Sitemap index file.
Is it possible then to create a 3- or more tiered chain? For example:
//mysite/sitemap.xml is:
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>http://mysite/sitemaps/index.xml</loc>
<lastmod>2004-10-01T18:23:17+00:00</lastmod>
</sitemap>
</sitemapindex>
//mysite/sitemaps/index.xml is:
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>http://mysite/sitemaps/sitemap-lm.xml.gz</loc>
</sitemap>
<sitemap>
<loc>http://mysite/sitemaps/sitemap-1.xml.gz</loc>
</sitemap>
....
</sitemapindex>
and //mysite/sitemaps/sitemap-lm.xml.gz is a normal gzipped XML-file, passing validation and so on.
Id est:
/robots.txt -> /sitemap.xml -> /sitemaps/sitemapslist.xml ->
/sitemaps/sitemap-1.xml.gz
The specification doesn't give a clear answer.
Google and personal input both have yelded inconclusive and contradictory answers, ranging from "sure, why not" to "no, because nobody does it that way".
Any ideas would be welcome!
No, I don't think you can proceed that way, but there is nothing preventing you from declaring multiple <sitemapindex> sitemaps in your robots.txt with multiple sitemap:<path-to-a-sitemap-index-sitemap> lines in it.
Your robots.txt would be the 1st level, the listed <sitemapindex> sitemaps would be the second level, and the real sitemaps would be the 3rd level.

How to keep cts:highlight from matching inside XML tags?

I am trying to search some content and highlight the search strings present in the XML content(like google) in MarkLogic using REST API. The problem is when I am including "ME" in the search-string,it's highlighting the 'i' tags(html Italic tags) along with the "Me" in the content. I have created a document with some elements and running a word-query on the document.
For example XML content:
<resources>
<title> some data from me</title>
<desc> more data <i> from </i> somewhere by me </desc>
</resources>
I have created a document with root node 'resources' and child elements 'title' and 'desc' and searching the search strings within the document using word-query.
Now when i search for "some me" ,its retrieving the content like
<resources>
<title> <<span class="highlight">some</span> data from <<span class="highlight">me</span>
</title>
<desc> more data <<span class="highlight">i</span>> from <<span class="highlight">i</span>> somewhere by <span class="highlight">me</span> </desc>
</resources>
Url:
localhost:9000/v1/search?q=some me&collection=Data&start=0&pageLength=10&options=Transformation&format=json
I am using cts:highlight for highlighting,some thing like :
cts:highlight($final-result, $query, fn:concat('<span class="highlight">',$cts:text,'</span>')), $custom-config)
Any ideas on why the html elements are highlighted here?
Thanks in Advance.
You probably inserted your document in text format, not xml format. I can reproduce your issue by inserting in text format:
xdmp:document-insert("test.xml", text {"<resources>
<title> some data from me</title>
<desc> more data <i> from </i> somewhere by me </desc>
</resources>"})
then running a cts:highlight on that document:
cts:highlight(doc("test.xml"), cts:parse("some me"), concat('<span class="highlight">', $cts:text, '</span>'))
But if I re-insert the document as XML:
xdmp:document-insert("test.xml", <resources>
<title> some data from me</title>
<desc> more data <i> from </i> somewhere by me </desc>
</resources>)
then the same cts:highlight works better:
<?xml version="1.0" encoding="UTF-8"?>
<resources>
<title> <span class="highlight">some</span> data from <span class="highlight">me</span></title>
<desc> more data <i> from </i> somewhere by <span class="highlight">me</span> </desc>
</resources>
If I add the suggestion from #ehennum and #mholstege and instead run this cts:highlight:
cts:highlight(doc("test.xml"), cts:parse("some me"), <span class="highlight">{$cts:text}</span>)
then I get what I would guess you're looking for:
<?xml version="1.0" encoding="UTF-8"?>
<resources>
<title> <span class="highlight">some</span> data from <span class="highlight">me</span></title>
<desc> more data <i> from </i> somewhere by <span class="highlight">me</span> </desc>
</resources>
What version of MarkLogic is this?
Can you include a more complete example? What is $custom-config for example? And how is the REST call results linked to cts:highlight? For markup to be highlighted in this way, that results would have to be text rather than XML.
By the way, the third argument to cts:highlight is an expression -- if you want to create markup, just use constructors there, not string concatenation:
cts:highlight($final-result, $query, <span class="highlight">{$cts:text}</span>, $custom-config)
Try supplying the tags in the cts:hightlight() expression as nodes instead of a string.
That is, instead of
fn:concat('<span class="highlight">',$cts:text,'</span>')
try
<span class="highlight">{$cts:text}</span>
For more information, see the first example in:
http://docs.marklogic.com/cts:highlight?q=cts:highlight
Hoping that helps,