Is it possible/wise to NOT link any pages from index? (SEO, Search Engines) - seo

I have a humble question :)
i plan to set up a rather unusual webproject with about a thousand pages, where there won't be a classical navigation (only for about page and contact) and all pages won't link to one and another.
its index > opens random page > opens random page > opens random page.. all via a small php action..
i know from basic SEO understanding, that you should then generate a static directory like a sitemap, that links to all pages, so that google finds all pages from the index downwards..
BUT i don't want users, to see it.. it kills the fun on using the site, when you can see all content pages at a glance.. this project is all about exploring random things..
is this somehow possible? to have a dead end index page and a thousand dead end html pages that are only connected via a php script?
thanks in advance..

From a technical standpoint, there are no issues in what you are planning. From a SEO indexing and Google standpoint, make sure none of the pages you want discovered and indexed by Google are orphans, i.e. without a link to these pages.
These "hidden" pages need not be linked from the home page or a sitemap (one-to-many), instead you can try the breadcrumb method where a page leads only to the next page, which leads to the next page (one-to-one) and so on.
e.g. -
Parent 1 > child 1 > child 1a > child 1b .......
Parent 2 > child 2 > child 2a > child 2b .......
Parent 3 > child 3 > child 3a > child 3b .......
Here, the home page and your sitemap will have links ONLY to Parent 1, Parent 2 & Parent 3
UPDATE--
Also, not having a HTML sitemap for your users will not affect Google indexing as long as your XML sitemap is in place for Google to access.
Hope this helps.

There's not technical problem in having one page generate links to other pages upon generation, but I feel like there are some issues in the general idea here..
Firstly, why do you want your "sub pages" to be indexed by Google? Per definition, this defeats the "random page" idea. A Google search over your site (for instance using the "site:" feature of Google) will list all your pages, since they're indexed. This means it is easy to navigate between the "secret pages" (even if only cached versions of them).
Secondly, unless you prevent Google from indexing your pages (via a robots.txt file, for instance) - the Google bot will generate at least a subset of your pages by visiting the index page and then generating a link to a sub page.
To conclude, you can create an index which sends users over to random pages, but it probably makes little sense to have the web site indexed by a search engine if you'd like the sub pages to be "secret".

Related

Indexed_search only the news detail

I'm setup a TYPO3 v9.5 website with the Indexed_search ext.
When I search a word using the seachbox on the FE it show all results : home page, categories pages, and news pages.
there is a way to index/search only the newsitems detail page ?
There are multiple ways to achieve this.
In my opinion the simplest (without setting up crawler configurations) would be to limit indexing to only this page.
See https://docs.typo3.org/c/typo3/cms-indexed-search/master/en-us/Configuration/General/Index.html
On your root page you would set page.config.index_enable = 0 in TypoScript setup and on your news detail page page.config.index_enable = 1. Then clear and rebuild the index.
Another possibility for smaller sites is to filter the shown results in your Fluid template. I would not really suggest that but it works, too.

How do I export all pages below a page in confluence using a script

I'm capturing meeting notes in a page structure in confluence. I want to export all the meeting notes to a share drive for others to read. I've found notes on how to export a page or a space but not the pages below a page. e.g. I want everything below "Parent Page" but not anything else.
e.g.
Space
Unrelated Pages
Unrelated Pages
Parent of Parent
Parent Page
Child Page 1
Child Page 2
Child page 3
I want to drag the child pages to a share drive. I'm looking to use one of the following e.g. curl, .bat files, python, R etc.
This is on the cloud version of confluence
Go to the Space directory, find the relevant space and click the (i) icon there to get to the space details page. From there, click either Space Operations/PDF Export, or Content Tools/Export, then choose Custom Export. You’ll be able to select the list of pages to be exported. (All pages under a given page can be selected by clicking Deselect All and then clicking the checkbox for the parent page. All child pages will be selected automatically.)
My first instinct was to give the "other" folks who want to read the meeting notes, (read) access to confluence itself - thats what confluence was meant for.
But if you're dead set on living in the 90s and downloading stuff to another drive, you can try the Page Tree Word Exporter plugin (But this is manual)
Script wise, you can do the following:
Get all the child pages with the REST API: Make a GET call to
https://confluence-domain.com/rest/api/content/search?cql=parent={Parent Page-id}
This will return a "results" array with info about the child pages. Parse out the "id" fields. (Hint: if you are using bash, you can use the beautiful jq library https://stedolan.github.io/jq/
Once you have the child page IDs, you can export each individually to PDF using:
wget https://confluence-domain.com/spaces/flyingpdf/pdfpageexport.action?pageId=xxxx -O mypage.pdf
This blog might help you in your coding: http://javamemento.blogspot.no/2016/05/jira-confluence-3.html
Step 1 (bottom left):
Step 2:
To export all pages below a page, check the parent page.
If the export is absent, the admin needs to give you the permissions for it.

How to search category and fill site parameters automatically

I have seen some sites where you search (in Google) for a particular item category and when you click the link found in Google it automatically goes to the site clicked with the search criteria filled in displaying the categorised products.
Hypothetical Example
Go into Google type in Sony TV , click to search.
Results are displayed.
Clicking one of the links takes me to a website which shows all the Sony TV models beginning with AA.
Looking at the search options on the page some fields have been automatically filled in (in other words if you did this search manually the site would prompt you to enter some search criteria) - Not sure if this is relevant but thought to mention.
How is this done? Do i need to setup something in our Google account to get the same results?
It's fairly simple. You pass parameters in your URLs that identify the product, and then you just read the URL parameters when pre-populating the search form on the page. When building your site / sitemap / internal & external links you use those page URLs and Google will naturally pick them up.
In your example, you search for Sony TV. One of the results may be
example.com/index.php?product=sony-tv
The website has the variable sony-tv, which it gets from the URL and pre-populates on the search form.
The important part to note is that the site will have built its URL structure in this method typically and the page you're presented with just happens to look like the site dynamically searched based on your query (it hasn't).

Canonical tag for content split across multiple pages

We have pages which have been split into multiple pages as they are too in depth. The structure currently...
Page (www.domain.com/page)
We have split this up like so...
Page + Subtitle (www.new-domain.com/page-subtitle-1)
Page + Subtitle (www.new-domain.com/page-subtitle-2)
Page + Subtitle (www.new-domain.com/page-subtitle-3)
I need to know the correct way of adding in multiple canonical tags on the original page. Is it search engine friendly to add say 3/4 canonical tags linking to 3/4 separate pages?
Well, this is what you should do -
Keep the complete page even if you are dividing into component pages.
Use rel="next" and rel="prev" links to indicate the relationship between component URLs. This markup provides a strong hint to Google that you would like it to treat these pages as a logical sequence, thus consolidating their linking properties and usually sending searchers to the first page.
In each of the component pages, add a rel="canonical" link to the original (all content) page. This will tell google about the all content page.
This strategy is recommended by google - read here.
Canonical tags are basically to consolidate link signals for duplicate or similar content. With that said, you are not supposed to have multiple canonical tags in a page. You have two options.
If your old page is going to go away, then you should pick one primary page(in the split pages) and do a 301 redirect, so the SEO value are carried over to that new primary URL.
If its going to stay, you can create internal links to the new pages. But make sure the content is different, so that it does not count as duplicate pages.
Hope this helps.

How do I setup a robots.txt which allows all pages EXCEPT the main page?

If I have a site called http://example.com, and under it I have articles, such as:
http://example.com/articles/norwegian-statoil-ceo-resigns
Basically, I don't want the text from the frontpage to show on Google results, so that when you search for "statoil ceo", you ONLY get the article itself, but not the frontpage which contains this text but is not of the article itself.
If you did that, then Google could still display your home page with a note under the link saying they couldnt crawl the page. This is because robots.txt doesnt stop a page being indexed. You could noindex the home page, though personally I wouldnt recommend it.