google-bigquery - google-bigquery

I am using BigQuery for SEO reasons. I am a search TC and I am a little confused why you are not using the Google Forum as I thought that was standard. What I want to use BigQuery for is to pull when my competitors change data on their website and which pages that were changed. So I need the URL that was changed and when it was changed (date) so I can also pull the page title and description to see what they are doing different than I am.
Is there anyone that knows how to use BigQuery to pull:
Date the page was changed
URL
Title
Description

We've switched to using Stack Overflow for support for many of our developer products, such as BigQuery. There's a great community here on StackOverflow, and the interface for formatting technical questions and interacting with the community is fantastic.
BigQuery does not collect the data for you-- it's a cloud service for performing ad hoc queries on massive datasets. Before first performing the queries, you need to upload the data to the service (as a CSV format).
So, if you have a job which collects this data -- URL, title, description, date and perhaps a hash of the webpage, you could potentially ingest a CSV file of this data into BigQuery and use it to understand when webpages have changed.
Of course, there are also 3rd-party services (such as Changedetection.com) which may be easier to use for your purposes.

Related

Is it possible to use BigQuery to find when people request a website?

I'm trying to figure out at what time people request content from a website (such as 'www.netflix.com'), do any of the avalible reports contain this data? If so how would I access it?
I've had a look around and can't see a table that has this data. Would anywhere else store it?
If you meant reports that available in BigQuery, I would suggest exploring BigQuery Public Datasets. At the moment I do not see any website access datasets but might still be useful for your reference.

Firebase Big Query: How can I see realtime data in BigQuery?

I have a Firebase application which is uploading events with parameters. I need to be able to view those events in order to debug some issues we're having in production. I can only see the tables which are generated nightly in BigQuery. I can find references online saying that BigQuery allows viewing real time data. What I can't find is any straightforward instructions on how to create those views.
Is it possible? If so, can someone give me instructions that even a complete newb could follow?
We have decided to use the BigQuery APIs for information we want to see immediately in the database.

How can I customize a data set for Google BigQuery? Can I export a file? How do I test it to see if it meets my needs?

I would like to improve the quality of existing data by using the Google BigQuery API to help validate the accuracy of existing data.
I dont see information on the types of data elements contained in the BigQuery and dont understand how to use an API if I just want to see what types of data are contained in there.
I tried looking for instructions and data elements in the Google Health Care API and Google BigQuery documentation and only saw how to set up a payment option.
I am a newbie at programming and wanted to do some preliminary research on these data sets prior to bringing them to our technical team.
I expect to see a list of relevant results based on a custom query.
You can see the data types supported
by Google BigQuery here and the conversion between different types here.
Also you can try out the BigQuery APIs in the OAuthPlayGround.

Splunk Database

I understand that Splunk does not need a lot of functionality that a MySQL database would provide, and to index and perform searches on Big Data it might not be a good option to use a relational database.
Does Splunk use Lucene as a search engine, or have they made their on-disk data format?
I am sorry if there are any problems in the way I am asking the question. This is my first question on Stack Overflow.
Splunk uses its own search engine, it's not based on any 3rd party.
Its search engine is based on files only, no database behind it.
It does not store fields, but raw data only. The fields are extracted during search time, and due to that are very dynamic.
Its also very fast in finding keywords in the data (needle in haystack).
Breaking the data into time-based events, attaching time for each raw event.
Marking every word found in the events and their location across the index
Storing the events in compressed format (tar.gz)
To be more detailed, Splunk is storing data in the following way:
Very fast search for keywords inside the events
Look in the original raw data
Create new fields on the raw data and use them with statistics commands.
Source:
http://www.splunk.com/web_assets/pdfs/secure/Splunk_for_BigData.pdf
http://docs.splunk.com/Documentation/Splunk/6.5.1/Indexer/Howindexingworks
+3 Years experience
Splunk architect.
Googling would have helped: http://answers.splunk.com/answers/43533/search-capabilities-of-splunk-how-powerful-is-it-really --> No Lucene
Splunk has proprietary data format for their indexes. Lucene is not used, and Splunk has it's own Search language called SPL.

Would server traffic software (something like Piwik or Google) make a good case for using No SQL?

We are trying to develop a company specific tracking software but not interested in Google or Piwik. Essentially we would have a JavaScript tracking code also. The data that it would capture, would that be best suited for traditional RDMS or can we get a NO SQL solution ?
Any thoughts or ideas welcome.
Creating xml files could do the trick for a no sql solution but web analytics can encompass a very large ammount of data depending on your "tracking software." You'll need some sort of relational data solution if you want to properly analyse the data and see trends such as how many unique visitors are using a specific browser.