Zeppelin configuration properties file: Can't load BigQuery interpreter configuration - google-bigquery

I am attempting to set my zeppelin.bigquery.project_id (or any bigquery configuration property) via my zeppelin-site.xml, but my changes are not loaded when I start Zeppelin. The project ID always defaults to ' '. I am able to change other configuration properties (ex. zeppelin.notebook.storage). I am using Zeppelin 0.7.3 from https://hub.docker.com/r/apache/zeppelin/.
zeppelin-site.xml (created before starting Zeppelin, before an interpreter.json file exists):
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>zeppelin.notebook.storage</name>
<value>org.apache.zeppelin.notebook.repo.S3NotebookRepo</value>
<description>notebook persistence layer implementation</description>
</property>
... etc ...
<property>
<name>zeppelin.bigquery.project_id</name>
<value>my-project-id</value>
<description>Google Project Id</description>
</property>
</configuration>
Am I configuring the interpreter incorrectly? Could this parameter be overridden elsewhere?

I am not really familiar with Apache Zeppelin, but I have found some documentation pages that make me think that you should actually store the BigQuery configuration parameters in your Interpreter configuration file:
This entry in the GCP blog explains how to use the BigQuery Interpreter for Apache Zeppelin. It includes some examples on how to use it with Dataproc, Apache Spark and the Interpreter.
The BigQuery Interpreter documentation for Zeppelin 0.7.3 mentions that zeppelin.bigquery.project_id is the right parameter to configure, so that is not the issue here. Here there is some information on how to configure the Zeppelin Interpreters.
The GitHub page of the BigQuery Interpreter states that you have to configure the properties during Interpreter creation, and then you should enable is by using %bigquery.sql.
Finally, make sure that you are specifying the BigQuery interpreter in the appropriate field in the zeppelin-site.xml (like done in the template) or instead enable it by clicking on the "Gear" icon and selecting "bigquery".

Edit /usr/lib/zeppelin/conf/interpreter.json, change zeppelin.bigquery.project_id to be the value of your project and run sudo systemctl restart zeppelin.service.

Related

PHP/Java Bridge Error

I am trying to connect the PHP with Java with the PHP/Java Bridge library. But I am facing an error as given below.
Please check below screenshot.
By default, the phpjavabridge servlet try to connect to a php-cgi daemon. It can be easily fixed, either by starting one of the bundled php-cgi (found in WEB_INF/cgi, follow the error message) or modify the configuration in the web.xml (see an example here) to provide your system php-cgi path. Repackage and redeploy your war file.
That said, if you want to use Java from PHP (and not PHP from Java), you don't need a php-cgi daemon. My recommendation, have a look to the following project: http://docs.soluble.io/soluble-japha/. This is a reworked version focusing on PHP -> Java integration.
Feel free to open issues on https://github.com/belgattitude/soluble-japha

Is using ENTITY allowed in SSIS configuration package?

I am trying to define one global configuration package for all my .dtsx files.
I have a login there:
<Configuration ConfiguredType="Property" Path="\Package.Connections[SourceConnectionOLEDB].Properties[UserName]" ValueType="String">
<ConfiguredValue> exampleLoginHere </ConfiguredValue>
</Configuration>
This login appears in many places.
So, what i'm trying to do is to set this login into variable and change only in one place instead of do that in all occurrences.
I found This solution but when i put
<!DOCTYPE DTSConfiguration [
<!ENTITY sourceLogin "exampleLoginHere">
]>
and then change
<ConfiguredValue> exampleLoginHere </ConfiguredValue>
to
<ConfiguredValue> &sourceLogin; </ConfiguredValue>
my dtsx after start return:
Warning: Cannot load the XML configuration file. The XML configuration file may be malformed or not valid
Am I doing something wrong? I forgot about something?
Package Configuration files are nothing but regular xml files and the rules that apply for them should be working for this as well. Having said that to address your need of having a variable across multiple packages you can set it up as a 'Indirect Configuration' and have the value coming from a 'SQL Server' table. Here is a link that gives more detailed breakdown of how it works -
http://bi-blogger.typepad.com/etlbi_blogger/2008/05/using-indirect-configuration-with-ssis.html

Getting exception while running Pig Script

I am getting the below error while running a Pig Script on approximately a 300GB dataset.
Error: Exceeded limits on number of counters - Counters=120 Limit=120
Does anybody have any ideas on how to resolve the issue without modifying counter config in the Pig properties file?
This can not be qualified as proper answer since you need modify configuration files. I don't think there is any way at the moment of doing this without modifying some configuration files.
Now this is pure nit picking but actually you can do this without modifying Pig properties. All you need to do is to configure counter limit in Hadoop configuration file.
Add mapreduce.job.counters.max or mapreduce.job.counters.limit, depending your Hadoop version, to your file mapred-site.xml. Eg.
<property>
<name>mapreduce.job.counters.limit</name>
<value>256</value>
</property>
Remember to restart all node managers and also the history server.

Why 'mapred-site.xml' is not included in the latest Hadoop 2.2.0?

Latest build of Hadoop provides mapred-site.xml.template
Do we need to create a new mapred-site.xml file using this?
Any link on documentation or explanation related to Hadoop 2.2.0 will be much appreciated.
I believe it's still required. For our basic Hadoop 2.2.0 2-node cluster setup that we have working I did the following from the setup documentation.
"
From the base of the Hadoop installation, edit the etc/hadoop/mapred-site.xml file. A new
configuration option for Hadoop 2 is the capability to specify a framework name for
MapReduce, setting the mapreduce.framework.name property. In this install we will use the
value of "yarn" to tell MapReduce that it will run as a YARN application.
First, copy the template file to the mapred-site.xml.
cp mapred-site.xml.template mapred-site.xml
Next, copy the following into Hadoop etc/hadoop/mapred-site.xml file and remove the original empty tags.
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
"
Wrt documentation, I found this the most useful. Also, etc/hosts configs for cluster setup and other cluster related configs were a bit hard to figure out.

How to fix java IOException: Can't find resource 'solrconfig.xml' in classpath?

I was trying to install Apache Solr 4.0.0 with apache Tomcat, but it is giving an error like this:
SolrCore Initialization Failures
collection1: java.io.IOException:java.io.IOException: Can't find resource 'solrconfig.xml' in classpath or 'solr\collection1\conf/', cwd=C:\apps\tomcat-solr\apache-tomcat-7.0.35\bin
There are no SolrCores running.
Using the Solr Admin UI currently requires at least one SolrCore.
After this I have installed apache solr 3.6.2 and it working perfectly well. I still cannot understand why i am not able to use solr 4.0.0 with the same server configuration.
I hope you will be able to tell me the mistake i have committed.
The directory structure changed in Solr 4.0. Have a look at the one in the example/solr directory. You will see that Solr 4 now has collection1 directory inside that and conf directory is now one level lower inside that. That's basically what the error message said.
If you don't like that, I think you can change it by putting solr.xml with the single core definition in it and directory paths setup the way you like it.