WSO2 Gadget Gen Tool - - hive

I have an external Hadoop cluster (CDH4) with Hive. I used the Gadget Gen tool (BAM 2.3.0) to create a simple table gadget, but no data is populated when I add the gadget to a dashboard using the URL supplied from the gadget gen tool.
Here's my data source settings from the Gadget Generator Wizard
jdbc:hive://x.x.x.x:10000/default
org.apache.hadoop.hive.jdbc.HiveDriver
I added the following jar files to make sure I had everything required for the JDBC connection and restarted wso2server:
hive-exec-0.10.0-cdh4.2.0.jar hive-jdbc-0.10.0-cdh4.2.0.jar
hive-metastore-0.10.0-cdh4.2.0.jar hive-service-0.10.0-cdh4.2.0.jar
libfb303-0.9.0.jar commons-logging-1.0.4.jar slf4j-api-1.6.4.jar
slf4j-log4j12-1.6.1.jar hadoop-core-2.0.0-mr1-cdh4.2.0.jar
I see map reduce jobs running on my cluster during step 2 and 3 of the wizard (and the wizard shows me previews of the actual data), but I don't see any jobs submitted after the gadget is generated.
Any help appreciated.

Gadgen gen tool is for RDBMS database such as MySQL,h2, etc. you can't provide hive URL from the gadget gen tool and run it.
Generally in WSO2 BAM, the hive is used to summarize the collected data which was stored in cassandra and write the summarized final result on RDBMS database. Then from Gadget-gen tool, the gdaget xmls are created by pointing to the final result stored RDBMS database.
You can find more information on WSO2 BAM 2.3.0 documentation. http://docs.wso2.org/wiki/display/BAM230/Gadget+Generation+Tool

Make sure the URL generated for the location of Gadget XML has the correct IP/Host Name. See whether the given gadget xml is located in the registry location of the generated url. You do not have to worry about Hive / Hadoop / Cassandra stuff as they are not relevant to the Gadget. Only the RDBMS (H2 by default) data matters. Hope your problem will be resolved when Gadget location is corrected.

Related

Query blob storage with Get-AzDataLakeGen2ChildItem?

Our powershell test harness used to use Get-AzDataLakeGen2ChildItem to list blobs found in non data lake storage accounts. Today I updated the powershell and Az module versions they were locked at, and now when issuing the command (specifying a Filesystem container, and context), the following error is returned:
Get-AzDataLakeGen2ChildItem: Input string was not in a correct format.
I'm assuming something has changed, and this function cannot process a result from non data lake storage compatibly anymore.
For one reason or another, a while back we changed from using Get-AzStorageBlob. So interested to know if there's any solution to be able to continue working with this call, rather than to deviate from Get-AzDataLakeGen2ChildItem where required.
One of the workaround to list the sub directories and files in a directory or Filesystem from an Azure storage account using the Get-AzDataLakeGen2ChildItem .
To do that we must have enabled Hierarchical Namespace .
Then you will get something like below example;
NOTE:- If you are using existing storage which has not enabled Hierarchical Namespace then you need to upgrade that storage account by doing the below steps:
For more information please refer the below links:-
MS DOC| Get-AzDataLakeGen2ChildItem , Get-AzStorageBlob .
SO THREAD FOR SIMILAR ISSUE.

ELK stack configuration

I dont have much experience with elk stack I basically only know the basics.
Something i.e. filebeat gets data and sends it to logstash
Logstash processes it and sends it Elastic search
Kibana uses elastic search to visualise data
(I hope that thats correct)
I need to create an elk system where data from three different projects is passed, stored and visualised.
Project no1. Uses MongoDB and I need to get all the information from 1 table into kibana
Project no2. Also uses MongoDB and I need to get all the information from 1 table into kibana
Project no3. Uses mysql and I need to get a few tables from that database into kibana
All three of these projects are on the same server
The thing is for Projects 1 and 2 I need the data flow to be constant (i.e. if a user registers I can see that in kabana)
But for Project no3. I only need the data when I need to generate a report (this project functions as a BI of sorts)
So my question is how does one go about creating an elk architecture that gets the inputs from these 3 sources and is able to combine into one elk project.
My best guess is :
Project No1 -> filebeat -> logstash
Project No2 -> filebeat -> logstash
Project No3 -> logstash
(logstash here being a single instance that then feeds into elastic)
Would this be a realistic approach?
I also stumbled upon redis, and from the looks of it it looks like it can combine all the data sources into one and then feed the output to logstash.
What would be the better approach?
Finally, I mentioned filebeat, but from what I understand it basically reads the data from a log file. Would that mean that I would have to re-write all my database entries into a log file in order to feed them into logstash or can logstash tap into the DB without an intermediary.
I tried looking for all of this online, but for some reason the internet is a bit scarce on ELK stack beginner questions.
Thanks
filebeat is used for shipping logs to logstash, you can't use it for reading items from DB. But you can read from DB using logstash's input plugins.
From what you're describing you'll need a logstash instance with 3 pipelines (one per project)
For project 3 you can use Logstash JDBC input plugin to connect to your mysql DB and read new/updated lines based on some "last_updated" column.
JDBC input plugin has a cron confguration value, that allows you to set it up to run periodically and read updated lines with an SQL query that you define in configuration.
For projects 1-2 you can also use the JDBC input plugin with mongoDB.
There is also an Open Source implementation for a mongoDB input plugin on git. You can check this post for how to use it here.
(see the full list of input plugins here)
If that works for you and you manage to set it up, then the rest will be about the same for all three configurations.
i.e. using filter plugins to modify data, and Elasticsearch output plugin to push data to an elastic index.

Arcgis server CreateReplica REST API of feature not working

I created Feature class in enterprise geodatabase (SQLServer2014 express). Feature class is sync enabled and published successfully.
Now I can not generate offline geodatabase from Arcgis Android SDk.
I can see ' Create Replica ' from 'Supported Operations' from 'http://xyz:6080/arcgis/rest/services/MyFeature/FeatureServer'
I tried 'http://xyz:6080/arcgis/rest/services/MyFeature/FeatureServer/createReplica' rest api from feature service. it creates job but no results shown.
Server logs show following error
Error executing tool.: ErrorMsg#SyncGPService:{"code":400,"description":""} Failed to execute (Create Feature Service Replica).
Log source is 'System/SyncTools.GPServer'
First, make sure that there's nothing needed at the DB level where your data is stored. Taking the server out of the equation, can you run the Create Replica tool in ArcMap/ArcGIS Pro against the data source, and does it succeed? If that works (and other operations like Adds, Updates, Deletes etc.), then put ArcGIS Server back in the equation.
What are your ArcGIS Server log levels set at? It may be beneficial to up the logging level to Verbose or Debug, try to create the replica again, and consult the logs to see if more helpful information is returned.
You may also want to check and see if your version of ArcGIS Server needs to be patched. For example, at 10.5.1 there was a patch released specifically for Sync issues.
If all else fails, Esri Support may be a good place to find some help as well.
Have you looked at the requirements for making your data available for offline use? See this link in the ArcGIS Server documentation.
Specifically you need to enable archiving and include Global IDs on the dataset, but there are more details at the above link.
For future reference, and in case that suggestion doesn't work, the Esri GeoNet ArcGIS Enterprise place is a good spot to ask these questions.

Datahub Hybris installation unclear

I am following the Getting Started with Data Hub guide to install standalone data hub. the guide mentions that hsqldb is already configured after datahub.webapp.war file and it will use database instance with the name of 'integration' with an administrative user named 'hybris' with the password 'hybris'.
I wanted to know two things; 1) How to connect to hsql server for datahub and check the item-types 2) Are we supposed to create this database named 'integration' and if yes, how?
Here is the default Hsql configuration for the Datahub :
#main storage
dataSource.className=org.hsqldb.jdbc.JDBCDataSource
dataSource.jdbcUrl=jdbc:hsqldb:mem:datahub
dataSource.username=sa
dataSource.password=
#media storage
media.dataSource.className=org.hsqldb.jdbc.JDBCDataSource
media.dataSource.jdbcUrl=jdbc:hsqldb:mem:dhmedia
media.dataSource.username=sa
media.dataSource.password=
You don't need to create anything for this to work.
If you want to check the item types you could use a client like : http://www.sql-workbench.net, but bear in mind that only one connection can be active at a time, therefore I would suggest to use Mysql (See configuration example below) :
dataSource.className=com.mysql.jdbc.jdbc2.optional.MysqlDataSource
dataSource.jdbcUrl=jdbc:mysql://localhost/integration?useConfigs=maxPerformance
dataSource.username=user
dataSource.password=pass

Merge two Endeca Servers (Endeca 3.1) into one. Including their current data

Let me explain in more detail:
1st: I'm running endeca 3.1, so Endeca Server here refers to 3.0's Data Domain.
I'm required to use an Endeca Server currently present on Endeca (Downloaded a Demo VM). All the info on it, including, groups, attributes and data, must be merged into out Endeca Server. (It can also be the other way around, i could merge my Endeca Server into this one.)
So far, i've tried to do the following:
1) Clone the Endeca Server
2) use the putCollection sconfig operation to create a collection on it with the same name i have on mine.
3) Load configurations using the LoadCollection & LoadAttributes graphs from OEID POC Template 3.1. I point to the new collection on the Configuration.xls file.
This is where i encounter an issue. The LoadAttributes graph gets a T/O message from the server's WS. Then the config WSDL becomes inaccesible for a while. I can't go beyond this point.
I've been able to load data into the collection, but i need to load the attributes first.
THanks in advance for your replies.
Regards
There are a few techniques.
Have you tried exporting the data domain and then importing it?
You can use the endeca-cmd tools to export to a file, and then import from that file. This would enable you to add 2 datastores into one server.
If you want to combine 2 datastores then that is a different question.
The simplest approach in 3.1 if the data collections are small. Extract then as CSV (via a data-table), convert to XLS and add them via self provisioning into separate collections within a single data store. If you are running in the VM this is potentially the easiest approach.
This can also be done using Integrator.
You don't need to load the attributes unless you are using multi-value types. You can call against the conversation web-service to extract data and then load it using 'bulk-load' I would not worry too much about creating the attributes unless this becomes essential due to their type or complexity. If you cannot call against the conversation web-service, then again extract as csv and load using Integrator.