Nutch 2 hsql configuration - hsqldb

How can i configure data mapping in this database using nutch. Probably should be some hsql-mapping.xml but the compiled nutch don't have this file.
I am asking this because sometimes i get a sql error due to data truncation, so i need to alter the datalength of the fields.
Thanks

Related

Data streaming in Apache superset from BQ?

I am new to superset and wanted to know if there's any way to perform data streaming in big query using apache superset? Currently, I have set up the database in apache superset with big query but when I update the table data using SQL commands in bigquery it doesn't reflect in superset. Is there any way to get the streaming of data to superset?
I've looked around the Apache Superset documentation and couldn't find anything related to "streaming data" from a source, what I think in this scenario is happening, is that you have a dashboard which use the data from the table you have in BigQuery, and after adding some new information to the table you expect that change to be reflected in the dashboard automatically.
Based on this my theory is that Superset saves the result of your query on memory or it could be using BigQuery cached results which may not allow the dashboard to automatically update the data and see the changes made. My suggestion is to either run again the query for your table to try to get the latest data. On the other hand, if Superset use cached results, you'll have to take a look at the configuration used for Superset for BigQuery looking for a way to disable it.

a large dataset to test apache ignite?

I am new to Apache Ignite. Can you please suggest a way to get a large data set (preferably CSVs along with DDL statements that is Ignite compliant), which I could use it to create schema, tables in Ignite (uses native persistence), to test a few use cases that I have.
You can use Web Console to copy data from relational DB into Apache Ignite, creating data structure and project files along the way.
Apply it on existing database or something like MySQL Employees sample database.
Web Console will connect to existing internally deployed Database using 'agent' program ran locally.

Is there a native SQL source in Apache Flume?

I need to create a simple data warehouse. The data sources for the data warehouse are heterogeneous, thus I'm experimenting with Frameworks like Apache Flume for data collection. I went through the documentation but didn't find anything about SQL. (http://flume.apache.org/FlumeDeveloperGuide.html and http://flume.apache.org/FlumeUserGuide.html#flume-sources)
Question: Are there any (native) possibilities to connect an Apache Flume source to an SQL server?
Apache Flume is designed to collect, aggregate and move log data to HDFS.
If you are considering moving large amounts of data from a SQL database, take a look at Apache Sqoop:
http://sqoop.apache.org/
Look into this project flume-ng-sql-source. Here are some examples as well.
http://www.toadworld.com/platforms/oracle/w/wiki/11093.streaming-oracle-database-logs-to-hdfs-with-flume
http://www.toadworld.com/platforms/oracle/w/wiki/11100.streaming-mysql-table-data-to-oracle-nosql-database-with-flume

Concern with using external SQL server for DIH

I am looking to import entries to my SOLR server by using the DIH, connecting to an external PostgreSQL server using the JDBC driver. I will be importing about 50,000 entries each time.
Is connecting to an external SQL server for my data unreliable or risky, or is it instead perfectly reasonable?
My only alternative is to export the SQL file on the other server, download the SQL file to my SOLR server, import it to my Solr servers copy of PostgreSQL and then run the DIH on the local database.
The way you're using it is pretty much why the DIH exists. Otherwise, you could just use the /update handler with XML documents. The core I'm working on right now regularly indexes 11,000,000 rows per batch.
This is a standard use case, importing from a remote DB. Proceed with confidence!

Viewing a grails schema while it runs in memory?

I want to view a grails schema for the default hsqldb in-memory database, but when I connect to the in-memory databse with SquirrelSQL or DbVisualizer as userid: sa, password: (nothing), I only see two schemas:
INFORMATION_SCHEMA
PUBLIC
And neither contains my Domain tables. What's going on?
You need to set the hsqldb database to file, and set shutdown to true, as outlined here.
If you want to access the in-memory database, there's a writeup on how to do that here: http://act.ualise.com/blogs/continuous-innovation/2009/07/viewing-grails-in-memory-hsqldb/
There's also a new plugin that gives you access to a web-based database console that can access any database that you have a JDBC driver for, including the in-memory hsql db. The plugin docs are at http://grails.org/plugin/dbconsole and you install it the usual way, i.e. grails install-plugin dbconsole. Unfortunately the plugin has an artificial restriction to Grails 1.3.6+, so if you're using an older version of Grails you can use the approach from the blog post that inspired the plugin, http://burtbeckwith.com/blog/?p=446
To use the database console, select "Generic HSQLDB" from the settings dropdown and change the values to match what's in DataSource.groovy. This will probably just require changing the url to jdbc:hsqldb:mem:devDB
You need to set up a shared hsql database: Creating a shared HSQLDB database
edit: There is NO way to expose in-memory hsqldb. Either create a Server or WebServer or use file URL.