pig configuration spring - apache-pig

Have you come across configuring and running pig jobs using spring?
Because lately i was trying to integrate spring and pig during which i couldn't get hdp:pig-runner and hdp:pig-factory tags running. It was giving me error saying " cvc-complex-type.2.4.c: The matching wildcard is strict, but no declaration can be found for element 'hdp:pig- factory'." . I tried all possible ways but couldn't come up with a solution. Could anyone of you please help.. Even any small suggestions will be helpful.
The schemas I used are
<beans xmlns:hdp="http://www.springframework.org/schema/hadoop" <xsi:schemaLocation="http://www.springframework.org/schema/hadoop http://www.springframework.org/schema/hadoop/spring-hadoop.xsd http://www.springframework.org/schema/hadoop http://www.springframework.org/schema/hadoop/spring-hadoop.xsd">

If you are still facing this problem..
First of all you have 2 spring-hadoop.xsd declared in your configuration file.
Also, did you add the necessary libraries. As in, if you use maven, have you added the necessary dependencies?

Related

Conflicting class versions in Apache Flink

I have 2 applications. The first is a Play! Framework (v2.5.1) application. This application's job is to read the aggregated data. The second is an Apache Flink (v1.1.2) application. This application's job is to write the aggregated data.
The error
java.lang.NoSuchMethodError: com.typesafe.config.ConfigFactory.defaultApplication(Lcom/typesafe/config/ConfigParseOptions;)Lcom/typesafe/config/Config;
This is caused by Play & Flink using different versions of com.typesafe.config (1.3.0 vs 1.2.1).
I've tried
I've tried using shading, but there are further complications when I get to using Akka. Akka also has conflicting versions, so I shade config & akka, which leads to a configuration error in Akka. If I duplicate the configuration to the correct path, then the ActorSystem fails to initialize because of incorrect class version.
Research
I don't know this area well, but it seems like a number of JVM servers handle this by doing parent-last class loading. Is that possible in flink?
There may be other, simple solutions that I've not tried as well. If there are some of those, let me know, and I'll gladly try them.
Thanks for your help!

What is the significance of data-config.xml file in Solr?

and when shall I use it? How is it configured can anyone please tell me in detail?
The data-config.xml file is an example configuration file for how to use the DataImportHandler in Solr. It's one way of getting data into Solr, allowing one of the servers to connect through JDBC (or through a few other plugins) to a database server or a set of files and import them into Solr.
DIH has a few issues (for example the non-distributed way it works), so it's usually suggested to write the indexing code yourself (and POST it to Solr from a suitable client, such as SolrJ, Solarium, SolrClient, MySolr, etc.)
It has been mentioned that the DIH functionality really should be moved into a separate application, but that hasn't happened yet as far as I know.

Why liquibase:updateDatabase ant task does not support labels?

As I see in version 3.3 labels feature were provided. I would really love to use them but it's currenty impossible because lack of handling labels in ant database tasks. Is it handled in some other way ?? Additionally i would like to know how could I add an issue to liquibase jira.
To add something to the Liquibase Jira, you just need to sign up for an account here:
https://liquibase.jira.com/login?
After your account is created, you will be able to create new issues.
As far as the Ant tasks go, it appears that you should be able to use labels and contexts in the Ant tasks. If you could include an Ant build file that shows any problems you are having along with the command you are using and the output, we can help address those issues.

How to separate the latest file from Multiple files in Mule

I have 5000 files in a folder and on daily basis new file keep loaded in same file. I need to get the latest file only on daily basis among all the files.
Will it be possible to achieve the scenario in Mule out of box.
Tried keeping file component inside Poll component( To make use of waterMark) but not working.
Is there any way we can achieve this. If not please suggest the best way ( Any possible links).
Mule Studio: 5.3, RunTime 3.7.2.
Thanks in advance
Short answer: Not really any extremely quick out of the box solution. But there are other ways. Im not saying this is the right or only way of solving it, but I've earlier implemented a similar scenario in this way:
A Normal File inbound with a database table as file-log. Each time a new file is processed, a component checks if its name appears in the table. By choice or filter I only continue if it isn't in there already - and after processing I add the filename to the table.
This is a quite "heavy" solution though. A simpler access would be to use an idempotent filter with a object store. For example a Redis server: https://github.com/mulesoft/redis-connector/blob/master/src/test/resources/redis-objectstore-tests-config.xml
It is actually very simple if your incoming file contains timestamp........you can configure the file inbound connector by setting file:filename-regex-filter pattern="myfilename_#[function:timestamp].csv". I hope this helps
May be you can use a quartz scheduler( mention the time in cron expression), followed by a groovy script in which you can start the file connector . Keep the file connector in another flow.

File Addition and Synchronization issues in RavenFS

I am having a very hard time making RavenFS behave properly and was hoping that I could get some help.
I'm running into two separate issues, one where uploading files to the ravenfs while using an embedded db inside a service causes ravendb to fall over, and the other where synchronizing two instances setup in the same way makes the destination server fall over.
I have tried to do my best in documenting this... Code and steps to reproduce these issues are located here (https://github.com/punkcoder/RavenFSFileUploadAndSyncIssue), and video is located here (https://youtu.be/fZEvJo_UVpc). I tried looking for these issues in the issue tracker and didn't find something directly that looked like it related, but I may have missed something.
Solution for this problem was to remove Raven from the project and replace it with MongoDB. Binary storage in Mongo can be done on the record without issue.