WSO2 BAM Incremental Analysis - hive

According to the documentation here, this feature is experimental but I would like to know if anyone is using it successfully. I already have some data so I am trying use case 4.
I tried to run an update hive query with #Incremental annotation but with it nothing goes into my RDB anymore.
If I remove it, everything is working but I want to take an advantage of this feature, because of the large amount of stored data and the query execution going very slow cause of it.
Any suggestion or help is greatly appreciated.

The incremental analysis feature will be working fine in the partially distributed setup, but it wasn't thoroughly tested in the external hadoop cluster, hence it was marked as 'experimenal'. Anyhow if you find any bugs on these you can report it in jira.
To answer your question, you need to enable the incremental processing for your stream first and then you need to add the incremental annotation.The following are the detailed steps for this.
1) You need add property 'streams.definitions.defn1.enableIncrementalIndex=true' in the streams.properties as explained here file and create a toolbox which consists only the stream definition artefact as explained here.
2) Install the toolbox - This will register the stream definition you mentioned in the toolbox with incremental analysis. On this point on wards the incoming data will be incrementally processed.
3) Now indicate the #Incremental annotation in the query. The first iteration will consider the whole available data as you have enabled the incremental analysis in the middle of the processing, but from next iteration onwards it'll only consider the new bunch of data.

This feature is said as experimental as there may be some critical bugs. We will release a more stable version of BAM with this feature in the next release.

Related

how to fix the problem of downloading fasttext-model300?

I'm using windows 10 and python 3.3. I tried to download fasttext_model300 to calculate soft cosine similarity between documents, but when I run my python file, it stops after arriving at this statement:
fasttext_model300 = api.load('fasttext-wiki-news-subwords-300')
There are no errors or not responding, It just stops without any reaction.
Does anybody know why it happens?
Thanks
I'd recommend against using the gensim api.load() functionality. It dynamically runs new, unversioned source code from remote servers – which is opaque in its operations & suboptimal for maintaining a secure local configuration, or debugging any issues which occur.
Instead, find the actual exact data files you trust and download them as plain data. Then, use specific library operations, like the KeyedVectors.load_word2vec_format() method, so instantiate exactly the model you need, using precise local-file paths you understand.
Following those steps may make it clearer what, if anything, is going wrong. If it doesn't, try also enabling logging at the INFO level to gather more information about what progress is made before failure (and add any new details as a comment or to your question).
python3 -m gensim.downloader --download fasttext-wiki-news-subwords-300
Try using this. Source : https://awesomeopensource.com/project/RaRe-Technologies/gensim-data

(How) can I enable allocation data collection when profiling with dotMemory CLT

I'm getting comfortable with dotMemory CLT and I'd like to understand if/how I can enable the collection of allocation data with a command line flag.
With the API, I'm aware of the ability to leverage MemoryProfiler.EnableAllocations, and with the desktop application I simply check a box
But I find no references to this concept with respect to the CLT.
Attempting to use start doesn't do the trick, and poring over dotMemory help start doesn't reveal anything promising.
Is this simply not-supported, or am I missing/mis-understanding a critical section of documentation?
Line copied from dotMemory.exe help start |more
[--collect-alloc|-c] Collect callstack allocation data (impacts performance!)
Example: dotMemory start "C:\Path\To\YourProgram.exe" -c

Can I convert a patch for linux kernel into a Loadable Kernel Module?

I have a patch for vanila linux kernel which includes new files as well as changes in original Kernel SourceTree files too.
I want to make this patch a Loadable Kernel Module so that I can avoid any rebuild of basic kernel.
As this patch has changes in header files as well as .c and data file of org kernel source, I have a doubt whether this can be made a Kernel Module?
Please ask for any more details you want to clarify the issue.
Thanks,
Sapan
The simple answer is no.
A more nuanced answer: Yes, it's theoretically possible to do something like what you're envisioning. However, it's enormously complex -- every detail has got to be exactly right -- and not something you could hope to do in an ad-hoc manner. For example, every data structure that increases in size might cause huge areas of memory to need to be reallocated and relocated, and every pointer pointing to one of those pieces of data would then need to be adjusted, and there is a potential cascade of further dependent adjustments. There's simply no way to track all those details.
But see https://www.ksplice.com/, which actually patches a kernel at runtime. I don't know many details about ksplice, but I'm fairly certain it's only possible to do this with very tight constraints on what exactly changes due to the concerns I outlined above, among others.

build script - how to do it

About 2 months ago I overtook building proccess in current company. Even though I don't have much knowledge of it, I was the only with enough time, so I didn't have much choice.
Situation is not that good, and I would like to do following:
Labeling files in SourceSafe with version (example ProjectName PV 1.2)
GetFiles from SourceSafe to specific directory
Build vb6/c++/c# projects(yes, there are all kinds of them)
Build InstallShield setups
This is for now partly done using batch scripts(one for labeling and getting, one for building, etc..). So when building start I pretty much have babysit it.
Good part of this code could be reused.
Any recommendations on how to do it better? One big problem is whole bunch of dependencies between projects. Also labeling has to increment version and if necessary change PV to EV.
I would like to minimize user interaction as much as possible. One click on one build script(Spolsky is god) and all is done, no need to increment version, to set where to get files and similar stuff.
Is the batch scripting best way to go? Should I do some functionality with msbuild. Are there any other options?
Specific code is not need, for now I just need a way how to improve it, even though it wouldn't hurt.
Tnx,
Marko
Since you already have a build system (even though some of it currently "manual"), whatever you do, don't start over from scratch.
(1) Make sure you have a test machine (or Virtual Machine) on which to work. Thus you can make changes and improvements without having to worry about breaking anything.
(2) Put all of your build scripts and tools in version control, not just the source code. Then as you make changes, see if they work. If they do, then save them to version control. If they don't, then roll them back.
(3) Choose one area to work on at a time. Don't try to do everything at once. Going from a lot of manual work to "one-click" will take time no matter what build system you're working with.
Sounds like you want a continuous integration solution, like CC.Net. It has configuration options to do all the things you want and a great community to answer questions.
Also, batch scripting is probably not a good option. Sophisticated build and integration tools will let you feed parameters into the build and create different builds for different environments (test, production, etc.). Batch scripting will involve a lot of hand-coding and glue.

Print complete control flow through gdb including values of variables

The idea is that given a specific input to the program, somehow I want to automatically step-in through the complete program and dump its control flow along with all the data being used like classes and their variables. Is their a straightforward way to do this? Or can this be done by some scripting over gdb or does it require modification in gdb?
Ok the reason for this question is because of an idea regarding a debugging tool. What it does is this. Given two different inputs to a program, one causing an incorrect output and the other a correct one, it will tell what part of the control flow differ for them.
So What I think will be needed is a complete dump of these 2 control flows going into a diff engine. And if the two inputs are following similar control flows then their diff would (in many cases) give a good idea about why the bug exist.
This can be made into a very engaging tool with many features build on top of this.
Tell us a little more about the environment. dtrace, for example, will do a marvelous job of this in Solaris or Leopard. gprof is another possibility.
A bumpo version of this could be done with yes(1), or expect(1).
If you want to get fancy, GDB can be scripted with Python in some versions.
What you are describing sounds a bit like gdb's "tracepoint debugging".
See gdb's internal help "help tracepoint". You can also see a whitepaper
here: http://sourceware.org/gdb/talks/esc-west-1999/
Unfortunately, this functionality is not currently implemented for
native debugging, but I believe that CodeSourcery is doing some work
on it.
Check this out, unlike Coverity, Fenris is free and widly used..
How to print the next N executed lines automatically in GDB?