Apache Flume - send only new file contents - apache

I am a very new user to Flume, please treat me as an absolute noob. I am having a minor issue configuring Flume for a particular use case and was hoping you could assist. Note that I am not using HDFS, which is why this question is different from others you may have seen on forums.
I have two Virtual Machines (VMs) connected to each other through an internal network on Oracle Virtual Box. My goal is to have one VM watch a particular directory that will only ever have one file in it. When the file is changed, I wish for Flume to only send only the new lines/data. I want the other VM to receive this data and update/concatenate the data to a single file in a particular directory on it.
So far, I have this process very close to working. Whenever changes are made in VM1, they are updated on VM2. However, the entire file on VM1 is sent to VM2 every time, not the new lines. For example, if I wrote “Test1” and then a while later underneath wrote “Test2” to the file on VM1, on VM2 the output would be:
Test1
Test1
Test2
What I want to see is:
Test1
Test2
I am not sure how to implement this, and am sending this email after thoroughly examining the Flume user guide documentation and most relevant articles on stackoverflow/stackexchange. For your reference, below are the current configurations(they are working in the manner I mentioned above).
VM1 configuration
VM2 configuration
I realize another solution would be to keep the configuration on VM1 and overwrite the file on VM2 everytime new contents are detected. However, I am also unsure how to implement this.
Any assistance you could provide is greatly appreciated!

Use TailDir source provided in Flume.It periodically writes last position read in position file and its more reliable than exec source as even in case of agent crashes or stops for some reason it will start reading from last position saved in the position file.
agent1.sources.src1.type = TAILDIR
agent1.sources.src1.channels = ch1
agent1.sources.src1.filegroups =f1
agent1.sources.src1.filegroups.f1= //path to log file
agent1.sources.src1.maxBackoffSleep = 10000
Set maxBackoffSleep value as per your need it means how much max time agent should wait before polling for changes in log file , when it didnt find any changes in last attempt made.

Related

ClearCase error, registry does not contain VOB with UUID

I am in the process of migrating a very large multisite installation to newer OS platforms. Running ClearCase 9. In one particular migration stage all the VOBs appear to have migrated correctly, ct lsvob -s -host xxxx shows no VOBs remaining on the old server, but now I am getting packets stuck in the incoming bin on that old server. I assume it has to do with devs who still had views open before the migration, but the problem is that mt lspacket is complaining that it cannot find a VOB with a single UUID in the registry. Packets are piling up, and they are all complaining about the same UUID, so I assume they are all related to one VOB. ct lsvob -uuid xxxx says it cannot find a VOB with that UUID.
How would I go about correcting this?
Looking at multitool lspacket, check if a multitool lspacket –long /usr/tmp/packet1 (one of the packet listed by multitool lspacket) helps (a bit like the old CC7.0 multitool lspacket -l -dump)
If this is linked to dev views, check if you can get the a cleartool rmview --force -vob \avob -uuid an_uuid is still possible, to make sure there is no view referencing the old Vob.
The packets are getting routed to the old server by the other sites. It has nothing to do with developer views.
#VonC's lspacket -long answer will give you the name of the sending replica... Where you'll have to describe the target replica to see what it currently thinks the host is for the moved replica.
In the interim, you can copy/move the sync packets to the new server and the should import fine.
Assuming that you use the default jobs, and don't use -out to change the default packet names, running run multitool lspacket on the receiving host, you will show you names like "sh_o_sync_P50-rep_2022-11-14T160519-0500_17508." In this case, "P50-rep" is the name of the SENDING replica.
You will also see a line reading:
VOB family identifier is: 19fd6066.dbf111e1.9886.44:37:e6:60:fc:96
cleartool lsvob -family {above UUID} will identify the VOB whose sync packet this is.
* \bc-linuxtest \\this-is-the-vob-server-host\vobstore\bc-linuxtest.vbs public (replicated)
You can then combine that information to locate the sending site since the describe would look something like this:
replica "P50-rep"
created 2018-04-10T08:50:15-04:00 by CC VOB Admin (vobadm2.ccusers#Bullwinkle)
"Test replica 3."
replica type: unfiltered
master replica: P50-rep#\bc-linuxtest
request for mastership: enabled
owner: PROD\vobadm
group: PROD\ccusers
host: "this-is-the-vob-server-host"
identities: preserved
permissions: preserved
Once you go there, you will be able to see what IT thinks the replica host is, and then we can make it know where the replica is now... By hook or by crook if need be. However, the "by crook" method would mean that you need to open a support case to get the tool and the steps to use it.
My guess is that the problem replica is:
The problem replica is self mastering, and
Does not send updates to at least one "upstream" replica.

Automatically add database entry after ftp upload

Sorry if this seems stupid but I wonder if it's possible to add a database entry after an ftp upload.
To be more clear, thanks to winSCP I have several folders sending everything I put in there automatically to my server.
However, I would like to create a mysql entry for each uploaded files and once again, automatically. Is it possible to do that? How?
To gives the full details of what I need to do, you can read the following.
I have several folders with pictures and each folders are uploaded automatically.
Each of those folders belong to one user and the goal is to give them an account and allow them to see and download those files through a web interface. Since one account = one folder, that's kinda easy.
And I think a simple .htaccess can simply secure things so one user can only see and download the file in his own repository, no?
However if I want them to be able to see what's new (=something they didn't download or simply mark as read) I think I need a table to manage those files.
Something like id | file (string) | read (bool).
If you think this way to proceed is bad, they I'm open to change how to do things, but to be clear uploading the file need to work this way. Not using any kind of formulary.
Thanks for reading that, sorry for my english.
Your problem contains three steps:
Folders/Files been automatically uploaded to your server directory, as you say, this been efficiently handled by winSCP.
You need to update your database with all the files and folders present in your server directory.
You need to update whether or not it is been read/downloaded by the user.
Since your first step is in place, we don't need anything there. For second step, you should write a script and schedule that script to run at a fixed time interval using CRON (if using LINUX or UNIX, or WINDOWS). The script would be responsible to create a list of file(s) present in the directory, and simply insert the file(s) information that are not present in your database.
EDIT:
This edit is to describe how your script file should work. As I explained, the cron jobs would simply help you run your script file in fixed set of interval (which can be every minute, or every hour, or every day, and so on). Lets say your database table has following columns:
fileid (varchar[20])
filepath (varchar[20])
status (boolean)
Your script file should do following things:
Create a list of existing filepaths in your server directory
Run a select query, create a list of existing filepaths from database table.
Compare list1 with list2, and find the ones that doesn't exist in list2 (This would give you a list of filepath that needs to be inserted into table)
Just insert the list of file paths you got above, and set there status to be false (which means the file is not read/downloaded yet)
NOTE: Please keep in mind that I am not advising right now that how your database table should look like. It can be what you have proposed or can even differ depending on your will or requirements.
For the third step, simply keep the status of your file to be unread when creating entries in your table from the second step, and then when user click on the file link in your application whether to view or download it, send a POST request to your server updating the file status to be marked as read.
Let me know if this helps!

Unresolved reference to WseeFileStore

I am trying run SOA Suite and when I execute startWeblogic.sh I got the following message error:
Unresolved reference to WseeFileStore by [<domain name>]/SAFAgents[ReliableWseeSAFAgent]/Store
at weblogic.descriptor.internal.ReferenceManager.resolveReferences(ReferenceManager.java:310)
at weblogic.descriptor.internal.DescriptorImpl.validate(DescriptorImpl.java:322)
at weblogic.descriptor.BasicDescriptorManager.createDescriptor(BasicDescriptorManager.java:332)
at weblogic.management.provider.internal.DescriptorManagerHelper.loadDescriptor(DescriptorManagerHelper.java:68)
at weblogic.management.provider.internal.RuntimeAccessImpl$IOHelperImpl.parseXML(RuntimeAccessImpl.java:690)
at weblogic.management.provider.internal.RuntimeAccessImpl.parseNewStyleConfig(RuntimeAccessImpl.java:270)
at weblogic.management.provider.internal.RuntimeAccessImpl.<init>(RuntimeAccessImpl.java:115)
... 7 more
Does anyone know how to fix this error?
I am running the system over 64 bits Suse
The quick and dirty way to get your admin server back up:
cd to <domain name>/config
Back up config.xml just in case
Edit config.xml, find and remove the <saf-agent> tags that point to your non-existent WseeFileStore
When you have the admin server back up. You can look at the Store-and-Forward Agents and Persistent Stores links to see what is already configured there. It sounds like a SAF agent was somehow created but the backing Persistent Store was not.
You can always created the Persistent Store later and add that SAF agent back in if you need it.
This happens simply because the automated tool used to adapt the config.xml file to the new cluster structure is... well, far from efficient.
It can create all other relevant structures ok, but the <saf-agent> entry is wrongly created.
Just open and look briefly to the config.xml file and you should see that something is not right with this entry.
I will use my environment as an example for this situation:
I have a single cluster with two managed servers named osb1 and osb2. Both are administered by the cluster's AdminServer and all these components are in a single machine called rdaVM. The whole domain was created with the Configuration wizard and, upon the first AdminServer start, I've got that dreadful error for quite some time.
The solution does reside in the config.xml file located in <DOMAIN_HOME>/config/config.xml
When I opened this file in the editor and did a quick search for WseeFileStore I got some curious entries:
<jms-server>
<name>WseeJmsServer_auto_1</name>
<target>osb1</target>
<persistent-store>WseeFileStore_auto_1</persistent-store>
</jms-server>
<jms-server>
<name>WseeJmsServer_auto_2</name>
<target>osb2</target>
<persistent-store>WseeFileStore_auto_2</persistent-store>
</jms-server>
and
<file-store>
<name>WseeFileStore_auto_1</name>
<directory>WseeFileStore_auto_1</directory>
<target>osb1</target>
</file-store>
<file-store>
<name>WseeFileStore_auto_2</name>
<directory>WseeFileStore_auto_2</directory>
<target>osb2</target>
</file-store>
but looking at the offending entry:
<saf-agent>
<name>ReliableWseeSAFAgent</name>
<store>WseeFileStore</store>
</saf-agent>
Obviously there's something missing here. Looking at the <DOMAIN_HOME> I could see two folders there: WseeFileStore_auto_1 and WseeFileStore_auto_2. So no WseeFileStore and hence that annoying error. Also, the saf-agent element doesn't have a target.
Solution: using just the underlining logic, I adapted the <saf-agent> entry to:
<saf-agent>
<name>ReliableWseeSAFAgent_auto_1</name>
<target>osb1</target>
<store>WseeFileStore_auto_1</store>
</saf-agent>
<saf-agent>
<name>ReliableWseeSAFAgent_auto_2</name>
<target>osb2</target>
<store>WseeFileStore_auto_2</store>
</saf-agent>
I.e, created a <saf-agent> for each of the cluster's managed servers, targeted each entry to a managed server and added the _auto_# suffix, where # is the ordering number for each managed server, to the <name> and <persistent-store> entries.
After it, I was able to run the startWebLogic.sh script without problems (from this source at least...)

Move file in one AccuRev workspace that has been edited in another workspace

We have a need to refactor a code base. The thing is that this will be done by one person and it would be desirable to avoid having the rest of the development team sitting idle while this job takes place.
We therefore tried the following scenario to see if it is possible to work in parallel.
Created file test.txt in directory first in developer A's workspace.
Promoted this file.
Updated developer B's workspace, thereby getting file test.txt
In A's workspace moved file test.txt to directory second.
Promoted this move.
In B's workspace edited file test.txt while it still resides in directory first (no update is made thereby emulating that work is done while refactoring is taking place).
Tried to promote and got a message saying that file test.txt had been modified (correct, file has been moved).
Tried to merge but got an error message saying that AccuRev can't merge since the file is missing in directory second (where it has been moved).
Tried to update B's workspace but that is not allowed since there is a modified file that needs to be merged first.
We are now stuck in a catch 22 situation.
We did try to place a fake file in directory second but that is not being recognized since this file does not belong to the workspace.
Has anyone out there tried something like this and gotten it to work?
It is of course possible to copy files but if there is a better way we would be grateful to hear about this. Or if this is a known bug or limitation in the tool.
We will contact also contact AccuRev support but I thought that I might be able to get some useful tips from the community.
Currently we are using AccuRev client 5.5.0.
Thanks for any suggestions on how to make the tool support this operation.
Referring to your steps 6 & 7: In AccuRev 5.5 after a file is edited and has a (modified) status you first have to keep before you can promote.
At step 8 you could try doing the merge from the Browse Versions view of the file. That way you can select any node to merge with, including the one that has been moved.
Step 9. An AccuRev update will not run successfully if one of the files to be updated is (modified). This is by design. You can keep the file so it has (kept)(member) status then run the update.
David Howland
After contact with AccuRev support the answer is that the only option available is to copy the file to some temp directory, revert the changes, update the workspace and copy the file into the new location in the workspace.
AccuRev will at least tell you which files you have to copy since they will be marked as modified.
I could experimentally verify David's remark to step 9 using AccuRev 5.5.
Let's assume that in the workspace of user A the file was moved and the move was promoted, while in the workspace of user B the file was modified and user B is about to promote his/her change.
Before the file is kept, it will not be possible for user B neither to merge nor to update. But after keeping the modified file the update is possible. The file is first marked as overlap, then the merge succeeds in the new location. Basically, this avoids creating a copy of the file, reverting it and restoring it in the new location after an update, which can be quite cumbersome, as AccuRev does not reveal easily where the move goes.
If user B promotes the modification before user A promotes the move, all goes smoothly, i.e. on update the moved file appears as overlap, but easily merges into the moved file in the new location.
Similar results are obtained when the two users have workspaces connected to different streams and the overlap occurs on a common parent stream. Only if the file is unkept, an error can occur (i.e. only if the move is promoted before the change). Then a simple keep allows to proceed as usual (update, merge, then promote).

Prevent file being overwritten

Imagine there are 3 or more independent locations where a file can be modified. These locations communicate to each other through email or mail (direct flash drive restoration). Though there is a big room for flow - to make simultaneous editing to the file and screw up things, this client won't change too much. He rather call everyone that he is working on the last update or tell the other guys that he is waiting for third guy's last update. Anyway, at some point after several exchanges, due to one of participants unintentional error THE LAST VERSION of the file eventually gets mixed up. From this point everyone searches for the last version BY LOOKING THE CONTENT of the file.
This client wants to have a central location (he has actually, that is his PC's some location) and let everybody (including himself) copy any new or suspected new file to this location but prevent file's last version being copied. From this location he has to easily copy, send or open the file and work.
So, here is my concept (2 steps):
step 1: I made an ad to the main application where this file is created or edited. This ad prompts the user to give a version number to the file with every invoked save command from the editing application. In fact the file can be re-saved multiple times but not considered modified (file attributes creation, save etc. do not have great meaning here). This said the user can cancel my ad-in but have saved the file, not saving a new file version.
step 2: multiple solutions:
solution A: I'm thinking to have a folder/file watch and prevent the last version of the file being overwritten. As you know, FileSystemWatcher will fire the change/delete etc., events AFTER FACT so, I have to back copy overwritten file after the fact (w/ some tricks).
solution B: have a database to store all version of files and built-in some shell extension to extract/view files from the database. Move all copied/pasted files to the database (my program folder) and restore latest file in working folder after watcher fires change/delete event.
solution 3: find out built-in windows tools (API etc.) to greatly rely on it with some programming.
Any ideas?
Thanks in advance.