Apache JENA TDB files locked after creation with web application - glassfish

I am using JENA to create a triple store (TDB functionality) with the following code:
public void createTDBFromOWL() {
Dataset dataset = TDBFactory.createDataset(newTripleStoreLocation);
dataset.begin(ReadWrite.WRITE);
try {
//getting the model inside the transaction
Model model = dataset.getDefaultModel();
FileManager fileManager=FileManager.get();
Model holder=fileManager.readModel(model, newOWLFileLocation);
//committing dataset
dataset.commit();
model.close();
holder.close();
} finally {
dataset.end();
dataset.close();
}
}
After I create the triple store, the files created are locked by my application server (Glassfish), and I can't delete them until I manually stop Glassfish and it releases its lock. As shown in the above code, I think I am closing everything, so I don't get why a lock is maintained on the files.

When you call Dataset#close(), the implementation delegates that call to an underlying
DatasetGraphBase#close(), which then ultimately delegates to DatasetGraphTDB#_close().
This results in calls to TripleTable#close() and QuadTable#close(). Both of these call (several) NodeTupleTable#close(). Continuing with the indirection, this calls NodeTable#close() and TupleTable#close(). The former is an interface, so we'd need to make a proper guess as to which class is run in your implementation. The latter iterates through a collection of TupleIndex objects and calls close() on each of them. TupleIndex is, also, an interface.
There is only one meaningful heirarchy of descendents from TupleIndex that results in something which can lock a file, which leads us to TupleIndexRecord#close(). We can then follow a particular implementation of RangeIndex called BPlusTree all the way down until we see actual ownership of the MappedByteBuffer
Ultimately, while reading the implementation of BlockAccessMapped#close(), it seems like the entire heirarchy is closing things properly, down to the final classes, but that this longstanding bug may be the culprit. From the documentation:
once a file has been mapped a number of operations on that file will
fail until the mapping has been released (e.g. delete, truncating to a
size less than the mapped area). However the programmer can't control
accurately the time at which the unmapping takes place --- typically
it depends on the processing of finalization or a PhantomReference
queue.
So there you have it. Despite Jena's best efforts, one cannot yet control when that file will be unmapped in Java. This ends up being the tradeoff for memory-mapped file IO in java.

Related

Static instance in the class in distributed system

i was reading the blogs for "What about the code? Why does the code need to change when it has to run on multiple machines?".
And i came across one line which i am not getting it, can anyone please help me to understand it with simple or any example.
"There should be no static instances in the class. Static instances hold application data and when a particular server goes down, all the static data/state is lost. The app is left in an inconsistent state."
Assuming: Stating instance is an instance which can be at most once per process or context - e.g. in java there is at most one copy of a static class, with all data (or state) that the class contains.
So it is very simple memory model for a static class in a single node/jvm/process. Since there is a single copy of data, it is quite straightforward to reason about it. For example, you one may update the data and every next reader will see the updated information. This is a bit more complicated for multithreaded programs, but is still straightforward comparing to distributed systems.
Clearly in a distributed system, every node may have at most one static class with state. Which means if a system contains several nodes - a distributed system - then there are several copies of data.
Having several copies is a problem. It is hard to reason about such system - every node may have some unique data and data may differ on different node. It is very hard to reason about such data: how it is synced? Availability vs consistency?
For example, take a simple counter. In a single node system, a static instance may keep the score. If one writer increased the counter, the next reader will see the increased value (assuming multithreaded part is implemented correctly, which is not that complicated).
Same counter is a distributed system is much more complicated. A writer may write to one node, but a reader may read from another.
Basically, having state on nodes is a hard problem to solve. This is the primary reason to use some distributed storage layer e.g. Hbase, Cassandra, AWS DynamoDB. All these storage systems have predictable behaviour which helps to reason about correctness of programs.
For example, there are just two servers which accepts payments from clients.
Then somebody decided to create static class to be friendly with mutli threading:
public static class Payment
{
public static decimal Amount;
public static bool IsMoneyReceived;
public static string UserName;
}
Then some client, let's call him John, decided to buy something in shop. John sent money and static class has data about this purchase. Some service is going to write data into database from
Payment class, however, electicity was turned off. Load balancer knows that the server is not responding and redirects John requests to another server which knows nothing about data in
Payment class.

Proper logging in reactive application - WebFlux

last time I am thinking about proper using logger in our applications.
For example, I have a controller which returns a stream of users but in the log, I see the "Fetch Users" log is being logged by another thread than the thread on the processing pipeline but is it a good approach?
#Slf4j
class AwesomeController {
#GetMapping(path = "/users")
public Flux<User> getUsers() {
log.info("Fetch users..");
return Flux.just(...)..subscribeOn(Schedulers.newParallel("my-custom"));
}
}
In this case, two threads are used and from my perspective, not a good option, but I can't find good practices with loggers in reactive applications. I think below approach is better because allocation memory is from processing thread but not from spring webflux thread which potential can be blocking but logger.
#GetMapping(path = "/users")
public Flux<User> getUsers() {
return Flux.defer(() -> {
return Mono.fromCallable(() -> {
log.info("Fetch users..");
.....
})
}).subscribeOn(Schedulers.newParallel("my-custom"))
}
The normal thing to do would be to configure the logger as asynchronous (this usually has to be explicit as per the comments, but all modern logging frameworks support it) and then just include it "normally" (either as a separate line as you have there, or in a side-effect method such as doOnNext() if you want it half way through the reactive chain.)
If you want to be sure that the logger's call isn't blocking, then use BlockHound to make sure (this is never a bad idea anyway.) But in any case, I can't see a use case for your second example there - that makes the code rather difficult to follow with no real advantage.
One final thing to watch out for - remember that if you include the logging statement separately as you have above, rather than as part of the reactive chain, then it'll execute when the method at calltime rather than subscription time. That may not matter in scenarios like this where the two happen near simultaneously, but would be rather confusing if (for example) you're returning a publisher which may be subscribed to multiple times - in that case, you'd only ever see the "Fetch users..." statement once, which isn't obvious when glancing through the code.

Command Pattern clarification

I cannot see any of the command pattern classes e.g. invoker, receiver manifesting in the accepted answer of the following link Long list of if statements in Java. I have gone with the accepted answer to solve my 30+ if/else statements.
I have one repository that I am trying to pass DTOs to save to the database. I want the repository to invoke the correct save method for the DTO so I am checking the instance type at runtime.
Here is the implementation in Repository
private Map<Class<?>, Command> commandMap;
public void setCommandMap(Map<Class<?>, Command> commandMap) {
this.commandMap = commandMap;
}
and a method that will populate the commandMap
commandMap.put(Address.class, new CommandAddress());
commandMap.put(Animal.class, new CommandAnimal());
commandMap.put(Client.class, new CommandClient());
and finally the method that saves
public void getValue(){
commandMap.get(these.get(0).getClass()).save();
}
The service class that uses the Repo registers the commandMap.
Does the accepted answer represent a sort of (approximate) implementation of the Command pattern?
It seems like an enum that implements an exec interface will eliminate your if/else problem or turn it into a switch.
It does not look like you need a command patterm.
Gof says:
Use the Command pattern when you want to
parameterize objects by an action to perform, as MenuItem objects did above. You can express such parameterization in a procedural language with a callback function, that is, a function that's registered somewhere to be called at a later point. Commands are an object-oriented replacement for callbacks.
specify, queue, and execute requests at different times. A Command object can have a lifetime independent of the original request. If the receiver of a request can be represented in an address space-independent way, then you can transfer a command object for the request to a different process and fulfill the request there.
support undo. The Command's Execute operation can store state for reversing its effects in the command itself. The Command interface must have an added Unexecute operation that reverses the effects of a previous call to Execute. Executed commands are stored in a history list. Unlimited-level undo and redo is achieved by traversing this list backwards and forwards calling Unexecute and Execute, respectively.
support logging changes so that they can be reapplied in case of a system crash. By augmenting the Command interface with load and store operations, you can keep a persistent log of changes. Recovering from a crash involves reloading logged commands from disk and reexecuting them with the Execute operation.
structure a system around high-level operations built on primitives operations. Such a structure is common in information systems that support transactions. A transaction encapsulates a set of changes to data. The Command pattern offers a way to model transactions. Commands have a common interface, letting you invoke all transactions the same way. The pattern also makes it easy to extend the system with new transactions.
Which of the above do you want to do?

Keeping SAP's RFC data for consecutive calls of RFC using JCO

I was wondering if it was possible to keep an RFC called via JCO opened in SAP memory so I can cache stuff, this is the scenario I have in mind:
Suppose a simple function increments a number. The function starts with 0, so the first time I call it with import parameter 1 it should return 1.
The second time I call it, it should return 2 and so on.
Is this possible with JCO?
If I have the function object and make two successive calls it always return 1.
Can I do what I'm depicting?
Designing an application around the stability of a certain connection is almost never a good idea (unless you're building a stability monitoring software). Build your software so that it just works, no matter how often the connection is closed and re-opened and no matter how often the session is initialized and destroyed on the server side. You may want to persist some state using the database, or you may need to (or want to) use the shared memory mechanisms provided by the system. All of this is inconsequential for the RFC handling itself.
Note, however, that you may need to ensure that a sequence of calls happen in a single context or "business transaction". See this question and my answer for an example. These contexts are short-lived and allow for what you probably intended to get in the first place - just be aware that you should not design your application so that it has to keep these contexts alive for minutes or hours.
The answer is yes. In order to make it work, you need to implement two tasks:
The ABAP code needs to store its variable in the ABAP session memory. A variable in the function group's global section will do that. Or alternatively you could use the standard ABAP technique "EXPORT TO MEMORY/IMPORT FROM MEMORY".
JCo needs to keep the user session between calls. By default, JCo resets the backend-side user session after every call, which of course destroys all data stored in that user session memory. In order to prevent it, you need to use JCoContext.begin() and JCoContext.end() to get a stateful RFC connection that keeps the user session alive on backend side.
Sample code:
JCoDestination dest = ...
JCoFunction func = ...
try{
JCoContext.begin(dest);
func.execute(dest); // Will return "1"
func.execute(dest); // Will return "2"
}
catch (JCoException e){
// Handle network problems, ABAP exceptions, SYSTEM_FAILUREs
}
finally{
// Make sure to release the stateful connection, otherwise you have
// a resource-leak in your program and on backend side!
JCoContext.end(dest);
}

Why is my Navigation Properties empty in Entity Framework 4?

The code:
public ChatMessage[] GetAllMessages(int chatRoomId)
{
using (ChatModelContainer context = new ChatModelContainer(CS))
{
//var temp = context.ChatMessages.ToArray();
ChatRoom cr = context.ChatRooms.FirstOrDefault(c => c.Id == chatRoomId);
if (cr == null) return null;
return cr.ChatMessages.ToArray();
}
}
The problem:
The method (part of WCF-service) returns an empty array. If I uncomment the commented line it starts working as expected. I have tried turning of lazy loading but it didnt help.
Also, when it works, I get ChatMessages with a reference to ChatRoom populated but not the ChatParticipant. They are both referenced by the ChatMessage-entity in the schema with both Id and Navigation Properties. The Ids are set and points to the right entities but on the client-side only the ChatRoom-reference has been populated.
Related questions:
Is an array the preferred method to return collections of EF-entities like this?
When making a change in my model (edmx) Im required to run the "Generate Database from Model..."-option before I can run context.CreateDatabase(). Why? I get some error message pointing to old SSDL but I cant find where the SSDL is stored. Is this created when I run this "Generate Database..."-option?
Is it safe to return entire entity-graphs to the client? Ive read some about "circular reference exeptions" but is this fixed in EF4?
How and when is references populated in EF4? If I have lazy-loading turned on I suspect only entities I touch is populated? But with lazy loading turned off, should the entire graph be populated always then?
Are there any drawbacks of using self-updating entities over ordinary entities in EF4? I dont need self-updating right now but I might do later. Can I upgrade easily or should I start with self-updating from the start?
Why cant I use entity-keys with type string?
Each of your questions needs a separate answer, but I'll try to answer them as briefly as possible.
First of all, in the code sample you provided, you get a ChatRoom object and then try to access a related object that is not included in your query (ChatMessages). If lazy loading is turned off as you had suggested, then you will need the Include("ChatMessages") call in your query, so your LINQ query should look like this:
ChatRoom cr = context.ChatRooms.Include("ChatMessages").FirstOrDefault(c => c.Id == chatRoomId);
Please ensure that your connection string is in your config file as well.
For the related questions:
You can return collections in any way you choose - I have typically done them in a List object (and I think that's the common way), but you could use arrays if you want. To return as a list, use the .ToList() method call on your query.
I don't understand what you're trying to do here, are you using code to create your database from your EDMX file or something? I've typically used a database-first approach, so I create my tables etc then update my EDMX from the database. Even if you generate your DB from your model, you shouldn't have to run CreateDatabase in code, you should be able to run the generated script against your DB. If you are using code-only then you need to dump the EDMX file.
You can generally return entity graphs to the client, should handle ok.
EF4 should only populate what you need. If you use lazy loading, it will automatically load things that you do not include in your LINQ query when you reference them and execute the query (e.g. do a ToList() operation). This won't work so well if your client is across a physical boundary (eg a service boundary) obviously :) If you don't use lazy loading, it will load what you tell it to in your query and that is all.
Self tracking entities are used for n-tier apps, where objects have to be passed across physical boundaries (eg services). They come with an overhead of generated code for each object to keep track of its changes, they also generate POCO objects which are not dependent on EF4 (but obviously contain generated code that would make the tracked changes work with the EF4 tracker). I'd say it depends on your usage, if you're building a small app that's quite self contained, and don't really care about separation for testability without the infrastructure in place, then you don't need to use self tracking entities. I say only use framework features when you need them, so if you're not writing an enterprise scale application (enterprise doesn't have to be big, but something scalable, highly testable, high quality etc) then no need to go for self tracking POCOs.
I haven't tried but you should be able to do that - that would be a candidate for a separate question if you can't get it to work :)