Query possibilities: GemFire vs. Redis - redis

It seems Redis can provide same functionality as achieved thru Gemfire OQL queries, but with much of code change using spring/-boot/Java. Any insights would be highly appreciated!
Storing Data - Gemfire can store Parent Object and use OQL capability to query deeply nested objects, without doing anything different during storing. But in Redis it has to be stored based on the querying needs. So if we are expecting a "find x in xList" kind of capability; xList has to be stored as Set/List separately other than saving Parent Object.
Querying Data - If we have to store custom Objects in Redis, appropriate Serializers also have to be used, which also makes an instance of RedisTemplate bound to a DataType. In a huge codebase, where hundres of different objects needs to be stored and fetched, this doesn't appear to be highly maintainable.
Is this the only way to store/fetch every custom Object in Redis, when Queries are complex and integrated using spring-boot? RedisRepository doesn't suit the bill for complex queries

Related

What design patterns for marshalling JSON APIs to/from SQL

I'm working on a first JSON-RPC/JSON-REST API. One of the conveniences of JSON is that it can easily represent structured data (a user may have multiple email addresses, multiple addresses), etc...
For example, the Facebook Graph API nicely represents the kind of thing that's handy to return as JSON objects:
https://fbcdn-dragon-a.akamaihd.net/hphotos-ak-ash3/851559_339008529558010_1864655268_n.png
However, in implementing an API such as this with a relational database, we end up shattering structured objects into very many tables (at least one for each list in the JSON object), and un-shattering them when responding to requests. So:
requires a lot of modelling (separate models for JSON object and SQL tables).
inconsistencies creep in between the models: e.g. user_id (in SQL) vs. userID (in JSON)
marshaling stuff between one model and the other is very time consuming (tedious, error-prone and pointless boilerplate).
What design-patterns exist to help in this situation?
I'm not sure you are looking for design patterns. I would look for tools that handle this better.
I assume that you want to be able to query these objects, and not just store them in TEXT fields. Many databases support XML fairly well, so I would convert the JSON to XML (with a library) and then store that in the database.
You may also want to consider a JSON document based database. That will definitely get you where you want to go.
If you don't need to be able to query these, or only need to query a very small subset of fields, just store the objects as text, and extract those query-able fields into actual columns. This way you don't need to touch the majority of the data, but you can still query the few fields you care about. (Plus you can index them for speedier lookup.)
I have always chosen to implement this functionality in a facade pattern. Since the point of the facade is to simplify (abstract) an underlying complexity as a boundary between two or more systems, it seemed like the perfect place to handle this.
I realize however that this does not quite answer the question. I am talking about the container for the marshalling while the question is about how to better manage the contents (the code that does the job).
My approach here is somewhat old fashioned, but since this an old question maybe that’s okay. I employ (as much as possible) stored procedures in the dB. This promotes better encapsulation than one typically finds with a code layer outside of the dB that has to “know about” dB structure. What inevitably happens in the latter case is that more than one system will be written to do this (one large company I worked at had at least 6 competing ESBs) and there will be conflicts. Also, usually the stored procedure scripting will benefit from some sort of IDE that will helps maintain contextual awareness of the dB structure.
So this approach - even though it is not a pattern per se - makes managing the ORM a lot easier.

WCF data serialization : can it go faster?

This question is sort of a sequel to that question.
When we want to build a WCF service which works with some kind of data, it's natural that we want it to be fast and efficient. In order to achieve that, we have to make sure all segments of data road trip work as fast as they could, from data storage back-end such as SQL Server, to a WCF client who requested that data.
While seeking for an answer on that previous question, we have learned, thanks to Slauma and others who contributed through comments, that the time consuming part of Entity Framework's (first) large query is object materialization and attaching entities to the context when the result from the database is returned. We have seen that everything works much faster on subsequent queries.
Assuming those large queries are used as read-only operations, we came to a conclusion that we could set EF MergeOption to NoTracking, yielding better first query performance. What we have done with NoTracking was telling EF to create separate object for each record retrieved from the database - even when they have the same key. This will cause additional processing if we have .Include() statement in our query, which will lead to data with much larger size being returned.
The data may be so big that we could easily ask ourselves - did we really help our cause by using NoTracking option, even if we made the query faster (and maybe only the first one, depending on the number of .Include() statements, because subsequent queries without NoTracking option with multiple .Include() statements run faster simply because NoTracking option causes a lot more objects to be created when data returns from the server)?
The biggest problem is how to efficiently serialize this amount of data - and deserialize it on the client. With serialization already as slow as it is (I am using DataContractSerializer with PreserveObjectReferences set to true because I am sending EF 4.x generated POCOs to my client and vice versa), do we want to generate even more data (thanks to NoTracking)? To be honest, I haven't seen the data originated from the query with NoTracking option on ~11.000 objects not including navigation properties obtained via .Include(), arriving at the client side yet. Last time I tried to pull this off, the timeout of 00:10:00 was triggered (!)
So if you are still reading this wall of text, you tell me how to solve this situation. Which serializer to use in order to achieve acceptable results? Currently, if I don't use the NoTracking option, the serialization, transport and deserialization of ~11.000, via wsHttpBinding-like custom binding on the local machine take ~5 seconds. What's scary to me is that this large table is most likely going to contain ~500.000 records eventually.
Have you considered creating a View Model for your object and doing a projection in the select statement. That should be a lot faster so:
var result = from person in DB.Entities.Persons
.Include("District")
.Include("District.City")
.Include("District.City.State")
.Include("Nationality")
select new PersonViewModel()
{
Name = person.Name,
City = person.District.City,
State = person.District.City.State
Nationality = person.Nationality.Name
};
This would require you to create a ViewModel class to hold the flattened data for the PersonViewModel.
You might be able to further speed up things by creating a database view and letting Entity Framework select directly from there.
If you rally want the front-end to populate a grid with 500.000 records, then I'd remove the webservice layer altogether and use a DataReader to speed up the process. Entity Framework and WCF aren't suitable for transforming the data at a proper performance. What you're basically doing here is:
Database -> TDS -> .NET objects -> XML -> Plain text -> XML -> .NET Objects -> UI
While this could easily be reduced to:
Database -> TDS -> UI
Then use EntityFramwork to handle the changes to the entities in your business logic. This is in line with the Command and Query Separation pattern. Use a technology suitable for high performance querying of data and link that directly to your app. Then use a command strategy to implement your business logic.
OData services might also provide a better way to link your UI directly to the data, as it can be used to quickly query your data allowing you to implement quick filtering without the user really noticing.
If the security settings are prohibiting direct querying through OData or direct access to the SQL database, consider materializing the objects yourself. Select the data directly from either a view or a query and use a IDataReader to directly populate your ViewModel. That will probably give you the highest performance.
There are a lot of alternatives to Entity Framework created especially because EF isn't cut out for large datasets. See FluentData DapperDotNet, Massive or PetaPoco. You might want to use these side-by-side with entity Framework to handle your large, flat data queries.
I use Json.Net's implementation of Bson in my RIA application. More info here.
I yield return an IEnumerable, as I read from the database and serialize the rows. I find the speed to be acceptable and I return Entities with roughly 20 properties. This approach should minimize the concurrent memory use on the server.
Based on what I have gathered by looking at various reviews and performance benchmarks, I would choose protobuf-net as a serializer. It's just a matter of design whether it can be plugged into my service configuration. More info about that here.
Although not completely an answer to this question, jessehouwing had the best answer and I am marking it as accepted.

How to go from a full SQL querying to something like a NoSQL?

In one of my process I have this SQL query that take 10-20% of the total execution time. This SQL query does a filter on my Database, and load a list of PricingGrid object.
So I want to improve these performance.
So far I guessed 2 solutions :
Use a NoSQL solution, AFAIK these are good solutions for improving reading process.
But the migration seems hard and needs a lot of work (like import the data from sql server to nosql in a regular basis)
I don't have any knowledge , I even don't know which one I should use (the first I'd use is Ravendb because I follow ayende and it's done by the .net community).
I might have some stuff to change in my model to make my object ok for a nosql database
Load all my PricingGrid object in memory (in a static IEnumerable)
This might be a problem when my server won't have enough memory to load everything
I might reinvent the wheel (indexes...) invented by the NoSQL providers
I think I'm not the first one wondering this, so what would be the best solution ? Is there any tools that could help me ?
.net 3.5, SQL Server 2005, windows server 2005
Migrating your data from SQL is only the first step.
Moving to a document store (like RavenDB or MongoDB) also means that you need to:
Denormalize your data
Perform schema validation in your code
Handle concurrency of complex operations in your code since you no longer have transactions (at least not the same way)
Perform rollbacks in the event of partial commits (changes)
Depending on your updates, reads and network model you might also need to handle conflicts
You provided very limited information but it sounds like your needs include a single database server and that your data fits well in the relational model.
In such a case I would vote against a NoSQL solution, it is more likely that you can speed up your queries with database optimizations and still retain all the added value of a RDBMS.
Non-relational databases are tools for a specific job (no matter how they sell them), if you need them it is usually because your data doesn't fit well in the relational model or if you have a need to distribute your data over multiple machines (size or availability). For instance, I use MongoDB for a write-intensive high throughput job management application. It is centralized and the data is very transient so the "cost" of having low durability is acceptable. This doesn't sound like the case for you.
If prefer to use a NoSQL solution perhaps you should try using Memcached+MySQL (InnoDB) this will allow you to get the speed benefits of an in-memory cache (in the form of a memcached daemon plugin) with the underlying protection and capabilities of an RDBMS (MySQL). It should also ease data migration and somewhat reduce the amount of changes required in your code.
I myself have never used it, I find that I either need NoSQL for the reasons I stated above or that I can optimize the RDBMS using stored procedures, indexes and table views in a way which is sufficient for my needs.
Asaf has provided great information in regards to the usage of NoSQL and when it is most appropriate. Given that your main concern was performance, I would tend to agree with his opinion - it would take you much more time and effort to adopt a completely new (and very different) data persistence platform than it would to trick out your SQL Server cluster. That said, my answer is mainly to address the "how" part of your question.
Addressing misunderstandings:
Denormalizing Data - You do not need to manually denormalize your existing data. This will be done for you when it is migrated over. More than anything you need to simply think about your data in a different fashion - root aggregates, entity and value types, etc.
Concurrency/Transactions - Transactions are possible in both Mongo and Raven, they are simply done in a different fashion. One of the inherent ways Raven does this is by using an ORM-like "unit of work" pattern with its RavenSession objects. Yes, your data validation needs to be done in code, but you already should be doing it there anyway. In my experience this is an over-hyped con.
How:
Install Raven or Mongo on a primary server, run it as a service.
Create or extend an existing application that uses the database you intend to port. This application needs all the model classes/libraries that your SQL database provides persistence for.
a. In your "data layer" you likely have a repository class somewhere. Extract an interface form this, and use it to build another repository class for your Raven/Mongo persistence. Both DB's have plenty good documentation for using their APIs to push/pull/update changes in the document graphs. It's pretty damn simple.
b. Load your SQL data into C# objects in memory. Pull back your top-level objects (just the entities) and load their inner collections and related data in memory. Your repository is probably already doing this (ex. when fetching an Order object, ensure not only its properties but associated collections like Items are loaded in memory.
c. Instantiate your Raven/Mongo repository and push the data to it. Primary entities become "top level documents" or "root aggregates" serialized in JSON, and their collections' data nested within. Save changes and close the repository. Note: You may break this step down into as many little pieces as your data deems necessary.
Once your data is migrated, play around with it and ensure you are satisfied. You may want to modify your application Models a little to adjust the way they are persisted to Raven/Mongo - for instance you may want to make both Orders and Items top-level documents and simply use reference values (much like relationships in RDBMS systems). Watch out here though, as doing so sort-of goes against the principal and performance behind NoSQL as now you have to tap the DB twice to get the Order and the Items.
If satisfied, shard/replicate your mongo/raven servers across your remaining available server boxes.
Obviously there are tons of little details I did not explain, but that is the general process, and much of it depends on the applications already consuming the database and may be tricky if more than one app/system talks to it.
Lastly, just to reiterate what Asaf said... learn as much as you can about NoSQL and its best use-cases. It is an amazing tool, but not golden solution for all data persistence. In your case try to really find the bottlenecks in your current solution and see if they are solvable. As one of my systems guys says, "technology for technology's sake is bullshit"

Why does Redis use integer database numbers?

Why does Redis use integer database numbers instead of strings? It seems like it would be trivial to keep a small internal data structure which maps strings to the “actual” integer.
the reason why Redis does not use strings as DB names but indexes is that the goal and ability of Redis databases is not to provide an outer level of dictionary: Redis dictionaries can't scale to many dictionaries, but just to a small number (it is a tradeoff), nor we want to provide nested data structures per design, so this are just "a few namespaces" and as a result using a numerical small index seemed like the best option.
Having named databases doesn't really fit the design goals of redis. For a start, in a system designed for maximum performance, adding a string lookup to every call isn't a great idea when most users put everything in DB 0 anyway.
Another one of the design goals is keeping the core simple - If a requested new command can be implemented by combining existing commands on the client without a huge performance penalty it won't get added to the core system. If you really need named databases, it is trivial to update your client code read a string and send a number to redis.

Object serialization practical uses?

How many software projects have you worked on used object serialization? I personally never came across a scenario where object serialization was used. One use case i can think of is, a server software storing objects to disk to save memory. Are there other types of software where object serialization is essential or preferred over a database?
I've used object serialization in a lot of my projects. Sometimes we use it to store computer-specific settings locally. I have also used XML serialization to simplify interaction and generation of XML documents. It is also very beneficial in communication protocols. Serialize on one end and re-inflate on the other end.
Well, converting objects to XML or JSON is a form of serialization that is quite common on the web. I've also worked on a project where objects were created and serialized to a binary file in one application and then imported into another custom application (though that's fragile since it uses C# and serialization has broken in the past between versions of the .NET framework). Also, application settings that have a complex structure may be useful to serialize. I also think remoting APIs use serialization to communicate. Basically, serialization in general is simply a way to store the states of your objects, and this has many different uses.
Here are few uses I can think of :
Send an object across network, the most common example is serializing objects across a cluster
Serialize object for (sort of) caching, ie save the state in a file and read it back later
Serialize passive/huge data to a file to minimize the memory consumption and read it back whenever required.
I'm using serialization to pass objects across a TCP socket. You put XmlSerializers on either side, and it parses your data into readily available objects. If you do a little ground work, you can get it so that you're basically passing objects back and forth, and it makes socket communication extremely easy, reducing it to nothing more than socket.Send(myObject);.
Interprocess communication is a biggie.
you can combine db & serialization. f.ex. when you have to store an object with a lot of attributes (often dynamic, i.e. one object attribute set will be different from another one) to the relational DB, and you don't want to create a new column per each attribute
We started out with a system that serialized all of the thousands of in-memory objects to disk every 15 minutes or so. When that started taking too long we switched over to a mixed mode of saving the objects into a relational db and pickle file (this was a python system btw). Eventually the majority of the data was stored in a relational database. Interestingly, the system was written in such a way that all of the application code couldn't care less what was going on down there. It was all done using XP and thousands of automated tests.
Document based applications such as word processors and vector graphics editors will often serialize the document model to disk when the user invokes the Save command. Serialization is often preferred over complex databases in these apps.
Using serialization saves you time each time you want to implement an import/export functionality.
Every time you need to export your system's data, create backups or store some kind of settings, you could use serialization instead and just save the state of the objects that represent the actual config, data or whatever else.
Only when you need a specific format of the exported/imported data, there is a sense in building a custom parser and exporter/importer.
Serialization is also change-proof. Whenever you change the format of the object that is involved in the exchange functionality, it is automatically exportable and you don't have to change the logic behind your export/import parts.
We used it for a backup & update functionality. It was basically serialized hibernate objects being backed up, then the DB schema is altered through the update and we delivered a helper class that "coverted" the old objects to the new DB schema. This way we had a pretty solid update mechanism that wouldnt break easily and does an automatic backup at the same time.
I've used XML serialization heavily on one project. The technique was used to persist to database data structures that had no common structure, so the data couldn't be stored directly. I also used serialization to separate application settings that could be changed at runtime.