Will redis lua scripting increase read performance in this case? - redis

Let's suppose that we have simplified online store with following entities: products, related products and categories. Relations are pretty simple:
Products are linked to categories
Products have related products
Products prices are region specific
All entities are stored as json strings in Redis, something like:
Products (and related products): {"productId":1, "title": "Product", "relatedProductIds":[4,5,6], "categoryIds":[1,2,3]}
Product prices: {"price": 555.5, "region": "eu"}
Category: {"categoryId": 12, "title": "Category"}
Our task is to fetch whole tree from redis for some product ids: products, products categories, products prices, product related products, categories of related products, related products prices. Doing this from client is okay, but requires multiple requests to redis. Whole procedure could be described in this way:
Fetch all required products and prices and json decode them. Now we know list of primary products categories and related product ids.
Fetch categories, related products and prices of related products and json decode them. Now we know list of related products categories.
Fetch related products categories.
It works and it works really fast, but could this be done using redis lua scripting? I'm planning to do this job via lua, combine one big json (or multiple small jsons) and return to client. Is it worth doing and why?
Thank you and sorry for my english.

Yes, Lua script should probably get you much better latency than couple of hopes.
Especially if you don't need to retrieve all the fields, this way you should reduce data load on network and return only the relevant fields.
BTW, you might want to check https://redisgears.io and https://redisjson.io in case you have more complex use cases, that might require supporting Redis Cluster or more complex JSONs.

Lua script performance depends on what you did inside script. As Redis uses single thread command executor, if your script execute for a long time, it will block other commands and make Redis slower for response.
For your case I would recommend to query the different data directly from Redis and handle their relations in your application. In this way, Redis is only used for data storage not for logic, and make it pure and faster.

Related

Synchronizing millions of products

My team and I are trying update a huge number (millions) of products through integration with ERP.
We want to use the sync api.
https://forum.shopware.com/t/sync-api-upsert-mit-productnumber-als-unique-key/68556
Explains what we want to do. Our ERP system is not aware of the product id (UUID) from shopware and only knows a product SKU. This leaves us with having to do a product lookup in shopware for each product number, to get product id and then update the product data.
Is there a workaround so we can upsert by product number or other great ideas on how we can speed up things?
Kind regards
One idea is to generate our own product UUID based on MD5 hash of the product number. This way we will always know the uuid without having to do lookup in database.
There are several approaches to solve this. This are some which i use in my daily business.
API-only
If you do can only work with the sync-API, then i would recommend to build a sep. storage only for product synchronization. This would contain the following information:
SKU
Shopware-UUID
hash
Every synchronized product will have an existing UUID. If there is no UUID, you know that the product has not been synchronized to Shopware. The hash itself can be used internally to check for changes in the data. If there is a change, that specific entity will be updated then. In that way a simple and fast lookup would be possible without the API.
Direct access to the db
This is not my recommendation if you do not have a team of developers. Direct access to the database will provide you the fastest ways to access that information. Actually we do this for merchants with lots of product data but you will loose some benefits like the automatically queued events for updating entities in the catalog, prices, categories, etc. when using the sync api. So i only recommend it for read access which is what you want.
About the UUID
The approach you described is exactly what UUID v5 is. Shopware uses UUID v4 internally, which is just a random 128 bit sequence while v5 utilizes sha1 to generate that. So this is a good practice.
Shopwares API only allows updated by the primary key/id, so updates by other (unique) are not supported. Neither in the the sync api, nor in the default crud api.
But providing IDs for the products yourself instead of letting shopware auto generate them is totally valid, so your idea of generating the id based on the product number sounds reasonable and feasible to me.
Another solution would be to store the mapping between the shopware ids and the SKUs somewhere where you can access it faster, this would probably mean storing the ids either directly in the ERP (in some form of custom field etc.) or store that information in the mapping service. But i would prefer having a fast and consistent way of regenerating the ids without the need to store the mapping.
You could use the Shopware CSV import which allows mapping using the product number since a while. The CSV import also can be driven by API.

Two shops and sync clients between them with passwords

Is this possible to sync customers between two seperate prestashop 1.7 shops? I dont want to use multistore option..is there a module for that or maybe some database operations?
Customers are stored in a single database table (ps_customer) , so if you are able to write a synchronization routine between the two database tables you should be able to achieve that.
There are several additional considerations though :
Both stores must have the same "cookie_key" set in the site parameters for same passwords to be validated in both shops, so you'll have to start with at least one empty store.
Customers have different relationships to databases based on their id_customer auto_increment values (addresses, orders, third party modules etc.), so you'll need to know what you're doing and make sure the two shops can't have conflicts between customer ids (IE: you can start one of the two shops with a very high id_customer..) - Also not sure if you need to handle also addresses synchronzation.. This would add some complexity.
I hope I've given you some good starting points - but I would stick with native "multishop" PS feature for that - It would be far easier despite still having a lot of bugs :)

New to Microservices - refactoring a monolith "Marketplace" database

I am new to microservices and have been struggling to wrap my brain around it. On the surface they sound like a good idea, but from a practical standpoint, I can't break away from my centralized database background. For an example, I have this real-world Marketplace example that I cannot figure out if microservices would help or hurt. This site was working well until the PO asked for "Private Products." Now it is fragile and slow so I need to do a major refactor. A good time to implement microservices. I feel like many systems have this type of coupling, so that deconstructing this example would be very instructive.
Current State
This is a b2b marketplace where users belong to companies that are buying products from each other. Currently, there exists a monolithic database: User, Company, Catalog, Product, and Order. (This is a simplification, the actual scenario is much more complicated, users have roles, orders have header/detail, products have inventories, etc.)
Users belong to Companies
Companies have a Catalog of their Products
Companies have Orders for Products from other Companies
So far so good. I could see breaking the app into microservices on the major entity boundaries.
New Requirement
Unfortunately for my architectural aspirations, the product owner wants more features. In this case: Private Products.
Products are Public or Private
Companies send time-bound Invitations to Products or Catalogs to Users of other Companies
This seemingly simple request all the suddenly complicated everything.
Use Case - User displays a list of products
For example, listing or searching products was once just a simple case of asking the Products to list/search themselves. It is one of the top run queries on the system. Unfortunately, now what was a simple use case just got messy.
A User should be able to see all public Products (easy)
A User should be able to see all their own Company's private Products (not horrible)
A User can see any Product that their Company has Ordered in the past regardless of privacy (Uh oh, now the product needs to know about the User Company's Order history)
A User can see any private Product for which they have an active Invitation (Uh oh, now the product needs to know about the User's Product or Catalog Invitations which are time dependent)
Initial Monolithic Approach
This can be solved at the database level, but the SQL joins basically ALL of the tables together (and not just master data tables, all the transactions as well.) While it is a lot slower than before, since a DBMS is designed for big joins it seems like the appropriate tool. So I can start working on optimizing the query. However, as I said, for this and other reasons the system is feeling fragile.
Initial Design Thoughts... and ultimately my questions
So considering a Microservices architecture as a potential new direction, I began to think about how to start. Data redundancy seems necessary. Since, if I translate my major entities into services, asking to get a list of products without data redundancy would just have all of the services calling each other and a big slow mess.
Starting with a the idea of carving out "Product and Catalog" as its own microservice. Since Catalogs are just collections of Products, they seem to belong together - I'll just call the whole thing the "Product Service". This service would have an API for managing both products and catalogs and, most importantly, to search and list them.
As a separate service, to perform a Product search would require a lot of redundant data as it would have to subscribe to any event that affected product privacy, such as:
Listen for Orders and keep at least a summary of the relationship between purchased Products and Purchasing Companies
Listen to Invitations and maintain a list of active User/Product/Time relationships
Listen to User and Company events to maintain a User to Company relationship
I begin to worry about keeping it all synchronized.
In the end, a Product Service would have a large part of the current schema replicated. So I begin to think, maybe Microservices won't work for this situation. Or am I being melodramatic and the schema will be simpler enough to be more managable and faster so it is worth it?
Overall, am I thinking about this whole approach properly? Is this how microservice based designs are intended to be thought through? If not, can somebody give me a push in a different direction?
Try splitting your system into services over and over until it makes sense. Use your gut feeling. Read more books, articles, forums where other people describing how they did it.
You've mentioned that there is no point of splitting ProductService into Product and ProductSearch - fair enough, try to implement it like that. If you will end up with a pretty complicated schema for some reason or with performance bottlenecks - it's a good sign to continue splitting further. If not - it is fine like that for your specific domain.
Not all product services made equal. In some situations, you have to be able to create millions or even billions of products per day. In this situation, it is most likely that you should consider separating product catalogue and product search. The reason is performance: to make search perform faster (indexing) you have to slow down inserts. These are two mutually exclusive goals that are hard to reach without separating data into different microservices (which will lead to data duplication as well).

Realtime queries in deepstream "cache" layer?

I see, that by using RethinkDB connector one can achieve real time querying capabilites by subscribing into specifically named lists. I assume, that this is not actually the fastest solution, as the query probably updates only after changes to records are written to the database. Is there any recommended approach to achieve realtime querying capabilites deepstream-side?
There are some favourable properties like:
Number of unique queries is small compared to number of records or even number of connected clients
All manipulation of records that are subject to querying is done via RPC.
I can imagine multiple ways how to do that:
Imitate the rethinkdb connector approach. But for that I am missing a list.listen() method. With that I would be able to create a backend process creating a list on-demand and on each RPC CRUD operation on records update all currently active lists=queries.
Reimplement basic list functionality in records and use the above approach with now existing .listen()
Use .listen() in events?
Or do we have list.listen() and I just missed it? Or there is more elegant way how to do it?
Great question - generally lists are a client-side concept, implemented on top of records. Listen notifies you about clients subscribing to records, not necessarily changing them - change notifications arrive via mylist.subscribe(data => {}) or myRecord.subscribe(data => {}).
The tricky bit is the very limited querying capability of caches. Redis has a basic concept of secondary indices that can be searched for ranges and intersection, memcached and co are to my knowledge pure key-value stores, searchable only by ID - as a result the actual querying would make most sense on the database layer where your data will usually arrive in significantly less than 200ms.
The RethinkDB search provider offers support for RethinkDB's built in realtime querying capabilites. Alternatively you could use MongoDB and trail its operations log or use PostGres and deepstream's built in subscribe feature for change notifications.

Relational or document database for storing instant messages? Maybe something else?

I started playing around with RavenDB a few days ago. I like it this far, but I am pretty new to the whole NoSQL world. I am trying to think of patterns when to prefer it (or any other DocumentDB or any other NoSQL-kind of data store) over traditional RDBMSs. I do understand that "when you need to store documents or unstructured/dynamically structured data opt for DocumentDB" but that just feels way too general to grasp.
Why? Because from what I've read, people had been writing examples for "documents" such as order details in an e-commerce application and form details of a workflow management application. But these has been developed with RDBMSs for ages without too much trouble - for example, the details of an order, such as quantity, total price, discount, etc. are perfectly structured.
So I think there's an overlap here. But now, I am not asking for general advices for when to use what, because I believe the best for me would be to figure it out by myself through experimenting; so I am just going to ask about a concrete case along with my concerns.
So let's say I develop an instant messenger application which stores messages to ages back, like Facebook's messaging system does. I think using an RDBMS here is not suitable. My reason to this is that most poeple use instant messaging systems like this:
A: hi
B: hey
A: how r u?
B: good, u?
A: me 2
...
The thing to note is that most messages are very short, so storing each in a single row with this structure:
Messages(fromUserId, toUserId, sent, content)
feels very ineffective, because the "actual useful information (content)" is very small, whereas the table would contain incredible amounts of rows and therefore the indexes would grow huge. Adding to this the fact that messages are sent very frequently, the size of indexes would have a huge impact on performance. So a very large amount of rows must be managed and stored while every row contains a minimal amount of actual information.
In RavenDB, I would use a structure such as this:
// a Conversation object
{
"FirstUserId": "users/19395",
"SecondUserId": "users/19396",
"Messages": [
{
"Order": 0,
"Sender": "Second",
"Sent": "2016-04-02T19:27:35.8140061",
"Content": "lijhuttj t bdjiqzu "
},
{
"Order": 1,
"Sender": "Second",
"Sent": "2016-04-02T19:27:35.8200960",
"Content": "pekuon eul co"
}
]
}
With this structure, I only need to find out which conversation I am looking for: the one between User A and User B. Any message between User A and User B is stored in this object, regardless of whether User A or User B was the sender. So once I find the conversation between them - and there are far less converations than actual messages - I can just grab all of the messages associated with it.
However, if the two participants talk a lot (and assuming that messages are stored for, let's say, 3 years) there can be tens of thousands of messages in a single conversation causing the object to grow very large.
But there is one thing I don't know how it works (specifically) in RavenDB. Does its internal storage and query mechanism allow (the DB engine, not the client) to grab just the (for example) 50 most recent messages without reading the whole object? Afterall, it uses indexing on the properties of objects, but I haven't found any information about whether reading parts of an object is possible DB-side. (That is, without the DB engine reading the whole object from disk, parsing it and then sending back just the required parts to the client).
If it is possible, I think using Raven is a better option in this scenario, if not, then I am not sure. So please help me clean it up by answering the issue mentioned in the previous paragraph along with any advices on what DB model would suit this certain scenario the best. RDBMSs? DocDBs? Maybe something else?
Thanks.
I would say the primary distinctions will be:
Does your application consume the data in JSON? -- Then store it as JSON (in a document DB) and avoid serializing/deserializing it.
Do you need to run analytical workloads on the data? -- Then use SQL
What consistency levels do you need? -- SQL is made for high consistency, docDBs are optimized for lower consistencies
Does your schema change much? -- then use a (schema-less) docDB
What scale are you anticipating? -- docDBs are usually easier to scale out
Note also that many modern cloud document databases (like Azure DocDB) can give you the best of both worlds as they support geo-replication, schema-less documents, auto-indexing, guaranteed latencies, and SQL queries. SQL Databases (like AWS Aurora) can handle massive throughput rates, but usually still require more hand-holding from a DBA.