Problem Statement : One of our module(Ticket module - store procedure ) is taking 4- 5 sec to return the data from DB. This store procedure supports 7 -8 filters + it has joins on 4- 5 tables to get the text of the IDs stored in Ticket tables eg ( Client name, Ticket Status, TicketType ... ) and this has hampered the performance of the SP.
Current Tech stack : ASP.Net 4.0 Web API , MS SQL 2008
We are planning to introduce Redis as a caching server and Node js with the aim of improving the performance and scalability.
Use case : We have Service ticket module and its has following attributes
TicketId
ClientId
TicketDate
Ticket Status
Ticket Type
Each user of this module has access to fix no of clients ie
User1 have access to tickets of Client 1, 2, 3, 4......
User2 have access to tickets of Client 1, 2, 5, 7......
So basically when User1 accesses the Ticket module he should be able to filter service tickets on TicketId, Client, Ticket Date ( from and To) , Ticket Status (Open, Hold, In Process ...) and Ticket Type ( Request, Complaint, Service .....) + Since User1 has access of only Client 1 ,2 3, 4......, caching should only give the list of tickets of the clients he has access of.
Would appreciate if you guys can share your views how should we structure the Redis ie what should we use for each of the above items ie hashset, set, sorted set ... + how should we filter the tickets depending on the access of client resp user has.
Redis is a key/value storage. I would use hashset with a structure like:
Key: ticketId
Subkeys: clientId, ticketDate, ticketStatus, ticketType
Search, sorting etc - handle programmatically from application or/and in LUA.
Related
I am trying to build a chat application similar to slack chat, I want to understand how they have designed their database that it returns so much information at once when someone loads a chat, which database is good for this problem, I am adding screenshot of the same for reference.
Initially when I started thinking about this I wanted to go ahead with PostgreSQL and always keeping tables normalized to keep it clean but as I went ahead normalization started to feel like a problem.
Users Table
id
name
email
1
John
john#gmail.com
2
same
sam#gmail.com
Channels Table
id
channel_name
1
Channel name 1
2
Channel name 2
Participants table
id
user_id
channel_id
1
1
1
2
1
2
3
2
1
Chat table
id
user_id
channel_id
parent_id
message_text
total_replies
timestamp
1
1
1
null
first message
0
-
2
1
2
1
second message
10
-
3
1
3
null
third message
0
-
Chat table has column name parent_id which tells if it is parent message or child message I don't want to go with recursive child messages so this is fine
Emojis table
id
user_id
message_id
emoji_uni-code
1
1
12
U123
2
1
12
U234
3
2
14
U456
4
2
14
U7878
5
3
14
U678
A person can react with many emojis on the same message
when someone loads I want to fetch last 10 messages inserted into tables with
all the emojis which have been reacted with each messages and replies like you can see in the image where it says 1 reply with person's profile picture(this can be more than 1)
Now to fetch this data I have to join all the tables and then fetch the data which could be very heavy job on the back-end side, considering this is going to be very frequent.
What I thought is I would add two more columns in Chat table which are profile_replies and emoji_reactions_count and both will be of bson data types to store data something like this
This for emoji_reactions_count column
This is also with two ways one which is count only way
{
"U123": "123",// count of reactions on an emoji
"U234": "12"
}
When someone reacts i would update the count and insert or delete the row from Emojis table, Here I have a question, too frequent emoji updates on any message could become slow? because i need to update the count in above table everytime someone reacts with a emoji
OR
storing user id along with the count like this , this looks better I can get rid off Emojis table completely
{
"U123": {
"count": 123, // count of reactions on an emoji
"userIds": [1,2,3,4], // list of users ids who all have reacted
},
"U234": {
"count": 12,
"userIds": [1,2,3,4],
},
}
This for profile_replies column
[
{
"name": 'john',
"profile_image": 'image url',
"replied_on": timestamp
},
... with similar other objects
]
Does this look fine solution or is there anything I can do to imporve or should I switch to some noSQL Database like mongodb or cassandra? I have considered about mongodb but this does not also look very good because joins are slow when data exponentially grows but this does not happen in sql comparatively.
Even though this is honestly more like a discussion and there is no perfect answer to such a question, I will try to point out things you might want to consider if rebuilding Slack:
Emoji table:
As #Alex Blex already commmetend can be neglected for the very beginning of a chat software. Later on they could either be injected by your some cache in your application, somewhere in middleware or view or whereever, or stored directly with your message. There is no need to JOIN anything on the database side.
Workspaces:
Slack is organized in Workspaces, where you can participate with the very same user. Every workspace can have multiple channels, every channel can have multiple guests. Every user can join multiple workspaces (as admin, full member, single-channel or multi-channel guest). Try to start with that idea.
Channels:
I would refactor the channel wording to e.g. conversation because basically (personal opinion here) I think there is not much of a difference between e.g. a channel with 10 members and a direction conversation involving 5 people, except for the fact that: users can join (open) channels later on and see previous messages, which is not possible for closed channels and direct messages.
Now for your actual database layout question:
Adding columns like reply_count or profile_replies can be very handy later on when you are developing an admin dashboard with all kinds of statistics but is absolutely not required for the client.
Assuming your client does a small call to "get workspace members" upon joining / launching the client (and then obviously frequently renewing the cache on the clients side) there is no need to store user data with the messages, even if there are 1000 members on the same workspace it should be only a few MiB of information.
Assuming your client does the same with a call to "get recent workspace conversations" (of course you can filter by if public and joined) you are going to have a nice list of channels you are already in, and the last people you have talked to.
create table message
(
id bigserial primary key,
workspace_id bigint not null,
conversation_id bigint not null,
parent_id bigint,
created_dt timestamp with time zone not null,
modified_at timestamp with time zone,
is_deleted bool not null default false,
content jsonb
)
partition by hash (workspace_id);
create table message_p0 partition of message for values with (modulus 32, remainder 0);
create table message_p1 partition of message for values with (modulus 32, remainder 1);
create table message_p2 partition of message for values with (modulus 32, remainder 2);
...
So basically your query against the database whenever a user joins a new conversation is going to be:
SELECT * FROM message WHERE workspace_id = 1234 ORDER BY created_dt DESC LIMIT 25;
And when you start scrolling up its going to be:
SELECT * FROM message WHERE workspace_id = 1234 AND conversation_id = 1234 and id < 123456789 ORDER BY created_dt DESC LIMIT 25;
and so on... As you can already see you can now select messages very efficiently by workspace and conversation if you additionally add an INDEX like (might differ if you use partitioning):
create index idx_message_by_workspace_conversation_date
on message (workspace_id, conversation_id, created_dt)
where (is_deleted = false);
For message format I would use something similar than Twitter does, for more details please check their official documentation:
https://developer.twitter.com/en/docs/twitter-api/v1/data-dictionary/object-model/tweet
Of course e.g. your Client v14 should know how to 'render' all objects from v1 to v14, but thats the great thing about message format versioning: It is backwards compatible and you can launch a new format supporting more features whenever you feel like, a primitive example of content could be:
{
"format": "1.0",
"message":"Hello World",
"can_reply":true,
"can_share":false,
"image": {
"highres": { "url": "https://www.google.com", "width": 1280, "height": 720 },
"thumbnail": { "url": "https://www.google.com", "width": 320, "height": 240 }
},
"video": null,
"from_user": {
"id": 12325512,
"avatar": "https://www.google.com"
}
}
The much complicated question imo is efficiently determining which messages have been read by each and every user. I will not go into detail of how to send push notifications as that should be done by your backend application and not by polling the database.
Using the previously gathered data from "get recent workspace conversations" (something like SELECT * FROM user_conversations ORDER BY last_read_dt DESC LIMIT 25 should do, in your case the Participants table where you would have to add both last_read_message_id and last_read_dt) you can then do a query to get which messages have not been read yet:
a small stored function returning messages
a JOIN statement returning those messages
a UNNEST / LATERAL statement returning those messages
maybe something else that doesn't come to my mind at the moment. :)
And last but not least I would highly recommend not trying to rebuild Slack as there are so many more topics to cover, like security & encryption, API & integrations, and so on...
I have a segmentation project I am working on for my company and we have to create a pipeline to gather data from our app users and when they fit a segment then the app will receive that information and do something with it (not in my scope). So currently, the client connects and authenticates to an endpoint that allows their client to send JSON data to an Elasticsearch cluster (app started, level completed, etc). I'm then using an Azure Function to grab the live data every 5 minutes and store it in an Azure Blob Storage which then creates a queue that Snowflake reads and ingests the JSON files. We'd then use Snowflake to run a task per segment (that will be decided by the analysts or executives) and the data will be outputted to a table like the one below:
AccountID
Game
SegmentID
CreatedAt
DeletedAt
123456789
Game 1
1
2021-04-20
2021-04-21
123456789
Game 1
2
2021-04-20
123456789
Game 1
3
2021-04-20
Where SegmentID can represent something like
SegmentID
SegmentType
SegmentDescription
1
5 Day Streak
User played for 5 consecutive days
2
10 Day Streak
User played for 10 consecutive days
3
15 Day Streak
User played for 15 consecutive days
In the next step of the pipeline, the same API the user authenticated with should post a request when the game boots up to grab all the segments that the user matches. The dev team will then decide where, when in the session and how to use the information to personalize content. Something like:
select
SegmentID
from
SegmentTable
where
AccountID='{AccountID the App authenticated with}' and
Game='{Game the App authenticated with}' and
DeletedAt is null
Response:
SegmentID
2
3
Serialised:
{"SegmentID": [2,3]}
We expect to have about 300K-500K users per day. My question would be, what would be the most efficient and cost-effective way to get this information from Snowflake back to the client so that this amount of users wouldn't have issues when querying the same endpoint and it won't be costly.
OK, so a bit of a workaround, but I created an external function on Snowflake (using Azure Functions) that upserts data in a local MongoDB cluster. So the API connects to the MongoDB instance which can handle the large volume of concurrent connections and since it is on a local server it is quite cheap. The only cost is the data transfer from Snowflake to MongoDB and the running of the App Service Plan on Azure Functions (could not use consumption-based as to send data to our internal server I needed to create a VNET, NAT Gateway and a Static Outbound IP Address in Azure) and the API Management Service I had to create in Azure.
So how it works? For each stored procedure in Snowflake, at the end I am collecting the segments which have changed (New row or DELETED_AT not null) and triggering the external function which upserts the data in MongoDB using the pymongo client.
Hello guys I wanted to ask a few things because I want to upgrade my log in security . First of all this is how my log in security looks like atm -
Sql query compares the user input ( pass / name / id ) to my data base and if its correct he gets 2 values and redirected to the main restricted page. one of the values is a random value from a function that stores a limited amount of such values ( each time it picks a random 1 and returns it to the user upon successfull log in ) , and the other value is 1 of the input fields ( like company ID for example )
both of those stored in sessions ( hopefully its not an easy to gain the data stored in those from a hacker? ) and on each of the restricted pages, i use on the page load event 2 terms :
Session ( "the ID" ) <> ""
isLegit ( session ( "the random code" ) <> "false"
I am still learning about security and i guess my current method is bad?
And thats where my second question comes to play , i been reading about microsoft's memebership and wanted to use some of the stuff included , but even after reading about how it works i find myself failing to implant it on my site .
I got pretty long register form and well the site designed in some way, and if i try putting the log in controls from visual studio i cant get them to look like part of the page.
I read that there is a way to keep my site as it is and to use
FormsAuthentication.RedirectFromLoginPage("test", false);
to force membership or something of this sort?
Is such a thing possiblr without having to use the log in tools and storing additional log in data ( beside what i have on my sql data base )?
p.s I am using asp.net with vb on visual studio with MS sql server
Take a look at these articles:
Introduction to Membership http://msdn.microsoft.com/en-us/library/yh26yfzy(v=vs.100).aspx
Sample Membership Provider Implementation http://msdn.microsoft.com/en-us/library/44w5aswa(v=vs.100).aspx
Introduction to Membership http://msdn.microsoft.com/en-us/library/tw292whz(v=vs.100).aspx
If you are willing to create another table that would hold the user information it should be really straight-forward, works out of the box.
You would need to take out the checks you have in the code, though.
I am working on Login Form in perl. I wish to limit 3 times for invalid login attempts for 30 min. In which way shall I use. Give me some idea where to store the invalid login attempts.
You have a few different options:
Cookies - least secure, as different browsers, clearing cookies, etc. can all impact this. Not recommend but listed for completeness.
Database - if you are using a database then you can create a table that records every failed login attempt. Within this table login_attempts you record the following values: date, username. Then during any attempted login you (pseudo SQL) SELECT COUNT(*) FROM login_attempts WHERE username = '$the_user_name' AND date BETWEEN '$now' AND '$now-30m'. If the returned value is >= 3, deny the login. Make sure you create a clustered index on username and date descending. If you are concerned about space then you can have a job that runs nightly and removes any records with dates older than, say, 8 hours. (Though left in tact this serves as an audit log.)
File system - this option is so Byzantine I'm not going to go in-depth describing it, assuming you have some sort of database backing store already in place. If you have to go this route, then it looks like the database solution, above, without the SELECT statement and likely involves user names as directories with a file containing each login attempt and where the (mtime + 30m < now) will have to be true to permit a successful login. You'd need to make sure you have mtime enabled for the disk of course, or record the time within the file.
It's fairly obvious how to model a database table that would act as an access control list (ACL) when you're just dealing with discrete users who have some level of access to a discrete resource. Something like this:
TABLE acl (
user_id INT,
resource_id INT,
access_type INT
)
... where access_type is a number representing something like:
0 (or lack of record for user_id and resource_id) means no access
1 means read-only
2 means full control
However it starts getting trickier when you've got scenarios like users can be a member of one or more groups and groups can contain other groups. Then a resource could be a folder that contains other resources.
Other than the obviously poor approach of doing a whole bunch of recursive queries at runtime to determine the level of access a user should have to a resource, how do these scenarios tend to get handled? Are there commonly-accepted designs for modelling an ACL like this?
Are you using a DB with support for connect by, or something similar?
In oracle, I've implemented the following.
Table Group //Just the parent groups
{
groupCode varchar
groupDesc
}
Table groupMap //associates groups with other groups
{
parentGroup
childGroup
}
table userGroup //can assign user to more than one group
{
userId
groupCode
}
then use connect by to get all child groups for user
SELECT rm.CHILDGroup as roleCode
FROM groupMap rm
CONNECT BY PRIOR rm.CHILDGroup = rm.PARENTGroup
START WITH rm.CHILDGroup in
(SELECT ur.groupCode
FROM userGroup ur
WHERE ur.userId = &userId);
This query will get all the groups that were assigned to the user in userGroup and all the child groups assigned to the groups that the user belongs to.
Spring ACL is a solid implementation of ACL with inheritance for java. It is open source so I would check it out if it is what you are looking for.