What are typical lengths of chat message and comment in database? [closed] - sql

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I need to create a column in SQL Server database. Entries for that column will contain messages from chat. Previously such messages has been stored as comments.
My main quetion is:
What is typical text length for chat message and comment?
By the way:
What would happen if I used varchar(max)? How would it impact database size and performance? Is better to use powers of 2 or powers of 10 (e.g. 128 instead of 100) while considering text lengths?

Using VARCHAR(MAX) has a disadvantage: you can not define an index over this column.
Generally, your application should impose a maximum length for a chat message. How big that limit is depends very much on what the application is used for. But anything more than 1000 byte is probably less a legitimate message but an attempt to disrupt your service.
If your maximum value is a power of 2, or a power of ten or any other value has no influence on the performance as long as the row fits in one (8KB) page.

Short answer - it doesn't matter.
From MSDN:
The storage size is the actual length of the data entered + 2 bytes.
So VARCHAR(10) and VARCHAR(10000) will consume the same amount of data if the values don't exceed 10 characters.

Definitely use N/VARCHAR(MAX), it can grow to be 2GB (if I remember correctly). It will grow as required though, so it is very efficient with regards to space unless you are only storing very small amounts of data.

Related

Query time for a specific entity is 10000 times higher [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 9 months ago.
Improve this question
We run into a problem: select for a filter by a certain id takes a very long time. For all id about 5ms, for this - 10 seconds.
This is explain. Left - normal, right - long. This is absolutely the same sql query, where the difference is only in one digit 'where id = ...'
this
It is striking that a filter is used on the right, but for some reason it is not on the left, as well as some huge number of 'rows removed'. Such a number can only be obtained by multiplying the number of rows in the joined tables. Once again I repeat that the sql query is absolutely the same except for the entity id, the number of retrieved data for entities is comparable.
One of the tables also uses btre index. The only thing that this id has is special - it comes after the numbering break, 22,23,24,30 for example. But I was not able to reproduce the problem on this principle.
Unfortunately, I cannot show the code, but I hope that this information will be enough to advise something.
upd:
I found the reason. Postgres for some reason expects that one of the tables will return only 1 structure, when as a real return in 10k+ and therefore chooses the wrong algorithm. For other entity ids, it "thinks" correctly and chooses higher algorithms. Can you find how posgres counts plan lines? What could be the problem?
If I understand correctly, your problem is data histogram. We cannot support you because you cannot provide example code. Briefly, one of your table has a data whose id columns has heterogenous data in it. For example; your table has 1 billion records and in that table each id has 500 records. Yet, some of the id' s (virtually, let say) 20 or 200 millions records. So, if you search for these highly non-selective rows the database optimizer will not help you.
Check your data histogram!

Calculate user size to store in PostgreSQL DB [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I'm creating a PostgreSQL DB where I'll store some users, so I need to know which is the exact size (MB) of each user.
This is my reasoning :
Profile picture : JPEG 15 Mb 75% = up to 1,8 MB
name + surname + work : 20 characters each so = 60 B
date of birth : timestamp = 8 B
bio : up to 500 characters = 500 B
For a total of (approximately) 2,5 MB.
So if I have 1 GB of available space on the DB I will store up to 400 users.
Is it right? Am I missing something?
I would not store the image binary data in the database, this is not a good idea. Store it in azure blob storage and just store the url to it. At least not in the database, make the database as small and fast as possible or you will get issues later down the line. e.g. indexing with large columns will make queries slow down in time.

How to make correct function in dynamic programming [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I have the following problem in dynamic programming.
A person has time machine and he can move in time either 1 year or 2. At the beginning he is at year 0 and he wants to reach year 100. Every step he does (1 or 2 years) he is paying some fixed fees. There is an array with 100 integers represents the fee he needs to pay if he went threw the specific year.
I need to find the minimum amount the person can pay to go from year 0 to year 100 using dynamic programming.
From what i have done so far i think that there should be something like
minCost(i) = min{A[i-1], A[i-2]}
and the base cases are years 1 and 2 which costs A[1], A[2] respectively. But i think this approach has more of greedy algorithm rather than dynamic programming.
I saw the bin packing algorithm of dynamic programming and i understood it and the matrix that represents it.
How should the matrix of the shown problem above look like?
And how should i build the function and the pseudo code for this problem?
You are almost there.
Think about how will you reach the i th year from i-1 th year and i-2 th year. There is a fee which you are forgetting to take into consideration.
MinCostToReachYear(i) = min( MinCostToReachYear(i-1) + fee(i-1), MinCostToReachYear(i-2) + fee(i-2) )
You already know the base cases year 1 and year 2. Can you think of extrapolating with the use of a for loop or more easily with a recursive function which you already know as mentioned above? I leave it as an exercise for you.

Limiting chosen variables solved for in opensolver [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I've got a linear system of 17 equations, 506 variables that solve for a minimum summation of the total variables. This works fine, so far, but the solution is a result of a combination of 19 variables.
But in the end I want to limit the amount of chosen variables to 10, without knowing in advance which ones are the optimal ones (The solver figures that out for me, as well as their ratio).
I figured I can set a boolean = 1 if the value becomes larger than 0: (meaning the variable is picked), and 0 if the variable is not picked for an optimal solution.
And then have the sum of the booleans be 10 at most.
However this seems a bit elaborate, and I was wondering whether there was a built in option in the opensolver, for I think it is quite a common problem to solve a large set with a subset.
So does anyone have a suggestion on:
How my elaborate way drastically decreases performance? (*I have no intrinsic comprehension of the opensolver algorithms, yet.)
A suggestion to more easily/within the opensolver options account for my desire of max. 10 solution variables?
Based on the information provided below, I first scaled down the size of the problem:
I have three lists of data with 18 entries in columns:
W7:W23,AC7:AD23
which manually (with: W28 = 6000, AC28=600,W29 = 1,AC29 =1), in a linear combination,equal/exceed the target list:
EGM34:EG50
So what I did was put the descion variables in W28:W29, AC28:AD29
Where I added the constraint W28,AC28:AD28 = integer in the solver (both the original excel solver as in opensolver)
And I added the constraint W29,AC29:AD29 = Boolean in the solver (both the original excel solver as in opensolver)
Then I have a multiplication of the integer*boolean = the actual multiplication factor for the above lists in (W7:W23 etc)
In order to limit the nr of chosen variables I have also tried, in addition to the described constraints, to limit the cell with =sum(W29,AC29:AD29) to <= 10 (effectively reducing the amount of booleans set to true below 11, or so I thought, but the booleans aren't evaluated as booleans by the solver).
These new multiplied lists are placed in W34:W50,AC34:AD50, and the summation is situated in: EGY34:EGY50 Hence the final check is added as a constraint as:
EGY34:EGY50 =>EGM34:EGM50
And I had a question about how the linear solver evaluates these constraints, does it:
a. Think the sum of EGY34:EGY50 must be larger or equal than/to EGM34:EGM50
or
b. Does it think: "for every row x EGYx must be larger or equal than/to EGMx
So far I've noted b. but I would like to make sure.
But my main question concerns:
After using the Evolutionary algorithm as was kindly suggested in the comments below, how/why does it try values as 0.99994 for the desicion variables designated as booleans?
The introduction of binary variables is indeed the standard way to implement such constraints. Unfortunately, it transforms the problem from being a linear programming problem to being an integer programming problem (specifically a mixed integer linear programming problem). A standard approach to such problems is the branch and bound algorithm. This is what Excel's built-in solver seems to use, I'm not sure about the open solver that you are using. In the best case (where there is a lot of bounding) it will run fairly rapidly, even with problems of your size. In the worst case, for your problem it could be little better than what you would get by running the simplex algorithm C(506,10) = 2.8 x 10^20 times (once for each possible set of 10 decision variables). In other words, it might be infeasible. Integer programming is known to be NP-hard.
If an exact solution is infeasible, you could always use a heuristic algorithm such as an evolutionary approach.

how to find factors of very big number [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
i need to find factors of very big number say (10^1000) . i.e if input is 100 then output should be 10 10 because (10*10=100) .this is very simple if N<=size of (long) but i want to know how it will be possible to find factors of very big number say (10^1000). also i cant use Big Integer .
.
1) As has been pointed out, factoring large numbers is hard. It is in fact sufficiently hard that it's the basis for RSA public key cryptography, or in other words every time you buy something online, you are counting on the fact that it's hard to factor numbers of the order 2^2048 (given 2^10 = 1024 which is about 10^3, 2^2048 is about 10^600). While RSA specifically uses two large prime numbers and your random N may have lots of small numbers which will help somewhat, I wouldn't count on being able to factor 10^1000 +/- some random value anytime soon.
2) You can definitely reimplement big number library using strings [source: I had a classmate who did it before we learned about how to do big number math] but it's going to be painfully slow, and you basically have to cast your strings back to ints each time; a slightly less painful approach if you wanted to reimplmeent big numbers is arrays of integers. You still need to do some extra steps, but for doing at least basic math, it's not super difficult. (But it still won't be as efficient as specialized big number libraries, which can do clever algorithms. For example, multiplying 2 big numbers the straight forward way would be let A = P * 2^32 + Q (i.e. A is a 64 bit number represented as an array of 2 32 bit numbers) and B = R * 2^32 + S... the straightforward way takes 4 multiplactions plus some additions plus some dealing with carries). As the size of the big number increases, there are ways (see e.g. http://en.wikipedia.org/wiki/Karatsuba_algorithm) to reduce the number of multipication required)
3) (There are algorithms to more efficiently factor numbers compared to trial factorization, but the current ones are still not going to help compute the numbers you're asking about before the heat death of the universe)
10^1000 has exactly 1,002,001 integer divisors, and they should be very easy to find with a bit of thinking. The prime factorisation is
2 * 2 * 2 * ... * 5 * 5 * 5
with exactly 1,000 twos and exactly 1,000 fives.