Calculate user size to store in PostgreSQL DB [closed] - sql

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I'm creating a PostgreSQL DB where I'll store some users, so I need to know which is the exact size (MB) of each user.
This is my reasoning :
Profile picture : JPEG 15 Mb 75% = up to 1,8 MB
name + surname + work : 20 characters each so = 60 B
date of birth : timestamp = 8 B
bio : up to 500 characters = 500 B
For a total of (approximately) 2,5 MB.
So if I have 1 GB of available space on the DB I will store up to 400 users.
Is it right? Am I missing something?

I would not store the image binary data in the database, this is not a good idea. Store it in azure blob storage and just store the url to it. At least not in the database, make the database as small and fast as possible or you will get issues later down the line. e.g. indexing with large columns will make queries slow down in time.

Related

I need some storage ideas and tips for optimization [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 11 hours ago.
Improve this question
I have a large jsonl file (text file with one json per line) that is over 45gb in size. In each json there's a company name which I'll query from a web application. Users will type the name in a search box and results have to appear in less than a second. So far, I tried Cosmos DB with the city of the company as a partition key, but that is pretty slow and probably also expensive. Now I'm using SQL Server in Azure with a FULL TEXT index on the name column. My table looks like this:
Company ID | Name | JSON
One idea would be to use the "General Purpose" tier in Azure with 4 vCores, that's about 700 EUR/month and the search is <100ms most of the times (sometimes I get timeout from my web app with time >30s), but is there a way to further improve this? Maybe there's a better storage option that I'm not considering or another option from Azure besides "General Purpose" that's a better fit for this scenario? I'm not an expert in SQL Server so I would appreciate some ideas on how to optimize this besides using the FULL TEXT catalog. I don't know if it matters but I want to mention that the data set will be updated every 3 months, so it's not changed often.
Here's how I'm querying the database:
SELECT TOP(5) Id, Name
FROM Companies
WHERE
CONTAINS(Name,'%abc%')
The output with "STATISTICS TIME ON":
SQL Server parse and compile time:
CPU time = 15 ms, elapsed time = 49 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
(5 rows affected)
SQL Server Execution Times:
CPU time = 16 ms, elapsed time = 1083 ms.

How to make correct function in dynamic programming [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I have the following problem in dynamic programming.
A person has time machine and he can move in time either 1 year or 2. At the beginning he is at year 0 and he wants to reach year 100. Every step he does (1 or 2 years) he is paying some fixed fees. There is an array with 100 integers represents the fee he needs to pay if he went threw the specific year.
I need to find the minimum amount the person can pay to go from year 0 to year 100 using dynamic programming.
From what i have done so far i think that there should be something like
minCost(i) = min{A[i-1], A[i-2]}
and the base cases are years 1 and 2 which costs A[1], A[2] respectively. But i think this approach has more of greedy algorithm rather than dynamic programming.
I saw the bin packing algorithm of dynamic programming and i understood it and the matrix that represents it.
How should the matrix of the shown problem above look like?
And how should i build the function and the pseudo code for this problem?
You are almost there.
Think about how will you reach the i th year from i-1 th year and i-2 th year. There is a fee which you are forgetting to take into consideration.
MinCostToReachYear(i) = min( MinCostToReachYear(i-1) + fee(i-1), MinCostToReachYear(i-2) + fee(i-2) )
You already know the base cases year 1 and year 2. Can you think of extrapolating with the use of a for loop or more easily with a recursive function which you already know as mentioned above? I leave it as an exercise for you.

how to find factors of very big number [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
i need to find factors of very big number say (10^1000) . i.e if input is 100 then output should be 10 10 because (10*10=100) .this is very simple if N<=size of (long) but i want to know how it will be possible to find factors of very big number say (10^1000). also i cant use Big Integer .
.
1) As has been pointed out, factoring large numbers is hard. It is in fact sufficiently hard that it's the basis for RSA public key cryptography, or in other words every time you buy something online, you are counting on the fact that it's hard to factor numbers of the order 2^2048 (given 2^10 = 1024 which is about 10^3, 2^2048 is about 10^600). While RSA specifically uses two large prime numbers and your random N may have lots of small numbers which will help somewhat, I wouldn't count on being able to factor 10^1000 +/- some random value anytime soon.
2) You can definitely reimplement big number library using strings [source: I had a classmate who did it before we learned about how to do big number math] but it's going to be painfully slow, and you basically have to cast your strings back to ints each time; a slightly less painful approach if you wanted to reimplmeent big numbers is arrays of integers. You still need to do some extra steps, but for doing at least basic math, it's not super difficult. (But it still won't be as efficient as specialized big number libraries, which can do clever algorithms. For example, multiplying 2 big numbers the straight forward way would be let A = P * 2^32 + Q (i.e. A is a 64 bit number represented as an array of 2 32 bit numbers) and B = R * 2^32 + S... the straightforward way takes 4 multiplactions plus some additions plus some dealing with carries). As the size of the big number increases, there are ways (see e.g. http://en.wikipedia.org/wiki/Karatsuba_algorithm) to reduce the number of multipication required)
3) (There are algorithms to more efficiently factor numbers compared to trial factorization, but the current ones are still not going to help compute the numbers you're asking about before the heat death of the universe)
10^1000 has exactly 1,002,001 integer divisors, and they should be very easy to find with a bit of thinking. The prime factorisation is
2 * 2 * 2 * ... * 5 * 5 * 5
with exactly 1,000 twos and exactly 1,000 fives.

What are typical lengths of chat message and comment in database? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I need to create a column in SQL Server database. Entries for that column will contain messages from chat. Previously such messages has been stored as comments.
My main quetion is:
What is typical text length for chat message and comment?
By the way:
What would happen if I used varchar(max)? How would it impact database size and performance? Is better to use powers of 2 or powers of 10 (e.g. 128 instead of 100) while considering text lengths?
Using VARCHAR(MAX) has a disadvantage: you can not define an index over this column.
Generally, your application should impose a maximum length for a chat message. How big that limit is depends very much on what the application is used for. But anything more than 1000 byte is probably less a legitimate message but an attempt to disrupt your service.
If your maximum value is a power of 2, or a power of ten or any other value has no influence on the performance as long as the row fits in one (8KB) page.
Short answer - it doesn't matter.
From MSDN:
The storage size is the actual length of the data entered + 2 bytes.
So VARCHAR(10) and VARCHAR(10000) will consume the same amount of data if the values don't exceed 10 characters.
Definitely use N/VARCHAR(MAX), it can grow to be 2GB (if I remember correctly). It will grow as required though, so it is very efficient with regards to space unless you are only storing very small amounts of data.

Advertised disk space vs actual disk space [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
Why is it that advertised disk space is almost always higher than the disk space reported by the UI? For example, I have an "80 gb" hard drive, but the iTunes UI indicates only 74. I usually see this as well with hard disks and the amount reported with the drive letter.
There are 3 reasons why the amount of space you can actually use is different from that listed for the drive, all of which work against you:
Hard drive manufacturers treat 1GB as one billion bytes, while the operating system calls it 1,073,741,824 bytes (1000 * 1000 * 1000 vs 1024 * 1024 * 1024).
You lose some space for file tables when formatting.
Disk space is divided into chunks larger than 1 byte (typically 4K). Using typical Windows defaults, a 1 byte file takes up 4K of space on disk.
Of these, the first two can influence the amount of space reported by the drive (though IIRC the 2nd one was more of an issue with FAT32 than NTFS). The last one only influences the amount of free space remaining, but will still prevent you from using the full capacity of your drive.
It's the way the OS calculates free space vs the hard drive manufacturers.
OS: 1mb = 1024 kb
Vendor: 1mb = 1000 kb
The vendor will always use the *1000 to increase their numbers.
The main culprit is using base 10 vs. base 2 to list the storage size. It effectively becomes a rounding error.
There is a movement to try and list storage size with base 2 values instead of base 10 to reflect the true size.
It's the difference between the standard (SI) prefixes (giga, mega, kilo, etc.) which are multiples of 1000 and the binary prefixes which are multiples of 1024.
Marketing considers 80 gigabytes to be 80,000,000,000 bytes. The OS considers 80 gigabytes to be 85,899,345,920 bytes.
http://www.google.com/search?q=80000000000+bytes+in+GB
Usually due to some partitioned space that the OS or some software takes and hides for backup or system purposes.
Say manufacturer consider a MB to be 1024KB; others 1000KB. Similarly for GB. Some say 1024MB; others 1000MB.
Then, that refers to the un-formatted size. Formatting takes up some space.
Additionally many times they advertise gigabytes as slightly inaccurate numbers, which result in differences. You can see this in the disclaimer text on the outside of most hard drive boxes!