Recommended VLF File Count in SQL Server - sql

What is the recommended VLF File Count for 120 GB size database in SQL Server?
I appreciate anyone response quickly .
Thanks,
Govarthanan

There are many excellent articles on managing VLFs in SQL server; but the crux of all of them is- It depends on you!
Some people may need really quick recovery, and allocating a large VLF upfront is better.
DB size and VLFs are not really correlated.
You may have a small DB and may be doing large amount of updates. Imagine a DB storing daily stock values. It deletes all data every night and inserts new data in tables every day! This will really create a large log data but may not impact mdf file size.
Here's an article about VLF auto growth settings. Quoting important section
Up to 2014, the algorithm for how many VLFs you get when you create, grow, or auto-grow the log is based on the size in question:
Less than 1 MB, complicated, ignore this case.
Up to 64 MB: 4 new VLFs, each roughly 1/4 the size of the growth
64 MB to 1 GB: 8 new VLFs, each roughly 1/8 the size of the growth
More than 1 GB: 16 new VLFs, each roughly 1/16 the size of the growth
So if you created your log at 1 GB and it auto-grew in chunks of 512 MB to 200 GB, you’d have 8 + ((200 – 1) x 2 x 8) = 3192 VLFs. (8 VLFs from the initial creation, then 200 – 1 = 199 GB of growth at 512 MB per auto-grow = 398 auto-growths, each producing 8 VLFs.)
IMHO 3000+ VLFs is not a big number but alarming. Since you have some idea about your DB size; and assuming you know that typically your logs are approximately n times your DB size.
Then you can put in right auto growth settings to keep your VLFs in a range you are comfortable with.
I personally will be comfortable with a setting of 10 GB start with 5 GB auto-growth.
So for 120 GB of logs (n=1) this will give me 16 + 22*16=368 VLFs.
And if my logs go up to 500 GB, then I'll have 16+ 98*16=1584 VLFs

Related

When stored procedure returns 17 million rows, it's throwing out of memory while accessing dataset in Delphi

I'm using Delphi 6 for developing windows application and have a stored procedure which returns around 17 million rows. It takes 3 to 4 minutes while returning data in SQL Server Management Studio.
And, I'm getting an "out of memory" exception while I'm trying to access the result dataset. I'm thinking that the sp.execute might to executed fully. Do I need to follow any steps to fix this or shall I use sleep() to fix this issue?
Delphi 6 can only compile 32 bit executables.
32 bit executables running on a 32 bit Windows have a memory limit of 2 GiB. This can be extended to 3 GiB with a hardware boot switch.
32 bit executables running on a 64 bit Windows have the same memory limit of 2 GiB. Using the "large address aware" flag they can at max address 4 GiB of memory.
32 bit Windows executables emulated via WINE under Linux or Unix should not be able to overcome this either, because 32 bit can at max store the number 4,294,967,295 = 2³² - 1, so the logical limit is 4 GiB in any possible way.
Wanting 17 million datasets on currently 1,9 GiB of memory means that 1,9 * 1024 * 1024 * 1024 = 2,040,109,465 bytes divided by 17,000,000 gives a mean of just 120 bytes per dataset. I can hardly imagine that is enough. And it would even only be the gross load, but memory for variables are still needed. Even if you manage to put that into large arrays you'd still need plenty of overhead memory for variables.
Your software design is wrong. As James Z and Ken White already pointed out: there can't be a scenario where you need all those dataset at once, much less the user to view them all at once. I feel sorry for the poor souls that yet had to use that software - who knows what else is misconcepted there. The memory consumption should remain at sane levels.

Strangely large processing size in BigQuery query on partitioned table

I have just noticed some strange processing size behavior while running BigQuery queries on a Timestamp-partitioned table. We have a table with about 70 devices streaming inserts once per minute, so about 70 inserts/minute or 4,200 inserts/hour. Showing a small example is the easiest way to describe the problem. In the BigQuery UI on GCP, if I query one day of data:
select * from dataset.table where DATE(time) = '2020-09-21';
it says
This query will process 3.2 GB when run.
However, if I query 7 days of data:
select * from dataset.table where DATE(time) >= '2020-09-15' and DATE(time) <= '2020-09-21';
it says
This query will process 3.3 GB when run.
Through some experimentation, I found that there is about 14 MB increase in processing size for each day, which I expect is the true size of each partition.
What is even stranger is that this extra space seems to grow exponentially with the size of the table. As an example, I created a new table containing only 10 of the 70 devices and added a few months of data from these devices. When I performed these queries, I found that the query processed 1.4 MB for a 1-day query, and 8.9 MB for a 7-day query, which is an increase of about 1.07 MB/day, meaning this "extra size" is only about 33 KB, or about 2,300x smaller than the table with all devices.
So my question is what is this large extra processing size and why is it there? There must be something related to streaming inserts or caching or something that I'm missing?

Neo4j Configuration for 4M Nodes 10M relationship

I am new to Neo4j and have made few graph queries with 4M nodes and 10M relationships. Till now i've completely surprised from the performance of my queries.
SCHEMA
.......
(a:user{data:1})-[:follow]->(:user)-[:next*1..10]-(:activity)
Here user with data:1 is following another 100,000 user. Each of those 100,000 users have 2-8 next nodes(lets say activity of users) attached. Now i want to fetch the activities of users till next level 3 [:next*1..3] . Each activity has property relevance number.
So now i have 100,000 *3 nodes to traverse.
CYPHER
.......
match (u:user{data:1})-[:follow]-(:user)-[:next*1..3]-(a:activity)
return a order by a.relevance desc limit 50
This query is taking 72000 ms almost every time. Since i am new to Neo4j and i am sure that i haven't done tuning of the OS.
I am using following parameters-
Initial Java Heap Size (in MB)
wrapper.java.initmemory=2000
Maximum Java Heap Size (in MB)
wrapper.java.maxmemory=2456
Default values for the low-level graph engine
neostore.nodestore.db.mapped_memory=25M
neostore.relationshipstore.db.mapped_memory=50M
neostore.propertystore.db.mapped_memory=90M
neostore.propertystore.db.strings.mapped_memory=130M
neostore.propertystore.db.arrays.mapped_memory=130M
Please tell me where i am doing wrong. I read all the documentation from neo4j website but the query time didn't improve.
please tell me how can i configure high performing cache? What should i do so that all the graph loads up in memory? When i see my RAM usage , it is always like 1.8 GB out of 4 Gb. I am using enterprise license on windows (Neo4j 2.0). Please help.
You are actually following not 100k * 3 but, 100k * (2-10)^10 meaning 10^15 paths.
More memory in your machine would make a lot of sense, so try to get 8 or more GB.
Then you can increase the heap, e.g. to 6GB:
wrapper.java.initmemory=6000
wrapper.java.maxmemory=6000
neo4j.properties
neostore.nodestore.db.mapped_memory=100M
neostore.relationshipstore.db.mapped_memory=500M
neostore.propertystore.db.mapped_memory=200M
neostore.propertystore.db.strings.mapped_memory=200M
neostore.propertystore.db.arrays.mapped_memory=10M
If you want to pull your data through, you would most probably want to inverse your query.
match (a:activity),(u:user{data:1})
with a,u
order by a.relevance
desc limit 100
match (followed:user)-[:next*1..3]-(a:activity)
where (followed)-[:follow]-(user)
return a
order by a.relevance
desc limit 50

Should I force MS SQL to consume x amount of memory?

I'm trying to get better performance out of our MS SQL database. One thing I noticed that the instance is taking up about 20 gigs of RAM, and the database in question is taking 19 gigs of that 20. Why isn't the instance consuming most of the 32 gigs that is on box? Also the size of the DB is a lot larger then 32 gigs, so it being smaller then the available Ram is not the issue. I was thinking on setting the min server memory to 28 gigs or something along those lines, any thoughts? I didn't find anything on the interwebs that threw up red flags on this idea. This is on a VM(VMWARE). I verified that the host is not overcommitting memory. Also I do not have access to the host.
This is the query I ran to find out what each database was consuming
SELECT DB_NAME(database_id),
COUNT (*) * 8 / 1024 AS MBUsed
FROM sys.dm_os_buffer_descriptors
GROUP BY database_id
ORDER BY COUNT (*) * 8 / 1024 DESC
If data is sitting on disk, but hasn't been requested by a query since the service has started, then there would be no reason for SQL Server to put those rows into the buffer cache, thus the size on disk would be larger than the size in memory.

Interpreting and finetuning the BATCHSIZE parameter?

So I am playing around with the BULK INSERT statement and am beginning to love it. What was talking the SQL Server Import/Export Wizard 7 hours is taking only 1-3 hours using BULK INSERT. However, what I am observing is that the time to completion is heavily dependent on the BATCHSIZE specification.
Following are the times I observed for a 5.7 GB file containing 50 million records:
BATCHSIZE = 50000, Time Taken: 17.30 mins
BATCHSIZE = 10000, Time Taken: 14:00 mins
BATCHSIZE = 5000 , Time Taken: 15:00 mins
This only makes me curious: Is it possible to determine a good number for BATCHSIZE and if so, what factors does it depend on and can it be approximated without having to run the same query tens of times?
My next run would be a 70 GB file containing 780 million records. Any suggestions would be appreciated? I will report back the results once I finish.
There is some information here and it appears the batch size should be as large as is practical; documentation states in general the larger the batch size the better the performance, but you are not experiencing that at all. Seems that 10k is a good batch size to start with, but I would look at optimizing the bulk insert from other angles such as putting the database into simple mode or specifying a tablock hint during your import race.