When stored procedure returns 17 million rows, it's throwing out of memory while accessing dataset in Delphi - sql

I'm using Delphi 6 for developing windows application and have a stored procedure which returns around 17 million rows. It takes 3 to 4 minutes while returning data in SQL Server Management Studio.
And, I'm getting an "out of memory" exception while I'm trying to access the result dataset. I'm thinking that the sp.execute might to executed fully. Do I need to follow any steps to fix this or shall I use sleep() to fix this issue?

Delphi 6 can only compile 32 bit executables.
32 bit executables running on a 32 bit Windows have a memory limit of 2 GiB. This can be extended to 3 GiB with a hardware boot switch.
32 bit executables running on a 64 bit Windows have the same memory limit of 2 GiB. Using the "large address aware" flag they can at max address 4 GiB of memory.
32 bit Windows executables emulated via WINE under Linux or Unix should not be able to overcome this either, because 32 bit can at max store the number 4,294,967,295 = 2³² - 1, so the logical limit is 4 GiB in any possible way.
Wanting 17 million datasets on currently 1,9 GiB of memory means that 1,9 * 1024 * 1024 * 1024 = 2,040,109,465 bytes divided by 17,000,000 gives a mean of just 120 bytes per dataset. I can hardly imagine that is enough. And it would even only be the gross load, but memory for variables are still needed. Even if you manage to put that into large arrays you'd still need plenty of overhead memory for variables.
Your software design is wrong. As James Z and Ken White already pointed out: there can't be a scenario where you need all those dataset at once, much less the user to view them all at once. I feel sorry for the poor souls that yet had to use that software - who knows what else is misconcepted there. The memory consumption should remain at sane levels.

Related

Recommended VLF File Count in SQL Server

What is the recommended VLF File Count for 120 GB size database in SQL Server?
I appreciate anyone response quickly .
Thanks,
Govarthanan
There are many excellent articles on managing VLFs in SQL server; but the crux of all of them is- It depends on you!
Some people may need really quick recovery, and allocating a large VLF upfront is better.
DB size and VLFs are not really correlated.
You may have a small DB and may be doing large amount of updates. Imagine a DB storing daily stock values. It deletes all data every night and inserts new data in tables every day! This will really create a large log data but may not impact mdf file size.
Here's an article about VLF auto growth settings. Quoting important section
Up to 2014, the algorithm for how many VLFs you get when you create, grow, or auto-grow the log is based on the size in question:
Less than 1 MB, complicated, ignore this case.
Up to 64 MB: 4 new VLFs, each roughly 1/4 the size of the growth
64 MB to 1 GB: 8 new VLFs, each roughly 1/8 the size of the growth
More than 1 GB: 16 new VLFs, each roughly 1/16 the size of the growth
So if you created your log at 1 GB and it auto-grew in chunks of 512 MB to 200 GB, you’d have 8 + ((200 – 1) x 2 x 8) = 3192 VLFs. (8 VLFs from the initial creation, then 200 – 1 = 199 GB of growth at 512 MB per auto-grow = 398 auto-growths, each producing 8 VLFs.)
IMHO 3000+ VLFs is not a big number but alarming. Since you have some idea about your DB size; and assuming you know that typically your logs are approximately n times your DB size.
Then you can put in right auto growth settings to keep your VLFs in a range you are comfortable with.
I personally will be comfortable with a setting of 10 GB start with 5 GB auto-growth.
So for 120 GB of logs (n=1) this will give me 16 + 22*16=368 VLFs.
And if my logs go up to 500 GB, then I'll have 16+ 98*16=1584 VLFs

What is the average consumption of a GPS app (data-wise)?

I'm currently working on a school project to design a network, and we're asked to assess traffic on the network. In our solution (dealing with taxi drivers), each driver will have a smartphone that can be used to track its position to assign him the best ride possible (through Google Maps, for instance).
What would be the size of data sent and received by a single app during one day? (I need a rough estimate, no real need for a precise answer to the closest bit)
Thanks
Gps Positions compactly stored, but not compressed needs this number of bytes:
time : 8 (4 bytes is possible too)
latitude: 4 (if used as integer or float) or 8
longitude 4 or 8
speed: 2-4 (short: 2: integer 4)
course (2-4)
So binary stored in main memory, one location including the most important attributes, will need 20 - 24 bytes.
If you store them in main memory as single location object, additonal 16 bytes per object are needed in a simple (java) solution.
The maximum recording frequence is usually once per second (1/s): Per hour this need: 3600s * 40 byte = 144k. So a smartphone easily stores that even in main memory.
Not sure if you want to transmit the data:
When transimitting this to a server data usually will raise, depending of the transmit protocoll used.
But it mainly depends how you transmit the data and how often.
If you transimit every 5 minutes a position, you dont't have to care, even
when you use a simple solution that transmits 100 times more bytes than neccessary.
For your school project, try to transmit not more than every 5 or better 10 minutes.
Encryption adds an huge overhead.
To save bytes:
- Collect as long as feasible, then transmit at once.
- Favor binary protocolls to text based. (BSON better than JSON), (This might be out of scope for your school project)

Should I force MS SQL to consume x amount of memory?

I'm trying to get better performance out of our MS SQL database. One thing I noticed that the instance is taking up about 20 gigs of RAM, and the database in question is taking 19 gigs of that 20. Why isn't the instance consuming most of the 32 gigs that is on box? Also the size of the DB is a lot larger then 32 gigs, so it being smaller then the available Ram is not the issue. I was thinking on setting the min server memory to 28 gigs or something along those lines, any thoughts? I didn't find anything on the interwebs that threw up red flags on this idea. This is on a VM(VMWARE). I verified that the host is not overcommitting memory. Also I do not have access to the host.
This is the query I ran to find out what each database was consuming
SELECT DB_NAME(database_id),
COUNT (*) * 8 / 1024 AS MBUsed
FROM sys.dm_os_buffer_descriptors
GROUP BY database_id
ORDER BY COUNT (*) * 8 / 1024 DESC
If data is sitting on disk, but hasn't been requested by a query since the service has started, then there would be no reason for SQL Server to put those rows into the buffer cache, thus the size on disk would be larger than the size in memory.

Redis - configure parameters vm-page-size and vm-pages

Using Redis, I am currently parameterizing the redis.conf for using virtual memory.
Regarding I have 18 millions of keys (max 25 chars) as hashtables with 4 fields (maximum 256 chars)
My server has 16 Go RAM.
I wonder how to optimize the parameters vm-page-size (more than 64 ?) and vm-pages.
Any ideas ? Thanks.
You probably don't need to in this case - your usage is pretty close to standard. It's only when your values are large ( > ~4k iirc) that you can run into issues with insufficient contiguous space.
Also, with 16GB available there won't be much swapping happening, which makes the vm config a lot less important.

What makes Apple's PowerPC memcpy so fast?

I've written several copy functions in search of a good memory strategy on PowerPC. Using the Altivec or fp registers with cache hints (dcb*) doubles the performance over a simple byte copy loop for large data. Initially pleased with that, I threw in a regular memcpy to see how it compared... 10x faster than my best! I have no intention of rewriting memcpy, but I do hope to learn from it and accelerate several simple image filters that spend most of their time moving pixels to and from memory.
Shark analysis reveals that their inner loop uses dcbt to prefetch, with 4 vector reads, then 4 vector writes. After tweaking my best function to also haul 64 bytes per iteration, the performance advantage of memcpy is still embarrassing. I'm using dcbz to free up bandwidth, Apple uses nothing, but both codes tend to hesitate on stores.
prefetch
dcbt future
dcbt distant future
load stuff
lvx image
lvx image + 16
lvx image + 32
lvx image + 48
image += 64
prepare to store
dcbz filtered
dcbz filtered + 32
store stuff
stvxl filtered
stvxl filtered + 16
stvxl filtered + 32
stvxl filtered + 48
filtered += 64
repeat
Does anyone have some ideas on why very similar code has such a dramatic performance gap? I'd love to marinate the real image filters in whatever secret sauce memcpy is using!
Additional info: All data is vector aligned. I'm making filtered copies of the image, not replacing the original. The code runs on PowerPC G4, G5, and Cell PPU. The Cell SPU version is already insanely fast.
Shark analysis reveals that their inner loop uses dcbt to prefetch, with 4 vector reads, then 4 vector writes. After tweaking my best function to also haul 64 bytes per iteration
I may be stating the obvious, but since you don't mention the following at all in your question, it may be worth pointing it out:
I would bet that Apple's choice of 4 vectors reads followed by 4 vector writes has as much to do with the G5's pipeline and its management of out-of-order instruction execution in "dispatch groups" as it has with a magical 64-byte perfect line size. Did you notice the line skips in Nick Bastin's linked bcopy.s? These mean that the developer thought about how the instruction stream would be consumed by the G5. If you want to reproduce the same performance, it's not enough to read data 64 bytes at a time, you must make sure your instruction groups are well filled (basically, I remember that instructions can be grouped by up to five independent ones, with the first four being non-jump instructions and the fifth only being allowed to be a jump. The details are more complicated).
EDIT: you may also be interested by the following paragraph on the same page:
The dcbz instruction still zeros aligned 32 byte segments of memory as per the G4 and G3. However, since that is not a full cacheline on a G5 it will not have the performance benefits that you were likely hoping for. There is a dcbzl instruction newly introduced for the G5 that zeros a full 128-byte cacheline.
I don't know exactly what you're doing, since I can't see your code, but Apple's secret sauce is here.
Maybe it's because of CPU caching. Try to run CacheGrind:
Cachegrind is a cache profiler. It
performs detailed simulation of the
I1, D1 and L2 caches in your CPU and
so can accurately pinpoint the sources
of cache misses in your code. It
identifies the number of cache misses,
memory references and instructions
executed for each line of source code,
with per-function, per-module and
whole-program summaries. It is useful
with programs written in any language.
Cachegrind runs programs about
20--100x slower than normal.
Still not an answer, but did you verify that memcpy is actually moving the data? Maybe it was just remapped copy-on-write. You would still see the inner memcpy loop in Shark as part of the first and last pages are truly copied.
As mentioned in another answer, "dcbz", as defined by Apple on the G5, only operates on 32-bytes, so you will lose performance with this instruction on a G5 which has 128 byte cachelines. You need to use "dcbzl" to prevent the destination cacheline from being fetched from memory (and effectively reducing your useful read memory bandwidth by half).