Can't replace mongo document - pymongo

I am attempting to save documents to a mongoDB cluster (sharded replica sets) and am having a strange issue. I am using pymongo 2.7.2 and TokuMX 1.5 mongodb 2.4.10.
When I attempt to save (overwrite) existing documents I am getting an exception that looks like the document I am saving is too large:
doc = db.collection.find_one()
db.collection.save(doc)
pymongo.errors.OperationFailure: BSONObj size: 18798961 (0x71D91E01) is invalid. Size must be between 0 and 16793600(16MB) First element: op: "u"
However this works fine:
doc = db.collection.find_one()
db.collection.remove({'_id': doc['_id']})
db.collection.save(doc)
The document in question is about 9mb, so it looks like when I attempt to replace the document it is somehow doubling the size of the document, exceeding the 16mb limit.
Any ideas as to what could cause this behavior?

Apparently this is a known issue with TokuMX. Oplog entries are twice the size of the document, so replacing a 9mb document will result in a 18mb oplog entry- which raises the exception.
The solution would be to limit document writes to less than 8mb so that oplog entries never exceed 16mb.

I think this is a side effect of how save is implemented in PyMongo.
Under the hood if the document has a _id then the save(doc) is turned into an update(doc, doc). That is where the doubling is coming into play since the query+update is 18MB.
When you removed the _id you changed the save(doc) into a insert(doc) of a new document with a new _id. I don't think that is what you wanted.
Rather than use save I would recommend constructing a query with just the _id field from the original document and doing the update call manually. I would even go so far as you should enter a Jira ticket to get PyMongo to do this for you.
HTH,
Rob.

Related

syncfusion.pdf.pdfException"Could Not Find valid signature (%pds-).'

string docuAddr = #"C:\Users\psimmon\source\repos\PDFTESTAPP\PDFTESTAPP\TempForms\forms-www.courts.state.co.us-Forms-PDF-JDF1117.pdf";
byte[] bytes = Encoding.Unicode.GetBytes(docuAddr);
PdfLoadedDocument loadedDocument = new PdfLoadedDocument(bytes, true);---blows here
PdfLoadedForm myForm = loadedDocument.Form;
PdfLoadedFormFieldCollection fields = myForm.Fields;
not sure what I have done wrong here, but the PDF file is opening, either in a browser or a fileexployer window. so it has to be me, guessed at most of this, all you very smart folks, I could use your gray matter. forgive my stupidity.
The reported exception “could not find valid signature (%PDF-)” may occurs due to the file is not a PDF document. We suspect it seems the other format files are saved with the “.pdf” extension. We could not open and repair this type of document on our end, we have already added the details in our documentation,
Please find some of the following corrupted error messages that cannot be repaired:
UG: https://help.syncfusion.com/file-formats/pdf/open-and-save-pdf-file-in-c-sharp-vb-net#possible-error-messages-of-invalid-pdf-documents-while-loading
If you want to find this type of corrupted document, Syncfusion PDF Library provides support to check and report whether the existing PDF document is corrupted or not with corruption details and structure-level syntax errors.
UG: https://help.syncfusion.com/file-formats/pdf/working-with-document#find-corrupted-pdf-document
Blog: https://www.syncfusion.com/blogs/post/how-to-find-corrupted-pdf-files-in-c-sharp.aspx
KB: https://www.syncfusion.com/kb/9686/how-to-identify-the-corrupted-pdf-document-using-c-and-vb-net

RavenDb: Document Refresh feature does not run at nor after the specified time by #refresh flag

I need to mark documets as expired after some time and therefore I am trying to use #refresh feature to re-run subscription and to compute my 'expired' flag. I know there is 'Document expiration' feature but this one removes data which I don't want.
I have turned Refresh feature in settings and added #refresh UTC datetime in metadata for required documents. For example I added manually this document:
{
"Name": "My data",
"#metadata": {
"#collection": "Testing",
"#refresh": "2021-04-30T07:41:35.4845961Z"
}
}
It looks like I am facing non deterministic behavior - sometimes refresh is processed sometimes not. I tried with different combinations of times and set through code or via Raven Studio.
Refresh interval is set to refresh but still says "in less than a minute"
I am using
Community license (Document refresh not mentioned here, but I don't see it mentioned for any other licenses as well)
community license extensions
tried more vresions of RavenDB with same result (5.1.7. was looking more promising as it worked for some time but after a while stopped):
4.2.111 server/studio version in Docker on Windows 10
5.1.7 server/studio version
C# RavenDB.Client 5.1.6
Did not find related issue in bug tracker
https://issues.hibernatingrhinos.com/issues/RavenDB?q=document%20refresh
Any ideas what to check or what might be the case?
EDIT: After turned logging into console I found some error log. It looks like
RavendbProject, Raven.Server.Documents.Expiration.ExpiredDocumentsCleaner, Failed to refresh documents on RavendbProject which are older than 05/17/2021 09:48:47, EXCEPTION: System.NullReferenceException: Object reference not set to an instance of an object.
RavendbProject | at Sparrow.Server.ByteStringContext`1.From(String value, ByteStringType type, ByteString& str) in C:\Builds\RavenDB-Stable-5.1\51024\src\Sparrow.Server\ByteString.cs:line 1297
RavendbProject | at Raven.Server.Documents.DocumentPutAction.PutDocument(DocumentsOperationContext context, String id, String expectedChangeVector, BlittableJsonReaderObject document, Nullable`1 lastModifiedTicks, String changeVector, DocumentFlags flags, NonPersistentDocumentFlags nonPersistentFlags) in C:\Builds\RavenDB-Stable-5.1\51024\src\Raven.Server\Documents\DocumentPutAction.cs:line 190
Also worth mentioning is that my document was stored in ClusterWide transaction and thus I can see in one of my documents corresponding flag:
"#flags": "FromClusterTransaction",
My current suspicion is that it may happen that one of these documents prevented other documents from being refreshed. After deleting cluster-transaction document, other documents in collection were refreshed
The bug related to document that was added via cluster transaction, the workaround for now would be to not use cluster transaction.
I have opened an issue on bug tracker,
https://issues.hibernatingrhinos.com/issue/RavenDB-16710

Using Leigh version of S3Wrapper.cfc Can't get past Init

I am new to S3 and need to use it for image storage. I found a half dozen versions of an s2wrapper for cf but it appears that the only one set of for v4 is one modified by Leigh
https://gist.github.com/Leigh-/26993ed79c956c9309a9dfe40f1fce29
Dropped in the com directory and created a "test" page that contains the following code:
s3 = createObject('component','com.S3Wrapper').init(application.s3.AccessKeyId,application.s3.SecretAccessKey);
but got the following error :
So I changed the line 37 from
variables.Sv4Util = createObject('component', 'Sv4').init(arguments.S3AccessKey, arguments.S3SecretAccessKey);
to
variables.Sv4Util = createObject('component', 'Sv4Util').init(arguments.S3AccessKey, arguments.S3SecretAccessKey);
Now I am getting:
I feel like going through Leigh code and start changing things is a bad idea since I have lurked here for year an know Leigh's code is solid.
Does any know if there are any examples on how to use this anywhere? If not what I am doing wrong. If it makes a difference I am using Lucee 5 and not Adobe's CF engine.
UPDATE :
I followed Leigh's directions and the error is now gone. I am addedsome more code to my test page which now looks like this :
<cfscript>
s3 = createObject('component','com.S3v4').init(application.s3.AccessKeyId,application.s3.SecretAccessKey);
bucket = "imgbkt.domain.com";
obj = "fake.ping";
region = "s3-us-west-1"
test = s3.getObject(bucket,obj,region);
writeDump(test);
test2 = s3.getObjectLink(bucket,obj,region);
writeDump(test2);
writeDump(s3);
</cfscript>
Regardless of what I put in for bucket, obj or region I get :
JIC I did go to AWS and get new keys:
Leigh if you are still around or anyone how has used one of the s3Wrappers any suggestions or guidance?
UPDATE #2:
Even after Alex's help I am not able to get this to work. The Link I receive from getObjectLink is not valid and getObject never does download an object. I thought I would try the putObject method
test3 = s3.putObject(bucketName=bucket,regionName=region,keyName="favicon.ico");
writeDump(test3);
to see if there is any additional information, I received this :
I did find this article https://shlomoswidler.com/2009/08/amazon-s3-gotcha-using-virtual-host.html but it is pretty old and since S3 specifically suggests using dots in bucketnames I don't that it is relevant any longer. There is obviously something I am doing wrong but I have spent hours trying to resolve this and I can't seem to figure out what it might be.
I will give you a rundown of what the code does:
getObjectLink returns a HTTP URL for the file fake.ping that is found looking in the bucket imgbkt.domain.com of region s3-us-west-1. This link is temporary and expires after 60 seconds by default.
getObject invokes getObjectLink and immediately requests the URL using HTTP GET. The response is then saved to the directory of the S3v4.cfc with the filename fake.ping by default. Finally the function returns the full path of the downloaded file: E:\wwwDevRoot\taa\fake.ping
To save the file in a different location, you would invoke:
downloadPath = 'E:\';
test = s3.getObject(bucket,obj,region,downloadPath);
writeDump(test);
The HTTP request is synchronous, meaning the file will be downloaded completely when the functions returns the filepath.
If you want to access the actual content of the file, you can do this:
test = s3.getObject(bucket,obj,region);
contentAsString = fileRead(test); // returns the file content as string
// or
contentAsBinary = fileReadBinary(test); // returns the content as binary (byte array)
writeDump(contentAsString);
writeDump(contentAsBinary);
(You might want to stream the content if the file is large since fileRead/fileReadBinary reads the whole file into buffer. Use fileOpen to stream the content.
Does that help you?

Replacing or recreating a file in Windows 8 RT keeps the old DateCreated value

I'm attempting to cache data in a file for a Windows Store app, and using the DateCreated value to determine if it is out of date.
I first tried doing this:
var file = await rootFolder.CreateFileAsync(filename, Windows.Storage.CreationCollisionOption.ReplaceExisting);
FileIO.WriteTextAsync(file, contents);
but when it saves the file only the DateModified value is changed, even though the comments for the ReplaceExisting option clearly state that it recreates the file and replaces an existing one.
So I decided to force it to delete the file and recreate it with this:
var file = await rootFolder.CreateFileAsync(filename, Windows.Storage.CreationCollisionOption.ReplaceExisting);
// force delete because windows rt is not doing what it's supposed to in the line above!!
await file.DeleteAsync();
file = await rootFolder.CreateFileAsync(filename);
FileIO.WriteTextAsync(file, contents);
but amazingly, I still get the same result! The file is deleted and recreated with the OLD CREATION DATE!
Is this a bug, or am I doing something wrong here?
This is by design, a feature called "File system tunneling". This KB article explains the behavior and rationale.
The workaround it documents requires registry editing, clearly you cannot rely on that in a Store application. You'll need to find a workaround, like using the last-written timestamp or alternating between two files or keep track of age in a separate file.
Thanks for the comments everyone, it turns out the Modified date IS available but you have to get it through the GetBasicPropertiesAsync() method as shown here: http://msdn.microsoft.com/en-us/library/windows/apps/windows.storage.fileproperties.basicproperties.datemodified.aspx

itext outofmemory error while attempting to count the number of pages in a pdf file

I'm trying to execute the following code:
PdfReader reader = new PdfReader("/path/to/file.pdf");
int pages = reader.getNumberOfPages();
It works on most files, but on one particular file, it crashes with error:
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2882)
at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:572)
at java.lang.StringBuffer.append(StringBuffer.java:320)
at com.itextpdf.text.pdf.PRTokeniser.readString(PRTokeniser.java:158)
at com.itextpdf.text.pdf.PRTokeniser.getStartxref(PRTokeniser.java:224)
at com.itextpdf.text.pdf.PRTokeniser.getStartxref(PRTokeniser.java:229)
...goes on for a while
at com.itextpdf.text.pdf.PRTokeniser.getStartxref(PRTokeniser.java:229)
I know that it's something wrong with the input file. I'm just wondering if there's a way of knowing before attempting to make the method call, that the file is going to cause a problem.
It turns out it was a bug with the version of itext I am using (5.0.1). I logged a query with the developers, and a fix was put in - that I tested - and which hopefully will find it's way into the next version (5.0.2)