Alfresco: unable to backup alf_data - backup

I am an alfresco 3.3c user with an instance supporting more that 4 million objects. I’m starting having problems with backup, because to backup the alf_data/contentstore folder even in a incremental mode, it takes to long (always need to analyze all those files for changes).
I’ve noticed that alf_data/contentstore is organized internally per years, could I assume that the olders years (2012) are not anymore changed? (if yes, I can just create an exception and remove those dirs from the backup process, obviously with a previous full backup )
Thanks, kind regards.

Yes, you can assume that no objects will be created (and items are never updated) in old directories within your content store, although items may be removed by the repository's cleanup jobs after being deleted from Alfresco's trash can.
This is the section from org.alfresco.repo.content.filestore.FileContentStore which generates a new content URL. You can easily see that it always uses the current date and time.
/**
* Creates a new content URL. This must be supported by all
* stores that are compatible with Alfresco.
*
* #return Returns a new and unique content URL
*/
public static String createNewFileStoreUrl()
{
Calendar calendar = new GregorianCalendar();
int year = calendar.get(Calendar.YEAR);
int month = calendar.get(Calendar.MONTH) + 1; // 0-based
int day = calendar.get(Calendar.DAY_OF_MONTH);
int hour = calendar.get(Calendar.HOUR_OF_DAY);
int minute = calendar.get(Calendar.MINUTE);
// create the URL
StringBuilder sb = new StringBuilder(20);
sb.append(FileContentStore.STORE_PROTOCOL)
.append(ContentStore.PROTOCOL_DELIMITER)
.append(year).append('/')
.append(month).append('/')
.append(day).append('/')
.append(hour).append('/')
.append(minute).append('/')
.append(GUID.generate()).append(".bin");
String newContentUrl = sb.toString();
// done
return newContentUrl;
}

Actually no you can't, because if the file was modified/updated in Alfresco the filesystem path doesn't change. Remember, you can hot-backup the content-store (not the lucene index folder) dir, and it's not necessary to check every single file for consistency. Just launch a shell/batch script executing a copy without check, or use a tool like xxcopy.
(I'm talking about node properties, not the node content)

Related

Migrating from Microsoft.Azure.Storage.Blob to Azure.Storage.Blobs - directory concepts missing

These are great guides for migrating between the different versions of NuGet package:
https://github.com/Azure/azure-sdk-for-net/blob/Azure.Storage.Blobs_12.6.0/sdk/storage/Azure.Storage.Blobs/README.md
https://elcamino.cloud/articles/2020-03-30-azure-storage-blobs-net-sdk-v12-upgrade-guide-and-tips.html
However I am struggling to migrate the following concepts in my code:
// Return if a directory exists:
container.GetDirectoryReference(path).ListBlobs().Any();
where GetDirectoryReference is not understood and there appears to be no direct translation.
Also, the concept of a CloudBlobDirectory does not appear to have made it into Azure.Storage.Blobs e.g.
private static long GetDirectorySize(CloudBlobDirectory directoryBlob) {
long size = 0;
foreach (var blobItem in directoryBlob.ListBlobs()) {
if (blobItem is BlobClient)
size += ((BlobClient) blobItem).GetProperties().Value.ContentLength;
if (blobItem is CloudBlobDirectory)
size += GetDirectorySize((CloudBlobDirectory) blobItem);
}
return size;
}
where CloudBlobDirectory does not appear anywhere in the API.
There's no such thing as physical directories or folders in Azure Blob Storage. The directories you sometimes see are part of the blob (e.g. folder1/folder2/file1.txt). The List Blobs requests allows you to add a prefix and delimiter in a call, which are used by the Azure Portal and Azure Data Explorer to create a visualization of folders. As example prefix folder1/ and delimiter / would allow you to see the content as if folder1 was opened.
That's exactly what happens in your code. The GetDirectoryReference() adds a prefix. The ListBlobs() fires a request and Any() checks if any items return.
For V12 the command that'll allow you to do the same would be GetBlobsByHierarchy and its async version. In your particular case where you only want to know if any blobs exist in the directory a GetBlobs with prefix would also suffice.

Is it possible to load a pre-populated database from local resource using sqldelight

I have a relatively large db that may take 1 to 2 minutes to initialise, is it possible to load a pre-populated db when using sqldelight (kotlin multiplatform) instead of initialising the db on app launch?
Yes, but it can be tricky. Not just for "Multiplatform". You need to copy the db to the db folder before trying to init sqldelight. That probably means i/o on the main thread when the app starts.
There is no standard way to do this now. You'll need to put the db file in assets on android and in a bundle on iOS and copy them to their respective folders before initializing sqldelight. Obviously you'll want to check if the db exists first, or have some way of knowing this is your first app run.
If you're planning on shipping updates that will have newer databases, you'll need to manage versions outside of just a check for the existance of the db.
Although not directly answering your question, 1 to 2 minutes is really, really long for sqlite. What are you doing? I would first make sure you're using transactions properly. 1-2 minutes of inserting data would (probably) result in a huge db file.
Sorry, but I can't add any comments yet, which would be more appropriate...
Although not directly answering your question, 1 to 2 minutes is
really, really long for sqlite. What are you doing? I would first make
sure you're using transactions properly. 1-2 minutes of inserting data
would (probably) result in a huge db file.
Alternatively, my problem due to which I had to use a pre-populated database was associated with the large size of .sq files (more than 30 MB text of INSERTs per table), and SqlDeLight silently interrupted the generation, without displaying error messages.
You'll need to put the db file in assets on android and in a bundle on
iOS and copy them to their respective folders before initializing
sqldelight.
Having to load a db from resources on both android and ios feels a lot
of work + it means the shared project wont be the only place where the
data is initialised.
Kotlin MultiPlatform library Moko-resources solves the issue of a single source for a database in a shared module. It works for KMM the same way for Android and iOS.
Unfortunately, using this feature are almost not presented in the samples of library. I added a second method (getDriver) to the expected class DatabaseDriverFactory to open the prepared database, and implemented it on the platform. For example, for androidMain:
actual class DatabaseDriverFactory(private val context: Context) {
actual fun createDriver(schema: SqlDriver.Schema, fileName: String): SqlDriver {
return AndroidSqliteDriver(schema, context, fileName)
}
actual fun getDriver(schema: SqlDriver.Schema, fileName: String): SqlDriver {
val database: File = context.getDatabasePath(fileName)
if (!database.exists()) {
val inputStream = context.resources.openRawResource(MR.files.dbfile.rawResId)
val outputStream = FileOutputStream(database.absolutePath)
inputStream.use { input: InputStream ->
outputStream.use { output: FileOutputStream ->
input.copyTo(output)
}
}
}
return AndroidSqliteDriver(schema, context, fileName)
}
}
MR.files.fullDb is the FileResource from the class generated by the library, it is associated with the name of the file located in the resources/MR/files directory of the commonMain module. It property rawResId represents the platform-side resource ID.
The only thing you need is to specify the path to the DB file using the driver.
Let's assume your DB lies in /mnt/my_best_app_dbs/super.db. Now, pass the path in the name property of the Driver. Something like this:
val sqlDriver: SqlDriver = AndroidSqliteDriver(Schema, context, "/mnt/my_best_app_dbs/best.db")
Keep in mind that you might need to have permissions that allow you to read a given storage type.

OutOfMemory on custom extractor

I have stitched a lot of small XML files into one file, and then made a custom extractor to return rows with one byte array that corresponds to each file.
Run on remote/master
Run it for one file (gzipped, 11Mb), it works fine.
Run it for more than one file, I get a System.OutOfMemoryException.
Run on local/master
Run it for one or more files (gzipped 500+ Mbs), works fine.
Extractor looks like this:
public override IEnumerable<IRow> Extract(IUnstructuredReader input, IUpdatableRow output)
{
using (var stream = new StreamReader(input.BaseStream))
{
var xml = stream.ReadToEnd();
// Clean stiched XML
xml = UtilsXml.CleanXml(xml);
// Get nodes - one for each stiched file
var d = new XmlDocument();
d.LoadXml(xml);
var root = d.FirstChild;
for (int i = 0; i < root.ChildNodes.Count; i++)
{
output.Set<object>(1, Encoding.ASCII.GetBytes(root.ChildNodes[i].OuterXml.ToString()));
yield return output.AsReadOnly();
}
yield break;
}
}
and error message looks like this:
==== Caught exception System.OutOfMemoryException
at System.Xml.XmlDocument.CreateTextNode(String text)
at System.Xml.XmlLoader.LoadAttributeNode()
at System.Xml.XmlLoader.LoadNode(Boolean skipOverWhitespace)
at System.Xml.XmlLoader.LoadDocSequence(XmlDocument parentDoc)
at System.Xml.XmlDocument.Load(XmlReader reader)
at System.Xml.XmlDocument.LoadXml(String xml)
at Microsoft.Analytics.Tools.Formats.Text.XmlByteArrayRowExtractor.<Extract>d__0.MoveNext()
at ScopeEngine.SqlIpExtractor<ScopeEngine::GZipInput,Extract_0_Data0>.GetNextRow(SqlIpExtractor<ScopeEngine::GZipInput\,Extract_0_Data0>* , Extract_0_Data0* output) in d:\data\ccs\jobs\bc367467-ef86-43d2-a937-46ba2d4cc524_v0\sqlmanaged.h:line 1924
So what am I doing wrong? And how do I debug this on remote?
Thanks!
Unfortunately local run does not enforce memory allocations, so you would have to check memory in local vertex debug yourself.
Looking at your code above, I see that you are loading XML documents into a DOM. Please note that an XML DOM can explode the data size from the string representation up to a factor of 10 or more (I have seen 2 to 12 in my times as the resident SQL XML guru).
Each UDO today only gets 1/2 GB of RAM to play with. So what I assume is that your XML DOM document(s) start going beyond that.
The recommendation normally is that you use the XMLReader interface (there is a reader extractor in the samples on http://usql.io as well) and scan through the document(s) to find the information you are looking for.
If your documents are always small enough (e.g., <20MB), you may want to make sure that you release the memory of the other documents and operate one document at a time.
We do have plans to allow you to annotate your UDO with memory needs, but that is still a bit out.

How to access files stored in SQL Server's FileTable?

As I know SQL Server since version 2012 has a new feature, FileTable. It allows us to store files in the file system and to use them from T-SQL.
I am trying to use this feature and I have no idea how to do it properly.
Generally, I don't know how to access files stored in the file table. Let's suppose I have asp.net MVC app and there are a lot of images which I show on web pages in img tags. I would like to store these images in Filetable and access them as files from the filesystem. But I don't know where these files are stored and how to use them as files. Now my images are stored in web application directory in folder images and I write something like this:
<img src='/images/mypicture.png' />
And if I move my images to file table what I should write in src?
<img src='path-toimage-in-filetable' />
I don't think you still need this, anyways I'll post my answer for anyone else interested.
First, a filetable still being a table, so, if you want to access to data from it you need to use a Select SQL statement. So you'd need something like:
select name, file_stream from filetable_name
where
name = 'file_name',
file_type = 'file_extension'
just execute an statement like this in your asp.net app, then fetch the results and use the file_stream column to get the binary data of the stored file. If you want to retrieve the file from HTML, first you need to create an action in your controller, which will return the retrieved file:
public ActionResult GetFile(){
..
return File(file.file_stream,file.file_type);
}
After this, put in you HTML tag something like:
<img src="/controller/GetFile" />
hope this could help!
If you want to know the schema of a filetable see
here
I assume by FileTable you actually mean FileStream. A couple notes about that:
This feature is best used if your files are actually files
The files should be, on average, greater than 1mb - although there can be exceptions to this rule, if they're smaller than 1mb on average, you may be better off using a VARBINARY(MAX) or XML data type as appropriate. If your images are very small on average (only a few KB), consider using a VARBINARY(MAX) column.
Accessing these files will require an open transaction and that the database is properly configured for FILESTREAM
You can get some significant advantages bypassing the normal SQL engine/database file method of data access by telling SQL Server that you want to access the file directly, however it's not meant for directly accessing the file on the file system and attempting to do so can break SQL's management of these files (transactional consistency, tracking, locking, etc.).
It's pretty likely that your use case here would be better served by using a CDN and storing image URLs in the table if you really need SQL for this. You can use FILESTREAM to do this (see code sample below for one implementation), but you'll be hammering your SQL server for every request unless you store the images somewhere else anyway that the browser can properly cache (my example doesn't do that) - and if you store them somewhere else for rendering int he browser you might as well store them there to begin with (you won't have transactional consistency for those images once they're copied to some other drive/disk/location anyway).
With all that said, here's an example of how you'd access the FILESTREAM data using ADO.NET:
public static string connectionString = ...; // get your connection string from encrypted config
// assumes your FILESTREAM data column is called Img in a table called ImageTable
const string sql = #"
SELECT
Img.PathName(),
GET_FILESTREAM_TRANSACTION_CONTEXT()
FROM ImageTagble
WHERE ImageId = #id";
public string RetreiveImage(int id)
{
string serverPath;
byte[] txnToken;
string base64ImageData = null;
using (var ts = new TransactionScope())
{
using (var conn = new SqlConnection(connectionString))
{
conn.Open();
using (SqlCommand cmd = new SqlCommand(sql, conn))
{
cmd.Parameters.Add("#id", SqlDbType.Int).Value = id;
using (SqlDataReader rdr = cmd.ExecuteReader())
{
rdr.Read();
serverPath = rdr.GetSqlString(0).Value;
txnToken = rdr.GetSqlBinary(1).Value;
}
}
using (var sfs = new SqlFileStream(serverPath, txnToken, FileAccess.Read))
{
// sfs will now work basically like a FileStream. You can either copy it locally or return it as a base64 encoded string
using (var ms = new MemoryStream())
{
sfs.CopyTo(ms);
base64ImageData = Convert.ToBase64String(ms.ToArray());
}
}
}
ts.Complete();
// assume this is PNG image data, replace PNG with JPG etc. as appropraite. Might store in table if it will vary...
return "data:img/png;base64," + base64ImageData;
}
}
Obviously, if you have lots of images to handle like this this is not an ideal method - don't try to make an instance of SQL server into what you should be using a CDN for.... However, if you have other really good reasons, you should try to grab as many images as possible in a single request/transaction (e.g. if you know you're displaying 50 images on a page, get all 50 with a single transaction scope, don't use 50 transaction scopes - this code won't handle that).

App Folder files not visible after un-install / re-install

I noticed this in the debug environment where I have to do many re-installs in order to test persistent data storage, initial settings, etc... It may not be relevant in production, but I mention this anyway just to inform other developers.
Any files created by an app in its App Folder are not 'visible' to queries after manual un-install / re-install (from IDE, for instance). The same applies to the 'Encoded DriveID' - it is no longer valid.
It is probably 'by design' but it effectively creates 'orphans' in the app folder until manually cleaned by 'drive.google.com > Manage Apps > [yourapp] > Options > Delete hidden app data'. It also creates problem if an app relies on finding of files by metadata, title, ... since these seem to be gone. As I said, not a production problem, but it can create some frustration during development.
Can any of friendly Googlers confirm this? Is there any other way to get to these files after re-install?
Try this approach:
Use requestSync() in onConnected() as:
#Override
public void onConnected(Bundle connectionHint) {
super.onConnected(connectionHint);
Drive.DriveApi.requestSync(getGoogleApiClient()).setResultCallback(syncCallback);
}
Then, in its callback, query the contents of the drive using:
final private ResultCallback<Status> syncCallback = new ResultCallback<Status>() {
#Override
public void onResult(#NonNull Status status) {
if (!status.isSuccess()) {
showMessage("Problem while retrieving results");
return;
}
query = new Query.Builder()
.addFilter(Filters.and(Filters.eq(SearchableField.TITLE, "title"),
Filters.eq(SearchableField.TRASHED, false)))
.build();
Drive.DriveApi.query(getGoogleApiClient(), query)
.setResultCallback(metadataCallback);
}
};
Then, in its callback, if found, retrieve the file using:
final private ResultCallback<DriveApi.MetadataBufferResult> metadataCallback =
new ResultCallback<DriveApi.MetadataBufferResult>() {
#SuppressLint("SetTextI18n")
#Override
public void onResult(#NonNull DriveApi.MetadataBufferResult result) {
if (!result.getStatus().isSuccess()) {
showMessage("Problem while retrieving results");
return;
}
MetadataBuffer mdb = result.getMetadataBuffer();
for (Metadata md : mdb) {
Date createdDate = md.getCreatedDate();
DriveId driveId = md.getDriveId();
}
readFromDrive(driveId);
}
};
Job done!
Hope that helps!
It looks like Google Play services has a problem. (https://stackoverflow.com/a/26541831/2228408)
For testing, you can do it by clearing Google Play services data (Settings > Apps > Google Play services > Manage Space > Clear all data).
Or, at this time, you need to implement it by using Drive SDK v2.
I think you are correct that it is by design.
By inspection I have concluded that until an app places data in the AppFolder folder, Drive does not sync down to the device however much to try and hassle it. Therefore it is impossible to check for the existence of AppFolder placed by another device, or a prior implementation. I'd assume that this was to try and create a consistent clean install.
I can see that there are a couple of strategies to work around this:
1) Place dummy data on AppFolder and then sync and recheck.
2) Accept that in the first instance there is the possibility of duplicates, as you cannot access the existing file by definition you will create a new copy, and use custom metadata to come up with a scheme to differentiate like-named files and choose which one you want to keep (essentially implement your conflict merge strategy across the two different files).
I've done the second, I have an update number to compare data from different devices and decide which version I want so decide whether to upload, download or leave alone. As my data is an SQLite DB I also have some code to only sync once updates have settled down and I deliberately consider people updating two devices at once foolish and the results are consistent but undefined as to which will win.