Is it possible to load a pre-populated database from local resource using sqldelight - kotlin

I have a relatively large db that may take 1 to 2 minutes to initialise, is it possible to load a pre-populated db when using sqldelight (kotlin multiplatform) instead of initialising the db on app launch?

Yes, but it can be tricky. Not just for "Multiplatform". You need to copy the db to the db folder before trying to init sqldelight. That probably means i/o on the main thread when the app starts.
There is no standard way to do this now. You'll need to put the db file in assets on android and in a bundle on iOS and copy them to their respective folders before initializing sqldelight. Obviously you'll want to check if the db exists first, or have some way of knowing this is your first app run.
If you're planning on shipping updates that will have newer databases, you'll need to manage versions outside of just a check for the existance of the db.
Although not directly answering your question, 1 to 2 minutes is really, really long for sqlite. What are you doing? I would first make sure you're using transactions properly. 1-2 minutes of inserting data would (probably) result in a huge db file.

Sorry, but I can't add any comments yet, which would be more appropriate...
Although not directly answering your question, 1 to 2 minutes is
really, really long for sqlite. What are you doing? I would first make
sure you're using transactions properly. 1-2 minutes of inserting data
would (probably) result in a huge db file.
Alternatively, my problem due to which I had to use a pre-populated database was associated with the large size of .sq files (more than 30 MB text of INSERTs per table), and SqlDeLight silently interrupted the generation, without displaying error messages.
You'll need to put the db file in assets on android and in a bundle on
iOS and copy them to their respective folders before initializing
sqldelight.
Having to load a db from resources on both android and ios feels a lot
of work + it means the shared project wont be the only place where the
data is initialised.
Kotlin MultiPlatform library Moko-resources solves the issue of a single source for a database in a shared module. It works for KMM the same way for Android and iOS.
Unfortunately, using this feature are almost not presented in the samples of library. I added a second method (getDriver) to the expected class DatabaseDriverFactory to open the prepared database, and implemented it on the platform. For example, for androidMain:
actual class DatabaseDriverFactory(private val context: Context) {
actual fun createDriver(schema: SqlDriver.Schema, fileName: String): SqlDriver {
return AndroidSqliteDriver(schema, context, fileName)
}
actual fun getDriver(schema: SqlDriver.Schema, fileName: String): SqlDriver {
val database: File = context.getDatabasePath(fileName)
if (!database.exists()) {
val inputStream = context.resources.openRawResource(MR.files.dbfile.rawResId)
val outputStream = FileOutputStream(database.absolutePath)
inputStream.use { input: InputStream ->
outputStream.use { output: FileOutputStream ->
input.copyTo(output)
}
}
}
return AndroidSqliteDriver(schema, context, fileName)
}
}
MR.files.fullDb is the FileResource from the class generated by the library, it is associated with the name of the file located in the resources/MR/files directory of the commonMain module. It property rawResId represents the platform-side resource ID.

The only thing you need is to specify the path to the DB file using the driver.
Let's assume your DB lies in /mnt/my_best_app_dbs/super.db. Now, pass the path in the name property of the Driver. Something like this:
val sqlDriver: SqlDriver = AndroidSqliteDriver(Schema, context, "/mnt/my_best_app_dbs/best.db")
Keep in mind that you might need to have permissions that allow you to read a given storage type.

Related

Neo4j 3.5's embeded database does not seem to persist data

I am trying to build a small command line tool that will store data in a neo4j graph. To do this I have started experimenting with Neo4j3.5's embedded databases. After putting together the following example I have found that either the nodes I am creating are not being saved to the database or the method of database creation is overwriting my previous run.
The Example:
fun main() {
//Spin up data base
val graphDBFactory = GraphDatabaseFactory()
val graphDB = graphDBFactory.newEmbeddedDatabase(File("src/main/resources/neo4j"))
registerShutdownHook(graphDB)
val tx = graphDB.beginTx()
graphDB.createNode(Label.label("firstNode"))
graphDB.createNode(Label.label("secondNode"))
val result = graphDB.execute("MATCH (a) RETURN COUNT(a)")
println(result.resultAsString())
tx.success()
}
private fun registerShutdownHook(graphDb: GraphDatabaseService) {
// Registers a shutdown hook for the Neo4j instance so that it
// shuts down nicely when the VM exits (even if you "Ctrl-C" the
// running application).
Runtime.getRuntime().addShutdownHook(object : Thread() {
override fun run() {
graphDb.shutdown()
}
})
}
I would expect that every time I run main the resulting query count will increase by 2.
That is currently not the case and I can find nothing in the docs that references a different method of opening an already created embedded database. Am I trying to use the embedded database incorrectly or am I missing something? Any help or info would be appreciated.
build Info:
Kotlin jvm 1.4.21
Neo4j-comunity-3.5.35
Transactions in neo4j 3.x have a 3 stage model
create
success / failure
close
you missed the third, which would then commit or rollback.
You can use Kotlin's use as Transaction is an AutoCloseable

Migrating from Microsoft.Azure.Storage.Blob to Azure.Storage.Blobs - directory concepts missing

These are great guides for migrating between the different versions of NuGet package:
https://github.com/Azure/azure-sdk-for-net/blob/Azure.Storage.Blobs_12.6.0/sdk/storage/Azure.Storage.Blobs/README.md
https://elcamino.cloud/articles/2020-03-30-azure-storage-blobs-net-sdk-v12-upgrade-guide-and-tips.html
However I am struggling to migrate the following concepts in my code:
// Return if a directory exists:
container.GetDirectoryReference(path).ListBlobs().Any();
where GetDirectoryReference is not understood and there appears to be no direct translation.
Also, the concept of a CloudBlobDirectory does not appear to have made it into Azure.Storage.Blobs e.g.
private static long GetDirectorySize(CloudBlobDirectory directoryBlob) {
long size = 0;
foreach (var blobItem in directoryBlob.ListBlobs()) {
if (blobItem is BlobClient)
size += ((BlobClient) blobItem).GetProperties().Value.ContentLength;
if (blobItem is CloudBlobDirectory)
size += GetDirectorySize((CloudBlobDirectory) blobItem);
}
return size;
}
where CloudBlobDirectory does not appear anywhere in the API.
There's no such thing as physical directories or folders in Azure Blob Storage. The directories you sometimes see are part of the blob (e.g. folder1/folder2/file1.txt). The List Blobs requests allows you to add a prefix and delimiter in a call, which are used by the Azure Portal and Azure Data Explorer to create a visualization of folders. As example prefix folder1/ and delimiter / would allow you to see the content as if folder1 was opened.
That's exactly what happens in your code. The GetDirectoryReference() adds a prefix. The ListBlobs() fires a request and Any() checks if any items return.
For V12 the command that'll allow you to do the same would be GetBlobsByHierarchy and its async version. In your particular case where you only want to know if any blobs exist in the directory a GetBlobs with prefix would also suffice.

Connecting to existing database in Kotlin

I try to use precreated database for my android app written in Kotlin, so in my databaseHelper class I tried to use the following code for that:
override fun onCreate(db: SQLiteDatabase)
{
db = SQLiteDatabase.openDatabase(DATABASE_PATH + DATABASE_NAME,null,SQLiteDatabase.OPEN_READWRITE)
}
I thought this will worm, but I get the message val cannot be reassigned.
Then in the next methods, which are supposed to perform actions on database (execute all needed queries) I reffer to the database like that:
val db = this.writableDatabase
val cursor = db.rawQuery(query, null)
(or just with db.execSQL(query) if don't need the results of selection; of course query is a string with SQL query)
The point is how can I access the precreated database because when I would just create new "global" variable (in companion object?) then I would have to reffer to it in diffent way... at least I think so... well maybe using the line val db = this.writableDatabase also in onCreate would do, but wouldn't it connect just "localy"(as I overrwrite the refference for db in parameter by local variable) instead of putting this precreated database into what is returned by this.writableDatabase?
Edit:
Well I think I find a solution to how to connect to database here https://discuss.kotlinlang.org/t/how-to-use-pre-populated-sqlite-database-using-kotlin/8068 but it still leave me with one serious problem: how to include that database in AndroidStudio project (to have something to connect to).
I mean that include this database as part of an app (as one of its resources)

Running Google Dataflow pipeline from a Google App Engine app?

I am creating a dataflow job using DataflowPipelineRunner. I tried the following scenarios.
Without specifying any machineType
With g1 small machine
with n1-highmem-2
In all the above scenarios, Input is a file from GCS which is very small file(KB size) and output is Big Query table.
I got Out of memory error in all the scenarios
The size of my compiled code is 94mb. I am trying only word count example and it did not read any input(It fails before the job starts). Please help me understand why i am getting this error.
Note: I am using appengine to start the job.
Note: The same code works with beta versoin 0.4.150414
EDIT 1
As per the suggestions in the answer tried the following,
Switched from Automatic scaling to Basic Scaling.
Used machine type B2 which provides 256MB memory
After these configuration, Java Heap Memory problem is solved. But it is trying to upload a jar into stagging location which is more than 10Mb, hence it fails.
It logs the following exception
com.google.api.client.http.HttpRequest execute: exception thrown while executing request
com.google.appengine.api.urlfetch.RequestPayloadTooLargeException: The request to https://www.googleapis.com/upload/storage/v1/b/pwccloudedw-stagging-bucket/o?name=appengine-api-L4wtoWwoElWmstI1Ia93cg.jar&uploadType=resumable&upload_id=AEnB2Uo6HCfw6Usa3aXlcOzg0g3RawrvuAxWuOUtQxwQdxoyA0cf22LKqno0Gu-hjKGLqXIo8MF2FHR63zTxrSmQ9Yk9HdCdZQ exceeded the 10 MiB limit.
at com.google.appengine.api.urlfetch.URLFetchServiceImpl.convertApplicationException(URLFetchServiceImpl.java:157)
at com.google.appengine.api.urlfetch.URLFetchServiceImpl.fetch(URLFetchServiceImpl.java:45)
at com.google.apphosting.utils.security.urlfetch.URLFetchServiceStreamHandler$Connection.fetchResponse(URLFetchServiceStreamHandler.java:543)
at com.google.apphosting.utils.security.urlfetch.URLFetchServiceStreamHandler$Connection.getInputStream(URLFetchServiceStreamHandler.java:422)
at com.google.apphosting.utils.security.urlfetch.URLFetchServiceStreamHandler$Connection.getResponseCode(URLFetchServiceStreamHandler.java:275)
at com.google.api.client.http.javanet.NetHttpResponse.<init>(NetHttpResponse.java:36)
at com.google.api.client.http.javanet.NetHttpRequest.execute(NetHttpRequest.java:94)
at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:965)
at com.google.api.client.googleapis.media.MediaHttpUploader.executeCurrentRequestWithoutGZip(MediaHttpUploader.java:545)
at com.google.api.client.googleapis.media.MediaHttpUploader.executeCurrentRequest(MediaHttpUploader.java:562)
at com.google.api.client.googleapis.media.MediaHttpUploader.resumableUpload(MediaHttpUploader.java:419)
at com.google.api.client.googleapis.media.MediaHttpUploader.upload(MediaHttpUploader.java:336)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:427)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:352)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:469)
at com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel$UploadOperation.call(AbstractGoogleAsyncWriteChannel.java:357)
at java.util.concurrent.FutureTask.run(FutureTask.java:260)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1168)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:605)
at com.google.apphosting.runtime.ApiProxyImpl$CurrentRequestThreadFactory$1$1.run(ApiProxyImpl.java:1152)
at java.security.AccessController.doPrivileged(Native Method)
at com.google.apphosting.runtime.ApiProxyImpl$CurrentRequestThreadFactory$1.run(ApiProxyImpl.java:1146)
at java.lang.Thread.run(Thread.java:745)
at com.google.apphosting.runtime.ApiProxyImpl$CurrentRequestThreadFactory$2$1.run(ApiProxyImpl.java:1195)
I tried directly uploading the jar file - appengine-api-1.0-sdk-1.9.20.jar, still it tries to upload this jar appengine-api-L4wtoWwoElWmstI1Ia93cg.jar.
which i dont know what jar it is. Any idea on what this jar is appreciated.
Please help me to fix this issue.
The short answer is that if you use AppEngine on a Managed VM you will not encounter the AppEngine sandbox limits (OOM when using a F1 or B1 instance class, execution time limit issues, whitelisted JRE classes). If you really want to run within the App Engine sandbox, then your use of the Dataflow SDK most conform to the limits of the AppEngine sandbox. Below I explain common issues and what people have done to conform to the AppEngine sandbox limits.
The Dataflow SDK requires an AppEngine instance class which has enough memory to execute the users application to construct the pipeline, stage any resources, and send the job description to the Dataflow service. Typically we have seen users require using an instance class with more than 128mb of memory to not see OOM errors.
Generally constructing a pipeline and submitting it to the Dataflow service typically takes less than a couple of seconds if the required resources for your application are already staged. Uploading your JARs and any other resources to GCS can take longer than 60 seconds. This can be solved manually by pre-staging your JARs to GCS beforehand (the Dataflow SDK will skip staging them again if it detects they are already there) or using a task queue to get a 10 minute limit (note that for large applications, 10 mins may not be enough to stage all your resources).
Finally, within the AppEngine sandbox environment, you and all your dependencies are limited to using only whitelisted classes within the JRE or you'll get an exception like:
java.lang.SecurityException:
java.lang.IllegalAccessException: YYY is not allowed on ZZZ
...
EDIT 1
We perform a hash of the contents of the jars on the classpath and upload them to GCS with a modified filename. AppEngine runs a sandboxed environment with its own JARs, appengine-api-L4wtoWwoElWmstI1Ia93cg.jar refers to appengine-api.jar which is a jar that the sandboxed environment adds. You can see from our PackageUtil#getUniqueContentName(...) that we just append -$HASH before .jar.
We are working to solve why you are seeing the RequestPayloadToLarge excepton and it is currently recommended that you set the filesToStage option and filter out the jars not required to execute your Dataflow to get around the issue that you face. You can see how we build the files to stage with DataflowPipelineRunner#detectClassPathResourcesToStage(...).
I had the same problem with the 10MB limit. What I did was filtering out the JAR files bigger than that limit (instead of specific files), and then set the renaming files in the DataflowPipelineOptions with setFilesToStage.
So I just copied the method detectClassPathResourcesToStage from the Dataflow SDK and changed it sightly:
private static final long FILE_BYTES_THRESHOLD = 10 * 1024 * 1024; // 10 MB
protected static List<String> detectClassPathResourcesToStage(ClassLoader classLoader) {
if (!(classLoader instanceof URLClassLoader)) {
String message = String.format("Unable to use ClassLoader to detect classpath elements. "
+ "Current ClassLoader is %s, only URLClassLoaders are supported.", classLoader);
throw new IllegalArgumentException(message);
}
List<String> files = new ArrayList<>();
for (URL url : ((URLClassLoader) classLoader).getURLs()) {
try {
File file = new File(url.toURI());
if (file.length() < FILE_BYTES_THRESHOLD) {
files.add(file.getAbsolutePath());
}
} catch (IllegalArgumentException | URISyntaxException e) {
String message = String.format("Unable to convert url (%s) to file.", url);
throw new IllegalArgumentException(message, e);
}
}
return files;
}
And then when I'm creating the DataflowPipelineOptions:
DataflowPipelineOptions dataflowOptions = PipelineOptionsFactory.create().as(DataflowPipelineOptions.class);
...
dataflowOptions.setFilesToStage(detectClassPathResourcesToStage(DataflowPipelineRunner.class.getClassLoader()));
Here's a version of Helder's 10MB-filtering solution that will adapt to the default file-staging behavior of DataflowPipelineOptions even if it changes in a future version of the SDK.
Instead of duplicating the logic, it passes a throwaway copy of the DataflowPipelineOptions to DataflowPipelineRunner to see which files it would have staged, then removes any that are too big.
Note that this code assumes that you've defined a custom PipelineOptions class named MyOptions, along with a java.util.Logger field named logger.
// The largest file size that can be staged to the dataflow service.
private static final long MAX_STAGED_FILE_SIZE_BYTES = 10 * 1024 * 1024;
/**
* Returns the list of .jar/etc files to stage based on the
* Options, filtering out any files that are too large for
* DataflowPipelineRunner.
*
* <p>If this accidentally filters out a necessary file, it should
* be obvious when the pipeline fails with a runtime link error.
*/
private static ImmutableList<String> getFilesToStage(MyOptions options) {
// Construct a throw-away runner with a copy of the Options to see
// which files it would have wanted to stage. This could be an
// explicitly-specified list of files from the MyOptions param, or
// the default list of files determined by DataflowPipelineRunner.
List<String> baseFiles;
{
DataflowPipelineOptions tmpOptions =
options.cloneAs(DataflowPipelineOptions.class);
// Ignore the result; we only care about how fromOptions()
// modifies its parameter.
DataflowPipelineRunner.fromOptions(tmpOptions);
baseFiles = tmpOptions.getFilesToStage();
// Some value should have been set.
Preconditions.checkNotNull(baseFiles);
}
// Filter out any files that are too large to stage.
ImmutableList.Builder<String> filteredFiles = ImmutableList.builder();
for (String file : baseFiles) {
long size = new File(file).length();
if (size < MAX_STAGED_FILE_SIZE_BYTES) {
filteredFiles.add(file);
} else {
logger.info("Not staging large file " + file + ": length " + size
+ " >= max length " + MAX_STAGED_FILE_SIZE_BYTES);
}
}
return filteredFiles.build();
}
/** Runs the processing pipeline with given options. */
public void runPipeline(MyOptions options)
throws IOException, InterruptedException {
// DataflowPipelineRunner can't stage large files;
// remove any from the list.
DataflowPipelineOptions dpOpts =
options.as(DataflowPipelineOptions.class);
dpOpts.setFilesToStage(getFilesToStage(options));
// Run the pipeline as usual using "options".
// ...
}

App Folder files not visible after un-install / re-install

I noticed this in the debug environment where I have to do many re-installs in order to test persistent data storage, initial settings, etc... It may not be relevant in production, but I mention this anyway just to inform other developers.
Any files created by an app in its App Folder are not 'visible' to queries after manual un-install / re-install (from IDE, for instance). The same applies to the 'Encoded DriveID' - it is no longer valid.
It is probably 'by design' but it effectively creates 'orphans' in the app folder until manually cleaned by 'drive.google.com > Manage Apps > [yourapp] > Options > Delete hidden app data'. It also creates problem if an app relies on finding of files by metadata, title, ... since these seem to be gone. As I said, not a production problem, but it can create some frustration during development.
Can any of friendly Googlers confirm this? Is there any other way to get to these files after re-install?
Try this approach:
Use requestSync() in onConnected() as:
#Override
public void onConnected(Bundle connectionHint) {
super.onConnected(connectionHint);
Drive.DriveApi.requestSync(getGoogleApiClient()).setResultCallback(syncCallback);
}
Then, in its callback, query the contents of the drive using:
final private ResultCallback<Status> syncCallback = new ResultCallback<Status>() {
#Override
public void onResult(#NonNull Status status) {
if (!status.isSuccess()) {
showMessage("Problem while retrieving results");
return;
}
query = new Query.Builder()
.addFilter(Filters.and(Filters.eq(SearchableField.TITLE, "title"),
Filters.eq(SearchableField.TRASHED, false)))
.build();
Drive.DriveApi.query(getGoogleApiClient(), query)
.setResultCallback(metadataCallback);
}
};
Then, in its callback, if found, retrieve the file using:
final private ResultCallback<DriveApi.MetadataBufferResult> metadataCallback =
new ResultCallback<DriveApi.MetadataBufferResult>() {
#SuppressLint("SetTextI18n")
#Override
public void onResult(#NonNull DriveApi.MetadataBufferResult result) {
if (!result.getStatus().isSuccess()) {
showMessage("Problem while retrieving results");
return;
}
MetadataBuffer mdb = result.getMetadataBuffer();
for (Metadata md : mdb) {
Date createdDate = md.getCreatedDate();
DriveId driveId = md.getDriveId();
}
readFromDrive(driveId);
}
};
Job done!
Hope that helps!
It looks like Google Play services has a problem. (https://stackoverflow.com/a/26541831/2228408)
For testing, you can do it by clearing Google Play services data (Settings > Apps > Google Play services > Manage Space > Clear all data).
Or, at this time, you need to implement it by using Drive SDK v2.
I think you are correct that it is by design.
By inspection I have concluded that until an app places data in the AppFolder folder, Drive does not sync down to the device however much to try and hassle it. Therefore it is impossible to check for the existence of AppFolder placed by another device, or a prior implementation. I'd assume that this was to try and create a consistent clean install.
I can see that there are a couple of strategies to work around this:
1) Place dummy data on AppFolder and then sync and recheck.
2) Accept that in the first instance there is the possibility of duplicates, as you cannot access the existing file by definition you will create a new copy, and use custom metadata to come up with a scheme to differentiate like-named files and choose which one you want to keep (essentially implement your conflict merge strategy across the two different files).
I've done the second, I have an update number to compare data from different devices and decide which version I want so decide whether to upload, download or leave alone. As my data is an SQLite DB I also have some code to only sync once updates have settled down and I deliberately consider people updating two devices at once foolish and the results are consistent but undefined as to which will win.