How to merge sqlite3 session extension sessions? - sql

I'm using the c API of the sqlite3 session extension and wondering if the session extension can be used to merge sqlite3 sessions that already have been written to file.
Following the tutorial referenced above I was able to register sqlite3 sessions by writing them to file one by one, e.g. for an UPDATE call I end up with a session file, and with another INSERT call I get another file, and so on. These transactions are triggered by UI button callbacks. I wonder if the session files could be somehow merged afterwards into one single session file, so that calling sqlite3changeset_apply() with this merged session file as its parameter I could end up with the same result as if I called sqlite3changeset_apply() on a list of session files. The reason I would like to do this is that I'd like to transfer only one session file instead of a folder of session files.
I tried iterating over a session list calling subsequent sqlite3changeset_apply() on a copy of the original database while registering the session, but in that I case I eventually get a session file with zero size (although the copy database would contain all the expected changes).
I could not find anything on this in the official documentation nor on the web.

🧟‍♀️🧟🧟‍♂️ necromancy alert 🧟‍♂️🧟🧟‍♀️
You're probably looking for the sqlite3changeset_concat function:
This function is used to concatenate two changesets, A and B, into a single changeset. The result is a changeset equivalent to applying changeset A followed by changeset B.
There is also a streaming version, sqlite3changeset_concat_strm.
If you need to combine many changesets, you can use the type sqlite3_changegroup and its associated functions.


Azure Data Factory - delete data from a MongoDb (Atlas) Collection

I'm trying to use Azure Data Factory (V2) to copy data to a MongoDb database on Atlas, using the MongoDB Atlas connector but I have an issue.
I want to do an Upsert but the data I want to copy has no primary key, and as the documentation says:
Note: Data Factory automatically generates an _id for a document if an
_id isn't specified either in the original document or by column mapping. This means that you must ensure that, for upsert to work as
expected, your document has an ID.
This means the first load works fine, but then subsequent loads just insert more data rather than replacing current records.
I also can't find anything native to Data Factory that would allow me to do a delete on the target collection before running the Copy step.
My fallback will be to create a small Function to delete the data in the target collection before inserting fresh, as below. A full wipe and replace. But before doing that I wondered if anyone had tried something similar before and could suggest something within Data Factory that I have missed that would meet my needs.
As per the document, You cannot delete multiple documents at once from the MongoDB Atlas. As an alternative, you can use the db.collection.deleteMany() method in the embedded MongoDB Shell to delete multiple documents in a single operation.
It has been recommended to use Mongo Shell to delete via query. To delete all documents from a collection, pass an empty filter document {} to the db.collection.deleteMany() method.
Eg: db.movies.deleteMany({})

How to performance test an Update endpoint in JMeter while automatically updating row value in payload

I have an Update entity endpoint in my .NET Core Microservice API that needs to be tested for performance. For all other endpoints, I am able to store the ID in a CSV file and load it before processing, however I want to reuse the values in the CSV for update, which requires updating and keeping track of the Row Version attribute for the ID.
I will be testing using 100 Users and 100 Orders, so I will need to match every user to one order so they don't try updating the same entity.
Read CSV with ID and current row version
Call Update endpoint on the ID and row version, read in new Row Version from response body
Store the new row version and the ID within JMeter to reuse in the test
Call Update endpoint on the ID and new row version
The problem with storing inside of the CSV is JMeter will be reading and writing from the same file. I am looking for a way to use a Java like collection inside of my script to not have to read and write from a file.
The dictionary would look like {'q28937-3423572903485-324875', rowVersion: 42}
Add a Post Processor as a child of your HTTP request (the first update) to extract the new ID and rowVersion.
Then in the next update, you should use Jmeter variables ${ID} and ${rowVersion} which holds the new values that you extracted using Post Processor.
Note that variables are not shared between threads, from Jmeter user manual best practices - 16.13
Variables are local to a thread; a variable set in one thread cannot be read in another. This is by design.
Also check
Using RegEx (Regular Expression Extractor) with JMeter guide.
CSV data set config Jmeter User Manual

How to watch Changes to SQlite Database and Trigger Shell Script

Note: I believe I may be missing a simple solution to this problem. I'm relatively new to programming. Any advice is appreciated.
The problem: A small team of people (~3-5) want to be able to automate, as far as possible, the filing of downloaded files in appropriate folders. Files will be downloaded into a shared downloads folder. The files in this downloads folder will be sorted into a large shared folder structure according to their file-type, URL the file was downloaded from, and so on and so forth. These files are stored on a shared server, and the actual sorting will be done by some kind of shell script running on the server itself.
Whilst there are some utilities which do this (such as Maid), they don't do everything I want them to do. Maid for example doesn't have a way to get the the download url of a file in Linux. Additionally, it is written in Ruby, which I'd like to avoid.
The biggest stumbling block then is finding a find a way to get the url of the downloaded file that can be passed into the shell script. Originally I thought this could be done via getfattr, which would get a file's extended attributes. Frustratingly however, whilst chromium saves a file's download url as an extended attribute, Firefox doesn't seem to do the same thing. So relying on extended attributes seems to be out of the question.
What Firefox does do however is store download 'metadata' in the places.sqlite file, in two separate tables - moz_annos and moz_places. Inspired by this, I dediced to build a Firefox extension that writes all information about the downloaded file to a SQLite database downloads.sqlite on our server upon the completion of said download. This includes the url, MIME type, etc. of the downloaded file.
The idea is that with this data, the server could run a shell script that does some fine-grained sorting of the downloaded file into our shared file system.
However, I am struggling to find out a stable, reliable, and portable way of 'triggering' the script that will actually move the files, as well as passing information about these files to the script so that it can sort them accordingly.
There are a few ways I thought I could go about this. I'm not sure which method is the most appropriate:
1) Watch Downloads Folder
This method would watch for changes to the shared downloads directory, then use the file name of the downloaded file to query downloads.sqlite, getting the matching row, then finally passing the file's attributes into a bash script that sorts said file.
Difficulties: Finding a way to reliably match the downloaded file with the appropriate record in the database. Files may have the same download name but need to be sorted differently, perhaps, for example, if they were downloaded from a different URL. Additionally, I'd like to get additional attributes like whether the file was downloaded in incognito mode.
2) Create Auxillary 'Helper' File
Upon a file download event, the extension creates a 'helper' text file, which is the name of the file + some marker that contains the additional file attribute:
The server can then watch for the creation of a .txt file in the downloads directory run the necessary shell script from this.
Difficulties: Whilst this avoids using a SQlite databse, it seems rather ungraceful and hacky, and I can see a multitude of ways in which this method would just break or not work.
3) Watch
SQlite Database
This method writes to the shared SQlite database downloads.sqlite on server. Then, by some method, watch for a new insert of a row into this database. This could either be by watching the sqlite databse for a new INSERT on a table, or have a sqlite trigger on INSERT that runs a bash script, passing on the download information into a shell script.
Difficulties: there doesn't seem to be any easy way to watch an SQlite database for a new row insert, and a trigger within SQlite doesn't seem to be able to launch an external script/program. I've searched high and low for a method of doing either of these two processes, but I'm struggling to find any documented way to do it that I am able to understand.
What I would like is :
Some feedback on which of these methods is appropriate, or if there is a more appropriate method that I am overlooking.
An example of a system/program that does something similar to this.
Many thanks in advance.
It seems to me that you have put "the cart in front of the horse":
Use cron to periodically check for new downloads. Process them on the command line instead of trying to trigger things from inside sqlite3:
a) Here is an approach using your shared sqlite3 database "downloads.sqlite":
Upfront once:
Add a table to your database containing just an integer as record counter and a timeStamp field, e.g., "table_counter":
sqlite3 downloads.sqlite "CREATE TABLE "table_counter" ( "counter" INTEGER PRIMARY KEY NOT NULL, "timestamp" DATETIME DEFAULT (datetime('now','UTC')));" 2>/dev/null
Insert an initial record into this new table setting the "counter" to zero and recording a timeStamp:
sqlite3 downloads.sqlite "INSERT INTO "table_counter" VALUES (0, (SELECT datetime('now','UTC')));" 2>/dev/null
Every so often:
Query the table containing the downloads with a "SELECT COUNT(*)" statement:
sqlite3 downloads.sqlite "SELECT COUNT(*) from table_downloads;" 2>/dev/null
Result e.g., 20
Compare this number to the number stored in the record counter field:
sqlite3 downloads.sqlite "SELECT (counter) from table_counter;" 2>/dev/null
Result e.g., 17
If result from 3) > result from 4), then you have downloaded more files than processed.
If so, query the table containing the downloads with a "SELECT" statement for the oldest not yet processed download, using a "subselect":
sqlite3 downloads.sqlite "SELECT * from table_downloads where rowid = (SELECT (counter+1) from table_counter);" 2>/dev/null
In my example this would SELECT all values for the data record with the rowid of 17+1 = 18;
Do your magic in regards to the downloaded file stored as record #18.
Increase the record counter in the "table_counter", again using a subselect:
sqlite3 downloads.sqlite "UPDATE table_counter SET counter = (SELECT (counter) from table_counter)+1;" 2>/dev/null
Finally, update the timeStamp for the "table_counter":
Why? Shit happens on shared drives... This way you can always check how many download records have been processed and when this has happened last time.
sqlite3 downloads.sqlite "UPDATE table_counter SET timeStamp = datetime('now','UTC');" 2>/dev/null
If you want to have a log of this processing then change the SQL statements in 4) to a "SELECT COUNT(*)" and in 7) to an "INSERT counter" and its subselect to an "(SELECT (counter+1) from table_counter)" respectively ...
Please note: The redirections " 2>/dev/null" at the end of the SQL statements are just to suppress this kind of line issued by newer versions of SQLite3 before showing your query results.
-- Loading resources from /usr/home/bernie/.sqliterc
If you don't like timeStamps based on UTC then use localtime instead:
Put steps 3) inclusive 8) in a shell-script and use a cron entry to run this query/comparism periodically...
Use the complete /path/to/sqlite3 in this shell-script (just in case running on a shared drive. Someone could be fooling around with paths and could surprise your cron ...)
b) I will give you a simpler answer using awk and some hash like md5 in a separate answer.
So it is easier for future readers and easier for you to "rate" :-)

Iterating through folder - handling files that don't fit schema

I have a directory containing multiple xlsx files and what I want to do is to insert the data from the files in to a database.
So far I have solved this by using tFileList -> tFileInputExcel -> tPostgresOutput
My problem begins when one of this files doesn't match the defined schema and returns an error resulting on a interruption of a workflow.
What I need to figure out is if it's possible skip that file (moving it to another folder for instance) and continuing iterating the rest of existing files.
If I check the option "Die on error" the process ends and doesn't process the rest of the files.
I would approach this by making your initial input schema on the tFileInputExcel be all strings.
After reading the file I would then validate the schema using a tSchemaComplianceCheck set to "Use another schema for compliance check".
You should be able to then connect a reject link from the tSchemaComplianceCheck to a tFileCopy configured to move the file to a new directory (if you want it to move it then just tick "Remove source file").
Here's a quick example:
With the following set as the other schema for the compliance check (notice how it now checks that id and age are Integers):
And then to move the file:
Your main flow from the tSchemaComplianceCheck can carry on using just strings if you are inserting into a database. You might want to use a tConvertType to change things back to the correct data types after this if you are doing any processing that requires proper data types or you are using your tPostgresOutput component to create the table as well.

How do I completely clear a SQLite3 database without deleting the database file?

For unit testing purposes I need to completely reset/clear SQLite3 databases. All databases are created in memory rather than on the file system when running the test suite so I can't delete any files. Additionally, several instances of a class will be referencing the database simultaneously, so I can't just create a new database in memory and assign it to a variable.
Currently my workaround for clearing a database is to read all the table names from sqlite_master and drop them. This is not the same as completely clearing the database though, since meta data and other things I don't understand will probably remain.
Is there a clean and simple way, like a single query, to clear a SQLite3 database? If not, what would have to be done to an existing database to make it identical to a completely new database?
In case it's relevant, I'm using Ruby 2.0.0 with sqlite3-ruby version 1.3.7 and SQLite3 version 3.8.2.
This works without deleting the file and without closing the db connection:
PRAGMA writable_schema = 1;
DELETE FROM sqlite_master;
PRAGMA writable_schema = 0;
PRAGMA integrity_check;
Another option, if possible to call the C API directly, is by using the SQLITE_DBCONFIG_RESET_DATABASE:
sqlite3_db_config(db, SQLITE_DBCONFIG_RESET_DATABASE, 1, 0);
sqlite3_exec(db, "VACUUM", 0, 0, 0);
sqlite3_db_config(db, SQLITE_DBCONFIG_RESET_DATABASE, 0, 0);
Here is the reference
The simple and quick way
If you use in-memory database, the fastest and most reliable way is to close and re-establish sqlite connection. It flushes any database data and also per-connection settings.
If you want to have some kind of "reset" function, you must assume that no other threads can interrupt that function - otherwise any method will fail. Therefore even you have multiple threads working on that database, there need to be a "stop the world" mutex (or something like that), so the reset can be performed. While you have exclusive access to the database connection - why not closing and re-opening it?
The hard way
If there are some other limitations and you cannot do it the way above, then you were already pretty close to have a complete solution. If your threads don't touch pragmas explicitly, then only "schema_version" pragma can be changed silently, but if your threads can change pragmas, well, then you have to go through the list on and write "reset" function which will set each and every pragma value to it's initial value (you need to read default values at the begining).
Note, that pragmas in SQLite can be divided to 3 groups:
defined initially, immutable, or very limited mutability
defined dynamically, per connection, mutable
defined dynamically, per database, mutable
Group 1 are for example page_size, page_count, encoding, etc. Those are definied at database creation moment and usualle cannot be modified later, with some exceptions. For example page_size can be changed prior to "VACUUM", so the new page size will be set then. The page_count cannot be changed by user, but it changes automatically when adding data (obviously). The encoding is defined at creation time and cannot be modified later.
You should not need to reset pragmas from group 1.
Group 2 are for example cache_size, recursive_triggers, jurnal_mode, foreign_keys, busy_timeout, etc. These pragmas are always set to defaults when opening new connection to the database. If you don't disconnect, you will need to reset those to defaults manually.
Group 3 are for example schema_version, user_version, maybe some others, you need to look it up. Those will also need manual reset. If you disconnect from in-memory database, the database gets destroyed, so then you don't need to reset those.
Create an empty memory database.
Use the backup API to copy that database over the actual database.
In the case of sqlite3-ruby, see test/test_backup.rb for an example.
SELECT * FROM dbname.sqlite_master WHERE type='table';