Use RStudio to connect to, and run queries on, a locally stored, compressed SQL databse - sql

I'm trying to connect to and run queries on two large, locally-stored SQL databases with file extensions like so:
filename.sql.zstd.part
filename2.sql.zstd
My preference is to use the RMySQL package- however i am finding it hard to find documentation of a) how to access locally stored SQL files, and b) how to deal with the zstd extension.
This may be very basic but help is appreciated!

Seems like you have problems understanding the file extensions.
filename.sql.zstd.part
.part usually means you are downloading a file from the internet, but the download isn't complete yet (so downloads that are in progress or have been stopped)
So to get from filename.sql.zstd.part to filename.sql.zstd you need to complete your download
.zstd means it is a compressed file (to save disk space). You need a decompression program to get from filename.sql.zstd to filename.sql
The compression algorithm used is called Zstandard so you need a decompressor specifically for this program. Look here https://facebook.github.io/zstd/ for such a program.
There was also once an R package for this - but it has been archived. But you could also download an older version
(https://cran.r-project.org/web/packages/zstdr/index.html)
In filename.sql is actually not a database. In an .sql file are usually SQL statements for creating / modifying database structures. You'd have to install a database e.g. MariaDB and then import this .sql file to actually really have the files in a database on your computer. And then you would access this database via R.

Related

Recovering Postgres DB from Program Files

a few weeks ago my computer stopped working and I lost a lot of important data stored in a postgres DB (should have backed it up but, I didn't ** sigh ** ).
I was able to access the hard drive and extract the program files for postgres. How can I try and restore the DB?
Here is what I have tried so far:
download postgres (same version as the lost DB) on a separate computer and swap out the program files with the ones I am trying to recover. I am using a PC, so I stop service->swap out all the files-> restart service. Restarting the service is never successful.
Use PGAdmin -> creat new db (ensure path for binary files is correct) -> restore-> Here i get stuck figuring out what the correct files are

Creating multiple files for uploading to Snowflake

Currently, my company uses SSIS and BCP to export data from SQL Server to CSV files. However, we are only able to create a single file per SQL table (due to the limitations of BCP). Most of these files are quite large; if I am correct, they are too large to get the best performance when loading them into Snowflake. On their website, they state that we should be working with multiple gzip files to offer the best performance.
I am wondering how other people made this work? Splitting up the CSV to multiple files and zipping them? Any good tools that can do this during export from SSIS?
I'd keep the current process that exports the large .csv files using SSIS, then run 7zip via command line to create a split gzip set for each text file, either within the SSIS package or via Powershell.
The -v switch is used to specify the volume size.
https://sevenzip.osdn.jp/chm/cmdline/switches/volume.htm
You may be able to start importing/uploading the completed chunks before the later ones are finished to pick up some additional time savings, but I've not tested that.

SQL Database in GitHub

I am building a Java app that uses an SQLite database to hold most of its data. For the end-user, the database would be almost entirely read-only, with very occasional edits. I'll (theoretically) be displaying/distributing it through my GitHub page, so my question is:
What's the best way to load the database into GitHub? (I'm using IntelliJ with DataGrip.)
I'd prefer to be able to update the database when I commit/push, instead of having to overwrite the whole file. The closest question I can find is How to include MySQL database schema on GitHub? but there could potentially be hundreds or thousands of entries, so I can't just rebuild the tables when the user installs the app.
I'm applying for entry-level developer jobs, and this project is going to be my main portfolio piece during job-hunting. I'm trying to make sure it is not only functional but also makes a good impression. Any help is (very) greatly appreciated.
EDIT:
After moving my .db file into the folder connected to GitHub (same level as my src folder) apparently I can now commit/push it with the rest of my files. How do I make sure that the connection from my Java code to the database stays valid once it is loaded onto another user's system? Can I just stick with
connection = DriverManager.getConnection("jdbc:sqlite:mydatabase.db");
or do I need to rework the path?
Upon starting, if your application can't find a corresponding sqlite database file, have it create one. Then do initial load of your tables from either CSV, JSON or XML files.
You can upload these files to Git, as they are text formats.

what data type to use for storing different files in database

Im trying to make a database accept different files in a postgres database table. The files I want to support are of different mime-types. I want to support pdf, word, plain text, and power point. The problem is that i don't know what datatype to choose. The documentation to pgadmin (the tool im using) is very (let´s say) unsatisfactory. Thanks
While you can store the file contents in the database, consider storing the file path instead and using the file system to store the file.
In the IT world "you can do anything with anything", but that doesn't mean you should.
In this case, you're trying to use a database as a file system, which it can do, but databases are not as efficient or practical as file systems for storing file contents (typically "large" data). It will:
make your backups longer and larger
slow your insert queries down (more I/O)
make your log files larger (slower and fill more often)
make accessing the files slower (query vs simple disk I/O)
require you to go via the database to access the files (hassle, can't use browser etc)
etc
You can use bytea type in PostgreSQL.

Storing Drupal SQL in Git

I have a drupal site, and I am storing the codebase in a git repository. This seems to be working out well, but I'm also making changes to the database. I'm considering doing periodic dumps of the database and committing to git. I had a few questions about this.
If I overwrite the file, will git think it is a brand new file or will it recognize that it is an altered version of the same file.
Will this potentialy make my repo huge (the database is 16mb)
Can I zip this file? or will this mess Git up ... the zipped version is only 3mb
Any other suggestions?
If you have enough space, a non-compressed dump in source control is pretty handy because you can compare using a diff program what rows were added/modified/deleted.
Another solution is to use the features module which is supposed to capture drupal config in code. It stores this captured data as a feature module which you can put into version control.
For my database applications, I store scripts of DDL statements (like CREATE TABLE) in some sort of version control system. These scripts sometimes include static "seed" data as well. All the version control systems I use are good at recognizing differences in these files, and they are much smaller than the full database with data.
For the dynamically-generated data, I store backups (e.g. from mysqldump) in an appropriate location (depending on the importance of the data, that may include offsite backups).
1) It's all text, so GIT will just see it as it would any other file.
2) No, due to the above it should add 16mb to the repo (or less, due to GITs own compression), it won't add a new file every time, just the changes, so the repo will change by the size of the additions to the repository
3) No, or GIT won't be able to see the differences - GIT does it's own compression anyway