Is there a way to use SPARQL to dump all RDF graphs from a triplestore (Virtuoso) to a .sparql file containing all INSERT queries to rebuild the graphs?
Like the mysqldump command?
RDF databases are essentially schemaless, which means a solution like mysqldump is not really necessary: you don't need any queries to re-create the database schema (table structures, constraints, etc), a simple data dump contains all the necessary info to re-create the database.
So you can simply export your entire database to an RDF file, in N-Quads or TriG format (you need to use one of these formats because other formats, like RDF/XML or Turtle, do not preserve named graph information).
I'm not sure about the native Virtuoso approach to do this (perhaps it has an export/data dump option in the client UI), but since Virtuoso is Sesame/RDF4J-compatible you could use the following bit of code to do this programmatically:
Repository rep = ... ; // your Virtuoso repository
File dump = new File("/path/to/file.nq");
try (RepositoryConnection conn = rep.getConnection()) {
conn.export(Rio.createWriter(RDFFormat.NQUADS, new FileOutputStream(dump)));
}
Surprisingly enough, the Virtuoso website and documentation include this information.
You don't get a .sparql file as output, because RDF always uses the same triple (or quad) "schema", so there's no schema definition in such a dump; just data.
The dump procedures are run through the iSQL interface.
To dump a single graph — just a lot of triples — you can use the dump_one_graph stored procedure.
SQL> dump_one_graph ('http://daas.openlinksw.com/data#', './data_', 1000000000);
To dump the whole quad store (all graphs except the Virtuoso-internal virtrdf:), you can use the dump_nquads stored procedure.
SQL> dump_nquads ('dumps', 1, 10000000, 1);
There are many load options; we'd generally recommend the Bulk Load Functions for such a full dump and reload.
(ObDisclaimer: OpenLink Software produces Virtuoso, and employs me.)
Related
We have application which collect huge data daily. So write operation is more, Hence my server slow down. So what we have planned use MongoDB to collect data, By using scheduler will import data to SQL.
So my problem is how can I import that much heavy data from MongoDB to SQL
Any suggestion Please. Like any tool etc.
I don't know any tools, but I'm sure they exist if you google them.
If it was me, without prior knowledge, I may export data to a flat file (.csv) and create either a stored procedure or an SSIS package to import the data into SQL.
Python may be my choice to automate the exports in chunks overnight where SQL can handle the importation and cleanup.
mongoexport --host yourhost --db yourdb --collection yourcollection --csv --out yourfile.csv --fields field1,field2,field3
Doing it this way allows you to define the structure before it hits the SSIS package.
Another way
Here is a good example of doing all collections. This was from another answer.
out = `mongo #{DB_HOST}/#{DB_NAME} --eval "printjson(db.getCollectionNames())"`
collections = out.scan(/\".+\"/).map { |s| s.gsub('"', '') }
collections.each do |collection|
system "mongoexport --db #{DB_NAME} --collection #{collection} --host '#{DB_HOST}' --out #{collection}_dump"
end
We created MongoSluice for this specific reason.
Our application interrogates a MongoDB collection and creates a full, deep schema. It then streams data and meta data to any RDBMS system (Oracle, MySQL, Postgres, HP Vertica...).
What you end up with is a representation of your NoSQL as SQL. A big use case for this is to get unstructured data into analytical databases. BI platforms, particularly.
You can user linked server to mongodb so you can query whatever you want.
Of course before that you have to install to required drivers to MongoDB
EXEC sp_addlinkedserver #server='MongoDB',
#srvproduct='CData.Mongo DB.ODBC.Driver',
#provider='SQLNCLI10',
#datasrc='<Machine IP address>,1434',
#provstr='Network Library=DBMSSOCN;',
#catalog='CDataMongoDB';
I have a pretty big db in the terms of amount of different db object, not the size. I want to backup it (its structure, scheme). Preferably, I'd like to get the sql code. Of course, I can navigate to each its object and just get it but, as I said, there are a lot of them.
How do I do this easily?
You can use pg_dump --schema-only --format=plain > dump_file.sql to dump your schema (db objects without data) into a sql file.
Details here: pg_dump.
I am a C# developer, I am not really good with SQL. I have a simple questions here. I need to move more than 50 millions records from a database to other database. I tried to use the import function in ms SQL, however it got stuck because the log was full (I got an error message The transaction log for database 'mydatabase' is full due to 'LOG_BACKUP'). The database recovery model was set to simple. My friend said that importing millions records using task->import data will cause the log to be massive and told me to use loop instead to transfer the data, does anyone know how and why? thanks in advance
If you are moving the entire database, use backup and restore, it will be the quickest and easiest.
http://technet.microsoft.com/en-us/library/ms187048.aspx
If you are just moving a single table read about and use the BCP command line tools for this many records:
The bcp utility bulk copies data between an instance of Microsoft SQL Server and a data file in a user-specified format. The bcp utility can be used to import large numbers of new rows into SQL Server tables or to export data out of tables into data files. Except when used with the queryout option, the utility requires no knowledge of Transact-SQL. To import data into a table, you must either use a format file created for that table or understand the structure of the table and the types of data that are valid for its columns.
http://technet.microsoft.com/en-us/library/ms162802.aspx
The fastest and probably most reliable way is to bulk copy the data out via SQL Server's bcp.exe utility. If the schema on the destination database is exactly identical to that on the source database, including nullability of columns, export it in "native format":
http://technet.microsoft.com/en-us/library/ms191232.aspx
http://technet.microsoft.com/en-us/library/ms189941.aspx
If the schema differs between source and target, you will encounter...interesting (yes, interesting is a good word for it) problems.
If the schemas differ or you need to perform any transforms on the data, consider using text format. Or another format (BCP lets you create and use a format file to specify the format of the data for export/import).
You might consider exporting data in chunks: if you encounter problems it gives you an easier time of restarting without losing all the work done so far.
You might also consider zipping the exported data files up to minimize time on the wire.
Then FTP the files over to the destination server.
bcp them in. You can use the bcp utility on the destination server for the BULK IMPORT statement in SQL Server to do the work. Makes no real difference.
The nice thing about using BCP to load the data is that the load is what is described as a 'non-logged' transaction, though it's really more like a 'minimally logged' transaction.
If the tables on the destination server have IDENTITY columns, you'll need to use SET IDENTITY statement to disable the identity column on the the table(s) involved for the nonce (don't forget to reenable it). After your data is imported, you'll need to run DBCC CHECKIDENT to get things back in synch.
And depending on what your doing, it can sometimes be helpful to put the database in single-user mode or dbo-only mode for the duration of the surgery: http://msdn.microsoft.com/en-us/library/bb522682.aspx
Another approach I've used to great effect is to use Perl's DBI/DBD modules (which provide access to the bulk copy interface) and write a perl script to suck out the data from the source server, transform it and bulk load it directly into the destination server, without having to save it to disk and move it. Also means you can trap errors and design things for recovery and restart right at the point of failure.
Use BCP to migrate data.
Another approach i have used in the past is to take a backup of the transaction log and shrink the log Prior to the migration. Split the migration script in parts and run the log backup- shrink - migrate iteration a few times.
I'm searching a good (JDBC-compatible) replacement for SQLite in java. I've found hsqldb and it's quite satisfying me, but there are some questions.
First. How it operates when database size will be 3-4GB? Still load all to RAM?
Second. There said, it can support binary format, not only script. How can I enable it?
Class.forName("org.hsqldb.jdbcDriver");
PleaseHsqlUseBinaryFormat(); // What should I write here or somewhere else?
link = DriverManager.getConnection("jdbc:hsqldb:file:/tmp/mydatabase.hsql","sa","");
try
{
// Work with database
}
HSQLDB is flexible and allows you to choose how the data is stored. If you want the data to be stored in disk tables, use CACHED tables. The easiest way to do this is by adding a property to the connection URL:
jdbc:hsqldb:file:/tmp/mydatabase.hsql;hsqldb.default_table_type=cached
All CREATE TABLE and similar statements are stored in the .script file. If you don't want this file to be in text format, add another property:
jdbc:hsqldb:file:/tmp/mydatabase.hsql;hsqldb.default_table_type=cached;hsqldb.script_format=3
The Guide covers the different options:
http://hsqldb.org/doc/2.0/guide/management-chapt.html
How do I copy data from multiple tables within one database to another database residing on a different server?
Is this possible through a BTEQ Script in Teradata?
If so, provide a sample.
If not, are there other options to do this other than using a flat-file?
This is not possible using BTEQ since you have mentioned both the databases are residing in different servers.
There are two solutions for this.
Arcmain - You need to use Arcmain Backup first, which creates files containing data from your tables. Then you need to use Arcmain restore which restores the data from the files
TPT - Teradata Parallel Transporter. This is a very advanced tool. This does not create any files like Arcmain. It directly moves the data between two teradata servers.(Wikipedia)
If I am understanding your question, you want to move a set of tables from one DB to another.
You can use the following syntax in a BTEQ Script to copy the tables and data:
CREATE TABLE <NewDB>.<NewTable> AS <OldDB>.<OldTable> WITH DATA AND STATS;
Or just the table structures:
CREATE TABLE <NewDB>.<NewTable> AS <OldDB>.<OldTable> WITH NO DATA AND NO STATS;
If you get real savvy you can create a BTEQ script that dynamically builds the above statement in a SELECT statement, exports the results, then in turn runs the newly exported file all within a single BTEQ script.
There are a bunch of other options that you can do with CREATE TABLE <...> AS <...>;. You would be best served reviewing the Teradata Manuals for more details.
There are a few more options which will allow you to copy from one table to another.
Possibly the simplest way would be to write a smallish program which uses one of their communication layers (ODBC, .NET Data Provider, JDBC, cli, etc.) and use that to take a select statement and an insert statement. This would require some work, but it would have less overhead than trying to learn how to write TPT scripts. You would not need any 'DBA' permissions to write your own.
Teradata also sells other applications which hides the complexity of some of the tools. Teradata Data Mover handles provides an abstraction layer between tools like arcmain and tpt. Access to this tool is most likely restricted to DBA types.
If you want to move data from one server to another server then
We can do this with the flat file.
First we have fetch data from source table to flat file through any utility such as bteq or fastexport.
then we can load this data into target table with the help of mload,fastload or bteq scripts.