RavenDB ETL with sql database - ravendb

I'm playing with RavenDB ETL and are trying to send data to an sql database. With one Ravendb database that is pretty easy to achive. In my case I have many (20++) RavenDB databases and I would like to send all the data to the SQL reporting database. Since all the databases have the same structure the documentId in the different RavenDB can be the same and the value in the documentId column in SQL database will then not be unique. Do anyone have experierence on how to solve this?

You can simply put extra column which will discriminate source database name.
I know each SQL script will contain different definition, but you can automate this task using any client (c#, java, node, etc).
For API/examples go here: https://github.com/ravendb/ravendb/blob/c8c7f7ac0e8276f6ebee9a40ebacc1320e486a0d/test/SlowTests/Server/Documents/ETL/SQL/SqlEtlTests.cs#L39
Optionally you might concat database name to document id - decision is yours.

Just an idea:
Do the 20++ RavenDB databases have different/unique names ?
If yes, you can concatenate the db name to each document created on those databases
This way each document will get a unique id
The document id can be manipulated with onBeforeStore See: https://ravendb.net/docs/article-page/5.2/Csharp/client-api/session/how-to/subscribe-to-events
So the db name is concatenated to the db is right before storing to disk...
and this way the SQL ETL gets unique document ids

Related

How do I copy data from one Azure database table to a different Azure database table and also convert data types?

I have to copy data from one table to another, the tables are held in two different databases within Azure. I did a quick search for answers to this and whilst a query seems fairly straight forward i.e.
INSERT INTO table1 (make, model, type, serial)
SELECT the_make, the_model, the_type, ref_no
FROM database2.dbo.table2
I encountered issues because I'm using Azure.
Msg 40515, Level 15, State 1, Line 16 Reference to database and/or
server name in 'database2.dbo.table2' is not supported in this version of
SQL Server.
The above issue led me to the Cross-Database Queries articles. My requirements are a little more complicated than some of the scenarios provided and I need some help in making it work.
I also need to convert some columns such as reg_no which is a 'string' to an 'int' and then copy the value to the 'serial' column.
My question is, what the best way to create a script for this that allows me to reference both databases without any errors, copy the data and convert the columns at the same time? I tried the simple way of exporting data and importing it, editing the mappings for the columns, it wasn't that good I found and was causing problems all over the place.
Any guidance is appreciated on this.
You're getting this error because there's no linked server by default. You'll need to add it, in order to access the secondary db server. Here's a link about how to do it:
https://www.sqlshack.com/create-linked-server-azure-sql-database/
In terms of the transformation. It depends on many factors e.g. amount of rows, frequency, etc..
Usually the best alternative is by using an external tool (ETL) such as SSIS / Azure Data Factory because you can schedule it's execution and get the status of each execution.

SQL Server: Create a duplicate of a database without the data

I have a database called AQOA_Core with huge amount of data.
I have a newly created database called AQOA_Core1 which is basically empty. I want to write a query to duplicate AQOA_Core to AQOA_Core1 without the data. I guess to be precise I want to create a skeleton of the primary database into the secondary database.
PS: I use Toad for my database operations.
You can use SQL Server Script Wizard for scripting database objects. You can exclude data, and select the database object types you want to include in your script
Please check the SQL Server guide I referenced above,
I hope it helps you

SQL SELECT WHERE value IN ('Huge list of Values')

Note: C# 3.5 application calling a SQL Server 2005 DB on a remote server.
I'm developing a two step process.
1) I search a Windows Indexing Service for a list of files that contain a given word, such as "Bob".
2) I then need to retrieve a list of rows from a DOCUMENT table in a SQL DB by passing in the list of filenames from the Indexing Service.
At the moment I retrieve a list from the indexing service AND all rows from the DOCUMENT table, then filter them in code. This isn't practical as there are 10,000+ documents and the database is through a firewall.
I considered creating a query such as:
SELECT DocName FROM Documents WHERE DocName IN ({list of files from indexing service})
...but given the list of files could be thousands it won't work.
So, what's the best thing I can do? I don't want to query the DB for all 10,000+ rows and pass them back over the firewall (takes 10 minutes). I somehow need to pass in the list of filenames retrieved from the indexing service.
How would linq work in this scenario?
Any advice greatly appreciated.
If you had SQL Server 2008, you could use Table Valued Parameters, but for 2005, there's nothing quite as elegant.
The simplest solution I can think of is:
Create a table in the database
Bulk Insert the results of your Indexing Service into the table
Join your query to this table to filter the results
Retrieve the filered results
It's not a great solution, but I don't know that a great solution exists - that's why TVPs were created.
You can evaluate different solutions for this kind of "massive" operation, may be not necessary to use linq. For example, try to implement a stored procedure on SQL Server, that receives in input the list of file name and returns the list of documents.
I opted for a solution similar to what Bazzz mentioned.
I've set up a nightly operation to copy the required fields from the database and set meta tags on the document files (PDFs). The meta data can then be used in the Indexing Service ;o)
This has proved to be a good solution for this instance, but otherwise what Hallainzil said would've been the best option albeit painful on Sql Server 2005.

How do you transfer all tables between databases using SQL Management Studio?

When I right click on the database I want to export data from, I only get to select a single table or view, rather than being able to export all of the data. Is there a way to export all of the data?
If this is not possible, could you advise on how I could do the following:
I have two databases, with the same table names, but one has more data than the other
They both have different database names (Table names are identical)
They are both on different servers
I need to get all of the additional data from the larger database, into the smaller database.
Both are MS SQL databases
Being that both are MS SQL Servers, on different hosts... why bother with CSV when you can setup a Linked Server instance so you can access one instance from the other via a SQL statement?
Make sure you have a valid user on the instance you want to retrieve data from - it must have access to the table(s)
Create the Linked Server instance
Reference the name in queries using four name syntax:
INSERT INTO db1.dbo.SmallerTable
SELECT *
FROM linked_server.db.dbo.LargerTable lt
WHERE NOT EXISTS(SELECT NULL
FROM db1.dbo.SmallerTable st
WHERE st.col = lt.col)
Replace WHERE st.col = lt.col with whatever criteria you consider to be duplicate values between the two tables.
There is also a very good tool by Redgate software that syncs data between two databases.
I've also used SQL scripter before to generate a SQL file with insert statements that you can run on the other database to insert the data.
If you right-click on the database, under the Tasks menu, you can use the Generate Scripts option to produce SQL scripts for all the tables and data. See this blog post for details. If you want to sync the second database with the first, then you're better off using something like Redgate as suggested in mpenrow's answer.

How to do a search and replace of a part of a string in all columns in all tables in an sql database

Is it possible to search and replace all occurrences of a string in all columns in all tables of a database? I use Microsoft SQL Server.
Not easily, though I can thing of two ways to do it:
Write a series of stored procedures that identify all varchar and text columns of all tables, and generate individual update statements for each column of each table of the form "UPDATE foo SET BAR = REPLACE(BAR,'foobar','quux')". This will probably involve a lot of queries against the system tables, with a lot of experimentation -- Microsoft doesn't go out of its way to document this stuff.
Export the entire database to a single text file, do a search/replace on that, and then re-import the entire database. Given that you're using MS SQL Server, this is actually the easier approach. Microsoft created the Microsoft SQL Server Database Publishing Wizard for other reasons, but it makes a fine tool for exporting all of the tables of a SQL Server database as a text file containing pure SQL DDL and DML. Run the tool to export all of the tables for a database, edit the resulting file as you need, and then feed the file back to sqlcmd to recreate the database.
Given a choice, I'd use the second method, as long as the DPW works with your version of SQL Server. The last time I used the tool, it met my needs (MS SQL Server 2000 / 2005) but it had some quirks when working with database Roles.
In MySQL, you can do it very easily like this:
update [table_name] set [field_name] = replace([field_name],'[string_to_find]','[string_to_replace]');
I have personally tested this successfully on a production server.
Example:
update users set vct_filesneeded = replace(vct_filesneeded,'.avi','.ai');
Ref: http://www.mediacollege.com/computer/database/mysql/find-replace.html
A good starting point for writing such a query is the "Search all columns in all the tables in a database for a specific value" stored procedure. The full code is at the link (not trivial, but copy/paste it and use it, it just works).
From there on it's relatively trivial to amend the code to do a replace of the found values.