SQL SELECT WHERE value IN ('Huge list of Values') - sql

Note: C# 3.5 application calling a SQL Server 2005 DB on a remote server.
I'm developing a two step process.
1) I search a Windows Indexing Service for a list of files that contain a given word, such as "Bob".
2) I then need to retrieve a list of rows from a DOCUMENT table in a SQL DB by passing in the list of filenames from the Indexing Service.
At the moment I retrieve a list from the indexing service AND all rows from the DOCUMENT table, then filter them in code. This isn't practical as there are 10,000+ documents and the database is through a firewall.
I considered creating a query such as:
SELECT DocName FROM Documents WHERE DocName IN ({list of files from indexing service})
...but given the list of files could be thousands it won't work.
So, what's the best thing I can do? I don't want to query the DB for all 10,000+ rows and pass them back over the firewall (takes 10 minutes). I somehow need to pass in the list of filenames retrieved from the indexing service.
How would linq work in this scenario?
Any advice greatly appreciated.

If you had SQL Server 2008, you could use Table Valued Parameters, but for 2005, there's nothing quite as elegant.
The simplest solution I can think of is:
Create a table in the database
Bulk Insert the results of your Indexing Service into the table
Join your query to this table to filter the results
Retrieve the filered results
It's not a great solution, but I don't know that a great solution exists - that's why TVPs were created.

You can evaluate different solutions for this kind of "massive" operation, may be not necessary to use linq. For example, try to implement a stored procedure on SQL Server, that receives in input the list of file name and returns the list of documents.

I opted for a solution similar to what Bazzz mentioned.
I've set up a nightly operation to copy the required fields from the database and set meta tags on the document files (PDFs). The meta data can then be used in the Indexing Service ;o)
This has proved to be a good solution for this instance, but otherwise what Hallainzil said would've been the best option albeit painful on Sql Server 2005.

Related

RavenDB ETL with sql database

I'm playing with RavenDB ETL and are trying to send data to an sql database. With one Ravendb database that is pretty easy to achive. In my case I have many (20++) RavenDB databases and I would like to send all the data to the SQL reporting database. Since all the databases have the same structure the documentId in the different RavenDB can be the same and the value in the documentId column in SQL database will then not be unique. Do anyone have experierence on how to solve this?
You can simply put extra column which will discriminate source database name.
I know each SQL script will contain different definition, but you can automate this task using any client (c#, java, node, etc).
For API/examples go here: https://github.com/ravendb/ravendb/blob/c8c7f7ac0e8276f6ebee9a40ebacc1320e486a0d/test/SlowTests/Server/Documents/ETL/SQL/SqlEtlTests.cs#L39
Optionally you might concat database name to document id - decision is yours.
Just an idea:
Do the 20++ RavenDB databases have different/unique names ?
If yes, you can concatenate the db name to each document created on those databases
This way each document will get a unique id
The document id can be manipulated with onBeforeStore See: https://ravendb.net/docs/article-page/5.2/Csharp/client-api/session/how-to/subscribe-to-events
So the db name is concatenated to the db is right before storing to disk...
and this way the SQL ETL gets unique document ids

How do I copy data from one Azure database table to a different Azure database table and also convert data types?

I have to copy data from one table to another, the tables are held in two different databases within Azure. I did a quick search for answers to this and whilst a query seems fairly straight forward i.e.
INSERT INTO table1 (make, model, type, serial)
SELECT the_make, the_model, the_type, ref_no
FROM database2.dbo.table2
I encountered issues because I'm using Azure.
Msg 40515, Level 15, State 1, Line 16 Reference to database and/or
server name in 'database2.dbo.table2' is not supported in this version of
SQL Server.
The above issue led me to the Cross-Database Queries articles. My requirements are a little more complicated than some of the scenarios provided and I need some help in making it work.
I also need to convert some columns such as reg_no which is a 'string' to an 'int' and then copy the value to the 'serial' column.
My question is, what the best way to create a script for this that allows me to reference both databases without any errors, copy the data and convert the columns at the same time? I tried the simple way of exporting data and importing it, editing the mappings for the columns, it wasn't that good I found and was causing problems all over the place.
Any guidance is appreciated on this.
You're getting this error because there's no linked server by default. You'll need to add it, in order to access the secondary db server. Here's a link about how to do it:
https://www.sqlshack.com/create-linked-server-azure-sql-database/
In terms of the transformation. It depends on many factors e.g. amount of rows, frequency, etc..
Usually the best alternative is by using an external tool (ETL) such as SSIS / Azure Data Factory because you can schedule it's execution and get the status of each execution.

SSAS Multidimensional - Table Value Function as a Query for Partition

#GregGalloway was able to answer the question I should have asked. I am adding a more concise question here, while maintaining the original lengthy text
How do I use a table valued function as the query for a partition, when the function is in separate database from my fact and referenced dimensions?
Overview: I am building a SSAS multidimensional cube that is built off of a single fact table in our application's data warehouse, and want to use the result set from a table valued function as my fact table's partition query. We are using SQL Server (and SSAS) 2014
Condition: For each environment (Dev,Tst,Prd) there are 2 separate databases on the same server, one for the application data warehouse [DW_App], the other for custom objects [DW_Custom]. I cannot create any objects in [DW_App], but have a lot of freedom in [DW_Custom]
Background info: I have not been able to find much information on using a TVF and partitions in this way. My thinking is that it will help streamline future development by giving me a single place to update the SQL if/when I modify the fact table.
So in testing out my crazy idea of using a TVF as the query for my partitions I have run into a bit of a conundrum. I am able to use my TVF when I explicitly state the Database in my FROM clause.
SELECT * FROM [DW_Custom].[dbo].[CubePartition](#StartDate, #EndDate)
However, that will not work, because the cube will be deployed in multiple environments before production, and it needs to point to different DBs for each. So I tried adding a new data source, setting my partition query to point to the new data source, and then remove the database name. IE:
SELECT * FROM [dbo].[CubePartition](#StartDate, #EndDate)
I get an error that
The SQL syntax is not valid. The relational database returned the following error message: Deferred prepare could not be completed. Invalid object name 'dbo.CubePartition'
If I click through this error and the subsequent warnings about the cube not being able to process if I continue I am able to build and deploy the cube. However I cannot process it, because I get an error that one of my dimensions does not exist.
Looking into the query that was generated and it is clear that it is querying my dimensions as well as fact, which do not exist inside of '[DW_Custom]' which explains that error perfectly fine.
So I guess 2 questions:
Is it possible to query another DB (on the same server) from inside of an SSAS partition query?
If not, is there any way I can use a variable as the database name in the query, and update that variable based on the project configuration (Dev,Tst,Prd)
Bonus question: Is the reason that I can not find much about doing it this way because it is an obviously bad idea that I am overlooking, and if so why?
How about creating a second SSAS Data Source pointing to the DW_Custom database (or whatever it's called in the particular environment you're deploying to)? Then when you deploy from Dev to Prod, you need only change that connection string. When you create your partitions, then specify the DW_Custom data source and then specify the query without database name:
SELECT * FROM [dbo].[CubePartition](#StartDate, #EndDate)
As long as the query plan for that table-valued function is efficient compared to a plain SELECT statement, then I don't see a problem with that.

How to do a search and replace of a part of a string in all columns in all tables in an sql database

Is it possible to search and replace all occurrences of a string in all columns in all tables of a database? I use Microsoft SQL Server.
Not easily, though I can thing of two ways to do it:
Write a series of stored procedures that identify all varchar and text columns of all tables, and generate individual update statements for each column of each table of the form "UPDATE foo SET BAR = REPLACE(BAR,'foobar','quux')". This will probably involve a lot of queries against the system tables, with a lot of experimentation -- Microsoft doesn't go out of its way to document this stuff.
Export the entire database to a single text file, do a search/replace on that, and then re-import the entire database. Given that you're using MS SQL Server, this is actually the easier approach. Microsoft created the Microsoft SQL Server Database Publishing Wizard for other reasons, but it makes a fine tool for exporting all of the tables of a SQL Server database as a text file containing pure SQL DDL and DML. Run the tool to export all of the tables for a database, edit the resulting file as you need, and then feed the file back to sqlcmd to recreate the database.
Given a choice, I'd use the second method, as long as the DPW works with your version of SQL Server. The last time I used the tool, it met my needs (MS SQL Server 2000 / 2005) but it had some quirks when working with database Roles.
In MySQL, you can do it very easily like this:
update [table_name] set [field_name] = replace([field_name],'[string_to_find]','[string_to_replace]');
I have personally tested this successfully on a production server.
Example:
update users set vct_filesneeded = replace(vct_filesneeded,'.avi','.ai');
Ref: http://www.mediacollege.com/computer/database/mysql/find-replace.html
A good starting point for writing such a query is the "Search all columns in all the tables in a database for a specific value" stored procedure. The full code is at the link (not trivial, but copy/paste it and use it, it just works).
From there on it's relatively trivial to amend the code to do a replace of the found values.

Move Data from Oracle to SQL Server

I would like to copy parts of an Oracle DB to a SQL Server DB. I need to move the data because the Oracle box is being decommissioned. I only need the data for reference purposes so don't need indexes or stored procedures or contstaints, etc. All I need is the data.
I have a link to the Oracle DB in SQL Server. I have tested the following query, which seemed to work just fine:
select
*
into
NewTableName
from
linkedserver.OracleTable
I was wondering if there are any potential issues with using this approach?
Using SSIS (sql integration services) may be a good alternative especially if your table names are the same on both servers. Use the import wizard via and it should create the destination tables for you and let you edit any mappings.
The only issue I see with that is you will need to execute that of course for each and every table you need. Glad you are decommissioning the oracle server :-). Otherwise if you are not concerned with indexes or any of the existing sprocs I don't see any issue in what you are doing.
The "select " approach could be very slow if tables are large. Consider writing pro*C in that case or use Fastreader http://www.wisdomforce.com/products-FastReader.html
A faster and easier approach might be to use the Data Transformation Services, depending on the number of objects you're trying to copy over.