Postgres: Restructuring to Schemas - ruby-on-rails-3

I have a Rails 3.2 multi-tenant subdomain based app which I'm trying to migrate over to PostgreSQL's schemas (each account getting its own schema -- right now all of the accounts use the same tables).
So, I'm thinking I need to:
Create a new DB
Create a schema for each Account (its id) and the tables under them
Grab all the data that belongs to each account and insert it into the new DB under the schema of said account
Does that sound correct? If so, what's a good way of doing that? Should I write a ruby script that uses ActiveRecord, plucks the data, then inserts it (pretty inefficient, but should get the job done) into the new DB? Or does Postgres provide good tools for doing such a thing?
EDIT:
As Craig recommended, I created schemas in the existing DB. I then looped through all of the Accounts in a Rake task, copying the data over with something like:
Account.all.each do |account|
PgTools.set_search_path account.id, false
sql = %{INSERT INTO tags SELECT DISTINCT "tags".* FROM "tags" INNER JOIN "taggings" ON "tags"."id" = "taggings"."tag_id" WHERE "taggings"."tagger_id" = #{admin.id} AND "taggings"."tagger_type" = 'User'}
ActiveRecord::Base.connection.execute sql
#more such commands
end

I'd do the conversion with SQL personally.
Create the new schemas in the same database as the current one for easy migration, because you can't easily query across databases with PostgreSQL.
Migrate the data using appropriate INSERT INTO ... SELECT queries. To do it without having to disable any foreign keys, you should build a dependency graph of your data. Copy the data into tables that depend on nothing first, then tables that depend on them, and so on.
You'll need to repeat this for each customer schema, so consider creating a PL/PgSQL function that uses EXECUTE ... dynamic SQL to:
Create the schema for a customer
Create the tables within the schema
Copy data in the correct order by looping over a hard-coded array of table names, doing:
EXECUTE `'INSERT INTO '||quote_ident(newschema)||'.'||quote_ident(tablename)||' SELECT * FROM oldschema.'||quote_ident(tablename)||' WHERE customer_id = '||quote_literal(customer_id)'||;'
where newschema, tablename and customer_id are PL/PgSQL variables.
You can then invoke that function from SQL. While you could do just select convert_customer(c.id) FROM customer GROUP BY c.id, I'd probably do it from an external control script just so each customer's work got done and committed individually, avoiding the need to start again from scratch if the second-to-last customer conversion fails.
For bonus crazy points it's even possible to define triggers on the main customer schema's tables that replicate changes to already-migrated customers over to the copy of their data in the new schema, so they can keep using the system during the migration. I'd avoid that unless the migration was just too big to do without downtime, as it'd be a nightmare to test and you'd still need the triggers to throw an error on access by customer id x while the migration of x's data was actually in-progress, so it wouldn't be fully transparent.
If you're using different login users for different customers (strongly recommended) your function can also:
REVOKE rights on the schema from public
GRANT limited rights on the schema to the user(s) or role(s) who'll be using it
REVOKE rights on public from each table created
GRANT the desired limited rights on each table to the user(s) and role(s)
GRANT on any sequences used by those tables. This is required even if the sequence is created by a SERIAL pseudo-column.
That way all your permissions are consistent and you don't need to go and change them later. Remember that your webapp should never log in as a superuser.

Related

I have a default schema that is not dbo, why is the content in that schema not always consumed?

I work at a company, 8 months, that provides software to about 80 customers. I have been informed by one of the owners that I am now also the DBA. Until I came here my SQL Server experience has been mostly in the ETL and BI realm, I do not mind learning, so I am diving in. Our base content, stored procedures, function, view etc. are stored in the dbo schema. When we create custom content for a customer this is placed in the cus schema. We do not use the schema designation when consuming content. I have been told that our customers access the database with a single user, access security is handled in the application, and this user has the default schema of cus. Our customers are on a mix of sql server editions from 2008 R2 to 2014 mostly standard or express.
My problem is this. If I make a change for a customer to the base code, dbo schema, and save it in the cus schema then I need to also create an object for everything that calls that object up the call stack in the cus schema otherwise it will not be called. When I was first told this I was like ok that does not make sense but if you say so. My thought being that if content was in the default schema, cus, it would be executed as SQL Server would look there first before looking in the dbo schema.
I now have the opportunity to create all of this content and decided to test my belief in how I thought SQL Server worked and found that what I believe is true but only sometimes. I hope this is not one of those it depends situations and I just do not understand. Anyway I created 7 very simple functions in the dbo and cus schema like this
CREATE FUNCTION [dbo].[Function_X](#Co int)
RETURNS TABLE
AS RETURN
(
Select 'cus' as sche, 'Function_X' as fun ,#co as passed
union all
Select * from Function_X_X(1)
)
They are identical except the ones in the dbo schema say dbo and the ones in the cus schema say cus. That way I know where they are called from. I actually tried to do this in extended events but could not find a way to do that, but I digress. The main function calls function_1, function_2, function_3 and function_4. Function_2 calls function_2_1 and that calls function_2_1_1 see below.
Function
Function_1
Function_2
Function_2_1
Function_2_1_1
Function_3
Function_4
By renaming various functions in the different schemas, so they would not be called, I was able to see which ones were executing in which schema. The results were not what I expected.
1 If the initial function is not in the cus schema then no functions in the cus schema are run.
2 Ignoring for the moment, functions 2_1 and 2_1_1, when the initial function is in the cus schema the four functions 1-4 will run from the cus schema if they exist otherwise they run from dbo so I will get a mix of cus and dbo schema depending on which functions exist.
3 If the function 2_1 is not found in cus then it runs in dbo as does function 2_1_1 even if it exists in the cus schema.
Example 2 executed as expected, however I was not expecting the results from example 1 or 3. I expected to see the cus function execute if it existed. In example 1 all the functions in the cus schema were ignored. In example 3 when function_2_1 was not found it ran the one in dbo as expected but then it also ran Function 2_1_1 from dbo.
Searching far and wide on the world wide web I could not see anything that touched on my question. I did stumble on a post that talked about the owner of the schema and the fact that they have a default schema. The owner dbo of the cus schema and the default schema of the login SA for the owner are both dbo. Thinking that might override the default schema, I created a new server login and database user with the default schema of cus and changed the owner of the cus schema. Alas no change in the result. I logged off the server and/or restarted the service when I did this. I did attempt this several times. I even created a new schema using the new login but got the same result.
Is there a way I can get SQL Server to use what is in the cus schema? Am I missing a setting or do I have to go and create all of that extra content?
The correct answer is to always fully qualify objects with the schema both when creating them and querying them. Relying on defaults is asking for trouble.
The response from, Erland Sommarkskog, located here:
https://social.msdn.microsoft.com/Forums/en-US/73678fa7-0b9c-4780-ae16-197bd30aba3a/i-have-a-default-schema-that-is-not-dbo-why-is-the-content-in-that-schema-not-always-consumed?forum=sqlsecurity
answered my question as to this behavior. "once you are inside a module, it is the module that determines the default schema of unqualified objects, never the default schema of the current user."

Creating view with nested 'no-lock' on SQL Server

Here is the scenario: I have some database model with about 500K new records everyday. The database is almost never updated (only insert statement and delete).
Many users would like to perform queries against database with tools such as PowerBI or so, but I haven't given any access to anybody to prevent deadlocking (I only allow specific IT managed resource to access the data).
I would like to open up data access, but I must prevent any one from blocking the new records insertions.
Could I create a view with nested no-lock inside it assuming no dirty read are created since no update are performed?
Would that be an acceptable design? I know it's is not a perfect solution and it's not mean for that.
It's a compromise to allow user with no SQL skills to perform ad-hoc queries and lookup.
Anything I might be missing?
I think that you can use 'WITH (NoLock) ' in front of table name in query, such as :
SELECT * FROM [table Name] WITH (NoLock)

Use schema name in a JOIN in Redshift

Our database is set up so that each of our clients is hosted in a separate schema (the organizational level above a table in Postgres/Redshift, not the database structure definition). We have a table in the public schema that has metadata about our clients. I want to use some of this metadata in a view I am creating.
Say I have 2 tables:
public.clients
name_of_schema_for_client
metadata_of_client
client_name.usage_info
whatever columns this isn't that important
I basically want to get the metadata for the client I'm running my query on and use it later:
SELECT *
FROM client_name.usage_info
INNER JOIN public.clients
ON CURRENT_SCHEMA() = public.clients.name_of_schema_for_client
This is not possible because CURRENT_SCHEMA() is a leader-node function. This function returns an error if it references a user-created table, an STL or STV system table, or an SVV or SVL system view. (see https://docs.aws.amazon.com/redshift/latest/dg/r_CURRENT_SCHEMA.html)
Is there another way to do this? Or am I just barking up the wrong tree?
Your best bet is probably to just manually set the search path within the transaction from whatever source you call this from. See this:
https://docs.aws.amazon.com/redshift/latest/dg/r_search_path.html
let's say you only want to use the table matching your best client:
set search_path to your_best_clients_schema, whatever_other_schemas_you_need_for_this;
Then you can just do:
select * from clients;
Which will try to match to the first clients table available, which by coincidence you just set to your client's schema!
You can manually revert afterwards if need be or just reset the connection to return to default, up to you

How to repeat multi-step schema change (ETL schema changes?)

I'm new to DBA and not much of a SQL person, so be gentle please.
I'd like to restructure a database that requires adding new columns, tables, and relationships followed by removing old tables, columns, and relationships. A three step process seems to be in order.
Change schema to add new stuff
Run SSIS to hook up new data using some of the old data.
Change schema to drop old stuff.
I'm using a SQL database Project in VS 2015 to maintain the schema, and using schema compare to update the DB schema. I'd like to make it repeatable or automatic, if possible, so I can test it out on a non-production database to get the flow right: change schema->run ETL->change schema. Is there a way to apply schema changes from within ETL or does this require manual operations? Is there a way to store two schemas into files and then apply them, other than VS publish or compare?
There is a SQL TASK that allows you to do what you want to do. You want to alter table (to add columns), move the data from old columns to new columns, then drop the old columns.
1) Alter table tableA add column ..
2) update table tableA set ..
3) alter table tableA drop column...
Please test your code carefully before running it.
It worked! Here is the example of the ETL. Note that it's important to set DelayValidation to true for the data flows and to disable ValidateExternalMetadata for some of the operations within the data flows because the database is not static.

run sql queries under a particular schema context

We are thinking about to create new schema with its own 3 tables which will be created on the fly for an individual customer.
To run a particular query for those tables in a procedure, should we have something like this.
declare #sName nvarchar(200);
select #sName =Schema_Name from schema where Schema_Id = passed_id_from_code
ALTER USER UserName WITH DEFAULT_SCHEMA = #sName
-- Run the statements here --
...
-- After finishing executing statements
ALTER USER UserName WITH DEFAULT_SCHEMA = db;
In this scenario, can concurrent customers from various schema can update their own schema table or it will conflict.
Your suggestions are welcome.
Anil
Most SQL databases have each table create as an unique entity in that database. That means that each table can be modified and altered individually with no relation to the other tables. CUSTOMERA.TABLE_ONE is a different object in the database that CUSTOMERB.TABLE_ONE. They do share the same name, but they are not the same object with potentially different layouts (as they have different schemas).
So unless there is some restriction on the RDBMS you can do this. Now having different schemas for each user may not be good. If you are develop the same app to handle several customers, you have to make sure it will work with all schemas and all custoemrs. In potentially different versions of the schema.
If you are going for a multi-tenant architecture, it may be wiser to use some kind of extension to to table. So you have a single DB.TABLE_ONE, with a CUSTOMER_DATA column where you put data in a know and flexible format (say JSON or XML). Some RDBMS have that that as a native features (I believe DB2 is one of them).
Hope this helps.