Use schema name in a JOIN in Redshift - sql

Our database is set up so that each of our clients is hosted in a separate schema (the organizational level above a table in Postgres/Redshift, not the database structure definition). We have a table in the public schema that has metadata about our clients. I want to use some of this metadata in a view I am creating.
Say I have 2 tables:
public.clients
name_of_schema_for_client
metadata_of_client
client_name.usage_info
whatever columns this isn't that important
I basically want to get the metadata for the client I'm running my query on and use it later:
SELECT *
FROM client_name.usage_info
INNER JOIN public.clients
ON CURRENT_SCHEMA() = public.clients.name_of_schema_for_client
This is not possible because CURRENT_SCHEMA() is a leader-node function. This function returns an error if it references a user-created table, an STL or STV system table, or an SVV or SVL system view. (see https://docs.aws.amazon.com/redshift/latest/dg/r_CURRENT_SCHEMA.html)
Is there another way to do this? Or am I just barking up the wrong tree?

Your best bet is probably to just manually set the search path within the transaction from whatever source you call this from. See this:
https://docs.aws.amazon.com/redshift/latest/dg/r_search_path.html
let's say you only want to use the table matching your best client:
set search_path to your_best_clients_schema, whatever_other_schemas_you_need_for_this;
Then you can just do:
select * from clients;
Which will try to match to the first clients table available, which by coincidence you just set to your client's schema!
You can manually revert afterwards if need be or just reset the connection to return to default, up to you

Related

If not exist clause SQL statement

so I found this sql query in a project I am succeeding. This is the first time I encountering this clause/statement. I understand that this is to look if the table exist before creating one and that Object_ID is the table name that is to be created.
My questions are:
Does sysobject mean the database?
What is the Object property?
I know that it is not the columns inside the table to be created.
The columns are : dtb_color_id and description.
can someone explain this to me. please?
IF NOT EXISTS(SELECT * FROM SYSOBJECTS WHERE ID = OBJECT_ID('DTB_COLOR') AND OBJECTPROPERTY(ID,'ISUserTable') = 1)
BEGIN
.......some query I understand
END
sysobjects, OBJECTPROPERTY and OBJECT_ID are used in Microsoft SQL Server. They are part of the SQL Server DMVs and system functions/procedures used to query and manipulate the metadata.
sys.sysobjects is simply the list of all objects (tables, views, SPs, functions, etc) on the server in the active database. Please note, that sys.sysobjects is deprecated and is only available for backward compatibility. Use sys.objects instead
https://learn.microsoft.com/en-us/sql/relational-databases/system-dynamic-management-views/system-dynamic-management-views?view=sql-server-ver16
It has (as far as I know) no meaning in MySQL, unless somebody specifically created them.
You can also use INFORMATION_SCHEMA which is available in MySQL too (however slightly different in different RDBMS).
MSSQL INFORMATION_SCHEMA: https://learn.microsoft.com/en-us/sql/relational-databases/system-information-schema-views/system-information-schema-views-transact-sql?view=sql-server-ver16
MySQL INFORMATION_SCHEMA: https://dev.mysql.com/doc/refman/8.0/en/information-schema.html
SQL Server has no CREATE TABLE IF NOT EXISTS construct, a variation of the mentioned condition is commonly used to imitate that.
This is a way in SQL Server to check if a table exists in the active database and to perform actions according to the result, like creating the table.
OBJECTPROPERTY simply checks (in this case) if the table is a user created one.
https://learn.microsoft.com/en-us/sql/t-sql/functions/objectproperty-transact-sql?view=sql-server-ver16
I would remove the OBJECTPROPERTY condition in case the part you understand is a CREATE TABLE statement. You don't want to create a table which has a similar name to any system table/view, also you don't want to execute the CREATE TABLE if there is a VIEW with the same name (table creation will fail)
Yes sysobject means database.
The OBJECTPROPERTY() function returns information about schema-
scoped objects in the current database. Use this to check if an
object is a table, view, stored procedure, etc. You can also use
it to check if a table has a primary key, foreign key, foreign
key reference, etc.
For more details : https://learn.microsoft.com/en-us/sql/t-sql/functions/objectpropertyex-transact-sql?view=sql-server-ver16
In this scenario it is used to check whether it is user table or
not. The result of the ISUserTable property is 1 when it is user
table otherwise returns 0.
Here the following steps are followed:
First, it executes the select statement inside the IF Exists
If the select statement returns a value that condition is TRUE for IF Exists
It starts the code inside a begin statement
DTB_COLOR - May be a stored procedure

Object Variable in a query

I am using SSDT 2017 and I am working on a solution that basically gets a full result set from a query into a variable (1 column only: AccountID), and I need to include the values in that object variable in a query, something like this:
"SELECT * FROM dbo.account WHERE AccountID IN (" + #AccountIDObjectVariable + ")"
I tried with an expression but I get an error, so I am not sure if there's a better way, also I tried a for each loop container logic but since I have millions of record in the object variable I think that's not the best way.
Any idea?
It doesn't work that way. Where "it" is going to be a host of things.
The SSIS data types are primitive types (boolean, date, numbers) or Object. The only supported operations for Object is a null check and enumeration.
SSIS parameterization is only for equality based substitutions. There is no concept of a list data type in SQL so there's no analog in SSIS.
I have millions of record in the object variable
Even if you converted your list to a string and used string concatenation, the next problem you're going to run into is the string length limit of 4000 characters.
What is the way?
Let's reset the problem: You have a non-trivial set of identities from a source system. That set of ids needs to be used as the basis for a subsequent extract.
Is the source of identities and the actual data on the same server
While you can empty the ocean with a teaspoon, it's not the correct tool. Same holds true here. Move the query that identifies the recordset to be extracted into a filter condition for your source.
i.e.
Load dataset into #AccountIDObjectVariable
SELECT
OA.AccountId
FROM
dbo.OutstandingAccount AS OA;
Extract that isn't working
"SELECT * FROM dbo.account WHERE AccountID IN (" + #AccountIDObjectVariable + ")"
is rewritten as
SELECT * FROM dbo.account AS A WHERE EXISTS (SELECT * FROM dbo.OutstandingAccount AS OA WHERE OA.AccountID = A.AccountID);
There are two reasonable approaches for solving this
Pull it all
If the source ids list and the source table are of similar orders of magnitude, it might be easier to just bring it all down and use the account id generating query in a Lookup Task. If AccountID exists, then it's the data you want. Yes you pulled more than you wanted but you likely would have burned more cycles and complexity trying to selectively pull what you wanted.
Push and pull
This approach is going to work for SQL Server and I have no idea about any other database. Well, I suppose Sybase would be the same given database paternity.
Open SSMS and create a global temporary table on the database where dbo.account lives. Do not disconnect from SSMS.
IF OBJECT_ID('tempdb..##SO_66961235') IS NOT NULL
BEGIN
DROP TABLE ##SO_66961235;
END
GO
CREATE TABLE ##SO_66961235
(
AccountID int NOT NULL
);
Modify the Connection manager to set the RetainSameConnection Property to true for the database connection to dbo.account
Execute SQL Task - Make Temp Table
Use the connection to the account database and the above query. This will ensure the table exists for future sessions of SSIS to work.
DataFlow Load IDs
In the dataflow properties, set DelayValidation to True
Use your source query to generate the list of IDs and select the temporary table as the destination. You might need to have a second connection manager to this system running and pointed at tempdb, it's been a long time since I've done this. Same rule about RetainSameConnection will hold true though.
When this data flow completes, then we will have a temporary table on the data source server that we can reference.
Dataflow 2 Get Data
Again, DelayValidation to true.
Source will be a query
SELECT * FROM dbo.account AS A WHERE EXISTS (SELECT * FROM ##SO_66961235 AS OA WHERE OA.AccountID = A.AccountID);
What's with all the delay validation?
When the SSIS package starts, the first thing it does is ensure all the pieces are in place for it to run successfully and not only are the pieces in place, is the shape of the data still the same? A temporary table won't exist when the package starts and the package will fail with VS_NEEDSNEWMETADATA error. Setting DelayValidation tells SSIS that it should not worry about checking until the component actually gets the signal to start before it checks metadata. Since we defined the precursor Execute SQL Task to create the table, the validation should succeed.
I used global temporary tables here. You can use local scoped temporary tables but it makes the already fiddly design process much more so. Were it me, I'd have a package parameter controlling a boolean that uses a global temp table for development sessions and local temp table for actual run-time operations but that's beyond the scope of this question.

How to join a table which is in another database in postgres [duplicate]

I'm going to guess that the answer is "no" based on the below error message (and this Google result), but is there anyway to perform a cross-database query using PostgreSQL?
databaseA=# select * from databaseB.public.someTableName;
ERROR: cross-database references are not implemented:
"databaseB.public.someTableName"
I'm working with some data that is partitioned across two databases although data is really shared between the two (userid columns in one database come from the users table in the other database). I have no idea why these are two separate databases instead of schema, but c'est la vie...
Note: As the original asker implied, if you are setting up two databases on the same machine you probably want to make two schemas instead - in that case you don't need anything special to query across them.
postgres_fdw
Use postgres_fdw (foreign data wrapper) to connect to tables in any Postgres database - local or remote.
Note that there are foreign data wrappers for other popular data sources. At this time, only postgres_fdw and file_fdw are part of the official Postgres distribution.
For Postgres versions before 9.3
Versions this old are no longer supported, but if you need to do this in a pre-2013 Postgres installation, there is a function called dblink.
I've never used it, but it is maintained and distributed with the rest of PostgreSQL. If you're using the version of PostgreSQL that came with your Linux distro, you might need to install a package called postgresql-contrib.
dblink() -- executes a query in a remote database
dblink executes a query (usually a SELECT, but it can be any SQL
statement that returns rows) in a remote database.
When two text arguments are given, the first one is first looked up as
a persistent connection's name; if found, the command is executed on
that connection. If not found, the first argument is treated as a
connection info string as for dblink_connect, and the indicated
connection is made just for the duration of this command.
one of the good example:
SELECT *
FROM table1 tb1
LEFT JOIN (
SELECT *
FROM dblink('dbname=db2','SELECT id, code FROM table2')
AS tb2(id int, code text);
) AS tb2 ON tb2.column = tb1.column;
Note: I am giving this information for future reference. Reference
I have run into this before an came to the same conclusion about cross database queries as you. What I ended up doing was using schemas to divide the table space that way I could keep the tables grouped but still query them all.
Just to add a bit more information.
There is no way to query a database other than the current one. Because PostgreSQL loads database-specific system catalogs, it is uncertain how a cross-database query should even behave.
contrib/dblink allows cross-database queries using function calls. Of course, a client can also make simultaneous connections to different databases and merge the results on the client side.
PostgreSQL FAQ
Yes, you can by using DBlink (postgresql only) and DBI-Link (allows foreign cross database queriers) and TDS_LInk which allows queries to be run against MS SQL server.
I have used DB-Link and TDS-link before with great success.
I have checked and tried to create a foreign key relationships between 2 tables in 2 different databases using both dblink and postgres_fdw but with no result.
Having read the other peoples feedback on this, for example here and here and in some other sources it looks like there is no way to do that currently:
The dblink and postgres_fdw indeed enable one to connect to and query tables in other databases, which is not possible with the standard Postgres, but they do not allow to establish foreign key relationships between tables in different databases.
If performance is important and most queries are read-only, I would suggest to replicate data over to another database. While this seems like unneeded duplication of data, it might help if indexes are required.
This can be done with simple on insert triggers which in turn call dblink to update another copy. There are also full-blown replication options (like Slony) but that's off-topic.
see https://www.cybertec-postgresql.com/en/joining-data-from-multiple-postgres-databases/ [published 2017]
These days you also have the option to use https://prestodb.io/
You can run SQL on that PrestoDB node and it will distribute the SQL query as required. It can connect to the same node twice for different databases, or it might be connecting to different nodes on different hosts.
It does not support:
DELETE
ALTER TABLE
CREATE TABLE (CREATE TABLE AS is supported)
GRANT
REVOKE
SHOW GRANTS
SHOW ROLES
SHOW ROLE GRANTS
So you should only use it for SELECT and JOIN needs. Connect directly to each database for the above needs. (It looks like you can also INSERT or UPDATE which is nice)
Client applications connect to PrestoDB primarily using JDBC, but other types of connection are possible including a Tableu compatible web API
This is an open source tool governed by the Linux Foundation and Presto Foundation.
The founding members of the Presto Foundation are: Facebook, Uber,
Twitter, and Alibaba.
The current members are: Facebook, Uber, Twitter, Alibaba, Alluxio,
Ahana, Upsolver, and Intel.
In case someone needs a more involved example on how to do cross-database queries, here's an example that cleans up the databasechangeloglock table on every database that has it:
CREATE EXTENSION IF NOT EXISTS dblink;
DO
$$
DECLARE database_name TEXT;
DECLARE conn_template TEXT;
DECLARE conn_string TEXT;
DECLARE table_exists Boolean;
BEGIN
conn_template = 'user=myuser password=mypass dbname=';
FOR database_name IN
SELECT datname FROM pg_database
WHERE datistemplate = false
LOOP
conn_string = conn_template || database_name;
table_exists = (select table_exists_ from dblink(conn_string, '(select Count(*) > 0 from information_schema.tables where table_name = ''databasechangeloglock'')') as (table_exists_ Boolean));
IF table_exists THEN
perform dblink_exec(conn_string, 'delete from databasechangeloglock');
END IF;
END LOOP;
END
$$

Postgres: Restructuring to Schemas

I have a Rails 3.2 multi-tenant subdomain based app which I'm trying to migrate over to PostgreSQL's schemas (each account getting its own schema -- right now all of the accounts use the same tables).
So, I'm thinking I need to:
Create a new DB
Create a schema for each Account (its id) and the tables under them
Grab all the data that belongs to each account and insert it into the new DB under the schema of said account
Does that sound correct? If so, what's a good way of doing that? Should I write a ruby script that uses ActiveRecord, plucks the data, then inserts it (pretty inefficient, but should get the job done) into the new DB? Or does Postgres provide good tools for doing such a thing?
EDIT:
As Craig recommended, I created schemas in the existing DB. I then looped through all of the Accounts in a Rake task, copying the data over with something like:
Account.all.each do |account|
PgTools.set_search_path account.id, false
sql = %{INSERT INTO tags SELECT DISTINCT "tags".* FROM "tags" INNER JOIN "taggings" ON "tags"."id" = "taggings"."tag_id" WHERE "taggings"."tagger_id" = #{admin.id} AND "taggings"."tagger_type" = 'User'}
ActiveRecord::Base.connection.execute sql
#more such commands
end
I'd do the conversion with SQL personally.
Create the new schemas in the same database as the current one for easy migration, because you can't easily query across databases with PostgreSQL.
Migrate the data using appropriate INSERT INTO ... SELECT queries. To do it without having to disable any foreign keys, you should build a dependency graph of your data. Copy the data into tables that depend on nothing first, then tables that depend on them, and so on.
You'll need to repeat this for each customer schema, so consider creating a PL/PgSQL function that uses EXECUTE ... dynamic SQL to:
Create the schema for a customer
Create the tables within the schema
Copy data in the correct order by looping over a hard-coded array of table names, doing:
EXECUTE `'INSERT INTO '||quote_ident(newschema)||'.'||quote_ident(tablename)||' SELECT * FROM oldschema.'||quote_ident(tablename)||' WHERE customer_id = '||quote_literal(customer_id)'||;'
where newschema, tablename and customer_id are PL/PgSQL variables.
You can then invoke that function from SQL. While you could do just select convert_customer(c.id) FROM customer GROUP BY c.id, I'd probably do it from an external control script just so each customer's work got done and committed individually, avoiding the need to start again from scratch if the second-to-last customer conversion fails.
For bonus crazy points it's even possible to define triggers on the main customer schema's tables that replicate changes to already-migrated customers over to the copy of their data in the new schema, so they can keep using the system during the migration. I'd avoid that unless the migration was just too big to do without downtime, as it'd be a nightmare to test and you'd still need the triggers to throw an error on access by customer id x while the migration of x's data was actually in-progress, so it wouldn't be fully transparent.
If you're using different login users for different customers (strongly recommended) your function can also:
REVOKE rights on the schema from public
GRANT limited rights on the schema to the user(s) or role(s) who'll be using it
REVOKE rights on public from each table created
GRANT the desired limited rights on each table to the user(s) and role(s)
GRANT on any sequences used by those tables. This is required even if the sequence is created by a SERIAL pseudo-column.
That way all your permissions are consistent and you don't need to go and change them later. Remember that your webapp should never log in as a superuser.

Export/View data from a SQL Server temporary table

I have a temporary table, that isn't going away. I want to see what is in the table to determine what bad data might be in there. How can I view the data in the temporary table?
I can see it in tempdb. I ran
SELECT * FROM tempdb.dbo.sysobjects WHERE Name LIKE '#Return_Records%'
to get the name of the table.
I can see it's columns and its object id in
select c.*
from tempdb.sys.columns c
inner join tempdb.sys.tables t ON c.object_id = t.object_id
where t.name like '#Return_Records%'
How can I get at the data?
By the way, this doesn't work
SELECT * FROM #Return_Records
One way of getting at the data in a low-level and not particularly easy to manipulate manner is to use the DBCC PAGE command as described in a blog post by Paul Randal:
http://blogs.msdn.com/sqlserverstorageengine/archive/2006/06/10/625659.aspx
You should be able to find the fileid and page number of the first page in the object by querying on sysindexes .. the last time I did this was on SQL Server 7.
If the data is in the database, then DBCC page will be able to dump it.
pjjH
SQL Server limits access to Local Temp Tables (#TableName) to the connection that created the table. Global temp tables (##TableName) can be accessible by other connections as long as the connection that created it is still connected.
Even though you can see the table in the table catalog, it is not accessible when trying to do a SELECT. It gives you an "Invalid Object Name" error.
There's no documented way of accessing the data in Local Temp Tables created by other connections. I think you may be out of luck in this case.
This is something that seems like you obviously tried, but since you didn't mention it I though I would mention just in case:
Did you try "SELECT * FROM #Return_Records"?
Like José Basilio says, that's a temporary table belonging to another connection. If it's there for a long time, it must belong to a connection that has been open for a long time. Check Maintenance -> Acitivity Monitor; you can sort by Login Time.
Check if the Login Time, or Last Batch Time, matches with the create date of the temporary table. That can be retrieved with:
select crdate from tempdb.dbo.sysobjects WHERE Name LIKE '#Return_Records%'
You can shoot down suspect connections (right click and Kill Process.) If the table is gone after killing a process, you've found the culprit.
To just remove the table, restart the SQL Server service. You can attach SQL Profiler right after with a filter to start looking for the connection that creates the temporary table.