What is the source column of each column in view - sql

How can I know for each column in view, or procedure the source column
For example,
I have this table:
CREATE TABLE [dbo].[tbl_Address](
[iAddressId] [int] IDENTITY(1,1) NOT NULL,
[nvStreet] [nvarchar](50) NULL,
[iHouseNum] [int] NULL,
[nvEnterance] [nvarchar](10) NULL,
[iFloor] [int] NULL,
[nvNeighberwood] [nvarchar](20) NULL,
[nvCity] [nvarchar](20) NULL,
[nvNote] [nvarchar](50) NULL,
CONSTRAINT [PK_tbl_Address] PRIMARY KEY CLUSTERED
(
[iAddressId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
And this view:
CREATE VIEW ViewALIAS
AS
SELECT a.nvStreet myStreet,a.nvCity manyStreeats FROM tbl_Address a
The required answer:
nameObject | nameColumn | SourceObject | SourceColumn
-----------------------------------------------------
tbl_Address| nvStreet | ViewALIAS | myStreet
tbl_Address| nvCity | ViewALIAS | manyStreeats
I know how to make the columns: nameObject, nameColumn, SourceObject
or the table: nameObject, SourceObject, SourceColumn
But I do not know how to match x and t

I think that you can't get exactly what you want (see explanation below) but you can get an approximation, first you can get all the columns of the view using:
select name, column_id from sys.columns where object_id=object_id('ViewALIAS')
and then you can get the columns referenced by the view using column dependencies, note that this include all the columns used in any part of the query, not only columns in the select list (for example it would list a column used in a WHERE condition):
select dependencies.referenced_entity_name as referenced_name,
entities.referenced_minor_id as column_id, entities.referenced_minor_name as column_name
from sys.sql_expression_dependencies as dependencies
join sys.objects as objects on object_id=referencing_id
join sys.schemas as schemas on schemas.schema_id=objects.schema_id
cross apply sys.dm_sql_referenced_entities(schemas.name+'.'+objects.name,'OBJECT') as entities
where entities.referenced_entity_name=dependencies.referenced_entity_name
and (is_schema_bound_reference=0 or entities.referenced_minor_id=dependencies.referenced_minor_id)
and entities.referenced_minor_id!=0
and referencing_id=object_id('ViewALIAS')
You can't link the two results because as far as I know there is no way in SQL Server of getting the relation between view columns and referenced columns in the object catalog views, I don't know the exact reason but I'm quite sure that the primary cause is that a view column could be almost anything, so it's not always possible match it to a table column, a few examples:
CREATE VIEW ExampleView AS
SELECT ID, -- table column
(Value1+Value2)*Value3 as Total, -- multiple columns
12345 as Value, -- literal value
getdate() as CurrentDate, -- scalar function
dbo.fnMyFunction(Value1,Value2) as ScalarColumn, -- scalar function with parameters
TableFunctionColumn, -- column from a table function (which can be almost anything)
CASE WHEN Value1<10000 THEN Value2
WHEN Value1>20000 THEN Value3
ELSE Value2+Value3 END AS CaseColumn, -- different columns based on another value
(SELECT SUM(OtherValue) FROM OtherTable
WHERE OtherID=ID GROUP BY OtherValue) AS SubqueryColumn -- subquery result
FROM ExampleTable
CROSS APPLY dbo.fnTableFunction(ID)
I think the CASE column is the most representative example because the table columns used by the view column depends on the value of another column which is not used directly for the column value.

Related

How to copy table data and structure with Identity column and its value to another table in the same db(just change table name)

I have trouble in copying table data and structure to another table since I want to keep the Id column of Identity column and keep its oridinal value instead of starting from 1
I use below sql to insert all data except the ID column from MY_TABLE to MY_TABLE_NEW since it has error saying that
Only when the column list is used and IDENTITY_INSERT is ON, an explicit value can be specified for the identity column in the table'My_TABLE_NEW'.
But I have set it like below SQL:
IF NOT EXISTS (select * from sys.objects where name = 'My_TABLE_NEW')
BEGIN
CREATE TABLE [dbo].[My_TABLE_NEW]
(
[ID] [int] IDENTITY(1,1) NOT NULL,
[OBJECT_ID] [int] NOT NULL,
[YEAR_MONTH] [int] NOT NULL,
CONSTRAINT [PK_My_TABLE_NEW]
PRIMARY KEY CLUSTERED ([ID] ASC)
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF,
IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON,
ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
END
GO
IF EXISTS (SELECT * FROM sys.objects WHERE name = 'My_TABLE_NEW')
BEGIN
SET IDENTITY_INSERT My_TABLE_NEW ON
INSERT INTO My_TABLE_NEW
SELECT [ID]
,[OBJECT_ID]
,[YEAR_MONTH]
FROM My_TABLE
SET IDENTITY_INSERT My_TABLE_NEW OFF
END
GO
What is the problem?
Try your insert with the column names:
INSERT INTO My_TABLE_NEW ([ID], [OBJECT_ID], [YEAR_MONTH])
SELECT [ID]
,[OBJECT_ID]
,[YEAR_MONTH]
FROM My_TABLE
From the documentation:
When an existing identity column is selected into a new table, the new column inherits the IDENTITY property, unless one of the following conditions is true:
The SELECT statement contains a join, GROUP BY clause, or aggregate function.
Multiple SELECT statements are joined by using UNION.
The identity column is listed more than one time in the select list.
The identity column is part of an expression.
The identity column is from a remote data source.
That means you can copy the table with SELECT INTO while retaining the identity column and can just add the PK after.
SELECT *
INTO My_TABLE_NEW
FROM My_TABLE
Here is a demo with fiddle.
You can use the built-in tool sp_rename for this, as long as you are just renaming the table not trying to create a copy of it.
EXEC sp_rename 'My_TABLE', 'My_TABLE_NEW'
GO;
https://learn.microsoft.com/en-us/sql/relational-databases/system-stored-procedures/sp-rename-transact-sql?view=sql-server-ver15
If you want to create a copy, then you just have to do the following:
SELECT *
INTO My_TABLE_NEW
FROM My_TABLE
Note: If you do this, you will have to re-add any key constraints, computed value columns, etc.

Postgres Import from different table

I'm still fairly new to postgres. I have a table named: university_table with fields: name,
nationality, abbreviation, adjective, person.
I found this sql query to insert data from: https://stackoverflow.com/a/21759321/9469766
Snippet of query below.
How can alter the query to insert these values into my university_country table
-- Create and load Nationality Table - English
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[Nationality]') AND type in (N'U'))
DROP TABLE [dbo].[Nationality]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
-------------------------------------------------------------------
-- TABLE: [dbo].[Nationality]
-- Creation Date: 02/12/2014
-- Created by: Dan Flynn, Sr. DBA
--
-------------------------------------------------------------------
CREATE TABLE [dbo].[Nationality]
(
[NationalityID] [int] IDENTITY(1,1) NOT NULL,
[Country] [nvarchar](50) NULL,
[Abbreviation] [nvarchar](5) NULL,
[Adjective] [nvarchar] (130) NULL,
[Person] [nvarchar] (60) NULL
) ON [PRIMARY]
GO
-------------------------------------------------------------------------------
-- INSERT VALUES
-------------------------------------------------------------------------------
INSERT INTO [dbo].[Nationality](Country, Abbreviation, Adjective, Person )
VALUES ( 'AMERICAN - USA','US','US (used attributively only, as in US aggression but not He is US)','a US citizen' ),
( 'ARGENTINA','AR','Argentinian','an Argentinian' ),
( 'AUSTRALIA','AU','Australian','an Australian' ),
( 'BAHAMAS','BS','Bahamian','a Bahamian' ),
( 'BELGIUM','BE','Belgian','a Belgian' ),
GO
-------------------------------------------------------------------------------
-- ADD CLUSTERED INDEX
-------------------------------------------------------------------------------
CREATE CLUSTERED INDEX [idxNationality] ON [dbo].[Nationality]
(
[NationalityID] ASC,
[Country] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
EXEC sys.sp_addextendedproperty #name=N'TableDiscription', #value=N'CreatedBy: Dan Flynn, Sr. SQL Server DBA
CreationDate: 02/12/2014
Nationality table contains five columns, i.e.:
1. NationalityID, 2. Country, 3. Abbreviation, 4. Adjective, 5. Person
IDs 1 to 34 are alphabetical countries that are statistically the most popular as far as interaction with the United States. IDs 35 to 248 are also alphabetical for the rest of the countries.
' , #level0type=N'SCHEMA',#level0name=N'dbo', #level1type=N'TABLE',#level1name=N'Nationality'
GO
To convert T-SQL to be compatible with Postres' SQL dialect you can use the following steps.
Remove all square brackets (they are illegal in SQL identifiers). If you have identifiers that require them use double quotes " but I would highly recommend to avoid quoted identifiers completely (so never use " in SQL)
Remove all GO statements and end the statements with ; (Something that is recommended for SQL Server as well)
Remove the [dbo]. schema prefix if you didn't create one in Postgres (you typically don't)
Remove the ON [Primary] option it's not needed in Postgres (the equivalent would be to define a tablespace, but that's hardly ever needed in Postgres)
There is no IF in SQL (or Postgres), to conditionally drop a table use DROP TABLE IF EXISTS ....
There are no clustered indexes in Postgres, so just make that a regular index and remove all the options that are introduced by the WITH keyword.
Comments on tables are defined through comment on, not by calling a stored procedure
identity(x,y) needs to be replaced with the standard SQL generated always as identity
There is no nvarchar type, just make everything varchar and make sure your database was created with an encoding that can store multi-byte characters (by default it's UTF-8, so that should be fine)
Not required, but: it's highly recommended to use snake_case identifiers, rather than CamelCase in Postgres
Putting that all together the script should be something like this:
DROP TABLE IF EXISTS Nationality CASCADE;
CREATE TABLE nationality
(
Nationality_id int generated always as IDENTITY NOT NULL,
Country varchar(50) NULL,
Abbreviation varchar(5) NULL,
Adjective varchar (130) NULL,
Person varchar (60) NULL
);
INSERT INTO Nationality (Country, Abbreviation, Adjective, Person )
VALUES ( 'AMERICAN - USA','US','US (used attributively only, as in US aggression but not He is US)','a US citizen' ),
( 'ARGENTINA','AR','Argentinian','an Argentinian' ),
( 'AUSTRALIA','AU','Australian','an Australian' ),
( 'BAHAMAS','BS','Bahamian','a Bahamian' ),
( 'BELGIUM','BE','Belgian','a Belgian' );
CREATE INDEX idx_Nationality ON Nationality
(
Nationality_ID ASC,
Country ASC
);
comment on table nationality is 'CreatedBy: Dan Flynn, Sr. SQL Server DBA
CreationDate: 02/12/2014
Nationality table contains five columns, i.e.:
1. NationalityID, 2. Country, 3. Abbreviation, 4. Adjective, 5. Person
IDs 1 to 34 are alphabetical countries that are statistically the most popular as far as interaction with the United States. IDs 35 to 248 are also alphabetical for the rest of the countries.
';
I am a bit surprised that there is no primary key defined. You probably want to add:
alter table nationality
add primary key (nationality_id);
There is an SQL standard, but nobody implements it fully. Some database systems like PostgreSQL are better at sticking to the standard, others like Microsoft SQL Server are not.
The upshot of this is that you cannot take SQL that works on one RDBMS and use it with another one. You will have to translate that to the PostgreSQL dialect.

When I retrieve a non-indexed column from a large table, SQL Server looks up the PK

I have a huge table which has millions of rows:
create table BigTable
(
id bigint IDENTITY(1,1) NOT NULL,
value float not null,
vid char(1) NOT NULL,
dnf char(4) NOT NULL,
rbm char(6) NOT NULL,
cnt char(2) not null,
prs char(1) not null,
...
PRIMARY KEY CLUSTERED (id ASC)
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF,
IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON,
ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
)
There is a non-clustered unique index on this table which is on the following columns:
vid, dnf, rbm, cnt, prs
Now I would like to get some data out of this huge table with a partial index:
insert into #t
select b.id, b.rbm, b.value
from
(select distinct x from #t1) a
join
BigTable b on b.vid = '1'
and b.dnf = '1234'
and b.rbm = a.x
where:
create table #t
(
id integer primary key,
rbm varchar(8),
val float null
)
create index tmp_idx_t on #t(rbm)
create table #t1 (rbm char(6) not null)
If I don't include the value column in the query, the execution plan will not show the PK lookup on BigTable. But if I have the value in the resultset, a PK lookup will be on BigTable. The output list of this lookup is the very column val.
But I am not using any PKs here so it takes a lot of time to finish. Is there a way to stop that PK lookup? Or am I writing wrong SQL?
I know the BigTable is not a great design but I can't change it.
In order to avoid the PK index lookup you'll need to include all the columns (filtering predicate AND result set) in the index. For example:
create index ix1 on BigTable (dnf, vid, rbm) include (value);
This index only uses the first three columns (dnf, vid, rbm) for search, and adds (value) columns as data. As pointed out by #DanGusman the id column is always present in a SQL Server secondary index. This way the index has all the data it needs to resolve the query.
Alternatively, you could use the old way:
create index ix1 on BigTable (dnf, vid, rbm, value);
This will also work, but will generate a heavier index. That being said, this heavier index may also be of use for other queries.
Your index is lacking the id column. In order to resolve the query, the engine needs to go to the data pages. I would advise you include the column in the index, so the index covers the query.
I am thinking that you might try writing the query as:
select b.id, b.rbm, b.value
from BigTable b
where b.vid = '1' and
b.dnf = '1234' and
exists (select 1 from #t1 t1 where t1.x = b.rbm);
This might encourage SQL Server to use a more optimal execution plan, even when id is not in the index.

SQL efficiency rows or tables

I'm creating a database that holds yield values of electric engines. The yield values are stored in an Excel file which I have to transfer to the database. Each test for an engine has 42 rows (torque) and 42 columns (power in kw) with the values stored in these cells.
(kw) 1,0 1,2 ...(42x)
-------- -------
(rpm)2000 76,2 77,0
2100 76,7 77,6
...
(42x)
Well I thought of creating a column for engine_id, test_id (each engine can have more than one test), and 42 columns for the corresponding yield values. For each test I have to add 42 rows for one single engine with the yield values. This doesn't seem efficient nor easy to implement to me.
If there are 42 records (rows) for 1 single engine, in a matter of time the database will hold up several thousands of rows and searching for a specific engine with the corresponding values will be an exhausting task.
If I make for each test for a specific engine a separate table, again after some time I would I have probably thousands of tables. Now what should I go for, a table with thousands of records or a table with 42 columns and 42 rows? Either way, I still have redundant records.
A database is definitely the answer (searching through many millions, or hundred of millions of rows is pretty easy once you get the hang of SQL (the language for interacting with databases). I would recommend a table structure of
EngineId, TestId, TourqueId, PowerId, YieldValue
Which would have values...
Engine1, Test1, 2000, 1.0, 73.2
So only 5 columns. This will give you the flexibility to add more yield results in future should it be required (or even if its not, its just an easier schema anyway). You will need to learn SQL, however, to realise the power of the database over a spreadsheet. Also, there are many techniques for importing Excel data to SQL, so you should investigate that (Google it). If you find you are transferring all that data by hand then you are doing something wrong (not wrong really, but inefficient!).
Further to your comments, here is the exact schema with index (in MS SQL Server)
CREATE TABLE [dbo].[EngineTestResults](
[EngineId] [varchar](50) NOT NULL,
[TestId] [varchar](50) NOT NULL,
[Tourque] [int] NOT NULL,
[Power] [decimal](18, 4) NOT NULL,
[Yield] [decimal](18, 4) NOT NULL,
CONSTRAINT [PK_EngineTestResults] PRIMARY KEY CLUSTERED
(
[EngineId] ASC,
[TestId] ASC,
[Tourque] ASC,
[Power] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO
/****** Object: Index [IX_EngineTestResults] Script Date: 01/14/2012 14:26:21 ******/
CREATE NONCLUSTERED INDEX [IX_EngineTestResults] ON [dbo].[EngineTestResults]
(
[EngineId] ASC,
[TestId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
So note that there is no incrementing primary key...the key is (EngineId, TestId, Torque, Power). To get the results for a particular engine you would run a query like the following:
Select * from EngineTestResults where engineId = 'EngineABC' and TestId = 'TestA'
Note that I have added an index for that set of criteria.
The strength of a relational database is the ability to normalize data across multiple tables, so you could have one table for engines, one for tests and one for results. Something like the following:
CREATE TABLE tbl__engines (
`engine_id` SMALLINT UNSIGNED NOT NULL,
`name` VARCHAR(255) NOT NULL,
PRIMARY KEY(engine_id)
);
CREATE TABLE tbl__tests (
`test_id` INT UNSIGNED NOT NULL,
`engine_id` SMALLINT UNSIGNED NOT NULL,
PRIMARY KEY(test_id),
FOREIGN KEY(engine_id) REFERENCES tbl__engines(engine_id)
);
CREATE TABLE tbl__test_result (
`result_id` INT UNSIGNED NOT NULL,
`test_id` INT UNSIGNED NOT NULL,
`torque` INT NOT NULL,
`power` DECIMAL(6,2) NOT NULL,
`yield` DECIMAL(6,2) NOT NULL,
FOREIGN KEY(test_id) REFERENCES tbl__tests(test_id)
);
Then you can simply perform a join across these three tables to return the required results. Something like:
SELECT
*
FROM `tbl__engines` e
INNER JOIN `tbl__tests` t ON e.engine_id = t.engine_id
INNER JOIN `tbl__results` r ON r.test_id = t.test_id;

Massive CROSS JOIN in SQL Server 2005

I'm porting a process which creates a MASSIVE CROSS JOIN of two tables. The resulting table contains 15m records (looks like the process makes a 30m cross join with a 2600 row table and a 12000 row table and then does some grouping which must split it in half). The rows are relatively narrow - just 6 columns. It's been running for 5 hours with no sign of completion. I only just noticed the count discrepancy between the known good and what I would expect for the cross join, so my output doesn't have the grouping or deduping which will halve the final table - but this still seems like it's not going to complete any time soon.
First I'm going to look to eliminate this table from the process if at all possible - obviously it could be replaced by joining to both tables individually, but right now I do not have visibility into everywhere else it is used.
But given that the existing process does it (in less time, on a less powerful machine, using the FOCUS language), are there any options for improving the performance of large CROSS JOINs in SQL Server (2005) (hardware is not really an option, this box is a 64-bit 8-way with 32-GB of RAM)?
Details:
It's written this way in FOCUS (I'm trying to produce the same output, which is a CROSS JOIN in SQL):
JOIN CLEAR *
DEFINE FILE COSTCENT
WBLANK/A1 = ' ';
END
TABLE FILE COSTCENT
BY WBLANK BY CC_COSTCENT
ON TABLE HOLD AS TEMPCC FORMAT FOCUS
END
DEFINE FILE JOINGLAC
WBLANK/A1 = ' ';
END
TABLE FILE JOINGLAC
BY WBLANK BY ACCOUNT_NO BY LI_LNTM
ON TABLE HOLD AS TEMPAC FORMAT FOCUS INDEX WBLANK
JOIN CLEAR *
JOIN WBLANK IN TEMPCC TO ALL WBLANK IN TEMPAC
DEFINE FILE TEMPCC
CA_JCCAC/A16=EDIT(CC_COSTCENT)|EDIT(ACCOUNT_NO);
END
TABLE FILE TEMPCC
BY CA_JCCAC BY CC_COSTCENT AS COST CENTER BY ACCOUNT_NO
BY LI_LNTM
ON TABLE HOLD AS TEMPCCAC
END
So the required output really is a CROSS JOIN (it's joining a blank column from each side).
In SQL:
CREATE TABLE [COSTCENT](
[COST_CTR_NUM] [int] NOT NULL,
[CC_CNM] [varchar](40) NULL,
[CC_DEPT] [varchar](7) NULL,
[CC_ALSRC] [varchar](6) NULL,
[CC_HIER_CODE] [varchar](20) NULL,
CONSTRAINT [PK_LOOKUP_GL_COST_CTR] PRIMARY KEY NONCLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY
= OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
CREATE TABLE [JOINGLAC](
[ACCOUNT_NO] [int] NULL,
[LI_LNTM] [int] NULL,
[PR_PRODUCT] [varchar](5) NULL,
[PR_GROUP] [varchar](1) NULL,
[AC_NAME_LONG] [varchar](40) NULL,
[LI_NM_LONG] [varchar](30) NULL,
[LI_INC] [int] NULL,
[LI_MULT] [int] NULL,
[LI_ANLZ] [int] NULL,
[LI_TYPE] [varchar](2) NULL,
[PR_SORT] [varchar](2) NULL,
[PR_NM] [varchar](26) NULL,
[PZ_SORT] [varchar](2) NULL,
[PZNAME] [varchar](26) NULL,
[WANLZ] [varchar](3) NULL,
[OPMLNTM] [int] NULL,
[PS_GROUP] [varchar](5) NULL,
[PS_SORT] [varchar](2) NULL,
[PS_NAME] [varchar](26) NULL,
[PT_GROUP] [varchar](5) NULL,
[PT_SORT] [varchar](2) NULL,
[PT_NAME] [varchar](26) NULL
) ON [PRIMARY]
CREATE TABLE [JOINCCAC](
[CA_JCCAC] [varchar](16) NOT NULL,
[CA_COSTCENT] [int] NOT NULL,
[CA_GLACCOUNT] [int] NOT NULL,
[CA_LNTM] [int] NOT NULL,
[CA_UNIT] [varchar](6) NOT NULL,
CONSTRAINT [PK_JOINCCAC_KNOWN_GOOD] PRIMARY KEY CLUSTERED
(
[CA_JCCAC] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY
= OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
With the SQL Code:
INSERT INTO [JOINCCAC]
(
[CA_JCCAC]
,[CA_COSTCENT]
,[CA_GLACCOUNT]
,[CA_LNTM]
,[CA_UNIT]
)
SELECT Util.PADLEFT(CONVERT(varchar, CC.COST_CTR_NUM), '0',
7)
+ Util.PADLEFT(CONVERT(varchar, GL.ACCOUNT_NO), '0',
9) AS CC_JCCAC
,CC.COST_CTR_NUM AS CA_COSTCENT
,GL.ACCOUNT_NO % 900000000 AS CA_GLACCOUNT
,GL.LI_LNTM AS CA_LNTM
,udf_BUPDEF(GL.ACCOUNT_NO, CC.COST_CTR_NUM, GL.LI_LNTM, 'N') AS CA_UNIT
FROM JOINGLAC AS GL
CROSS JOIN COSTCENT AS CC
Depending on how this table is subsequently used, it should be able to be eliminated from the process, by simply joining to both the original tables used to build it. However, this is an extremely large porting effort, and I might not find the usage of the table for some time, so I was wondering if there were any tricks to CROSS JOINing big tables like that in a timely fashion (especially given that the existing process in FOCUS is able to do it more speedily). That way I could validate the correctness of my building of the replacement query and then later factor it out with views or whatever.
I am also considering factoring out the UDFs and string manipulation and performing the CROSS JOIN first to break the process up a bit.
RESULTS SO FAR:
It turns out that the UDFs do contribute a lot (negatively) to the performance. But there also appears to be a big difference between a 15m row cross join and a 30m row cross join. I do not have SHOWPLAN rights (boo hoo), so I can't tell whether the plan it is using is better or worse after changing indexes. I have not refactored it yet, but am expecting the entire table to go away shortly.
Examining that query shows only one column used from one table, and only two columns used from the other table. Due to the very low numbers of columns used, this query can be easily enhanced with covering indexes:
CREATE INDEX COSTCENTCoverCross ON COSTCENT(COST_CTR_NUM)
CREATE INDEX JOINGLACCoverCross ON JOINGLAC(ACCOUNT_NO, LI_LNTM)
Here are my questions for further optimization:
When you put the query in query analyzer and whack the "show estimated execution plan" button, it will show a graphical representation of what it's going to do.
Join Type: There should be a nested loop join in there. (the other options are merge join and hash join). If you see nested loop, then ok. If you see merge join or hash join, let us know.
Order of table access: Go all the way to the top and scroll all the way to the right. The first step should be accessing a table. Which table is that and what method is used(index scan, clustered index scan)? What method is used to access the other table?
Parallelism: You should see the little jaggedy arrows on almost all icons in the plan indicating that parallelism is being used. If you don't see this, there is a major problem!
That udf_BUPDEF concerns me. Does it read from additional tables? Util.PADLEFT concerns me less, but still.. what is it? If it isn't a Database Object, then consider using this instead:
RIGHT('z00000000000000000000000000' + columnName, 7)
Are there any triggers on JOINCCAC? How about indexes? With an insert this large, you'll want to drop all triggers and indexes on that table.
Continuing on what others a saying, DB functions that contained queries which are used in a select always made my queries extremely slow. Off the top of my head, I believe i had a query run in 45 seconds, then I removed the function, and then result was 0 seconds :)
So check udf_BUPDEF is not doing any queries.
Break down the query to make it a plain simple cross join.
SELECT CC.COST_CTR_NUM, GL.ACCOUNT_NO
,CC.COST_CTR_NUM AS CA_COSTCENT
,GL.ACCOUNT_NO AS CA_GLACCOUNT
,GL.LI_LNTM AS CA_LNTM
-- I don't know what is BUPDEF doing? but remove it from the query for time being
-- ,udf_BUPDEF(GL.ACCOUNT_NO, CC.COST_CTR_NUM, GL.LI_LNTM, 'N') AS CA_UNIT
FROM JOINGLAC AS GL
CROSS JOIN COSTCENT AS CC
See how good is the simple cross join? (without any functions applied on it)