How can i delete a partition fuction on sql - sql

So i got and ETL that stores 3 years '17 (corrupt), '18(corrupt), '19:
STG_tables: import Data from 3 differents DB and Export it to
DWH_tables: This is the Relational fase where all the historical information is stored. Here only the normalization and parameterization of the tables and the fields are carried out to adapt them to the developed logical model, but no business rules are applied.
DIM_tables: Finally, in the Dimensional Fase, the business rules are applied and the tables and indexes are optimized for the queries, since this is where the analytical tools will attack.
I got 2 types of Reloads:
Daily Reload: This Job is responsible for executing the SSIS packages necessary to perform the incremental daily load of the Data Warehouse. it only loads the last partition of the large tables (corresponding to the current year) in the dimensional Fase.
Full Reload: Loads full 3 years (this one is not working)
This wasn't done by me and i have 0 technical documentation, so im just trying to figure out how this works, my thinking is that once i get to do this full reload, the data will restore.
Im getting error on STG_fase:
DROP TABLE DWH_PROD.DWH_XX;
DROP TABLE ... ':' The partition function 'pfPetitions' is being used in one or more partition schemes.'. Possible reasons for the error: problems with the query, the property 'ResultSet' was not set correctly, parameters not set correctly or connection poorly established.
i dont know how to drop this partition so i can create it again
and cant find 'ResultSet' property, please help
USE DB;
GO
DROP TABLE DWH_PROD.DWH_ALBARANES_TARIFA;
DROP TABLE DWH_PROD.DWH_PETICIONES;
DROP TABLE DWH_PROD.DWH_SOLICITUDES;
DROP TABLE DWH_PROD.DWH_RESULTADOS;
DROP TABLE DWH_PROD.DWH_INCIDENCIAS;
-------i delete code so the text is not so big------
Here there are all the creation of the drop tables above
IF NOT EXISTS (SELECT * FROM sys.tables WHERE name = N'DWH_ALBARANES_TARIFA')
CREATE TABLE DWH_PROD.DWH_ALBARANES_TARIFA (
);
IF NOT EXISTS (SELECT * FROM sys.tables WHERE name = N'DWH_INCIDENCIAS')
CREATE TABLE DWH_PROD.DWH_INCIDENCIAS (
);
IF EXISTS (SELECT * FROM sys.partition_functions WHERE name = N'pfPeticiones')
DROP PARTITION FUNCTION pfPeticiones;
CREATE PARTITION FUNCTION pfPeticiones (DATE)
AS RANGE RIGHT FOR VALUES
('2017-01-01', '2018-01-01', '2019-01-01');
IF EXISTS (SELECT * FROM sys.partition_schemes WHERE name = N'psPeticiones')
DROP PARTITION SCHEME psPeticiones;
CREATE PARTITION SCHEME psPeticiones
AS PARTITION pfPeticiones
ALL TO ([Primary]);
IF NOT EXISTS (SELECT * FROM sys.tables WHERE name = N'DWH_PETICIONES')
CREATE TABLE DWH_PROD.DWH_PETICIONES (
) ON psPeticiones(FEC_PETICION);
IF NOT EXISTS (SELECT * FROM sys.tables WHERE name = N'DWH_SOLICITUDES')
CREATE TABLE DWH_PROD.DWH_SOLICITUDES (
) ON psPeticiones(FEC_PETICION);
IF NOT EXISTS (SELECT * FROM sys.tables WHERE name = N'DWH_RESULTADOS')
CREATE TABLE DWH_PROD.DWH_RESULTADOS (
) ON psPeticiones(FEC_PETICION);

You need to perform a few actions in order to do delete a partitioning function:
Delete or move (i.e. if you have a heap, create a clustered index on PRIMARY) all tables that use the partitioning schema.
Delete the partitioning schema.
Delete the partitioning function.

Related

How to generate a script to create all tables with different schema

I have about 200 tables in a schema.
I need to replicate these tables in a new backup schema with an automatic procedure.
I would like to create a procedure to dynamically recreate all the Tables in a Schema (potentially dynamic number of tables and columns) on a different schema.
I can cycle all the tables and create the SELECT * INTO dbo_b.TABLE FROM dbo.TABLE statement, but I get the error:
Column 'AMBIENTE' has a data type that cannot participate in a columnstore index.
I created a view that simply SELECT * FROM TABLE, and tried to perform the SELECT * INTO dbo_b.TABLE from dbo.VIEW but I got the same issue.
It works only if I create the dbo_b.Table and INSERT INTO it: so I would need to generate a script to automatically cycle all the tables in my schema and generate a script to create the tables in the new schema.
It's not a one time job, it should run every day so I cannot do it manually.
Seams we get the same issue.
You can try to loop on all table and create table in the new schema in this way:
IF EXISTS(SELECT * FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_NAME = 'YYYY' AND TABLE_SCHEMA = 'XXXX')
drop table [ZZZZ].[YYYY]
CREATE TABLE [ZZZZ].[YYYY]
WITH ( DISTRIBUTION = ROUND_ROBIN
, HEAP ) as
( SELECT * FROM XXXX.YYYY )
Let me know. BR

Using Polybase to load data into an existing table in parallel

Using CTAS we can leverage the parallelism that Polybase provides to load data into a new table in a highly scalable and performant way.
Is there a way to use a similar approach to load data into an existing table? The table might even be empty.
Creating an external table and using INSERT INTO ... SELECT * FROM ... - I would assume that this goes through the head node and is therefore not in parallel?
I know that I could also drop the table and use CTAS to recreate it but then I have to deal with all the metadata again (column names, data types, distributions, ...).
You could use partition switching to do this, although remember not to use too many partitions with Azure SQL Data Warehouse. See 'Partition Sizing Guidance' here.
Bear in mind check constraints are not supported so the source table has to use the same partition scheme as the target table.
Full example with partitioning and switch syntax:
-- Assume we have a file with the values 1 to 100 in it.
-- Create an external table over it; will have all records in
IF NOT EXISTS ( SELECT * FROM sys.schemas WHERE name = 'ext' )
EXEC ( 'CREATE SCHEMA ext' )
GO
-- DROP EXTERNAL TABLE ext.numbers
IF NOT EXISTS ( SELECT * FROM sys.external_tables WHERE object_id = OBJECT_ID('ext.numbers') )
CREATE EXTERNAL TABLE ext.numbers (
number INT NOT NULL
)
WITH (
LOCATION = 'numbers.csv',
DATA_SOURCE = eds_yourDataSource,
FILE_FORMAT = ff_csv
);
GO
-- Create a partitioned, internal table with the records 1 to 50
IF OBJECT_ID('dbo.numbers') IS NOT NULL DROP TABLE dbo.numbers
CREATE TABLE dbo.numbers
WITH (
DISTRIBUTION = ROUND_ROBIN,
CLUSTERED INDEX ( number ),
PARTITION ( number RANGE LEFT FOR VALUES ( 50, 100, 150, 200 ) )
)
AS
SELECT *
FROM ext.numbers
WHERE number Between 1 And 50;
GO
-- DBCC PDW_SHOWPARTITIONSTATS ('dbo.numbers')
-- CTAS the second half of the external table, records 51-100 into an internal one.
-- As check contraints are not available in SQL Data Warehouse, ensure the switch table
-- uses the same scheme as the original table.
IF OBJECT_ID('dbo.numbers_part2') IS NOT NULL DROP TABLE dbo.numbers_part2
CREATE TABLE dbo.numbers_part2
WITH (
DISTRIBUTION = ROUND_ROBIN,
CLUSTERED INDEX ( number ),
PARTITION ( number RANGE LEFT FOR VALUES ( 50, 100, 150, 200 ) )
)
AS
SELECT *
FROM ext.numbers
WHERE number > 50
GO
-- Partition switch it into the original table
ALTER TABLE dbo.numbers_part2 SWITCH PARTITION 2 TO dbo.numbers PARTITION 2;
SELECT *
FROM dbo.numbers
ORDER BY 1;

Best way of selecting 8k+ rows from a table

I have an excel sheet that contains more than 8k IDs. I have a table in SQL server that contains those IDs and related entries. What would be the best way to get those rows? The way I am doing right now is to use export data function from for the specific table using query:
select * from table_name where uID in (ALL 8K IDs)
Since this has to be done multiple times I suggest using bulk insert from the csv file to a temporary sql table and then use inner join with this table.
Assuming your csv file contains the ids in a single row, (i.e 1,34,345,....), something like this should do the trick:
-- create the temporary table
CREATE TABLE #CSVData
(
IdValue int
)
-- create a clustered index for this table (Note: this doesn't need to be unique)
CREATE CLUSTERED INDEX IX_CSVData on #CSVData (IdValue )
-- insert the csv data to the table
BULK INSERT #CSVData
FROM 'c:\csvData.txt'
WITH
(
ROWTERMINATOR = ','
)
-- select the data
SELECT T.*
FROM table_name T
INNER JOIN #CSVData ON(T.uId = IdValue)
-- cleanup (the index will be dropped with the table)
DROP TABLE #CSVData
One more link to look at is This article by Pinal dave on sqlauthority.

How can I copy a Redshift table but add a sortkey to a column?

I'm currently working on a project that uses a Redshift table with 51 columns. However, the person who made the table forgot to add a sortkey to our time column which will hurt performance for our use case if we don't add it.
How can I make a version of the table with our time column as the sortkey? I'm aware that you can't make a column a sortkey if its a member of an existing table, but I was hoping there's a way to do it that doesn't involve writing out the CREATE TABLE syntax by hand; for example, something like this would be nice:
timecube=# CREATE TABLE foo (like bar) sortkey(time);
ERROR: CREATE TABLE LIKE is not supported with DISTSTYLE, DISTKEY(), or SORTKEY() clauses
but as you can see its not supported. Is there another way? As we're still developing we don't need any of existing data.
Using traditional tools like pgdump didn't work well because they don't include any of the Redshift extras like encoding.
Redshift supports specifying the DIST and SORT keys as part of CREATE TABLE AS statements, as per the docs.
CREATE TABLE table_name
DISTSTYLE KEY
DISTKEY ( column )
SORTKEY ( column )
AS
(SELECT *
FROM source_table)
;
First step you need to do use get create table statement for existing table. Then create new table this time add sort key to new table.
Check encoding for old table ( when you load data using copy command it automatically adds compression encodings)
select "column", type, encoding
from pg_table_def where tablename = 'old_table'
When creating new table add encoding type for each column. Create table with Sort key .
Once new table is created use below command
insert into new table ( select * from old table order by time asc)

Create a unique primary key (hash) from database columns

I have this table which doesn't have a primary key.
I'm going to insert some records in a new table to analyze them and I'm thinking in creating a new primary key with the values from all the available columns.
If this were a programming language like Java I would:
int hash = column1 * 31 + column2 * 31 + column3*31
Or something like that. But this is SQL.
How can I create a primary key from the values of the available columns? It won't work for me to simply mark all the columns as PK, for what I need to do is to compare them with data from other DB table.
My table has 3 numbers and a date.
EDIT What my problem is
I think a bit more of background is needed. I'm sorry for not providing it before.
I have a database ( dm ) that is being updated everyday from another db ( original source ) . It has records form the past two years.
Last month ( july ) the update process got broken and for a month there was no data being updated into the dm.
I manually create a table with the same structure in my Oracle XE, and I copy the records from the original source into my db ( myxe ) I copied only records from July to create a report needed by the end of the month.
Finally on aug 8 the update process got fixed and the records which have been waiting to be migrated by this automatic process got copied into the database ( from originalsource to dm ).
This process does clean up from the original source the data once it is copied ( into dm ).
Everything look fine, but we have just realize that an amount of the records got lost ( about 25% of july )
So, what I want to do is to use my backup ( myxe ) and insert into the database ( dm ) all those records missing.
The problem here are:
They don't have a well defined PK.
They are in separate databases.
So I thought that If I could create a unique pk from both tables which gave the same number I could tell which were missing and insert them.
EDIT 2
So I did the following in my local environment:
select a.* from the_table#PRODUCTION a , the_table b where
a.idle = b.idle and
a.activity = b.activity and
a.finishdate = b.finishdate
Which returns all the rows that are present in both databases ( the .. union? ) I've got 2,000 records.
What I'm going to do next, is delete them all from the target db and then just insert them all s from my db into the target table
I hope I don't get in something worst : - S : -S
The danger of creating a hash value by combining the 3 numbers and the date is that it might not be unique and hence cannot be used safely as a primary key.
Instead I'd recommend using an autoincrementing ID for your primary key.
Just create a surrogate key:
ALTER TABLE mytable ADD pk_col INT
UPDATE mytable
SET pk_col = rownum
ALTER TABLE mytable MODIFY pk_col INT NOT NULL
ALTER TABLE mytable ADD CONSTRAINT pk_mytable_pk_col PRIMARY KEY (pk_col)
or this:
ALTER TABLE mytable ADD pk_col RAW(16)
UPDATE mytable
SET pk_col = SYS_GUID()
ALTER TABLE mytable MODIFY pk_col RAW(16) NOT NULL
ALTER TABLE mytable ADD CONSTRAINT pk_mytable_pk_col PRIMARY KEY (pk_col)
The latter uses GUID's which are unique across databases, but consume more spaces and are much slower to generate (your INSERT's will be slow)
Update:
If you need to create same PRIMARY KEYs on two tables with identical data, use this:
MERGE
INTO mytable v
USING (
SELECT rowid AS rid, rownum AS rn
FROM mytable
ORDER BY
co1l, col2, col3
)
ON (v.rowid = rid)
WHEN MATCHED THEN
UPDATE
SET pk_col = rn
Note that tables should be identical up to a single row (i. e. have same number of rows with same data in them).
Update 2:
For your very problem, you don't need a PK at all.
If you just want to select the records missing in dm, use this one (on dm side)
SELECT *
FROM mytable#myxe
MINUS
SELECT *
FROM mytable
This will return all records that exist in mytable#myxe but not in mytable#dm
Note that it will shrink all duplicates if any.
Assuming that you have ensured uniqueness...you can do almost the same thing in SQL. The only problem will be the conversion of the date to a numeric value so that you can hash it.
Select Table2.SomeFields
FROM Table1 LEFT OUTER JOIN Table2 ON
(Table1.col1 * 31) + (Table1.col2 * 31) + (Table1.col3 * 31) +
((DatePart(year,Table1.date) + DatePart(month,Table1.date) + DatePart(day,Table1.date) )* 31) = Table2.hashedPk
The above query would work for SQL Server, the only difference for Oracle would be in terms of how you handle the date conversion. Moreover, there are other functions for converting dates in SQL Server as well, so this is by no means the only solution.
And, you can combine this with Quassnoi's SET statement to populate the new field as well. Just use the left side of the Join condition logic for the value.
If you're loading your new table with values from the old table, and you then need to join the two tables, you can only "properly" do this if you can uniquely identify each row in the original table. Quassnoi's solution will allow you to do this, IF you can first alter the old table by adding a new column.
If you cannot alter the original table, generating some form of hash code based on the columns of the old table would work -- but, again, only if the hash codes uniquely identify each row. (Oracle has checksum functions, right? If so, use them.)
If hash code uniqueness cannot be guaranteed, you may have to settle for a primary key composed of as many columns are required to ensure uniqueness (e.g. the natural key). If there is no natural key, well, I heard once that Oracle provides a rownum for each row of data, could you use that?