Populate surrogate Datekey based on Date in a column - sql

I have created a table
CREATE TABLE myTable(
DateID INT PRIMARY KEY NOT NULL,
myDate DATE)
Then I want to populate the table with data from a staging table using SSIS. The problem is that I don't understand how to generate DateID based on the incoming value from a staging table. For example:
after the staging table inserted 2020-12-12, I want my DateID to become 20201212 and so on.
I tried to google this issue but didn't find anything related to my case. (But I don't deny that I did it badly). How can I do it?

You can use a persisted computed column:
CREATE TABLE myTable (
DateID AS (YEAR(myDate) * 10000 + MONTH(myDate) * 100 + DAY(myDate)) PERSISTED PRIMARY KEY,
myDate DATE
);
Here is a db<>fiddle.

Related

ADD CONSTRAINT on date part of a datetime column

I want to add a unique constraint on multiple columns. Usually, the following script should do the job:
CREATE UNIQUE INDEX uq_yourtablename
ON dbo.yourtablename(column1, column2);
How about if columns2 is DateTime and we want to set the constraint only on the date part?
One solution could be using triggers but I need to avoid that.
You can use a computed column to virtually store the date part of the datetime column, and use it in the unique index.
create table yourtablename (
column1 int,
column2 datetime,
column2_dt as convert(date, column2)
);
create unique index uq_yourtablename on yourtablename(column1, column2_dt);

how to create date

how to create date format yyyy-mm with postgresql11
CREATE TABLE public."ASSOL"
(
id integer NOT NULL,
"ind" character(50) ,
"s_R" character(50) ,
"R" character(50) ,
"th" character(50),
"C_O" character(50) ,
"ASSOL" numeric(11,3),
date date,
CONSTRAINT "ASSOL_pkey" PRIMARY KEY (id)
This is a variation of Kaushik's answer.
You should just use the date data type. There is no need to create another type for this. However, I would implement this use a check constraint:
CREATE TABLE public.ASSOL (
id serial primary key,
ind varchar(50) ,
s_R varchar(50) ,
R varchar(50) ,
th varchar(50),
C_O varchar(50) ,
ASSOL numeric(11,3),
yyyymm date,
constraint chk_assol_date check (date = date_trunc('month', date))
);
This only allows you to insert values that are the first day of the month. Other inserts will fail.
Additional notes:
Don't use double quotes when creating tables. You then have to refer to the columns/tables using double quotes, which just clutters queries. Your identifiers should be case-insensitive.
An integer primary key would normally be a serial column.
NOT NULL is redundant for a PRIMARY KEY column.
Use reasonable names for columns. If you want a column to represent a month, then yyyymm is more informative than date.
Postgres stores varchar() and char() in the same way, but for most databases, varchar() is preferred because trailing spaces actually occupy bytes on the data pages.
for year and month you can try like below
SELECT to_char(now(),'YYYY-MM') as year_month
year_month
2019-05
You cannot create a date datatype that stores only the year and month component. There's no such option available at the data type level.
If you want to to truncate the day component to default it to start of month, you may do it. This is as good as having only the month and year component as all the dates will have day = 1 and only the month and year would change as per the time of running insert.
For Eg:
create table t ( id int, col1 text,
"date" date default date_trunc('month',current_date) );
insert into t(id,col1) values ( 1, 'TEXT1');
select * from t
d col1 date
1 TEXT1 2019-05-01
If you do not want to store a default date, simply use the date_trunc('month,date) expression wherever needed, it could either be in group by or in a select query.

sql current date constraint

I need to add a constraint to one table in my database. The table name is Experience. And there is a column named ToDate. Every time the select statement executes like following.
select ToDate from Experience
It should return current date.
So every time select statement executes, the ToDate column get updated with current date.
I know I can do this with some type of sql trigger but is there a way to do it by sql constraint.
like
alter table add constraint...
Any help will be appreciated.
Thanks
You can use a computed column. That's specified like colname as <expression>:
create table t1(id int, dt as getdate());
insert t1 values (1);
select * from t1;
To add contraint ...
create table tbl (id int identity, dt datetime, colval varchar(10))
ALTER TABLE dbo.tbl
ADD CONSTRAINT col_dt_def
DEFAULT GETDATE() FOR dt;
Example of inserting to the table ..
insert into dbo.tbl(colval)
select 'somevalue'
select * from dbo.tbl
The result will be ..
id dt colval
1 2014-08-19 13:31:57.577 somevalue
You cannot use a constraint, because a constraint is basically a rule on what can go in the table, how the table can relate to others, etc. It has no bearing on the data in the table once it goes into the table. Now if I am understanding you correctly, you want to update the ToDate column whenever you select that column. Now you can't use a trigger either as mentioned here and here. They suggest a stored procedure where you would use an update followed by an insert. This is probably my preferred SQL method to go with if you have to use it repeated, which you seem to have to do. Though Andomar's answer is probably better.
Try this link code make help full
http://www.sqlatoms.com/queries/how-to-use-the-getdate-function-in-sql-server-3/
CREATE TABLE ProductOrders
(
OrderId int NOT NULL PRIMARY KEY IDENTITY,
ProductName nvarchar(50) NOT NULL,
OrderDate datetime NOT NULL DEFAULT GETDATE()
)

SQL Server Database unique number generation on any record insertion

I have like 11 columns in my database table and i am inserting data in 10 of them. i want to have a unique number like "1101 and so on" in the 11th column.
Any idea what should i do?? Thanks in advance.
SQL Server 2012 and above you can generate Sequence
Create SEQUENCE RandomSeq
start with 1001
increment by 1
Go
Insert into YourTable(Id,col1...)
Select NEXT VALUE FOR RandomSeq,col1....
or else you can use Identity
Identity(seed,increment)
You can start the seed from 1101 and increment the sequence by 1
Create table YourTable
(
id INT IDENTITY(1101,1),
Col varchar(10)
)
If you want to have that unique number in a different field then you can manipulate that field with primary key and insert that value.
If you want in primary key value, then open the table in design mode, go to 'Identity specification', set 'identity increment' and 'identity seed' as you want.
Alternatively you can use table script like,
CREATE TABLE Persons
(
ID int IDENTITY(12,1) PRIMARY KEY,
FName varchar(255) NOT NULL,
)
here the primary key will start seeding from 12 and seed value will be 1.
If you have your table definition already in place you can alter the column and add Computed column marked as persisted as:
ALTER TABLE tablename drop column column11;
ALTER TABLE tablename add column11 as '11'
+right('000000'+cast(ID as varchar(10)), 2) PERSISTED ;
--You can change the right operator value from 2 to any as per the requirements.
--Also replace ID with the identity column in your table.
create table inc
(
id int identity(1100,1),
somec char
)

Type II dimension joins

I have the following table lookup table in OLTP
CREATE TABLE TransactionState
(
TransactionStateId INT IDENTITY (1, 1) NOT NULL,
TransactionStateName VarChar (100)
)
When this comes into my OLAP, I change the structure as follows:
CREATE TABLE TransactionState
(
TransactionStateId INT NOT NULL, /* not an IDENTITY column in OLAP */
TransactionStateName VarChar (100) NOT NULL,
StartDateTime DateTime NOT NULL,
EndDateTime NULL
)
My question is regarding the TransactionStateId column. Over time, I may have duplicate TransactionStateId values in my OLAP, but with the combination of StartDateTime and EndDateTime, they would be unique.
I have seen samples of Type-2 Dimensions where an OriginalTransactionStateId is added and the incoming TransactionStateId is mapped to it, plus a new TransactionStateId IDENTITY field becomes the PK and is used for the joins.
CREATE TABLE TransactionState
(
TransactionStateId INT IDENTITY (1, 1) NOT NULL,
OriginalTransactionStateId INT NOT NULL, /* not an IDENTITY column in OLAP */
TransactionStateName VarChar (100) NOT NULL,
StartDateTime DateTime NOT NULL,
EndDateTime NULL
)
Should I go with bachellorete #2 or bachellorete #3?
By this phrase:
With the combination of StartDateTime and EndDateTime, they would be unique.
you mean that they never overlap or that they satisfy the database UNIQUE constraint?
If the former, then you can use the StartDateTime in joins, but note that it may be inefficient, since it will use a "<=" condition instead of "=".
If the latter, then just use a fake identity.
Databases in general do not allow an efficient algorithm for this query:
SELECT *
FROM TransactionState
WHERE #value BETWEEN StartDateTime AND EndDateTime
, unless you do arcane tricks with SPATIAL data.
That's why you'll have to use this condition in a JOIN:
SELECT *
FROM factTable
CROSS APPLY
(
SELECT TOP 1 *
FROM TransactionState
WHERE StartDateTime <= factDateTime
ORDER BY
StartDateTime DESC
)
, which will deprive the optimizer of possibility to use HASH JOIN, which is most efficient for such queries in many cases.
See this article for more details on this approach:
Converting currencies
Rewriting the query so that it can use HASH JOIN resulted in 600% times performance gain, though it's only possible if your datetimes have accuracy of a day or lower (or a hash table will grow very large).
Since your time component is stripped of your StartDateTime and EndDateTime, you can create a CTE like this:
WITH cal AS
(
SELECT CAST('2009-01-01' AS DATE) AS cdate
UNION ALL
SELECT DATEADD(day, 1, cdate)
FROM cal
WHERE cdate <= '2009-03-01'
),
state AS
(
SELECT cdate, ts.*
FROM cal
CROSS APPLY
(
SELECT TOP 1 *
FROM TransactionState
WHERE StartDateTime <= cdate
ORDER BY
StartDateTime DESC
) ts
WHERE ts.EndDateTime >= cdate
)
SELECT *
FROM factTable
JOIN state
ON cdate = DATE(factDate)
If your date ranges span more than 100 dates, adjust MAXRECURSION option on CTE.
Please be aware that IDENTITY(1,1) is a declaration for auto-generating values in that column. This is different than PRIMARY KEY, which is a declaration that makes a column into a primary key clustered index. These two declarations mean different things and there are performance implications if you don't say PRIMARY KEY.
You could also use SSIS to load the DW. In the slowly changing dimension (SCD) transformation, you can set how to treat each attribute. If a historical attribute is selected, the type 2 SCD is applied to the whole row, and the transformation takes care of details. You also get to configure if you prefer start_date, end_date or a current/expired column.
The thing to differentiate here is difference between the primary key and a the business (natural) key. Primary key uniquely identifies a row in the table. Business key uniquely identifies a business object/entity and it can be repeated in a dimension table. Each time a SCD 2 is applied, a new row is inserted, with a new primary key, but the same business key; the old row is then marked as expired, while the new one is marked as current -- or start date and end date fields are populated appropriately.
The DW should not expose primary keys, so incoming data from OLTP contains business keys, while assignment of primary keys is under control of the DW; IDENTITY int is good for PKs in dimension tables.
The cool thing is that SCD transformation in SSIS takes care of this.