1 billion rows DW to DM

1 billion rows DW to DM - sql

I have a design/performance question.
I have this next table.
CREATE TABLE [dbo].[DW_Visits_2016](
[VisitId] [int] NOT NULL,
[UserId] [int] NOT NULL,
[VisitReferrer] [varchar](512) NULL,
[VisitFirstRequest] [varchar](255) NOT NULL,
[VisitAppName] [varchar](255) NULL,
[VisitCountry] [varchar](50) NULL,
[VisitDate] [smalldatetime] NOT NULL,
[VisitMins] [int] NOT NULL,
[VisitHits] [int] NOT NULL,
[EntryTag] [varchar](100) NOT NULL,
[VisitCount] [int] NOT NULL,
[VisitInitialDate] [datetime] NOT NULL,
[AggregateType] [varchar](50) NULL,
[MemberId] [int] NULL,
[ServerName] [varchar](50) NULL,
[BrowserUserAgent] [varchar](255) NULL,
[LastModifiedDate] [smalldatetime] NULL,
[Guid] [uniqueidentifier] NULL,
[SessionId] [varchar](100) NULL,
[IPAddress] [varchar](40) NULL,
CONSTRAINT [PK_Visits] PRIMARY KEY NONCLUSTERED
(
[VisitId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
)
GO
SET ANSI_PADDING OFF
GO
ALTER TABLE [dbo].[Visits] WITH CHECK ADD CONSTRAINT [CK_Visits_VisitDate] CHECK (([VisitDate]>='2016-01-01' AND [VisitDate]<'2017-01-01'))
GO
ALTER TABLE [dbo].[Visits] CHECK CONSTRAINT [CK_Visits_VisitDate]
And this same table for 2015 ... 2010.
Every table has around 150 million rows. So, combined we are talking about 1,050 million rows.
I received a requirement where BI people wants to have this combined on a single view (Something crazy like select * from all_visits).
Luckily they gave me some ‘where’ clauses, and some columns they don’t need, so the final result would be 6 columns and 20% of the rows (210 million rows), but nonetheless, a ‘view’ is just a stored query. Even though the box has 60GB of ram, it’s shared with many other databases.
Options I see:
Instead of a view… Creating the views as tables and move them to a dedicated box.
Create one view per year?
Switch all of this to mongodb or something like vertica?!
Any of the previous options combined with column stored indexes?

Related

Duplicate unique keys have been allowed into my Azure SQL database table

I have a large SaaS system database, running for 8+ years, no problems. It is an Azure SQL database, and we host the corresponding web application through Azure too.
Suddenly, in the early hours of this morning, some of the C# web app reports start failing due to duplicate table records being detected. I check the table in question and yes, there are duplicate identical records, with clashing unique keys, in the table.
I've never seen this before. How can a unique key fail to enforce itself during inserts/updates?
EDIT:
Here's the schema:
CREATE TABLE [tenant_clientnamehere].[tbl_cachedstock](
[clusteringkey] [bigint] IDENTITY(1,1) NOT NULL,
[islivecache] [bit] NOT NULL,
[id] [uniqueidentifier] NOT NULL,
[stocklocation_id] [uniqueidentifier] NOT NULL,
[stocklocation_referencecode] [nvarchar](50) NOT NULL,
[stocklocation_description] [nvarchar](max) NOT NULL,
[productreferencecode] [nvarchar](50) NOT NULL,
[productdescription] [nvarchar](max) NOT NULL,
[unitofmeasurename] [nvarchar](50) NOT NULL,
[targetstocklevel] [decimal](12, 3) NULL,
[minimumreplenishmentquantity] [decimal](12, 3) NULL,
[minimumstocklevel] [decimal](12, 3) NULL,
[packsize] [int] NOT NULL,
[isbuffermanageddynamically] [bit] NOT NULL,
[dbmcheckperioddays] [int] NULL,
[dbmcheckperiodbuffergroup_id] [uniqueidentifier] NULL,
[ignoredbmuntildate] [datetime2](7) NULL,
[notes1] [nvarchar](100) NOT NULL,
[notes2] [nvarchar](100) NOT NULL,
[notes3] [nvarchar](100) NOT NULL,
[notes4] [nvarchar](100) NOT NULL,
[notes5] [nvarchar](100) NOT NULL,
[notes6] [nvarchar](100) NOT NULL,
[notes7] [nvarchar](100) NOT NULL,
[notes8] [nvarchar](100) NOT NULL,
[notes9] [nvarchar](100) NOT NULL,
[notes10] [nvarchar](100) NOT NULL,
[seasonaleventreferencecode] [nvarchar](50) NULL,
[seasonaleventtargetstocklevel] [decimal](12, 3) NULL,
[isarchived] [bit] NOT NULL,
[isobsolete] [bit] NOT NULL,
[currentstocklevel] [decimal](12, 3) NULL,
[quantityenroute] [decimal](12, 3) NULL,
[recommendedreplenishmentquantity] [decimal](12, 3) NULL,
[bufferpenetrationpercentage] [int] NOT NULL,
[bufferzone] [nvarchar](10) NOT NULL,
[bufferpenetrationpercentagereplenishment] [int] NOT NULL,
[bufferzonereplenishment] [nvarchar](10) NOT NULL,
CONSTRAINT [PK_tbl_cachedstock] PRIMARY KEY CLUSTERED
(
[clusteringkey] ASC
)WITH (STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY],
CONSTRAINT [UK_tbl_cachedstock_1] UNIQUE NONCLUSTERED
(
[islivecache] ASC,
[id] ASC
)WITH (STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO
ALTER TABLE [tenant_clientnamehere].[tbl_cachedstock] ADD CONSTRAINT [DF__tbl_cache__isarc__1A200257] DEFAULT ((0)) FOR [isarchived]
GO
ALTER TABLE [tenant_clientnamehere].[tbl_cachedstock] ADD CONSTRAINT [DF__tbl_cache__isobs__1B142690] DEFAULT ((0)) FOR [isobsolete]
GO
And the clash (two of which are still in the table) is:
islivecache = 1
id = BA7AD2FD-EFAA-485C-A200-095626C583A3

The cause of this turns out to be very simple - but troubling: every single Unique Key, in every single table, in every single schema, was simultaneously set to "Is Disabled", so whilst they exist they're not being applied.
I've manually cleared out records dupes and rebuilt all indexes to re-enforce the checks, and everything is fine again, but I have no idea how technically this can suddenly occur.
I'm now currently working with Azure support to get to the bottom of it.

Is it a good practice to apply multiple foreign key id on the same column

I have a table where I am storing different documents of different source as follows
CREATE TABLE [dbo].[Document](
[DocumentId] [int] IDENTITY(1,1) NOT NULL,
[EntityId] [int] NOT NULL,
[DocumentGuid] [uniqueidentifier] NOT NULL,
[DocumentTypeCdId] [int] NOT NULL,
[DocumentName] [nvarchar](500) NOT NULL,
[DocumentType] [nvarchar](500) NOT NULL,
[DocumentData] [nvarchar](max) NOT NULL,
[IsSuppressed] [bit] NULL,
[CreatedBy] [nvarchar](200) NULL,
[CreatedDt] [datetime] NULL,
[UpdatedBy] [nvarchar](200) NULL,
[UpdatedDt] [datetime] NULL,
CONSTRAINT [PK_Document] PRIMARY KEY CLUSTERED
(
[DocumentId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
ALTER TABLE [dbo].[Document] WITH CHECK ADD CONSTRAINT [FK_Document_DocumentTypeCd] FOREIGN KEY([DocumentTypeCdId])
REFERENCES [dbo].[DocumentTypeCd] ([DocumentTypeCdId])
GO
ALTER TABLE [dbo].[Document] CHECK CONSTRAINT [FK_Document_DocumentTypeCd]
GO
EntityId will be from different source tables, so can I add this column to be a FK of all those source table. Currently I have nearly 10 Source tables. If not what is the better approach to handle this scenario

You have a problem in the design of your database. In such a case you need to have a parent ancestor table that hold the keys of all type of documents, then multiple children table, each one speciallized for a speciic document type.
This is called inheritance and children must not share the same key value (children table with excusion ids...)

Table partitioning on sql server by foreign key column

All examples of table partitioning that I have found are quite simple but I need to partition many tables by one criteria.
For example I have tables: Contractors and Products where ContractorId in Products table is a foreign key.
I created function and schema for ContractorId. It works perfectly for Contractors table but when it comes to the Products table...
I have no idea how should I use it because when I try I always got the information: "The filegroup 'PRIMARY' specified for the clustered index 'PK_dbo.Products' was used for table 'dbo.Products' even though partition scheme 'scheme_Contractors' is specified for it". My Products table looks like:
CREATE TABLE [dbo].[Products](
[ProductId] [int] IDENTITY(1,1) NOT NULL,
[Name] [nvarchar](max) NULL,
[Amount] [int] NULL,
[Color] [nvarchar](max) NULL,
[Price] [decimal](18, 2) NULL,
[Guarantee] [nvarchar](max) NULL,
[GuaranteeType] [int] NULL,
[AdditionalFeatures] [nvarchar](max) NULL,
[Valid] [bit] NULL,
[ContractorId] [int] NOT NULL,
[ProducerId] [int] NOT NULL,
[ProductCategoryId] [int] NOT NULL,
CONSTRAINT [PK_dbo.Products] PRIMARY KEY ( [ProductId] ASC ) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] )
GO
ALTER TABLE [dbo].[Products] WITH CHECK ADD CONSTRAINT [FK_dbo.Products_dbo.Contractors_ContractorId] FOREIGN KEY([ContractorId])
REFERENCES [dbo].[Contractors] ([ContractorId])
GO
ALTER TABLE [dbo].[Products] CHECK CONSTRAINT [FK_dbo.Products_dbo.Contractors_ContractorId]
GO
Could anymone tell me please - is it possible to use my schema on ContractorId column and how? Thank you in advance!

In agreement with Dan Guzman, I'd like to point out there should be no [PRIMARY] specification in the table definition.
We use partitioning on large scale. It is very comfortable to partition all tables on the same partitioning scheme, because the SQL engine will use its multi-processor paralellisation capabilities to the full.
When a certain group of partitions is in one database file and another paration in another file you can even become flexible with disc-usage and backups.
So you first need a partition function to define the values of the partitioning scheme:
CREATE PARTITION FUNCTION [ContractorPartitionFunction](int) AS RANGE LEFT
FOR VALUES (contractor1,contractor2,...)
Then you need to create the partition scheme
CREATE PARTITION SCHEME [ContractorPartitionScheme]
AS PARTITION [ContractorPartitionFunction]
TO ([File_001],[File_002],...,[PRIMARY])
Then for all tables and indexes you now create you should remove ON [PRIMARY] from the definitions as the target filegroup, but instead you should use
ON [ContractorPartitionScheme](ContractorId)
So you table definition should now read:
CREATE TABLE [dbo].[Products](
[ProductId] [int] IDENTITY(1,1) NOT NULL,
[Name] [nvarchar](max) NULL,
[Amount] [int] NULL,
[Color] [nvarchar](max) NULL,
[Price] [decimal](18, 2) NULL,
[Guarantee] [nvarchar](max) NULL,
[GuaranteeType] [int] NULL,
[AdditionalFeatures] [nvarchar](max) NULL,
[Valid] [bit] NULL,
[ContractorId] [int] NOT NULL,
[ProducerId] [int] NOT NULL,
[ProductCategoryId] [int] NOT NULL)
ON ContractorPartitionScheme(ContractorId)
CREATE UNIQUE NONCLUSTERED INDEX PK_dbo.Products ON Products
(
productId,
ConstructorId
) WITH( STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
ON ContractorPartitionScheme(ContractorId)

SQL UniqueIdentifier

I have a table called UserCredentials and it has a uniqueidentifier column [UserCredentialId]. When I try to create a new user, I get 00000000-0000-0000-0000-000000000000 in my first try, then when I try adding another user, it says PK cannot be duplicated. At first I had a hard time guessing what does it mean but, I think its because of my uniqueidentifier is not generating random id.
What to do?
EDIT
Here is my SQL table structure:
CREATE TABLE [dbo].[UserCredential](
[UserCredentialId] [uniqueidentifier] NOT NULL,
[UserRoleId] [int] NOT NULL,
[Username] [varchar](25) NOT NULL,
[Password] [varchar](50) NOT NULL,
[PasswordSalt] [varchar](max) NOT NULL,
[FirstName] [varchar](50) NOT NULL,
[LastName] [varchar](50) NOT NULL,
[PayorCode] [varchar](20) NOT NULL,
[ProviderCode] [varchar](50) NULL,
[CorporationCode] [varchar](50) NULL,
[Department] [varchar](50) NULL,
[Status] [varchar](1) NOT NULL,
[DateCreated] [datetime] NOT NULL,
[DateActivated] [datetime] NULL,
[Deactivated] [datetime] NULL,
[DateUpdated] [datetime] NULL,
[CreatedBy] [varchar](50) NOT NULL,
[UpdatedBy] [varchar](50) NOT NULL,
[EmailAddress] [varchar](50) NULL,
[ContactNumber] [int] NULL,
[Picture] [varbinary](max) NULL,
CONSTRAINT [PK_UserCredential_1] PRIMARY KEY CLUSTERED
(
[UserCredentialId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
I already set it to (newid()) but still not working.

Set the Id of your user instance to Guid.NewGuid();
user.Id = Guid.NewGuid();

Change your table definition to
[UserCredentialId] UNIQUEIDENTIFIER DEFAULT (NEWSEQUENTIALID()) NOT NULL
Check why prefer NEWSEQUENTIALID than newid at http://technet.microsoft.com/en-us/library/ms189786.aspx.
When a GUID column is used as a row identifier, using NEWSEQUENTIALID can be faster than using the NEWID function. This is because the NEWID function causes random activity and uses fewer cached data pages. Using NEWSEQUENTIALID also helps to completely fill the data and index pages.

You can have all values in the table default to a new value during insert, much like an IDENTITY.
Run this to set the default of all inserts to use newid().
ALTER TABLE [dbo].[UserCredential] ADD CONSTRAINT [DF_UserCredential_UserCredentialID] DEFAULT (newid()) FOR [UserCredentialId]
GO

converting computed column to a specific datatype

I am currently trying to execute some SQL Query in SQLSERVER 2008 R2 form my Java GUI. I am working on currency management system.
I have to store Long data type values as the figure of currency may exceed than 10 digits but the computed column dose not show any data type option in the design view of the table. I really Need help regarding this as my value exceeds than 10 digits and I need to select total value from my database. I have tried to execute the code but its showing some sort of overflow error please help
The following is my script file of the table from database name CNV
USE [CNV]
CREATE TABLE [dbo].[soil_det](
[ID] [int] IDENTITY(1,1) NOT NULL,
[rm_id] [bigint] NULL,
[box_no] [int] NULL,
[weight] [decimal](18, 2) NULL,
[note_state] [varchar](10) NULL,
[dm_state] [varchar](10) NULL,
[1] [int] NULL,
[2] [int] NULL,
[5] [int] NULL,
[10] [int] NULL,
[20] [int] NULL,
[50] [int] NULL,
[100] [int] NULL,
[500] [int] NULL,
[1000] [int] NULL,
[tp] AS (((((((([1]+[2])+[5])+[10])+[20])+[50])+[100])+[500])+[1000]),
[tv] AS (((((((([1]*(1)+[2]*(2))+[5]*(5))+[10]*(10))+[20]*(20))+[50]*(50))+[100]*(100))+[500]*(500))+[1000]*(1000)) PERSISTED,
[tp_ex1] AS ((((((([2]+[5])+[10])+[20])+[50])+[100])+[500])+[1000]),
[tv_ex1] AS ((((((([2]*(2)+[5]*(5))+[10]*(10))+[20]*(20))+[50]*(50))+[100]*(100))+[500]*(500))+[1000]*(1000)),
[val_1] AS ([1]*(1)),
CONSTRAINT [PK_mut_det] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

here is solution for this , you can do something as given in image
Check the full article over here : SQL SERVER – Puzzle – Solution – Computed Columns Datatype Explanation

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

1 billion rows DW to DM - sql

Related

Duplicate unique keys have been allowed into my Azure SQL database table

Is it a good practice to apply multiple foreign key id on the same column

Table partitioning on sql server by foreign key column

SQL UniqueIdentifier

converting computed column to a specific datatype

Categories

Resources