How do I create a wide table in SQL server 2016? - sql

With this code(*), the creation of a wide table in SQL keeps me sending this:
Msg 1702, Level 16, State 1, Line 11
CREATE TABLE failed because column '2010/12/01' in table 'PriceToBookFinalI' exceeds the maximum of 1024 columns.
USE [Style]
GO
CREATE TABLE [dbo].[PriceToBookFinalI]
(DocID int PRIMARY KEY,
[2006/12/29][Money],
[2007/01/01][Money],
...
SpecialPurposeColumns XML COLUMN_SET FOR ALL_SPARSE_COLUMNS);
GO
(2614 columns)
Looking for a good hint !
Here is the background set of data I want to import to my wide table

The solution for this is to normalize your design. Even if you could fit it into the 1024 limit, your design is not a good idea. For example, what if you wanted to know the average amount a DocID changed per each month. That would be a nightmare to write in this model.
Try this instead.
CREATE TABLE dbo.PriceToBookFinalI (
DocID INT PRIMARY KEY,
SpecialPurposeColumns XML COLUMN_SET FOR ALL_SPARSE_COLUMNS
);
CREATE TABLE dbo.PriceToBookFinalMoney (
DocID INT,
DocDate DATE,
DocAmount MONEY,
CONSTRAINT PK_PriceToBookFinalMoney
PRIMARY KEY CLUSTERED
(
DocID,
DocDate
)
);
You can easily join the table with the SpecialPurposeColumns to the table with the dates and amounts for each DocID. You can still pivot the dates if desired into the format you provided above. Having the date as a value in a column gives you much more flexibility how you use the data, better performance, and naturally handles more dates.

Normalise it, allow for the columning as part of your query:
Create table Price (DocID INT primary key,
DocRef Varchar(30), -- the values from your [DATES] column
DocDate DATE,
DocValue MONEY);

Create your table with three columns: ID, Date, Amount. Each ID will have multiple rows in the table (for each date there's an amount value for).

There is a column count limitation in SQL server:
https://msdn.microsoft.com/en-us/library/ms143432.aspx
Columns per nonwide table 1,024
Columns per wide table 30,000
You can use "Wide table", where is Sparse columns - column sets. https://msdn.microsoft.com/en-us/library/cc280521.aspx
BUT - table will have limitation - 8,060 bytes per row. So, most of your columns should have no data.
So - the problem is in your design. Looks like, months should be as rows, not columns. Or maybe better would be some other structure of table. It can not be guessed without seeing the data structure in application.

Related

How to group a table by coverting int to text and the grouping the text

First of all, thank you for taking the time reading below:
I have a following table:
As Is table, the DataType of Level is int
I need to transform it to the table like below:
To Be Table, Grouped Text
The idea is to group a numeric column into Text per Unique ID.
Is this Achievable at all?
Note: The number of Levels are subject to growth, so is it possible to comeup with sql to accommodate the increasing levels without any hardcoding's?

Very slow SQL join, with limited amount of data

We have a very slow running piece of SQL, and I was wondering if anyone has any
advice on speeding it up.
We are collecting the data from a large number of tables (21) into a single table
for later processing. The tables are temporary tables, and exist only for the query.
All the tables share three columns (USN, DATASET, and INTERNAL_ID), and
the combination of the three is unique in each table, but the same values exist
in all the tables. It it possible that INTERNAL_ID is also unique, but I am not sure.
Each table contains six rows of data, and the output table also contains six rows.
I.e, each table has the following data, the first three columns being the same in each table, and the remaining columns contain different data for each table.
USN DATASET INTERNAL_ID <more stuff>
20 BEN 67 ...
20 APP 68 ...
30 BEN 70 ...
30 BEN 75 ...
50 CRM 80 ...
70 CRM 85 ...
The server is SQL 2008 R2 with 4 x 2.3GHz cores, 32GB memory, which is sitting
idle and should be more than adequate.
The INSERT INTO query itself takes approximately 3 seconds.
What can I do to either find out the reason for the code being so slow, or to speed it up. If there a maximum number of joins that I should do in a single query?
CREATE TABLE #output (
USN INT,
DATASET VARCHAR(150),
INTERNAL_ID INT,
MASTER_DATA INT,
EX1_DATA INT,
EX2_DATA INT,
EX3_DATA INT,
-- More columns
)
The full output table consists of 247 columns, with 71 integers, 11 floats, 44 datetimes and 121 varchars with a total size of 16,996 characters!!! I would expect each varchar to have around 20-30 characters.
CREATE TABLE #master (
USN INT,
DATASET VARCHAR(150),
INTERNAL_ID INT,
MASTER_DATA INT,
-- More columns
)
CREATE TABLE #ex1 (
USN INT,
DATASET VARCHAR(150),
INTERNAL_ID INT,
EX1_DATA INT,
-- More columns
)
CREATE TABLE #ex2 (
USN INT,
DATASET VARCHAR(150),
INTERNAL_ID INT,
EX2_DATA INT,
-- More columns
)
-- Repeat for ex3 .. ex20
Most of the ex tables are 10-11 columns with a couple in the 20-30 column range.
-- Insert data into master, ex1..ex20
INSERT INTO #output(USN, DATASET, INTERNAL_ID, MASTER_DATA, EX1_DATA, EX2_DATA, ...)
SELECT #master.USN, #master.DATASET, #master.INTERNAL_ID, #master.MASTER_DATA, #ex1.EX1_DATA, #ex2.EX2_DATA, ...
FROM
#master
LEFT JOIN #ex1 ON #master.USN = #ex1.USN AND
#master.DATASET = #ex1.DATASET AND
#master.INTERNAL_ID = #ex1.INTERNAL_ID
LEFT JOIN #ex2 ON #master.USN = #ex2.USN AND
#master.DATASET = #ex2.DATASET AND
#master.INTERNAL_ID = #ex2.INTERNAL_ID
-- contine until we hit ex20
I would add index on each of temporary tables according to the data (unique).
I would start with index on both int columns only, and if it is not enough I would add DATASET column to the index.
And sometimes the order you JOIN tables make (or made in previous version of MS SQL) a huge difference, so start JOINs from the smallest table (if possible).
If there are more than just one row with given USN, DATASET, INTERNAL_ID in each of these tables, the size of resulting table will grow exponentially with each join sequence. If this is the case, consider rework your statement, or replace with a number of simpler ones.
Consider using an index for a row in join statement with highest cardinality in each of #ex1-20 tables (or even complex index for two columns or the entire trio)
And, of course, if there are some constraints in resulting temporary table, you need an index for each such constraint as well.

Select on Row Version

Can I select rows on row version?
I am querying a database table periodically for new rows.
I want to store the last row version and then read all rows from the previously stored row version.
I cannot add anything to the table, the PK is not generated sequentially, and there is no date field.
Is there any other way to get all the rows that are new since the last query?
I am creating a new table that contains all the primary keys of the rows that have been processed and will join on that table to get new rows, but I would like to know if there is a better way.
EDIT
This is the table structure:
Everything except product_id and stock_code are fields describing the product.
You can cast the rowversion to a bigint, then when you read the rows again you cast the column to bigint and compare against your previous stored value. The problem with this approach is the table scan each time you select based on the cast of the rowversion - This could be slow if your source table is large.
I haven't tried a persisted computed column of this, I'd be interested to know if it works well.
Sample code (Tested in SQL Server 2008R2):
DECLARE #TABLE TABLE
(
Id INT IDENTITY(1,1) NOT NULL PRIMARY KEY,
Data VARCHAR(10) NOT NULL,
LastChanged ROWVERSION NOT NULL
)
INSERT INTO #TABLE(Data)
VALUES('Hello'), ('World')
SELECT
Id,
Data,
LastChanged,
CAST(LastChanged AS BIGINT)
FROM
#TABLE
DECLARE #Latest BIGINT = (SELECT MAX(CAST(LastChanged AS BIGINT)) FROM #TABLE)
SELECT * FROM #TABLE WHERE CAST(LastChanged AS BIGINT) >= #Latest
EDIT: It seems I've misunderstood, and you don't actually have a ROWVERSION column, you just mentioned row version as a concept. In that case, SQL Server Change Data Capture would be the only thing left I could think of that fits the bill: http://technet.microsoft.com/en-us/library/bb500353(v=sql.105).aspx
Not sure if that fits your needs, as you'd need to be able to store the LSN of "the last time you looked" so you can query the CDC tables properly. It lends itself more to data loads than to typical queries.
Assuming you can create a temporary table, the EXCEPT command seems to be what you need:
Copy your table into a temporary table.
The next time you look, select everything from your table EXCEPT everything from the temporary table, extract the keys you need from this
Make sure your temporary table is up to date again.
Note that your temporary table only needs to contain the keys you need. If this is just one column, you can go for a NOT IN rather than EXCEPT.

Saving Space For Archive Table In SQL Server

I have a SQL 2008 table with 773,705,261 rows in the table. I want to create an archive table to archive off the data, but I want to reduce the amount of space required for this data. Speed of accessing the archived data is not the primary concern, but is always desired.
The current table definition is something like this:
TableID (PK) BIGINT NOT NULL
DocumentID (FK) BIGINT NOT NULL
StatusID (FK) INT NOT NULL
RowCreateDate DATETIME NOT NULL
With my calculation, the current table uses 28 bytes per row in the table. The problem is that for each DocumentID it could have 6 – 10 rows in this table (the amount of rows per DocumentID could grow in the future too) depending on the amount of Statuses that the system processed.
My first thought of reducing the amount of space required to store this data is to have 1 row for each DocumentID and have an XML field containing all of the StatusIDs and Times they occurred. Something like this:
TableID (PK) BIGINT NOT NULL
DocumentID (FK) BIGINT NOT NULL
Statuses XML NOT NULL
Does anyone have any recommendations for me? Any methods I can research?
Set your archive table to use page compression.
From BOL
CREATE TABLE dbo.T1
(c1 int, c2 nvarchar(200) )
WITH (DATA_COMPRESSION = PAGE);
If you do not expect to be doing any updates or deletes from your archive table (well deletes that aren't off either end of the table) then I would also create a clustered index using a fillfactor of 100%. That way there will not be any space left in each page.
Of course I would look at both in BOL before actually applying anything.
You may be able to use INT data type for TableID and DocumentID, and SMALLINT or TINYINT for StatusID. Depending on the precision you need from the RowCreateDate column, you may be able to use SMALLDATETIME or DATE. These data types use less disk space and will save you several GB over your 775,000,000 rows.
Kenneth's suggestions of using page compression and FILLFACTOR = 100 are definitely worth considering.

SQL Database design question

I have a task to design a SQL database that will record values for the company's commodity products (sku numbers). The company has 7600 unique product items that need to be tracked, and each product will have approximately 200 values over the period of a year (one value per product per day, over the period of a year).
My first guess is that the sku numbers go top to bottom (each sku has a row) and each date is a column.
The data will be used to view in chart / graph format and additional calculations will be displayed against those (such as percentage profit margin etc)
My question is:
- is this layout advisable?
- do I have to be cautious of anything, if this type of data goes back about 15 yrs (each table will represent a year)
Any suggestions?
It better to have 3 columns only - instead of many as you are suggesting:
sku date value
-------------------------
1 2011-01-01 10
1 2011-01-02 12
2 2011-01-01 5
This way you can easily add another column if you want to record something else about a given product per date.
I would suggest a table for your products, and a table for the historical values. Maybe create an index for the historical values based on date if you plan to select for specific time periods.
create table products (
id number primary key,
sku number,
name text,
desc text);
create table values (
id number primary key,
product_id number,
timestamp date,
value number,
foreign key fk_prod_price product_id on product.id);
create index idx_price on values.timestamp;
NOTE: not actual sql, you will have to write your own
If you do like #fiver wrote, you don't have to have a table for each year either. Everything in one table. And add indexes on sku/date for faster searching