.NET framework error when enabling where clause in sql query - sql

I am facing a weird issue wherein on disabling/enabling certain condition in where clause, my Select query throws .net framework error.
Here is the CREATE table script.
Table test_classes:
CREATE TABLE [dbo].[test_classes]
(
[CLASSID] [int] NOT NULL,
[PARENTID] [int] NULL,
[CATID] [int] NOT NULL,
[CLASS_NAME] [nvarchar](255) NOT NULL,
[ORIGINAL_NAME] [nvarchar](255) NULL,
[GEOMETRY] [tinyint] NOT NULL,
[READ_ONLY] [bit] NOT NULL,
[DISPLAY_STYLES] [image] NULL,
[FEATURE_COUNT] [int] NOT NULL,
[TEMPOWNER] [int] NULL,
[OPTIONS] [int] NOT NULL,
[POLYGON_TYPE] [int] NULL,
[CLASS_EXTRA] [nvarchar](1024) NULL,
[MAPID] [int] NULL
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
Table test_polygon:
CREATE TABLE [dbo].[test_polygon]
(
[FID] [nvarchar](36) NOT NULL,
[EXTENT_L] [float] NOT NULL,
[EXTENT_T] [float] NOT NULL,
[EXTENT_R] [float] NOT NULL,
[EXTENT_B] [float] NOT NULL,
[COORDINATES] [image] NULL,
[CHAINS] [smallint] NOT NULL,
[CLASSID] [int] NOT NULL,
[SPATIAL_KEY] [bigint] NULL
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
Due to word limitation (due to image datatype), here is the INSERT input: GDrive SQL Link
SELECT SQL query:
select
Class_Name, FID,
geometry::STGeomFromWKB(b1+b2,0) as polygon,
Class_ID, Original_Name
from
(Select
cl.Class_Name, p.FID,
substring(CAST(p.Coordinates AS varbinary(max)),1,1) as b1,
substring(CAST(p.Coordinates AS varbinary(max)),3,999999) as b2,
cl.ClassID as Class_ID,
cl.Original_Name
From
test_polygon p
Inner Join
test_classes cl on cl.ClassID = p.ClassID) s_polygon
--where Class_ID = 215 --Filter#1
--where Class_Name = 'L1_County' --Filter#2
To note, Class_ID 215 represents 'L1_County' class_name.
Problem is, if you enable Filter#1, then the output is as expected. But when I only enable Filter#2 then the query fails with .NET Error.
Expected output :
Class_Name FID polygon Class_ID Original_Name
----------- ---------------- ------------- ----------- ------------------------
L1_County Northamptonshire <long value> 215 B8USR_4DB8184E88092424
Error I get :
Msg 6522, Level 16, State 1, Line 4
A .NET Framework error occurred during execution of user-defined routine or aggregate "geometry":
System.FormatException: 24119: The Polygon input is not valid because the start and end points of the exterior ring are not the same. Each ring of a polygon must have the same start and end points.
System.FormatException:
at Microsoft.SqlServer.Types.GeometryValidator.ValidatePolygonRing(Int32 iRing, Int32 cPoints, Double firstX, Double firstY, Double lastX, Double lastY)
at Microsoft.SqlServer.Types.Validator.Execute(Transition transition)
at Microsoft.SqlServer.Types.ForwardingGeoDataSink.EndFigure()
at Microsoft.SqlServer.Types.WellKnownBinaryReader.ReadLineStringPoints(ByteOrder byteOrder, UInt32 cPoints, Boolean readZ, Boolean readM)
at Microsoft.SqlServer.Types.WellKnownBinaryReader.ReadLinearRing(ByteOrder byteOrder, Boolean readZ, Boolean readM)
at Microsoft.SqlServer.Types.WellKnownBinaryReader.ParseWkbPolygonWithoutHeader(ByteOrder byteOrder, Boolean readZ, Boolean readM)
at Microsoft.SqlServer.Types.WellKnownBinaryReader.ParseWkb(OpenGisType> type) > at Microsoft.SqlServer.Types.WellKnownBinaryReader.Read(OpenGisType type, Int32 srid)
at Microsoft.SqlServer.Types.SqlGeometry.GeometryFromBinary(OpenGisType type, SqlBytes binary, Int32 srid) .
What I am trying to ask is, Why do I get error when WHERE clause has Class_Name and not when Class_ID.
I am using SQL Server 2012 Enterprise edition. Error replicates in SQL Server 2008 as well.
edit:
Estimated Execution plan for Filter#1 :
Estimated Execution plan for Filter#2 :

I will summarise comments:
You are seeing this issue because your table contains invalid data. The reason you do not see it when searching by test_polygon.Class_ID is that Class_ID is passed as a predicate to the table scan. When test_classes.Class_Name is used as filter the search predicate is applied to test_classes table.
Since geometry::STGeomFromWKB "Compute Scalar" happens before "Join" it causes all rows of test_polygon to be evaluated by this function, including rows containing invalid data.
Update: Even though the plans look the same, they are not, as predicate conditions are different for different filters (WHERE conditions) and therefore outputs of table scans operators are different.
The is no standard way to force the order of evaluation in SQL Server query as by design you are not supposed to.
There are two options:
Materialise (store in a table) the result of the sub-query. This, simply, splits the query into two separate queries, one to find records and the second query to compute data on the found results. The intermediate results are stored in a (temp) table.
Use "hacks" that allow you to coerce SQL Server to evaluate query a certain way.
Below is an example of a "hack":
select
Class_Name, FID,
CASE WHEN Class_Name = Class_Name THEN geometry::STGeomFromWKB(b1+b2,0) ELSE NULL END as polygon,
Class_ID, Original_Name
from
(Select
cl.Class_Name, p.FID,
substring(CAST(p.Coordinates AS varbinary(max)),1,1) as b1,
substring(CAST(p.Coordinates AS varbinary(max)),3,999999) as b2,
cl.ClassID as Class_ID,
cl.Original_Name
From
test_polygon p
Inner Join
test_classes cl on cl.ClassID = p.ClassID) s_polygon
--where Class_ID = 215 --Filter#1
where Class_Name = 'L1_County' --Filter#2
By adding a dummy CASE expression that looks at test_classes.Class_Name we are forcing SQL Server to evaluate it after the JOIN has been resolved.
The plan:
Useful Article:
http://dataeducation.com/cursors-run-just-fine/

Related

Add range values to include exiting table partition in Azure synapse Analytics

I need to add a new boundary value to the existing table .
CREATE TABLE [STG].[IHS_POLK]
(
[CCYYQ_NBR] [decimal](5, 0) NOT NULL,
[COUNTRY] [varchar](2) NOT NULL,
[STATE] [varchar](35) NOT NULL,
[COUNTY] [varchar](35) NOT NULL,
[ZIP] [varchar](6) NOT NULL,
[TOTAL] [varchar](10) NULL
)
WITH
(
DISTRIBUTION = ROUND_ROBIN,
CLUSTERED COLUMNSTORE INDEX,
PARTITION
([CCYYQ_NBR] RANGE LEFT FOR VALUES (20191, 20192, 20193, 20194, 20201, 20202, 20203, 20204, 20211, 20212, 20213, 20214, 20221, 20222, 20223, 20224, 20231, 20232, 20233, 20234, 20241, 20242, 20243, 20244, 20251, 20252, 20253, 20254))
)
GO
Now I need to add a value "20261" in the range . Please let me know the query to add.
You can use alter statement as below, if data exists in table, disable column store index before and rebuild it after running this. Refer this document for more details.
ALTER TABLE [STG].[IHS_POLK] SPLIT RANGE (20261);

Dealing with huge table - 100M+ rows

I have table with around 100 million rows and it is only getting larger, as table is queried pretty frequently I have to come up with some solution to optimise this.
Firstly here is the model:
CREATE TABLE [dbo].[TreningExercises](
[TreningExerciseId] [uniqueidentifier] NOT NULL,
[NumberOfRepsForExercise] [int] NOT NULL,
[CycleNumber] [int] NOT NULL,
[TreningId] [uniqueidentifier] NOT NULL,
[ExerciseId] [int] NOT NULL,
[RoutineExerciseId] [uniqueidentifier] NULL)
Here is Trening table:
CREATE TABLE [dbo].[Trenings](
[TreningId] [uniqueidentifier] NOT NULL,
[DateTimeWhenTreningCreated] [datetime] NOT NULL,
[Score] [int] NOT NULL,
[NumberOfFinishedCycles] [int] NOT NULL,
[PercentageOfCompleteness] [int] NOT NULL,
[IsFake] [bit] NOT NULL,
[IsPrivate] [bit] NOT NULL,
[UserId] [nvarchar](128) NOT NULL,
[AllRoutinesId] [bigint] NOT NULL,
[Name] [nvarchar](max) NULL,
)
Indexes (other than PK which are clustered):
TreningExercises:
TreningId (also FK)
ExerciseId (also FK)
Trenings:
UserId (also FK)
AllRoutinesId (also FK)
Score
DateTimeWhenTreningCreated (ordered by DateTimeWhenTreningCreated DESC)
And here is the example of the most commonly executed query:
DECLARE #userId VARCHAR(40)
,#exerciseId INT;
SELECT TOP (1) R.[TreningExerciseId] AS [TreningExerciseId]
,R.[NumberOfRepsForExercise] AS [NumberOfRepsForExercise]
,R.[TreningId] AS [TreningId]
,R.[ExerciseId] AS [ExerciseId]
,R.[RoutineExerciseId] AS [RoutineExerciseId]
,R.[DateTimeWhenTreningCreated] AS [DateTimeWhenTreningCreated]
FROM (
SELECT TE.[TreningExerciseId] AS [TreningExerciseId]
,TE.[NumberOfRepsForExercise] AS [NumberOfRepsForExercise]
,TE.[TreningId] AS [TreningId]
,TE.[ExerciseId] AS [ExerciseId]
,TE.[RoutineExerciseId] AS [RoutineExerciseId]
,T.[DateTimeWhenTreningCreated] AS [DateTimeWhenTreningCreated]
FROM [dbo].[TreningExercises] AS TE
INNER JOIN [dbo].[Trenings] AS T ON TE.[TreningId] = T.[TreningId]
WHERE (T.[UserId] = #userId)
AND (TE.[ExerciseId] = #exerciseId)
) AS R
ORDER BY R.[DateTimeWhenTreningCreated] DESC
Execution plan: link
Please accept my apologies if it is bit unreadable or unoptimised, it was generated by ORM (Entity Framework), I just edited it a bit.
According to Azure's SQL Analytics tool this query has the most impact on my DB and even though it usually doesn't take too long to execute, from time to time there are spikes in DB I/O due to it.
Also there is a bit business logic involved in this, to simplify it: 99% of the time I need data which is less then a year old.
What are my best options regarding querying and table size?
My thoughts on querying, either:
Create indexed view OR
Add Date and UserId fields to the TreningExerciseId table OR
Some option that I haven't thought of :)
Regarding table size, either:
Partition table (probably by date) OR
Move most of the data (or all of it) to some NoSQL key-value store OR
Some option that I haven't thought of :)
What are your thoughts about these problems, how should I approach solving them?
If you add the following columns to the index "ix_TreninID":
NoOfRepsForExecercise
ExerciseID
RoutineExerciseID
That will make the index a "covering index" and eliminate the need for the lookup which is taking 95% of the plan.
Give it a go, and post back.

SQL Server: Grouping heirarchical data

I have a document table - see below for definition
In this table We have a root document which has a OriginalDocID of null
Every time a revision is made a new entry is added with the parents documentID as the OriginalDocID
What I am looking to do is to be able to group/partition everything around the Original document that has OriginalDocID of null
Each document can have multiple revisions from one point of origin.
meaning we can have
Doc Id 1 -> 2 -> 3
2 -> 8 -> 9
1 -> 4 -> 7
5 -> 10
So what I would hope to see back is all the rows with the root document. appended
I hope this makes sense. For the life of me I can't wrap my head around a sufficient query.
CREATE TABLE [dbo].[Document](
[DocumentID] [int] IDENTITY(1,1) NOT NULL,
[DocumentName] [varchar](max) NOT NULL,
[ContentType] [varchar](50) NULL,
[DocumentText] [varchar](max) NULL,
[DateCreated] [datetime] NULL,
[DocumentTypeId] [int] NULL,
[Note] [varchar](8000) NULL,
[RefID] [int] NULL,
[Version] [int] NULL,
[Active] [bit] NULL,
[OriginalDocID] [int] NULL
)
You'll need to use a Recursive CTE to do this. That's a query that refers back to itself so it can traverse a hierarchy and gather information as it works it's way down (or up) the levels of that hierarchy.
In your case, something like:
WITH RECURSIVE docCTE AS
(
/* Recursive Seed */
SELECT
cast(null as int) as parentdoc
documentID,
0 as depth,
documentid as originalDocument,
CAST(null as varchar(100) as docpath
FROM
dbo.document
Where originalDocID IS NULL
UNION ALL
/* Recursive Term */
SELECT
docCTE.DocumentID as parentdoc,
document.documentID,
depth + 1 as depth,
docCTE.originalDocument,
docCTE.Path + '>' + document.documentID
FROM
docCTE
INNER JOIN dbo.document on doccte.document = document.originalDocID
WHERE
depth <= 15 /*Keep it from cycling in case of bad hierarchy*/
)
SELECT * FROM docCTE;
The recursive CTE is made up of two parts.
The recursive seed, which is what we use to kick of the query. This is all document records where the originalDocID is null.
The recursive term, where we join the table back to the recursive CTE establishing the parent/child relationship.
In your case we capture the documentid in the Recursive Seed as the originalDoc so that we can bring that down through each record found when we start traversing the hierarchy of documents.
These can be a little overwhelming when you get started, but after you write it a few times, it's second nature (and you'll find the really really helpful as you encounter more of this type of data).

Why database does not recognize the columns and table?

I have created a table in there is data in the table,
I can insert, update ect.
CREATE TABLE [dbo].[cse_reports_month](
[report_id] [int] IDENTITY(1,1) NOT NULL,
[starburst_dept_name] [varchar](50) NULL,
....more collumns
[shooting_total_stars] [float] NULL,
) ON [PRIMARY]
but for some reason when I hover over the columns name, for example
select the top 1000 rows :
SELECT TOP 1000 [report_id]
,[starburst_dept_name]
,[starburst_dept_average]
...... more columns
,[rising_star_dept_name]
FROM [Intranet].[dbo].[cse_reports_month]
It says "invalid column name 'starburst_dept_name'"
It like it does not requinize it, but all my other tables are
good.
Wondering why it does not recognize the columns?
I have
Microsoft SQL Server Management Studio 10.0.2531.0

Creating trigger in SQL Server 2005 (has to work in 2008 too) to prevent duplicates?

I have table that I insert data with following query (from c# code):
INSERT INTO [BazaZarzadzanie].[dbo].[Wycena]
([KlienciPortfeleKontaID]
,[WycenaData]
,[WycenaTyp]
,[WycenaWartosc]
,[WycenaWaluta]
,[WycenaUzytkownik]
,[WycenaUzytkownikData])
VALUES
(#varKlienciPortfeleKontaID
,#varWycenaData
,#varWycenaTyp
,#varWycenaWartosc
,#varWycenaWaluta
,#varWycenaUzytkownik
,#varWycenaUzytkownikData)
Table creation script looks like this:
CREATE TABLE [dbo].[Wycena](
[KlienciPortfeleKontaID] [int] NULL,
[WycenaData] [datetime] NULL,
[WycenaTyp] [int] NULL,
[InID] [int] NULL,
[WycenaIlosc] [decimal](18, 2) NULL,
[WycenaCena] [decimal](18, 2) NULL,
[WycenaWartosc] [decimal](18, 2) NULL,
[WycenaWaluta] [nvarchar](3) NULL,
[WycenaUzytkownik] [nvarchar](50) NULL,
[WycenaUzytkownikData] [datetime] NULL
) ON [PRIMARY]
It also has couple of foreign keys but nothing that i could make primary/unique key. So i thought to prevent duplicates i would go for a trigger since to know one row is duplicate i actually have to test every single value of that row (well maybe not 2 last columns) ? This table has around 2mln rows.
Is this good idea? Or is there a better way?
Below is trigger I've created (not tested if it works):
CREATE TRIGGER [dbo].[trg_WycenaDuplicateCheck]
ON [dbo].[Wycena] FOR INSERT
AS
IF EXISTS(SELECT INSERTED.[KlienciPortfeleKontaID]
,INSERTED.[WycenaData]
,INSERTED.[WycenaTyp]
,INSERTED.[InID]
,INSERTED.[WycenaIlosc]
,INSERTED.[WycenaCena]
,INSERTED.[WycenaWartosc]
,INSERTED.[WycenaWaluta]
FROM INSERTED, Wycena
WHERE INSERTED.[KlienciPortfeleKontaID] = Wycena.[KlienciPortfeleKontaID]
AND INSERTED.[WycenaData] = Wycena.[WycenaData]
AND INSERTED.[WycenaTyp] = Wycena.[WycenaTyp]
AND INSERTED.[InID] = Wycena.[InID]
AND INSERTED.[WycenaIlosc] = Wycena.[WycenaIlosc]
AND INSERTED.[WycenaCena] = Wycena.[WycenaCena]
AND INSERTED.[WycenaWartosc] = Wycena.[WycenaWartosc]
AND INSERTED.[WycenaWaluta] = Wycena.[WycenaWaluta]
Group By INSERTED.[KlienciPortfeleKontaID]
,INSERTED.[WycenaData]
,INSERTED.[WycenaTyp]
,INSERTED.[InID]
,INSERTED.[WycenaIlosc]
,INSERTED.[WycenaCena]
,INSERTED.[WycenaWartosc]
,INSERTED.[WycenaWaluta]
HAVING COUNT (*) > 1)
BEGIN
RAISERROR('>>>DUPLICATES PREVENTED<<< ',10,1)
ROLLBACK TRAN
END
Create a "unique" index on the fields you care about.
CREATE UNIQUE INDEX IX_YOUR_FAVORITE_NAME
ON [dbo].[Wycena](... list of columns goes here ...)
Seems like you need to look at UNIQUE Constraints