get number of rows select statement will fetch (without real fetching) - sql

Lets say I have many sql statements like this one:
select *
from [A]
where a in (
select a
from [B]
where b = 'c'
)
order by d;
Since my database is huge, I just need to determine how many rows this query will fetch. Of course I can really fetch all rows and count it but my intention is to avoid fetching since that would be a big overhead.
I have tried to extend query as follows:
select count (*)
from (
select *
from [A]
where a in (
select a
from [B]
where b = 'c'
)
order by d
) as table;
That works fine for some tables but for some (like this one in example) SQL server throws this:
The ORDER BY clause is invalid in views, inline functions, derived tables, subqueries, and common table expressions, unless TOP or FOR XML is also specified.
Consider that I'm not allowed to change any original query, I can just extend it...
Any idea?
Thanks.
EDIT: I'm pretty sure there is some solution related to ##ROWCOUNT field, but not sure how to use it...

Just remove the order by in the subquery. It doesn't affect the number of rows:
select count(*)
from (select *
from [A]
where a in (select a from [B] where b = 'c')
) as table;
Actually, this is better written as:
select count(*)
from [A]
where a in (select a from [B] where b = 'c')
That is, just replace the select * with select count(*).
Finally, if you have to keep the queries the same, then use top 100 percent:
select count(*)
from (select top 100 percent *
from [A]
where a in (select a from [B] where b = 'c')
order by d
) as table;
This does require changing the original queries, but in a way that does not affect what they output and does allow them to be used as ctes/subqueries.
You are allowed to use order by in subqueries when you also use top.
EDIT:
If you are using dynamic SQL, you might have to do something like:
#sql = 'select count(*) from (' +
(case when #sql not like 'SELECT TOP %'
then stuff(#sql, 1, 7, 'SELECT top 100 percent')
else #sql
end) +
+ ')';
The logic could be a bit more complicated if your SQL is not well formatted.

Related

postgres: COUNT, DISTINCT is not implemented for window functions

I am trying to use COUNT(DISTINC column) OVER(PARTITION BY column) when I am using COUNT + window function(OVER).
I get an error like the one in the title and can't get it to work.
I have looked into how to deal with this error, but I have not found an example of how to deal with such a complex query as the one below.
I cannot find an example of how to deal with such a complex query as shown below, and I am not sure how to handle it.
The COUNT part of the problem exists on line 65.
How can such a complex query be resolved without slowing down?
WITH RECURSIVE "cte" AS((
SELECT
"videos_productvideocomment"."id",
"videos_productvideocomment"."user_id",
"videos_productvideocomment"."video_id",
"videos_productvideocomment"."parent_id",
"videos_productvideocomment"."text",
"videos_productvideocomment"."commented_at",
"videos_productvideocomment"."edited_at",
"videos_productvideocomment"."created_at",
"videos_productvideocomment"."updated_at",
"videos_productvideocomment"."id" AS "root_id"
FROM
"videos_productvideocomment"
WHERE
(
"videos_productvideocomment"."parent_id" IS NULL
AND "videos_productvideocomment"."video_id" = 'f264433c-c0af-49cc-8b40-84453da71b2d'
)
) UNION(
SELECT
"videos_productvideocomment"."id",
"videos_productvideocomment"."user_id",
"videos_productvideocomment"."video_id",
"videos_productvideocomment"."parent_id",
"videos_productvideocomment"."text",
"videos_productvideocomment"."commented_at",
"videos_productvideocomment"."edited_at",
"videos_productvideocomment"."created_at",
"videos_productvideocomment"."updated_at",
"cte"."root_id" AS "root_id"
FROM
"videos_productvideocomment"
INNER JOIN
"cte"
ON "videos_productvideocomment"."parent_id" = "cte"."id"
))
SELECT
*,
EXISTS(
SELECT
(1) AS "a"
FROM
"videos_productvideolikecomment" U0
WHERE
(
U0."comment_id" = t."id"
AND U0."user_id" = '3bd3bc86-0335-481e-9fd2-eb2fb1168f48'
)
LIMIT 1
) AS "liked"
FROM
(
SELECT DISTINCT
"cte"."id",
"cte"."created_at",
"cte"."updated_at",
"cte"."user_id",
"cte"."text",
"cte"."commented_at",
"cte"."edited_at",
"cte"."parent_id",
"cte"."video_id",
"cte"."root_id" AS "root_id",
COUNT(DISTINCT "cte"."root_id") OVER(PARTITION BY "cte"."root_id") AS "reply_count", <--- here
COUNT("videos_productvideolikecomment"."id") OVER(PARTITION BY "cte"."id") AS "liked_count"
FROM
"cte"
LEFT OUTER JOIN
"videos_productvideolikecomment"
ON (
"cte"."id" = "videos_productvideolikecomment"."comment_id"
)
) t
WHERE
t."id" = t."root_id"
ORDER BY
CASE
WHEN t."user_id" = '3bd3bc86-0335-481e-9fd2-eb2fb1168f48' THEN 0
ELSE 1
END ASC,
"liked_count" DESC
DISTINCT will look for duplicates and remove it, but in big data it will take a lot of time to process this query, you should process the middle of the record in the programming part I think it will be fast than. Thank

Conditional statements in "WHERE" in SQL Server

What I want to achieve is to have a switch case in the where clause. I want to test if this statement returns something, if it returns null, use this instead.
Sample:
SELECT [THIS_COLUMN]
FROM [THIS_TABLE]
WHERE (IF THIS [ID] RETURNS NULL THEN DO THIS SUBQUERY)
What I mean is that it will do this query first.
SELECT [THIS_COLUMN]
FROM [THIS_TABLE]
WHERE [ID] = 'SOMETHING'
If this returns NULL, do this query instead:
SELECT [THIS_COLUMN]
FROM [THIS_TABLE]
WHERE ID = (SELECT [SOMETHING] FROM [OTHER_TABLE]
WHERE [SOMETHING_SPECIFIC] = 'SOMETHING SPECIFIC')
Note that the expected results from the intended query varies from 30 rows up to 15k rows. Hope it helps.
Adding more information:
The results for this query will be used for another query but will just focus on this query.
Providing a real case scenario:
[THIS_COLUMN] is expected to have a list of VALUES.
[THIS_TABLE] contains the latest data only(let's say 1 year's worth of data) while the [OTHER_TABLE] contains the historical data.
What I want to achieve is when I query for a data that is not with in the 1 year's worth of data, IE 'SOMETHING' is not with in the 1 year scope(or in my case it returns NULL), I will use the other query where I query the 'SOMETHING_SPECIFIC'(Or may be 'SOMETHING' from the first statement makes more sense) from the historical table.
If I as reading through the lines correctly, this might work:
SELECT THIS_COLUMN
FROM dbo.THIS_TABLE TT
WHERE TT.ID = 'SOMETHING'
OR TT.ID = (SELECT OT.SOMETHING
FROM dbo.OTHER_TABLE OT
WHERE OT.SOMETHING_SPECIFIC = 'SOMETHING SPECIFIC'
AND NOT EXISTS (SELECT 1
FROM dbo.THIS_TABLE sq
WHERE sq.ID = 'SOMETHING'
AND THIS_COLUMN IS NOT NULL))
Note, however, that this could easily not be particularly performant.
You an use union all and not exists:
select this_column
from this_table
where id = 'something'
union all
select this_column
from this_table
where
not exists (select this_column from this_table where id = 'something')
and id = (select something from other_table where something_specific = 'something specific')
The first union member attempts to find rows that match the first condition, while the other one uses the subquery - the not exists prevents the second member to return something if the first member found a match.
90% of the time you can use a query-batch (i.e. a sequence of T-SQL statements) in a single SqlCommand object or SQL Server client session, so with that in-mind you could do this:
DECLARE #foo nvarchar(50) = (
SELECT
[THIS_COLUMN]
FROM
[THIS_TABLE]
WHERE
[ID] = 'SOMETHING'
);
IF #foo IS NULL
BEGIN
SELECT
[THIS_COLUMN]
FROM
[THIS_TABLE]
WHERE
[ID] = (
SELECT
[SOMETHING]
FROM
[OTHER_TABLE]
WHERE
[SOMETHING_SPECIFIC] = 'SOMETHING SPECIFIC'
)
END
ELSE
BEGIN
SELECT #foo AS [THIS_COLUMN];
END
That said, SELECT ... FROM ... WHERE x IN ( SELECT y FROM ... ) is a code-smell in a query - you probably need to rethink your solution entirely.

MS SQL does not return the expected top row when ordering by DIFFERENCE()

I have noticed strange behaviour in some SQL code used for address matching at the company I work for & have created some test SQL to illustrate the issue.
; WITH Temp (Id, Diff) AS (
SELECT 9218, 0
UNION
SELECT 9219, 0
UNION
SELECT 9220, 0
)
SELECT TOP 1 * FROM Temp ORDER BY Diff DESC
Returns 9218 but
; WITH Temp (Id, Name) AS (
SELECT 9218, 'Sonnedal'
UNION
SELECT 9219, 'Lammermoor'
UNION
SELECT 9220, 'Honeydew'
)
SELECT TOP 1 *, DIFFERENCE(Name, '') FROM Temp ORDER BY DIFFERENCE(Name, '') DESC
returns 9219 even though the Difference() is 0 for all records as you can see here:
; WITH Temp (Id, Name) AS (
SELECT 9218, 'Sonnedal'
UNION
SELECT 9219, 'Lammermoor'
UNION
SELECT 9220, 'Honeydew'
)
SELECT *, DIFFERENCE(Name, '') FROM Temp ORDER BY DIFFERENCE(Name, '') DESC
which returns
9218 Sonnedal 0
9219 Lammermoor 0
9220 Honeydew 0
Does anyone know why this happens? I am writing C# to replace existing SQL & need to return the same results so I can test that my code produces the same results. But I can't see why the actual SQL used returns 9219 rather than 9218 & it doesn't seem to make sense. It seems it's down to the Difference() function but it returns 0 for all the record in question.
When you call:
SELECT TOP 1 *, DIFFERENCE(Name, '')
FROM Temp l
ORDER BY DIFFERENCE(Name, '') DESC
All three records have a DIFFERENCE value of zero, and hence SQL Server is free to choose from any of the three records for ordering. That is to say, there is no guarantee which order you will get. The same is true for your second query. Actually, it is possible that the ordering for the same query could even change over time. In practice, if you expect a certain ordering, you should provide exact logic for it, e.g.
SELECT TOP 1 *
FROM Temp
ORDER BY Id;

Yield Return equivalent in SQL Server

I am writing down a view in SQL server (DWH) and the use case pseudo code is:
-- Do some calculation and generate #Temp1
-- ... contains other selects
-- Select statement 1
SELECT * FROM Foo
JOIN #Temp1 tmp on tmp.ID = Foo.ID
WHERE Foo.Deleted = 1
-- Do some calculation and generate #Temp2
-- ... contains other selects
-- Select statement 2
SELECT * FROM Foo
JOIN #Temp2 tmp on tmp.ID = Foo.ID
WHERE Foo.Deleted = 1
The result of the view should be:
Select Statement 1
UNION
Select Statement 2
The intended behavior is the same as the yield returnin C#. Is there a way to tell the view which SELECT statements are actually part of the result and which are not? since the small calculations preceding what I need also contain selects.
Thank you!
Yield return in C# returns rows one at a time as they appear in some underlying function. This concept does not exist in SQL statements. SQl is set-based, returning the entire result set, conceptually as a unit. (That said, sometimes queries run slowly and you will see rows returned slowly or in batches.)
You can control the number of rows being returns using TOP (in SQL Server). You can select particular rows to be returned using WHERE statements. However, you cannot specify a UNION statement that conditionally returns rows from some components but not others.
The closest you may be able to come is something like:
if UseTable1Only = 'Y'
select *
from Table1
else if UseTable2Only = 'Y'
select *
from Table2
else
select *
from table1
union
select *
from table2
You can do something similar using dynamic SQL, by constructing the statement as a string and then executing it.
I found a better work around. It might be helpful for someone else. It is actually to include all the calculation inside WITH statements instead of doing them in the view core:
WITH Temp1 (ID)
AS
(
-- Do some calculation and generate #Temp1
-- ... contains other selects
)
, Temp2 (ID)
AS
(
-- Do some calculation and generate #Temp2
-- ... contains other selects
)
-- Select statement 1
SELECT * FROM Foo
JOIN Temp1 tmp on tmp.ID = Foo.ID
WHERE Foo.Deleted = 1
UNION
-- Select statement 2
SELECT * FROM Foo
JOIN Temp2 tmp on tmp.ID = Foo.ID
WHERE Foo.Deleted = 1
The result will be of course the UNION of all the outiside SELECT statements.

Reuse subquery result in WHERE-Clause for INSERT

i am using Microsoft SQL Server 2008
i would like to save the result of a subquery to reuse it in a following subquery.
Is this possible?
What is best practice to do this? (I am very new to SQL)
My query looks like:
INSERT INTO [dbo].[TestTable]
(
[a]
,[b]
)
SELECT
(
SELECT TOP 1 MAT_WS_ID
FROM #TempTableX AS X_ALIAS
WHERE OUTERBASETABLE.LT_ALL_MATERIAL = X_ALIAS.MAT_RM_NAME
)
,(
SELECT TOP 1 MAT_WS_NAME
FROM #TempTableY AS Y_ALIAS
WHERE Y_ALIAS.MAT_WS_ID = MAT_WS_ID
--(
--SELECT TOP 1 MAT_WS_ID
--FROM #TempTableX AS X_ALIAS
--WHERE OUTERBASETABLE.LT_ALL_MATERIAL = X_ALIAS.MAT_RM_NAME
--)
)
FROM [dbo].[LASERTECHNO] AS OUTERBASETABLE
My question is:
Is this correct what i did.
I replaced the second SELECT Statement in the WHERE-Clause for [b] (which is commented out and exactly the same as for [a]), with the result of the first SELECT Statement of [a] (=MAT_WS_ID).
It seems to give the right results.
But i dont understand why!
I mean MAT_WS_ID is part of both temporary tables X_ALIAS and Y_ALIAS.
So in the SELECT statement for [b], in the scope of the [b]-select-query, MAT_WS_ID could only be known from the Y_ALIAS table. (Or am i wrong, i am more a C++, maybe the scope things in SQL and C++ are totally different)
I just wannt to know what is the best way in SQL Server to reuse an scalar select result.
Or should i just dont care and copy the select for every column and the sql server optimizes it by its own?
One approach would be outer apply:
SELECT mat.MAT_WS_ID
, (
SELECT TOP 1 MAT_WS_NAME
FROM #TempTableY AS Y_ALIAS
WHERE Y_ALIAS.MAT_WS_ID = mat.MAT_WS_ID
)
FROM [dbo].[LASERTECHNO] AS OUTERBASETABLE
OUTER APPLY
(
SELECT TOP 1 MAT_WS_ID
FROM #TempTableX AS X_ALIAS
WHERE OUTERBASETABLE.LT_ALL_MATERIAL = X_ALIAS.MAT_RM_NAME
) as mat
You could rank rows in #TempTableX and #TempTableY partitioning them by MAT_RM_NAME in the former and by MAT_WS_ID in the latter, then use normal joins with filtering by rownum = 1 in both tables (rownum being the column containing the ranking numbers in each of the two tables):
WITH x_ranked AS (
SELECT
*,
rownum = ROW_NUMBER() OVER (PARTITION BY MAT_RM_NAME ORDER BY (SELECT 1))
FROM #TempTableX
),
y_ranked AS (
SELECT
*,
rownum = ROW_NUMBER() OVER (PARTITION BY MAT_WS_ID ORDER BY (SELECT 1))
FROM #TempTableY
)
INSERT INTO dbo.TestTable (a, b)
SELECT
x.MAT_WS_ID,
y.MAT_WS_NAME
FROM dbo.LASERTECHNO t
LEFT JOIN x_ranked x ON t.LT_ALL_MATERIAL = x.MAT_RM_NAME AND x.rownum = 1
LEFT JOIN y_ranked y ON x.MAT_WS_ID = y.MAT_WS_ID AND y.rownum = 1
;
The ORDER BY (SELECT 1) bit is a trick to specify an indeterminate ordering, which, accordingly, would result in indeterminate rownum = 1 rows picked by the query. That is to more or less duplicate your TOP 1 without an explicit order, but I would recommend you to specify a more sensible ORDER BY clause to make the results more predictable.