SQL Server 'AS' alias unexpected syntax - sql

I've come across following T-SQL today:
select c from (select 1 union all select 1) as d(c)
that yields following result:
c
-----------
1
1
The part that got me confused was d(c)
While trying to understand what's going on I've modified T-SQL into:
select c, b from (select 1, 2 union all select 3, 4) m(c, b)
which yields following result:
c b
----------- -----------
1 2
3 4
It was clear that d & m are table reference while letters in brackets c & b are reference to columns.
I wasn't able to find relevant documentation on msdn, but curious if
You're aware of such syntax?
What would be useful use case scenario?

select c from (select 1 union all select 1) as d(c)
is the same as
select c from (select 1 as c union all select 1) as d
In the first query you did not name the column(s) in your subquery, but named them outside the subquery,
In the second query you name the column(s) inside the subquery
If you try it like this (without naming the column(s) in the subquery)
select c from (select 1 union all select 1) as d
You will get following error
No column name was specified for column 1 of 'd'
This is also in the Documentation
As for the usage, some like to write it the first method, some in the second, whatever you prefer. It's all the same

An observation: Using the table constructor values gives you no way of naming the columns, which makes it neccessary to use column naming after the table alias:
select * from
(values
(1,2) -- can't give a column name here
,(3,4)
) as tableName(column1,column2) -- gotta do it here

You've already had comments that point you to the documentation of how derived tables work, but not to answer you question regarding useful use cases for this functionality.
Personally I find this functionality to be useful whenever I want to create a set of addressable values that will be used extensively in your statement, or when I want to duplicate rows for whatever reason.
An example of addressable values would be a much more compelx version of the following, in which the calculated values in the v derived table can be used many times over via more sensible names, rather than repeated calculations that will be hard to follow:
select p.ProductName
,p.PackPricePlusVAT - v.PackCost as GrossRevenue
,etc
from dbo.Products as p
cross apply(values(p.UnitsPerPack * p.UnitCost
,p.UnitPrice * p.UnitsPerPack * 1.2
,etc
)
) as v(PackCost
,PackPricePlusVAT
,etc
)
and an example of being able to duplicate rows could be in creating an exception report for use in validating data, which will output one row for every DataError condition that the dbo.Product row satisfies:
select p.ProductName
,e.DataError
from dbo.Products as p
cross apply(values('Missing Units Per Pack'
,case when p.SoldInPacks = 1 and isnull(p.UnitsPerPack,0) < 1 then 1 end
)
,('Unusual Price'
,case when p.Price > (p.UnitsPerPack * p.UnitCost) * 2 then 1 end
)
,(etc)
) as e(DataError
,ErrorFlag
)
where e.ErrorFlag = 1
If you can understand what these two scripts are doing, you should find numerous examples of where being able to generate additional values or additional rows of data would be very helpful.

Related

most effecient way to search SQL table using result from same table until final value is found

In the image above which represents an SQL table I would like to search 1111 and retrieve its last replaced number which should be 4444 where 1111 is just a single number replaced by a single number 4444. then i would like to search 5555 which should return (6666,9999,8888). NB 9999 had replaced 7777.
so 1111 was a single part number replaced multiple times and 5555 was a group number with multiple parts breakdown with one replaced number within(7777>>9999).
What would be the fastest and most efficient method?
if possible a solution using SQL for efficiency.
if unable within SQL then from within PHP.
What I have tried:
1) while loop. but need to access database 1000 times for 1000 replaced numbers. ##too inefficient.
2)
SELECT C.RPLPART
FROM TABLE A
left join TABLE B on A.RPLPART=B.PART#
left join TABLE C on B.RPLPART=C.PART#
WHERE A.PART#='1111' ##Unable to know when last number is reached.
A recursive Common Table Expression (CTE) would seem to be the ticket.
Something like so...
with rcte (lvl, topPart, part#, rplpart) as
(select 1 as lvl, part#, part#, rplpart
from MYTABLE
union all
select p.lvl + 1, p.topPart, c.part#, c.rplpart
from rcte p, MYTABLE c
where p.rplpart = c.part#
)
select topPart, rplpart
from rcte
where toppart = 1111
order by lvl desc
fetch first row only;
You can do this using a recursive CTE that generates a complete replacement chain for a given starting part id, and then limit the result to just those that don't exist in the parts# column:
WITH cte(part) AS
(SELECT replpart FROM parts WHERE part# = 1111
UNION ALL
SELECT parts.replpart FROM parts, cte WHERE parts.part# = cte.part)
SELECT DISTINCT part
FROM cte
WHERE part NOT IN (SELECT part# FROM parts);
Fiddle example

Returning several values within a CASE expression in subquery and separate columns again in main query

My test table looks like this:
# select * from a;
source | target | id
--------+--------+----
1 | 2 | 1
2 | 3 | 2
3 | 0 | 3
My query is this one:
SELECT *
FROM (
SELECT
CASE
WHEN id<>1
THEN source
ELSE 0
END
AS source,
CASE
WHEN id<>1
THEN target
ELSE 0
END
AS target
FROM a
) x;
The query seems a bit odd because the CASE expression with the same criteria is repeated for every column. I would like to simplify this and tried the following, but it doesn't work as expected.
SELECT *
FROM (
SELECT
CASE
WHEN id<>1
THEN (source, target)
ELSE (0, 0)
END
AS r
FROM a
) x;
It yields one column with a row value, but I would rather get the two original columns. Separating them with a (r).* or similar doesn't work, because the "record type has not been registered".
I found several questions here with solutions regarding functions returning RECORD values, but none regarding this example with a sub-select.
Actually, there is a quite long list of columns, so repeating the same CASE expression many times makes the whole query quite unreadable.
Since the real problem - as opposed to this simplified case - consists of several CASE expressions and several column groups, a solution with a UNION won't help, because the number of UNIONs would be large and make it unreadable as well as several CASEs.
My actual question is: How can I get the original columns from the row value?
This answers the original question.
If I understood your needs, you want 0 and 0 for source and target when id = 1:
SELECT
0 AS source,
0 AS target
FROM tablename
WHERE id = 1
UNION ALL
SELECT
source,
target
FROM tablename
WHERE id <> 1
Revised answer: You can make your query work (fixing the record type has not been registered issue) by creating a TYPE:
CREATE TYPE stpair AS (source int, target int);
And cast the composite value column to that type:
SELECT id, (cv).source, (cv).target
FROM (
SELECT id, CASE
WHEN id <> 1 THEN (source, target)::stpair
ELSE (0, 0)::stpair
END AS cv
FROM t
) AS x
Having said that, it should be far more convenient to use arrays:
SELECT id, av[1] AS source, av[2] AS target
FROM (
SELECT id, CASE
WHEN id <> 1 THEN ARRAY[source, target]
ELSE ARRAY[0, 0]
END AS av
FROM t
) AS x
Demo on db<>fiddle
Will this work for you?
select source,target,id from a where id <>1 union all select 0 as source,0 as target,id from a where id=1 order by id
I have used union all to included cases where multiple records may have ID=1

where clause with = sign matches multiple records while expected just one record

I have a simple inline view that contains 2 columns.
-----------------
rn | val
-----------------
0 | A
... | ...
25 | Z
I am trying to select a val by matching the rn randomly by using the dbms_random.value() method as in
with d (rn, val) as
(
select level-1, chr(64+level) from dual connect by level <= 26
)
select * from d
where rn = floor(dbms_random.value()*25)
;
My expectation is it should return one row only without failing.
But now and then I get multiple rows returned or no rows at all.
on the other hand,
>>select floor(dbms_random.value()*25) from dual connect by level <1000
returns a whole number for each row and I failed to see any abnormality.
What am I missing here?
The problem is that the random value is recalculated for each row. So, you might get two random values that match the value -- or go through all the values and never get a hit.
One way to get around this is:
select d.*
from (select d.*
from d
order by dbms_random.value()
) d
where rownum = 1;
There are more efficient ways to calculate a random number, but this is intended to be a simple modification to your existing query.
You also might want to ask another question. This question starts with a description of a table that is not used, and then the question is about a query that doesn't use the table. Ask another question, describing the table and the real problem you are having -- along with sample data and desired results.

select in sql server 2005

I have a table follow:
ID | first | end
--------------------
a | 1 | 3
b | 3 | 8
c | 8 | 10
I want to select follow:
ID | first | end
---------------------
a-c | 1 | 10
But i can't do it. Please! help me. Thanks!
This works for me:
SELECT MIN(t.id)+'-'+MAX(t.id) AS ID,
MIN(t.[first]) AS first,
MAX(t.[end]) AS [end]
FROM dbo.YOUR_TABLE t
But please, do not use reserved words like "end" for column names.
I believe you can do this using a recursive Common Table Expression as follows, especially if you're not expecting very long chains of records:
WITH Ancestors AS
(
SELECT
InitRow.[ID] AS [Ancestor],
InitRow.[ID],
InitRow.[first],
InitRow.[end],
0 AS [level],
'00000' + InitRow.[ID] AS [hacky_level_plus_ID]
FROM
YOUR_TABLE AS InitRow
WHERE
NOT EXISTS
(
SELECT * FROM YOUR_TABLE AS PrevRow
WHERE PrevRow.[end] = InitRow.[first]
)
UNION ALL
SELECT
ParentRow.Ancestor,
ChildRow.[ID],
ChildRow.[first],
ChildRow.[end],
ParentRow.level + 1 AS [level],
-- Avoids having to build the recursive structure more than once.
-- We know we will not be over 5 digits since CTEs have a recursion
-- limit of 32767.
RIGHT('00000' + CAST(ParentRow.level + 1 AS varchar(4)), 5)
+ ChildRow.[ID] AS [hacky_level_plus_ID]
FROM
Ancestors AS ParentRow
INNER JOIN YOUR_TABLE AS ChildRow
ON ChildRow.[first] = ParentRow.[end]
)
SELECT
Ancestors.Ancestor + '-' + SUBSTRING(MAX([hacky_level_plus_ID]),6,10) AS [IDs],
-- Without the [hacky_level_plus_ID] column, you need to do it this way:
-- Ancestors.Ancestor + '-' +
-- (SELECT TOP 1 Children.ID FROM Ancestors AS Children
-- WHERE Children.[Ancestor] = Ancestors.[Ancestor]
-- ORDER BY Children.[level] DESC) AS [IDs],
MIN(Ancestors.[first]) AS [first],
MAX(Ancestors.[end]) AS [end]
FROM
Ancestors
GROUP BY
Ancestors.Ancestor
-- If needed, add OPTION (MAXRECURSION 32767)
A quick explanation of what each part does:
The WITH Ancestors AS (...) clause creates a Common Table Expression (basically a subquery) with the name Ancestors. The first SELECT in that expression establishes a baseline: all the rows that have no matching entry prior to it.
Then, the second SELECT is where the recursion kicks in. Since it references Ancestors as part of the query, it uses the rows it has already added to the table and then performs a join with new ones from YOUR_TABLE. This will recursively find more and more rows to add to the end of each chain.
The last clause is the SELECT that uses this recursive table we've built up. It does a simple GROUP BY since we've saved off the original ID in the Ancestor column, so the start and end are a simple MIN and MAX.
The tricky part is figuring out the ID of the last row in the chain. There are two ways to do it, both illustrated in the query. You can either join back with the recursive table, in which case it will build the recursive table all over again, or you can attempt to keep track of the last item as you go. (If building the recursive list of chained records is expensive, you definitely want to minimize the number of times you need to do that.)
The way it keeps track as it goes is to keep track of its position in the chain (the level column -- notice how we add 1 each time we recurse), zero-pad it, and then stick the ID at the end. Then, getting the item with the max level is simply a MAX followed by stripping the level data out.
If the CTE has to recurse too much, it will generate an error, but I believe you can tweak that using the MAXRECURSION option. The default is 100. If you have to set it higher than that, you may want to consider not using a recursive CTE to do this.
This also doesn't handle malformed data very well. If you have two records with the same first or a record where first == end, then this won't work right and you may have to tweak the join conditions inside the CTE or go with another approach.
This isn't the only way to do it. I believe it would be easier to follow if you built a custom procedure and did all the steps manually. But this has the advantage of operating in a single statement.

SQL based data diff: longest common subsequence

I'm looking for research papers or writings in applying Longest Common Subsquence algorithm to SQL tables for obtaining a data diff view. Other sugestions on how to resolve a table diff problem are also welcomed. The challenge being that SQL tables have this nasty habit of geting rather BIG and applying straightforward algorithms designed for text processing may result in a program that never ends...
so given a table Original:
Key Content
1 This row is unchanged
2 This row is outdated
3 This row is wrong
4 This row is fine as it is
and the table New:
Key Content
1 This row was added
2 This row is unchanged
3 This row is right
4 This row is fine as it is
5 This row contains important additions
I need to find out the Diff:
+++ 1 This row was added
--- 2 This row is outdated
--- 3 This row is wrong
+++ 3 This row is right
+++ 5 This row contains important additions
If you export your tabls into csv files, you can use http://sourceforge.net/projects/csvdiff/
Quote:
csvdiff is a Perl script to diff/compare two csv files with the
possibility to select the separator. Differences will be shown like:
"Column XYZ in record 999" is different. After this, the actual and the
expected result for this column will be shown.
This is probably too simple for what you're after, and it's not research :-), but just conceptual. I imagine you're looking to compare different methods for processing overhead (?).
--This is half of what you don't want ( A )
SELECT o.Key FROM tbl_ORIGINAL o INNER JOIN tbl_NEW n WHERE o.Content = n.Content
--This is the other half of what you don't want ( B )
SELECT n.Key FROM tbl_ORIGINAL o INNER JOIN tbl_NEW n WHERE o.Content = n.Content
--This is half of what you DO want ( C )
SELECT '+++' as diff, n.key, Content FROM tbl_New n WHERE n.KEY NOT IN( B )
--This is the other half of what you DO want ( D )
SELECT '---' as diff, o.key, Content FROM tbl_Original o WHERE o.Key NOT IN ( A )
--Combining C & D
( C )
Union
( D )
Order By diff, key
Improvements...
try creating indexed views of the
base tables first
try reducing the length of the
content field to it's min for
uniqueness (trial/error), and then
use that shorter result to do your
comparisons
-- e.g. to get min length (1000 is arbitrary -- just need an exit)
declare #i int
set #i = 1
While i < 1000 and Exists (
Select Count(key), Left(content,#i) From Table Having Count(key) > 1 )
BEGIN
i = #i + 1
END