Select exactly 5 rows from a table - sql

I have an odd requirement which ideally should be solved in SQL, not the surrounding app.
I need to select exactly 5 rows regardless of how many are actually available. In practice the number of rows available will usually be less than 5 and on some rare occasions it will be more than 5. The "extra" rows should have null in every column.
The app is written in a technology that isn't Turing Complete. This requirement is much more difficult to solve in the app's code than you might imagine! To describe it, the app is effectively a transformer: It takes in a bunch of queries and spits out a report. So please understand the app is NOT written in a "programming language" in the traditional sense.
So for example, if I have a table:
A | B
-----
1 | X
2 | Y
3 | Z
Then a valid result would be
A | B
-----------
2 | Y
1 | X
3 | Z
null | null
null | null
I know this is an unusual requirement. Sadly it can't be solved in the application due to the technology being used.
Ideally this shouldn't require changes to the database but if there is no other way that changes can be arranged.
Any suggestions?

You can do something like this:
select top 5 a, b
from (select a, b, 1 as priority from t union all
select null, null, 2 cross join
(values(1, 2, 3, 4, 5)) v(5)
) x
order by priority;
That is, create dummy rows, append them, and then choose the first five.
I do think that this work should be done in the app, but you can do it in SQL.

Create Table #Test (A int, B int)
Insert #Test Values (1,1)
Insert #Test Values (2,1)
Insert #Test Values (3,1)
Select Top 5 * From
(
Select A, B From #Test
Union All
Select Null, Null
Union All
Select Null, Null
Union All
Select Null, Null
Union All
Select Null, Null
Union All
Select Null, Null
) A

Wrap this in a stored proc..
declare #rowcount int
select top 5* from dbo.test
set #rowcount=##rowcount
if #rowcount<5
Begin
select * from dbo.test
union all
select null from dbo.numbers where n<=5-#rowcount
End

If you use some sort of tally table (although the numbers themselves do not matter, only that the table has enough records), you can use it to create the dummy rows. e.g. using sys.columns:
select top 5 a,b from
(
select a, b, 0 ord from yourTable
union all
select null a,null b, 1 from sys.columns
) t
order by ord
The advantage of the tally would be that if you need another number of rows in the future, you only need to change the top x (provided the tally table has enough rows)

Get those 3 records from your table.
Take a counter variable.
and then from your code add the NULL content until your counter gets 5.

Related

SQL - List all pages in between record while maintaining ID key

I'm trying to come up with a useful way to list all pages in between the first of last page of a document into new rows while maintaining the ID number as a key, or cross reference. I have a few ways of getting pages in between, but I'm not exactly sure how to maintain the key in a programmatic way.
Example Input:
First Page Last Page ID
ABC_001 ABC_004 1
ABC_005 ABC_005 2
ABC_006 ABC_010 3
End Result:
All Pages ID
ABC_001 1
ABC_002 1
ABC_003 1
ABC_004 1
ABC_005 2
ABC_006 3
ABC_007 3
ABC_008 3
ABC_009 3
ABC_010 3
Any help is much appreciated. I'm using SQL mgmt studio.
One approach would be to set up a numbers table, that contains a list of numbers that you may possibly find in the column content:
CREATE TABLE numbers( idx INTEGER);
INSERT INTO numbers VALUES(1);
INSERT INTO numbers VALUES(2);
...
INSERT INTO numbers VALUES(10);
Now, assuming that all page values have 7 characters, with the last 3 being digits, we can JOIN the original table with the numbers table to generate the missing records:
SELECT
CONCAT(
SUBSTRING(t.First_Page, 1, 4),
REPLICATE('0', 3 - LEN(n.idx)),
n.idx
) AS [ALl Pages],
t.id
FROM
mytable t
INNER JOIN numbers n
ON n.idx >= CAST(SUBSTRING(t.First_Page, 5, 3) AS int)
AND n.idx <= CAST(SUBSTRING(t.Last_Page, 5, 3) AS int)
This demo on DB Fiddle with your sample data returns:
ALl Pages | id
:-------- | -:
ABC_001 | 1
ABC_002 | 1
ABC_003 | 1
ABC_004 | 1
ABC_005 | 2
ABC_006 | 3
ABC_007 | 3
ABC_008 | 3
ABC_009 | 3
ABC_010 | 3
To find all pages from First Page to Last Page per Book ID, CAST your page numbers from STRING to INTEGER, then add +1 to each page number until you reach the Last Page.
First, turn your original table into a table variable with the Integer data types using a TRY_CAST.
DECLARE #Book TABLE (
[ID] INT
,[FirstPage] INT
,[LastPage] INT
)
INSERT INTO #Book
SELECT [ID]
,TRY_CAST(RIGHT([FirstPage], 3) AS int) AS [FirstPage]
,TRY_CAST(RIGHT([LastPage], 3) AS int) AS [LastPage]
FROM [YourOriginalTable]
Set the maximum page that your pages will increment to using a variable. This will cap out your results to the correct number of pages. Otherwise your table would have many more rows than you need.
DECLARE #LastPage INT
SELECT #LastPage = MAX([LastPage]) FROM #Book
Turning a three-column table (ID, First Page, Last Page) into a two-column table (ID, Page) will require an UNPIVOT.
We're tucking that UNPIVOT into a CTE (Common Table Expression: basically a smart version of a temporary table (like a #TempTable or #TableVariable, but which you can only use once, and is a little more efficient in certain circumstances).
In addition to the UNPIVOT of your [First Name] and [Last Name] columns into a tall table, we're going to append every other combination of page number per ID using a UNION ALL.
;WITH BookCTE AS (
SELECT [ID]
,[Page]
FROM (SELECT [ID]
,[FirstPage]
,[LastPage]
FROM #Book) AS bp
UNPIVOT
(
[Page] FOR [Pages] IN ([FirstPage], [LastPage])
) AS up
UNION ALL
SELECT [ID], [Page] + 1 FROM BookCTE WHERE [Page] + 1 < #LastPage
)
Now that your data is held in a table format using a CTE with all combinations of [ID] and [Page] up to the maximum page in your #Book table, it's time to join your CTE with the #Book table.
SELECT DISTINCT
cte.ID
,cte.Page
FROM BookCTE AS cte
INNER JOIN #Book AS bk
ON bk.ID = cte.ID
WHERE cte.Page <= bk.[LastPage]
ORDER BY
cte.ID
,cte.Page
OPTION (MAXRECURSION 10000)
See also:
How to generate a range of numbers between two numbers (I based my code off of #Jayvee's answer)
Assigning variables using SET vs SELECT
SQL Server UNPIVOT
SQL Server CTE Basics
Recursive CTEs Explained
Note: will update with re-integrating string portion of FirstPage and LastPage (which I assume is based on book title). Stand by.

create a table of duplicated rows of another table using the select statement

I have a table with one column containing different integers.
For each integer in the table I would like to duplicate it as the number of digits -
For example:
12345 (5 digits):
1. 12345
2. 12345
3. 12345
4. 12345
5. 12345
I thought doing it using with recursion t (...) as () but I didn't manage, since I don't really understand how it works and what is happening "behind the scenes.
I don't want to use insert because I want it to be scalable and automatic for as many integers as needed in a table.
Any thoughts and an explanation would be great.
The easiest way is to join to a table with numbers from 1 to n in it.
SELECT n, x
FROM yourtable
JOIN
(
SELECT day_of_calendar AS n
FROM sys_calendar.CALENDAR
WHERE n BETWEEN 1 AND 12 -- maximum number of digits
) AS dt
ON n <= CHAR_LENGTH(TRIM(ABS(x)))
In my example I abused TD's builtin calendar, but that's not a good choice, as the optimizer doesn't know how many rows will be returned and as the plan must be a Product Join it might decide to do something stupid. So better use a number table...
Create a numbers table that will contain the integers from 1 to the maximum number of digits that the numbers in your table will have (I went with 6):
create table numbers(num int)
insert numbers
select 1 union select 2 union select 3 union select 4 union select 5 union select 6
You already have your table (but here's what I was using to test):
create table your_table(num int)
insert your_table
select 12345 union select 678
Here's the query to get your results:
select ROW_NUMBER() over(partition by b.num order by b.num) row_num, b.num, LEN(cast(b.num as char)) num_digits
into #temp
from your_table b
cross join numbers n
select t.num
from #temp t
where t.row_num <= t.num_digits
I found a nice way to perform this action. Here goes:
with recursive t (num,num_as_char,char_n)
as
(
select num
,cast (num as varchar (100)) as num_as_char
,substr (num_as_char,1,1)
from numbers
union all
select num
,substr (t.num_as_char,2) as num_as_char2
,substr (num_as_char2,1,1)
from t
where char_length (num_as_char2) > 0
)
select *
from t
order by num,char_length (num_as_char) desc

How do you find a missing number in a table field starting from a parameter and incrementing sequentially?

Let's say I have an sql server table:
NumberTaken CompanyName
2 Fred 3 Fred 4 Fred 6 Fred 7 Fred 8 Fred 11 Fred
I need an efficient way to pass in a parameter [StartingNumber] and to count from [StartingNumber] sequentially until I find a number that is missing.
For example notice that 1, 5, 9 and 10 are missing from the table.
If I supplied the parameter [StartingNumber] = 1, it would check to see if 1 exists, if it does it would check to see if 2 exists and so on and so forth so 1 would be returned here.
If [StartNumber] = 6 the function would return 9.
In c# pseudo code it would basically be:
int ctr = [StartingNumber]
while([SELECT NumberTaken FROM tblNumbers Where NumberTaken = ctr] != null)
ctr++;
return ctr;
The problem with that code is that is seems really inefficient if there are thousands of numbers in the table. Also, I can write it in c# code or in a stored procedure whichever is more efficient.
Thanks for the help
Fine, if this question isn't going to be closed, I may as well Copy and paste my answer from the other one:
I called my table Blank, and used the following:
declare #StartOffset int = 2
; With Missing as (
select #StartOffset as N where not exists(select * from Blank where ID = #StartOffset)
), Sequence as (
select #StartOffset as N from Blank where ID = #StartOffset
union all
select b.ID from Blank b inner join Sequence s on b.ID = s.N + 1
)
select COALESCE((select N from Missing),(select MAX(N)+1 from Sequence))
You basically have two cases - either your starting value is missing (so the Missing CTE will contain one row), or it's present, so you count forwards using a recursive CTE (Sequence), and take the max from that and add 1
Tables:
create table Blank (
ID int not null,
Name varchar(20) not null
)
insert into Blank(ID,Name)
select 2 ,'Fred' union all
select 3 ,'Fred' union all
select 4 ,'Fred' union all
select 6 ,'Fred' union all
select 7 ,'Fred' union all
select 8 ,'Fred' union all
select 11 ,'Fred'
go
I would create a temp table containing all numbers from StartingNumber to EndNumber and LEFT JOIN to it to receive the list of rows not contained in the temp table.
If NumberTaken is indexed you could do it with a join on the same table:
select T.NumberTaken -1 as MISSING_NUMBER
from myTable T
left outer join myTable T1
on T.NumberTaken= T1.NumberTaken+1
where T1.NumberTaken is null and t.NumberTaken >= STARTING_NUMBER
order by T.NumberTaken
EDIT
Edited to get 1 too
1> select 1+ID as ID from #b as b
where not exists (select 1 from #b where ID = 1+b.ID)
2> go
ID
-----------
5
9
12
Take max(1+ID) and/or add your starting value to the where clause, depending on what you actually want.

How to do this data transformation

This is my input data
GroupId Serial Action
1 1 Start
1 2 Run
1 3 Jump
1 8 End
2 9 Shop
2 10 Start
2 11 Run
For each activitysequence in a group I want to Find pairs of Actions where Action1.SerialNo = Action2.SerialNo + k and how may times it happens
Suppose k = 1, then output will be
FirstAction NextAction Frequency
Start Run 2
Run Jump 1
Shop Start 1
How can I do this in SQL, fast enough given the input table contains millions of entries.
tful, This should produce the result you want, but I don't know if it will be as fast as you 'd like. It's worth a try.
create table Actions(
GroupId int,
Serial int,
"Action" varchar(20) not null,
primary key (GroupId, Serial)
);
insert into Actions values
(1,1,'Start'), (1,2,'Run'), (1,3,'Jump'),
(1,8,'End'), (2,9,'Shop'), (2,10,'Start'),
(2,11,'Run');
go
declare #k int = 1;
with ActionsDoubled(Serial,Tag,"Action") as (
select
Serial, 'a', "Action"
from Actions as A
union all
select
Serial-#k, 'b', "Action"
from Actions
as B
), Pivoted(Serial,a,b) as (
select Serial,a,b
from ActionsDoubled
pivot (
max("Action") for Tag in ([a],[b])
) as P
)
select
a, b, count(*) as ct
from Pivoted
where a is not NULL and b is not NULL
group by a,b
order by a,b;
go
drop table Actions;
If you will be doing the same computation for various #k values on stable data, this may work better in the long run:
declare #k int = 1;
select
Serial, 'a' as Tag, "Action"
into ActionsDoubled
from Actions as A
union all
select
Serial-#k, 'b', "Action"
from Actions
as B;
go
create unique clustered index AD_S on ActionsDoubled(Serial,Tag);
create index AD_a on ActionsDoubled(Tag,Serial);
go
with Pivoted(Serial,a,b) as (
select Serial,a,b
from ActionsDoubled
pivot (
max("Action") for Tag in ([a],[b])
) as P
)
select
a, b, count(*) as ct
from Pivoted
where a is not NULL and b is not NULL
group by a,b
order by a,b;
go
drop table ActionsDoubled;
SELECT a1.Action AS FirstActio, a2.Action AS NextAction, COUNT(*) AS Frequency
FROM Activities a1 JOIN Activities a2
ON (a1.GroupId = a2.GroupId AND a1.Serial = a2.Serial + #k)
GROUP BY a1.Action, a2.Action;
The problem is this: Your query has to go through EVERY row regardless.
You can make it more manageable for your database by tackling each group separately as separate queries. Especially if the size of each group is SMALL.
There's a lot going on under the hood and when the query has to do a scan of the entire table, this actually ends up being many times slower than if you did small chunks which effectively cover all million rows.
So for instance:
--Stickler for clean formatting...
SELECT
a1.Action AS FirstAction,
a2.Action AS NextAction,
COUNT(*) AS Frequency
FROM
Activities a1 JOIN Activities a2
ON (a1.groupid = a2.groupid
AND a1.Serial = a2.Serial + #k)
WHERE
a1.groupid = 1
GROUP BY
a1.Action,
a2.Action;
By the way, you have an index (GroupId, Serial) on the table, right?

Adding Row Numbers To a SELECT Query Result in SQL Server Without use Row_Number() function

i need Add Row Numbers To a SELECT Query without using Row_Number() function.
and without using user defined functions or stored procedures.
Select (obtain the row number) as [Row], field1, field2, fieldn from aTable
UPDATE
i am using SAP B1 DIAPI, to make a query , this system does not allow the use of rownumber() function in the select statement.
Bye.
I'm not sure if this will work for your particular situation or not, but can you execute this query with a stored procedure? If so, you can:
A) Create a temp table with all your normal result columns, plus a Row column as an auto-incremented identity.
B) Select-Insert your original query, sans the row column (SQL will fill this in automatically for you)
C) Select * on the temp table for your result set.
Not the most elegant solution, but will accomplish the row numbering you are wanting.
This query will give you the row_number,
SELECT
(SELECT COUNT(*) FROM #table t2 WHERE t2.field <= t1.field) AS row_number,
field,
otherField
FROM #table t1
but there are some restrictions when you want to use it. You have to have one column in your table (in the example it is field) which is unique and numeric and you can use it as a reference. For example:
DECLARE #table TABLE
(
field INT,
otherField VARCHAR(10)
)
INSERT INTO #table(field,otherField) VALUES (1,'a')
INSERT INTO #table(field,otherField) VALUES (4,'b')
INSERT INTO #table(field,otherField) VALUES (6,'c')
INSERT INTO #table(field,otherField) VALUES (7,'d')
SELECT * FROM #table
returns
field | otherField
------------------
1 | a
4 | b
6 | c
7 | d
and
SELECT
(SELECT COUNT(*) FROM #table t2 WHERE t2.field <= t1.field) AS row_number,
field,
otherField
FROM #table t1
returns
row_number | field | otherField
-------------------------------
1 | 1 | a
2 | 4 | b
3 | 6 | c
4 | 7 | d
This is the solution without functions and stored procedures, but as I said there are the restrictions. But anyway, maybe it is enough for you.
RRUZ, you might be able to hide the use of a function by wrapping your query in a View. It would be transparent to the caller. I don't see any other options, besides the ones already mentioned.