Loop over SQL commands in a file - sql

I have an SQL file that looks like this (clearly the real thing is a bit longer and actualy does stuff :))
DECLARE #Mandatory int = 0
DECLARE #Fish int = 3
DECLARE #InitialPriceID int
if #Mandatory= 0
begin
select #InitialPriceID = priceID from Fishes where FishID = #Fish
end
I have a file of 'Mandatory' and 'Fish' values
Mandatory,Fish
1,3
0,4
1,4
1,3
1,7
I need to write a program that will produce an SQL file (or files) for our DBO to run against the database. but I am not quite sure how to approach the problem...
Cheers

You should generally prefer set based solutions. I've no idea what the full solution would look like, but from the start you've given:
declare #Values table (Mandatory int,Fish int)
insert into #Values(Mandatory,Fish) values
(1,3),
(0,4),
(1,4),
(1,3),
(1,7),
;with Prices as (
select
Mandatory,
Fish,
CASE
WHEN Mandatory = 0 THEN f.PriceID
ELSE 55 /* Calculation for Mandatory = 1? */
END as InitialPriceID
from
#Values v
left join /* Or inner join? */
Fishes f
on
v.Fish = f.Fish
) select * from Prices
You should aim to compute all of the results in one go, rather than trying to "loop through" each calculation. SQL works better this way.

At the risk of over-simplifying things in C# or similar you could use a string processing approach:
class Program
{
static void Main(string[] args)
{
var sb = new StringBuilder();
foreach(var line in File.ReadLines(#"c:\myfile.csv"))
{
string[] values = line.Split(',');
int mandatory = Int32.Parse(values[0]);
int fish = Int32.Parse(values[1]);
sb.AppendLine(new Foo(mandatory, fish).ToString());
}
File.WriteAllText("#c:\myfile.sql", sb.ToString());
}
private sealed class Foo
{
public Foo(int mandatory, int fish)
{
this.Mandatory = mandatory;
this.Fish = fish;
}
public int Mandatory { get; private set; }
public int Fish { get; set; }
public override string ToString()
{
return String.Format(#"DECLARE #Mandatory int = {0}
DECLARE #Fish int = {1}
DECLARE #InitialPriceID int
if #Mandatory=
begin
select #InitialPriceID = priceID from Fishes where FishID = #Fish
end
", this.Mandatory, this.Fish);
}
}
}

There are many article on how to read from a text file through t-sql, check "Stored Procedure to Open and Read a text file" on SO and if you can make change the format of you input files into xml, then you can check SQL SERVER – Simple Example of Reading XML File Using T-SQL

Related

Android custom keyboard suggestions

I am building a custom keyboard for android, the one that atleast supports autocomplete suggestions. To achieve this, I am storing every word that user types (not password fields) in a Room database table which has a simple model, the word and its frequency. Now for showing the suggestions, I am using a Trie which is populated by words from this database table. My query is basically to order by the table based on frequency of the word and limit the results to 5K (I do not feel like overpopulating the Trie, these 5K words can be considered as the users' favourite words that he uses often and needs suggestions for). Now my actual problem is the ORDER BY clause, this is a rapidly growing data set, sorting lets say 0.1M words to get 5K words seems like an overkill. How can i rework this approach to improve the efficiency of this entire suggestions logic.
If not already implemented, an index on the frequency #ColumnInfo(index = true).
Another could be to add a table that maintains the highest 5k. Supported by yet another table (the support table) that has 1 row, with columns for; the highest frequency (not really required), the lowest frequency in the current 5k, and a 3rd column for the number currently held. So you could then, after adding an existing word get whether or not the new/updated word should be added to the 5k table (perhaps a 4th column for the primary key of the lowest to facilitate efficient deletion).
So
if the number currently held is less than 5k insert or update the 5k table and increment the number currently held in the support table.
otherwise if the number is lower then the lowest skip it.
otherwise update if it already exists.
otherwise delete the lowest, insert the replacement and then update the lowest accordingly in the support table.
Note that the 5K table would probably only need to store the rowid as a pointer/reference/map to the core table.
rowid is a column that virtually all tables will have in Room (virtual tables an exception as are table that have the WITHOUT ROWID attribute but Room does not facilitate (as far as I am aware) WITHOUT ROWID table).
The rowid can be up to twice as fast as other indexes. I would suggest using #PrimaryKey Long id=null; (java) or #PrimaryKey var id: Long?=null (Kotlin) and NOT using #PrimaryKey(autogenerate = true).
autogenerate = true equates to SQLite's AUTOINCREMENT, about which the SQLite documentation says "The AUTOINCREMENT keyword imposes extra CPU, memory, disk space, and disk I/O overhead and should be avoided if not strictly needed. It is usually not needed."
see https://www.sqlite.org/rowidtable.html, and also https://sqlite.org/autoinc.html
curiously/funnily the support table mentioned isn't that far away from what coding AUTOINCREMENT does.
a table with a row per table that has AUTOINCREMENT, is used (sqlite_sequence) that stores the table name and the highest ever allocated rowid.
Without AUTOINCREMENT but with <column_name> INTEGER PRIMARY KEY and no value or null for the primary key column's value then SQLite generates a value that is 1 greater than max(rowid).
With AUTOINCREMENT/autogenerate=true then the generated value is the greater of max(rowid) and the value stored, for that table, in the sqlite_sequence table (hence the overheads).
of course those overheads, will very likely be insignificant in comparison to sorting 0.1M rows.
Demonstration
The following is a demonstration albeit just using a basic Word table as the source.
First the 2 tables (#Entity annotated classes)
Word
#Entity (
indices = {#Index(value = {"word"},unique = true)}
)
class Word {
#PrimaryKey Long wordId=null;
#NonNull
String word;
#ColumnInfo(index = true)
long frequency;
Word(){}
#Ignore
Word(String word, long frequency) {
this.word = word;
this.frequency = frequency;
}
}
WordSubset aka the table with the highest occurring 5000 frequencies, it simply has a reference/map/link to the underlying/actual word. :-
#Entity(
foreignKeys = {
#ForeignKey(
entity = Word.class,
parentColumns = {"wordId"},
childColumns = {"wordIdMap"},
onDelete = ForeignKey.CASCADE,
onUpdate = ForeignKey.CASCADE
)
}
)
class WordSubset {
public static final long SUBSET_MAX_SIZE = 5000;
#PrimaryKey
long wordIdMap;
WordSubset(){};
#Ignore
WordSubset(long wordIdMap) {
this.wordIdMap = wordIdMap;
}
}
note the constant SUBSET_MAX_SIZE, hard coded just the once so a simple single change to adjust (lowering it after rows have been added may cause issues)
WordSubsetSupport this will be a single row table that contains the highest and lowest frequencies (highest is not really needed), the number of rows in the WordSubset table and a reference/map to the word with the lowest frequency.
#Entity(
foreignKeys = {
#ForeignKey(
entity = Word.class,
parentColumns = {"wordId"},
childColumns = {"lowestWordIdMap"}
)
}
)
class WordSubsetSupport {
#PrimaryKey
Long wordSubsetSupportId=null;
long highestFrequency;
long lowestFrequency;
long countOfRowsInSubsetTable;
#ColumnInfo(index = true)
long lowestWordIdMap;
WordSubsetSupport(){}
#Ignore
WordSubsetSupport(long highestFrequency, long lowestFrequency, long countOfRowsInSubsetTable, long lowestWordIdMap) {
this.highestFrequency = highestFrequency;
this.lowestFrequency = lowestFrequency;
this.countOfRowsInSubsetTable = countOfRowsInSubsetTable;
this.lowestWordIdMap = lowestWordIdMap;
this.wordSubsetSupportId = 1L;
}
}
For access an abstract class (rather than interface, as this, in Java, allows methods/functions with a body, a Kotlin interface allows these) CombinedDao :-
#Dao
abstract class CombinedDao {
#Insert(onConflict = OnConflictStrategy.IGNORE)
abstract long insert(Word word);
#Insert(onConflict = OnConflictStrategy.IGNORE)
abstract long insert(WordSubset wordSubset);
#Insert(onConflict = OnConflictStrategy.IGNORE)
abstract long insert(WordSubsetSupport wordSubsetSupport);
#Query("SELECT * FROM wordsubsetsupport LIMIT 1")
abstract WordSubsetSupport getWordSubsetSupport();
#Query("SELECT count() FROM wordsubsetsupport")
abstract long getWordSubsetSupportCount();
#Query("SELECT countOfRowsInSubsetTable FROM wordsubsetsupport")
abstract long getCountOfRowsInSubsetTable();
#Query("UPDATE wordsubsetsupport SET countOfRowsInSubsetTable=:updatedCount")
abstract void updateCountOfRowsInSubsetTable(long updatedCount);
#Query("UPDATE wordsubsetsupport " +
"SET countOfRowsInSubsetTable = (SELECT count(*) FROM wordsubset), " +
"lowestWordIdMap = (SELECT word.wordId FROM wordsubset JOIN word ON wordsubset.wordIdMap = word.wordId ORDER BY frequency ASC LIMIT 1)," +
"lowestFrequency = (SELECT coalesce(min(frequency),0) FROM wordsubset JOIN word ON wordsubset.wordIdMap = word.wordId)," +
"highestFrequency = (SELECT coalesce(max(frequency),0) FROM wordsubset JOIN word ON wordsubset.wordIdMap = word.wordId)")
abstract void autoUpdateWordSupportTable();
#Query("DELETE FROM wordsubset WHERE wordIdMap= (SELECT wordsubset.wordIdMap FROM wordsubset JOIN word ON wordsubset.wordIdMap = word.wordId ORDER BY frequency ASC LIMIT 1)")
abstract void deleteLowestFrequency();
#Transaction
#Query("")
int addWord(Word word) {
/* try to add the add word, setting the wordId value according to the result.
The result will be the wordId generated (1 or greater) or if the word already exists -1
*/
word.wordId = insert(word);
/* If the word was added and not rejected as a duplicate, then it may need to be added to the WordSubset table */
if (word.wordId > 0) {
/* Are there any rows in the support table? if not then add the very first entry/row */
if (getWordSubsetSupportCount() < 1) {
/* Need to add the word to the subset */
insert(new WordSubset(word.wordId));
/* Can now add the first (and only) row to the support table */
insert(new WordSubsetSupport(word.frequency,word.frequency,1,word.wordId));
autoUpdateWordSupportTable();
return 1;
}
/* If there are less than the maximum number of rows in the subset table then
1) insert the new subset row, and
2) update the support table accordingly
*/
if (getCountOfRowsInSubsetTable() < WordSubset.SUBSET_MAX_SIZE) {
insert(new WordSubset(word.wordId));
autoUpdateWordSupportTable();
return 2;
}
/*
Last case is that the subset table is at the maximum number of rows and
the frequency of the added word is greater than the lowest frequency in the
subset, so
1) the row with the lowest frequency is removed from the subset table and
2) the added word is added to the subset
3) the support table is updated accordingly
*/
if (getCountOfRowsInSubsetTable() >= WordSubset.SUBSET_MAX_SIZE) {
WordSubsetSupport currentWordSubsetSupport = getWordSubsetSupport();
if (word.frequency > currentWordSubsetSupport.lowestFrequency) {
deleteLowestFrequency();
insert(new WordSubset(word.wordId));
autoUpdateWordSupportTable();
return 3;
}
}
return 4; /* indicates word added but does not qualify for addition to the subset */
}
return -1;
}
}
The addWord method/function is the only method that is used as this automatically maintains the WordSubset and the WordSubsetSupport tables.
TheDatabase is a pretty standard #Database annotated class, other than that it allows use the main thread for the sake of convenience and brevity of the demo:-
#Database( entities = {Word.class,WordSubset.class,WordSubsetSupport.class}, version = TheDatabase.DATABASE_VERSION, exportSchema = false)
abstract class TheDatabase extends RoomDatabase {
abstract CombinedDao getCombinedDao();
private static volatile TheDatabase instance = null;
public static TheDatabase getInstance(Context context) {
if (instance == null) {
instance = Room.databaseBuilder(context,TheDatabase.class,DATABASE_NAME)
.addCallback(cb)
.allowMainThreadQueries()
.build();
}
return instance;
}
private static Callback cb = new Callback() {
#Override
public void onCreate(#NonNull SupportSQLiteDatabase db) {
super.onCreate(db);
}
#Override
public void onOpen(#NonNull SupportSQLiteDatabase db) {
super.onOpen(db);
}
};
public static final String DATABASE_NAME = "the_database.db";
public static final int DATABASE_VERSION = 1;
}
Finally activity code that randomly generates and adds 10,000 words (or thereabouts as some could be duplicate words), each word having a frequency that is also randomly generated (between 1 and 10000) :-
public class MainActivity extends AppCompatActivity {
TheDatabase db;
CombinedDao dao;
#Override
protected void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.activity_main);
db = TheDatabase.getInstance(this);
dao = db.getCombinedDao();
for (int i=0; i < 10000; i++) {
Word currentWord = generateRandomWord();
Log.d("ADDINGWORD","Adding word " + currentWord.word + " frequency is " + currentWord.frequency);
dao.addWord(generateRandomWord());
}
}
public static final String ALPHABET = "abcdefghijklmnopqrstuvwxyz";
private Word generateRandomWord() {
Random r = new Random();
int wordLength = (abs(r.nextInt()) % 24) + 1;
int frequency = abs(r.nextInt()) % 10000;
StringBuilder sb = new StringBuilder();
for (int i=0; i < wordLength; i++) {
int letter = abs(r.nextInt()) % (ALPHABET.length());
sb.append(ALPHABET.substring(letter,letter+1));
}
return new Word(sb.toString(),frequency);
}
}
Obviously the results would differ per run, also the demo is only really designed to be run once (although it could be run more).
After running, using AppInspection, then
The support table (in this instance) is :-
So as countOfRowsInSubsetTable is 5000 then the subset table has been filled to it's capacity/limit.
The highest frequency encountered as 9999 (as could well be expected)
The lowest frequency in the subset is 4690 and that is for the word with the wordId that is 7412.
The subset table on it's own means little as it just contains a map to the actual word. So it's more informative to use a query to look at what it contains.
e.g.
As can be seen the query shows that the word who's wordId is 7412 is the one with the lowest frequency of 4690 (as expected according to the support table)
Going to the last page shows:-

Arrange the values in a column in ascending order [duplicate]

We have a large database on which we have DB side pagination. This is quick, returning a page of 50 rows from millions of records in a small fraction of a second.
Users can define their own sort, basically choosing what column to sort by. Columns are dynamic - some have numeric values, some dates and some text.
While most sort as expected text sorts in a dumb way. Well, I say dumb, it makes sense to computers, but frustrates users.
For instance, sorting by a string record id gives something like:
rec1
rec10
rec14
rec2
rec20
rec3
rec4
...and so on.
I want this to take account of the number, so:
rec1
rec2
rec3
rec4
rec10
rec14
rec20
I can't control the input (otherwise I'd just format in leading 000s) and I can't rely on a single format - some are things like "{alpha code}-{dept code}-{rec id}".
I know a few ways to do this in C#, but can't pull down all the records to sort them, as that would be to slow.
Does anyone know a way to quickly apply a natural sort in Sql server?
We're using:
ROW_NUMBER() over (order by {field name} asc)
And then we're paging by that.
We can add triggers, although we wouldn't. All their input is parametrised and the like, but I can't change the format - if they put in "rec2" and "rec10" they expect them to be returned just like that, and in natural order.
We have valid user input that follows different formats for different clients.
One might go rec1, rec2, rec3, ... rec100, rec101
While another might go: grp1rec1, grp1rec2, ... grp20rec300, grp20rec301
When I say we can't control the input I mean that we can't force users to change these standards - they have a value like grp1rec1 and I can't reformat it as grp01rec001, as that would be changing something used for lookups and linking to external systems.
These formats vary a lot, but are often mixtures of letters and numbers.
Sorting these in C# is easy - just break it up into { "grp", 20, "rec", 301 } and then compare sequence values in turn.
However there may be millions of records and the data is paged, I need the sort to be done on the SQL server.
SQL server sorts by value, not comparison - in C# I can split the values out to compare, but in SQL I need some logic that (very quickly) gets a single value that consistently sorts.
#moebius - your answer might work, but it does feel like an ugly compromise to add a sort-key for all these text values.
order by LEN(value), value
Not perfect, but works well in a lot of cases.
Most of the SQL-based solutions I have seen break when the data gets complex enough (e.g. more than one or two numbers in it). Initially I tried implementing a NaturalSort function in T-SQL that met my requirements (among other things, handles an arbitrary number of numbers within the string), but the performance was way too slow.
Ultimately, I wrote a scalar CLR function in C# to allow for a natural sort, and even with unoptimized code the performance calling it from SQL Server is blindingly fast. It has the following characteristics:
will sort the first 1,000 characters or so correctly (easily modified in code or made into a parameter)
properly sorts decimals, so 123.333 comes before 123.45
because of above, will likely NOT sort things like IP addresses correctly; if you wish different behaviour, modify the code
supports sorting a string with an arbitrary number of numbers within it
will correctly sort numbers up to 25 digits long (easily modified in code or made into a parameter)
The code is here:
using System;
using System.Data.SqlTypes;
using System.Text;
using Microsoft.SqlServer.Server;
public class UDF
{
[SqlFunction(DataAccess = DataAccessKind.None, IsDeterministic=true)]
public static SqlString Naturalize(string val)
{
if (String.IsNullOrEmpty(val))
return val;
while(val.Contains(" "))
val = val.Replace(" ", " ");
const int maxLength = 1000;
const int padLength = 25;
bool inNumber = false;
bool isDecimal = false;
int numStart = 0;
int numLength = 0;
int length = val.Length < maxLength ? val.Length : maxLength;
//TODO: optimize this so that we exit for loop once sb.ToString() >= maxLength
var sb = new StringBuilder();
for (var i = 0; i < length; i++)
{
int charCode = (int)val[i];
if (charCode >= 48 && charCode <= 57)
{
if (!inNumber)
{
numStart = i;
numLength = 1;
inNumber = true;
continue;
}
numLength++;
continue;
}
if (inNumber)
{
sb.Append(PadNumber(val.Substring(numStart, numLength), isDecimal, padLength));
inNumber = false;
}
isDecimal = (charCode == 46);
sb.Append(val[i]);
}
if (inNumber)
sb.Append(PadNumber(val.Substring(numStart, numLength), isDecimal, padLength));
var ret = sb.ToString();
if (ret.Length > maxLength)
return ret.Substring(0, maxLength);
return ret;
}
static string PadNumber(string num, bool isDecimal, int padLength)
{
return isDecimal ? num.PadRight(padLength, '0') : num.PadLeft(padLength, '0');
}
}
To register this so that you can call it from SQL Server, run the following commands in Query Analyzer:
CREATE ASSEMBLY SqlServerClr FROM 'SqlServerClr.dll' --put the full path to DLL here
go
CREATE FUNCTION Naturalize(#val as nvarchar(max)) RETURNS nvarchar(1000)
EXTERNAL NAME SqlServerClr.UDF.Naturalize
go
Then, you can use it like so:
select *
from MyTable
order by dbo.Naturalize(MyTextField)
Note: If you get an error in SQL Server along the lines of Execution of user code in the .NET Framework is disabled. Enable "clr enabled" configuration option., follow the instructions here to enable it. Make sure you consider the security implications before doing so. If you are not the db admin, make sure you discuss this with your admin before making any changes to the server configuration.
Note2: This code does not properly support internationalization (e.g., assumes the decimal marker is ".", is not optimized for speed, etc. Suggestions on improving it are welcome!
Edit: Renamed the function to Naturalize instead of NaturalSort, since it does not do any actual sorting.
I know this is an old question but I just came across it and since it's not got an accepted answer.
I have always used ways similar to this:
SELECT [Column] FROM [Table]
ORDER BY RIGHT(REPLICATE('0', 1000) + LTRIM(RTRIM(CAST([Column] AS VARCHAR(MAX)))), 1000)
The only common times that this has issues is if your column won't cast to a VARCHAR(MAX), or if LEN([Column]) > 1000 (but you can change that 1000 to something else if you want), but you can use this rough idea for what you need.
Also this is much worse performance than normal ORDER BY [Column], but it does give you the result asked for in the OP.
Edit: Just to further clarify, this the above will not work if you have decimal values such as having 1, 1.15 and 1.5, (they will sort as {1, 1.5, 1.15}) as that is not what is asked for in the OP, but that can easily be done by:
SELECT [Column] FROM [Table]
ORDER BY REPLACE(RIGHT(REPLICATE('0', 1000) + LTRIM(RTRIM(CAST([Column] AS VARCHAR(MAX)))) + REPLICATE('0', 100 - CHARINDEX('.', REVERSE(LTRIM(RTRIM(CAST([Column] AS VARCHAR(MAX))))), 1)), 1000), '.', '0')
Result: {1, 1.15, 1.5}
And still all entirely within SQL. This will not sort IP addresses because you're now getting into very specific number combinations as opposed to simple text + number.
RedFilter's answer is great for reasonably sized datasets where indexing is not critical, however if you want an index, several tweaks are required.
First, mark the function as not doing any data access and being deterministic and precise:
[SqlFunction(DataAccess = DataAccessKind.None,
SystemDataAccess = SystemDataAccessKind.None,
IsDeterministic = true, IsPrecise = true)]
Next, MSSQL has a 900 byte limit on the index key size, so if the naturalized value is the only value in the index, it must be at most 450 characters long. If the index includes multiple columns, the return value must be even smaller. Two changes:
CREATE FUNCTION Naturalize(#str AS nvarchar(max)) RETURNS nvarchar(450)
EXTERNAL NAME ClrExtensions.Util.Naturalize
and in the C# code:
const int maxLength = 450;
Finally, you will need to add a computed column to your table, and it must be persisted (because MSSQL cannot prove that Naturalize is deterministic and precise), which means the naturalized value is actually stored in the table but is still maintained automatically:
ALTER TABLE YourTable ADD nameNaturalized AS dbo.Naturalize(name) PERSISTED
You can now create the index!
CREATE INDEX idx_YourTable_n ON YourTable (nameNaturalized)
I've also made a couple of changes to RedFilter's code: using chars for clarity, incorporating duplicate space removal into the main loop, exiting once the result is longer than the limit, setting maximum length without substring etc. Here's the result:
using System.Data.SqlTypes;
using System.Text;
using Microsoft.SqlServer.Server;
public static class Util
{
[SqlFunction(DataAccess = DataAccessKind.None, SystemDataAccess = SystemDataAccessKind.None, IsDeterministic = true, IsPrecise = true)]
public static SqlString Naturalize(string str)
{
if (string.IsNullOrEmpty(str))
return str;
const int maxLength = 450;
const int padLength = 15;
bool isDecimal = false;
bool wasSpace = false;
int numStart = 0;
int numLength = 0;
var sb = new StringBuilder();
for (var i = 0; i < str.Length; i++)
{
char c = str[i];
if (c >= '0' && c <= '9')
{
if (numLength == 0)
numStart = i;
numLength++;
}
else
{
if (numLength > 0)
{
sb.Append(pad(str.Substring(numStart, numLength), isDecimal, padLength));
numLength = 0;
}
if (c != ' ' || !wasSpace)
sb.Append(c);
isDecimal = c == '.';
if (sb.Length > maxLength)
break;
}
wasSpace = c == ' ';
}
if (numLength > 0)
sb.Append(pad(str.Substring(numStart, numLength), isDecimal, padLength));
if (sb.Length > maxLength)
sb.Length = maxLength;
return sb.ToString();
}
private static string pad(string num, bool isDecimal, int padLength)
{
return isDecimal ? num.PadRight(padLength, '0') : num.PadLeft(padLength, '0');
}
}
Here's a solution written for SQL 2000. It can probably be improved for newer SQL versions.
/**
* Returns a string formatted for natural sorting. This function is very useful when having to sort alpha-numeric strings.
*
* #author Alexandre Potvin Latreille (plalx)
* #param {nvarchar(4000)} string The formatted string.
* #param {int} numberLength The length each number should have (including padding). This should be the length of the longest number. Defaults to 10.
* #param {char(50)} sameOrderChars A list of characters that should have the same order. Ex: '.-/'. Defaults to empty string.
*
* #return {nvarchar(4000)} A string for natural sorting.
* Example of use:
*
* SELECT Name FROM TableA ORDER BY Name
* TableA (unordered) TableA (ordered)
* ------------ ------------
* ID Name ID Name
* 1. A1. 1. A1-1.
* 2. A1-1. 2. A1.
* 3. R1 --> 3. R1
* 4. R11 4. R11
* 5. R2 5. R2
*
*
* As we can see, humans would expect A1., A1-1., R1, R2, R11 but that's not how SQL is sorting it.
* We can use this function to fix this.
*
* SELECT Name FROM TableA ORDER BY dbo.udf_NaturalSortFormat(Name, default, '.-')
* TableA (unordered) TableA (ordered)
* ------------ ------------
* ID Name ID Name
* 1. A1. 1. A1.
* 2. A1-1. 2. A1-1.
* 3. R1 --> 3. R1
* 4. R11 4. R2
* 5. R2 5. R11
*/
ALTER FUNCTION [dbo].[udf_NaturalSortFormat](
#string nvarchar(4000),
#numberLength int = 10,
#sameOrderChars char(50) = ''
)
RETURNS varchar(4000)
AS
BEGIN
DECLARE #sortString varchar(4000),
#numStartIndex int,
#numEndIndex int,
#padLength int,
#totalPadLength int,
#i int,
#sameOrderCharsLen int;
SELECT
#totalPadLength = 0,
#string = RTRIM(LTRIM(#string)),
#sortString = #string,
#numStartIndex = PATINDEX('%[0-9]%', #string),
#numEndIndex = 0,
#i = 1,
#sameOrderCharsLen = LEN(#sameOrderChars);
-- Replace all char that have the same order by a space.
WHILE (#i <= #sameOrderCharsLen)
BEGIN
SET #sortString = REPLACE(#sortString, SUBSTRING(#sameOrderChars, #i, 1), ' ');
SET #i = #i + 1;
END
-- Pad numbers with zeros.
WHILE (#numStartIndex <> 0)
BEGIN
SET #numStartIndex = #numStartIndex + #numEndIndex;
SET #numEndIndex = #numStartIndex;
WHILE(PATINDEX('[0-9]', SUBSTRING(#string, #numEndIndex, 1)) = 1)
BEGIN
SET #numEndIndex = #numEndIndex + 1;
END
SET #numEndIndex = #numEndIndex - 1;
SET #padLength = #numberLength - (#numEndIndex + 1 - #numStartIndex);
IF #padLength < 0
BEGIN
SET #padLength = 0;
END
SET #sortString = STUFF(
#sortString,
#numStartIndex + #totalPadLength,
0,
REPLICATE('0', #padLength)
);
SET #totalPadLength = #totalPadLength + #padLength;
SET #numStartIndex = PATINDEX('%[0-9]%', RIGHT(#string, LEN(#string) - #numEndIndex));
END
RETURN #sortString;
END
I know this is a bit old at this point, but in my search for a better solution, I came across this question. I'm currently using a function to order by. It works fine for my purpose of sorting records which are named with mixed alpha numeric ('item 1', 'item 10', 'item 2', etc)
CREATE FUNCTION [dbo].[fnMixSort]
(
#ColValue NVARCHAR(255)
)
RETURNS NVARCHAR(1000)
AS
BEGIN
DECLARE #p1 NVARCHAR(255),
#p2 NVARCHAR(255),
#p3 NVARCHAR(255),
#p4 NVARCHAR(255),
#Index TINYINT
IF #ColValue LIKE '[a-z]%'
SELECT #Index = PATINDEX('%[0-9]%', #ColValue),
#p1 = LEFT(CASE WHEN #Index = 0 THEN #ColValue ELSE LEFT(#ColValue, #Index - 1) END + REPLICATE(' ', 255), 255),
#ColValue = CASE WHEN #Index = 0 THEN '' ELSE SUBSTRING(#ColValue, #Index, 255) END
ELSE
SELECT #p1 = REPLICATE(' ', 255)
SELECT #Index = PATINDEX('%[^0-9]%', #ColValue)
IF #Index = 0
SELECT #p2 = RIGHT(REPLICATE(' ', 255) + #ColValue, 255),
#ColValue = ''
ELSE
SELECT #p2 = RIGHT(REPLICATE(' ', 255) + LEFT(#ColValue, #Index - 1), 255),
#ColValue = SUBSTRING(#ColValue, #Index, 255)
SELECT #Index = PATINDEX('%[0-9,a-z]%', #ColValue)
IF #Index = 0
SELECT #p3 = REPLICATE(' ', 255)
ELSE
SELECT #p3 = LEFT(REPLICATE(' ', 255) + LEFT(#ColValue, #Index - 1), 255),
#ColValue = SUBSTRING(#ColValue, #Index, 255)
IF PATINDEX('%[^0-9]%', #ColValue) = 0
SELECT #p4 = RIGHT(REPLICATE(' ', 255) + #ColValue, 255)
ELSE
SELECT #p4 = LEFT(#ColValue + REPLICATE(' ', 255), 255)
RETURN #p1 + #p2 + #p3 + #p4
END
Then call
select item_name from my_table order by fnMixSort(item_name)
It easily triples the processing time for a simple data read, so it may not be the perfect solution.
Here is an other solution that I like:
http://www.dreamchain.com/sql-and-alpha-numeric-sort-order/
It's not Microsoft SQL, but since I ended up here when I was searching for a solution for Postgres, I thought adding this here would help others.
EDIT: Here is the code, in case the link goes away.
CREATE or REPLACE FUNCTION pad_numbers(text) RETURNS text AS $$
SELECT regexp_replace(regexp_replace(regexp_replace(regexp_replace(($1 collate "C"),
E'(^|\\D)(\\d{1,3}($|\\D))', E'\\1000\\2', 'g'),
E'(^|\\D)(\\d{4,6}($|\\D))', E'\\1000\\2', 'g'),
E'(^|\\D)(\\d{7}($|\\D))', E'\\100\\2', 'g'),
E'(^|\\D)(\\d{8}($|\\D))', E'\\10\\2', 'g');
$$ LANGUAGE SQL;
"C" is the default collation in postgresql; you may specify any collation you desire, or remove the collation statement if you can be certain your table columns will never have a nondeterministic collation assigned.
usage:
SELECT * FROM wtf w
WHERE TRUE
ORDER BY pad_numbers(w.my_alphanumeric_field)
For the following varchar data:
BR1
BR2
External Location
IR1
IR2
IR3
IR4
IR5
IR6
IR7
IR8
IR9
IR10
IR11
IR12
IR13
IR14
IR16
IR17
IR15
VCR
This worked best for me:
ORDER BY substring(fieldName, 1, 1), LEN(fieldName)
If you're having trouble loading the data from the DB to sort in C#, then I'm sure you'll be disappointed with any approach at doing it programmatically in the DB. When the server is going to sort, it's got to calculate the "perceived" order just as you would have -- every time.
I'd suggest that you add an additional column to store the preprocessed sortable string, using some C# method, when the data is first inserted. You might try to convert the numerics into fixed-width ranges, for example, so "xyz1" would turn into "xyz00000001". Then you could use normal SQL Server sorting.
At the risk of tooting my own horn, I wrote a CodeProject article implementing the problem as posed in the CodingHorror article. Feel free to steal from my code.
Simply you sort by
ORDER BY
cast (substring(name,(PATINDEX('%[0-9]%',name)),len(name))as int)
##
I've just read a article somewhere about such a topic. The key point is: you only need the integer value to sort data, while the 'rec' string belongs to the UI. You could split the information in two fields, say alpha and num, sort by alpha and num (separately) and then showing a string composed by alpha + num. You could use a computed column to compose the string, or a view.
Hope it helps
You can use the following code to resolve the problem:
Select *,
substring(Cote,1,len(Cote) - Len(RIGHT(Cote, LEN(Cote) - PATINDEX('%[0-9]%', Cote)+1)))alpha,
CAST(RIGHT(Cote, LEN(Cote) - PATINDEX('%[0-9]%', Cote)+1) AS INT)intv
FROM Documents
left outer join Sites ON Sites.IDSite = Documents.IDSite
Order BY alpha, intv
regards,
rabihkahaleh#hotmail.com
I'm fashionably late to the party as usual. Nevertheless, here is my attempt at an answer that seems to work well (I would say that). It assumes text with digits at the end, like in the original example data.
First a function that won't end up winning a "pretty SQL" competition anytime soon.
CREATE FUNCTION udfAlphaNumericSortHelper (
#string varchar(max)
)
RETURNS #results TABLE (
txt varchar(max),
num float
)
AS
BEGIN
DECLARE #txt varchar(max) = #string
DECLARE #numStr varchar(max) = ''
DECLARE #num float = 0
DECLARE #lastChar varchar(1) = ''
set #lastChar = RIGHT(#txt, 1)
WHILE #lastChar <> '' and #lastChar is not null
BEGIN
IF ISNUMERIC(#lastChar) = 1
BEGIN
set #numStr = #lastChar + #numStr
set #txt = Substring(#txt, 0, len(#txt))
set #lastChar = RIGHT(#txt, 1)
END
ELSE
BEGIN
set #lastChar = null
END
END
SET #num = CAST(#numStr as float)
INSERT INTO #results select #txt, #num
RETURN;
END
Then call it like below:
declare #str nvarchar(250) = 'sox,fox,jen1,Jen0,jen15,jen02,jen0004,fox00,rec1,rec10,jen3,rec14,rec2,rec20,rec3,rec4,zip1,zip1.32,zip1.33,zip1.3,TT0001,TT01,TT002'
SELECT tbl.value --, sorter.txt, sorter.num
FROM STRING_SPLIT(#str, ',') as tbl
CROSS APPLY dbo.udfAlphaNumericSortHelper(value) as sorter
ORDER BY sorter.txt, sorter.num, len(tbl.value)
With results:
fox
fox00
Jen0
jen1
jen02
jen3
jen0004
jen15
rec1
rec2
rec3
rec4
rec10
rec14
rec20
sox
TT01
TT0001
TT002
zip1
zip1.3
zip1.32
zip1.33
I still don't understand (probably because of my poor English).
You could try:
ROW_NUMBER() OVER (ORDER BY dbo.human_sort(field_name) ASC)
But it won't work for millions of records.
That why I suggested to use trigger which fills separate column with human value.
Moreover:
built-in T-SQL functions are really
slow and Microsoft suggest to use
.NET functions instead.
human value is constant so there is no point calculating it each time
when query runs.

How to convert bit column to int

I want to convert a bit column to a integer column, do I need a case and convert function at the same time for this?
False = 0
True = 1
You do not need a conversion, because bit is already an integer data type:
An integer data type that can take a value of 1, 0, or NULL.
You can use bits in integer expressions without conversion. Here is a short demo:
create table demo (b bit, v int);
insert into demo (b, v) values (1,5), (0,4), (1, -2), (0, -5);
SELECT b, v, b+v AS b_plus_v FROM demo
Running this produces the following output:
B V B_PLUS_V
- - --------
1 5 6
0 4 4
1 -2 -1
0 -5 -5
EDIT : (based on this comment: "I'm using Code first EF")
Entity Framework requires that a bit column mapped to a bool field. One way to work around this requirement is introducing a computed property to your entity class to hide the "Booleanness" of the underlying column, like this:
partial class MyEntity {
// This code assumes that a bool property MyBoolProperty exists,
// and that it is mapped to the table using EF
public int MyIntProperty {
get {
return MyBoolProperty ? 1 : 0;
}
set {
MyBoolProperty = value != 0;
}
}
}

Unit testing calling of stored procedures that return a value with Simple.Data

I have the following stored procedure:
[dbo].[MyStoredProcedure]
#field1 int, #field2 int, #field3 nvarchar(5), #returnValue int output
I call it using Simple.Data and get the return value as follows:
var result = db.MyStoredProcedure(intParam1, intParam2, stringParam3, new int());
int returnValue = result.OutputValues["returnValue"];
I am trying to test the code above to ensure it gets returnValue from the OutputValues dictionary and handles the result of that correctly. To do so, I need to fake the stored procedure MyStoredProcedure.
I can see a good example of how to test stored procedures that add rows to the database in the Simple.Data documentation here:
Simple.Data documentation, but I cannot see how to return values from this function.
How do I simulate a stored procedure that takes a number of arguments (in my case int, int, nvarchar, int) and add my fake returnValue key (with corresponding int value) to the OutputValues dictionary?
I generally try to avoid "unit testing" code that has permanent side-effects, but you can do it by setting up a transaction scope in your unit test:
TransactionScope _trans;
[TestInitialize()]
public void Init()
{
_trans = new TransactionScope(); // start a new transaction
base.InitializeTest();
}
[TestCleanup()]
public void Cleanup()
{
base.CleanupTest();
_trans.Dispose(); // rollback the transaction
}
[TestMethod()]
public void TransactedTest()
{
// set up test parameters
var result = db.MyStoredProcedure(intParam1, intParam2, stringParam3, new int());
int returnValue = result.OutputValues["returnValue"];
// validate return value
}

SQL set-based range

How can I have SQL repeat some set-based operation an arbitrary number of times without looping? How can I have SQL perform an operation against a range of numbers? I'm basically looking for a way to do a set-based for loop.
I know I can just create a small table with integers in it, say from 1 to 1000 and then use it for range operations that are within that range.
For example, if I had that table I could make a select to find the sum of numbers 100-200 like this:
select sum(n) from numbers where n between 100 and 200
Any ideas? I'm kinda looking for something that works for T-SQL but any platform would be okay.
[Edit] I have my own solution for this using SQL CLR which works great for MS SQL 2005 or 2008. See below.
I think the very short answer to your question is to use WITH clauses to generate your own.
Unfortunately, the big names in databases don't have built-in queryable number-range pseudo-tables. Or, more generally, easy pure-SQL data generation features. Personally, I think this is a huge failing, because if they did it would be possible to move a lot of code that is currently locked up in procedural scripts (T-SQL, PL/SQL, etc.) into pure-SQL, which has a number of benefits to performance and code complexity.
So anyway, it sounds like what you need in a general sense is the ability to generate data on the fly.
Oracle and T-SQL both support a WITH clause that can be used to do this. They work a little differently in the different DBMS's, and MS calls them "common table expressions", but they are very similar in form. Using these with recursion, you can generate a sequence of numbers or text values fairly easily. Here is what it might look like...
In Oracle SQL:
WITH
digits AS -- Limit recursion by just using it for digits.
(SELECT
LEVEL - 1 AS num
FROM
DUAL
WHERE
LEVEL < 10
CONNECT BY
num = (PRIOR num) + 1),
numrange AS
(SELECT
ones.num
+ (tens.num * 10)
+ (hundreds.num * 100)
AS num
FROM
digits ones
CROSS JOIN
digits tens
CROSS JOIN
digits hundreds
WHERE
hundreds.num in (1, 2)) -- Use the WHERE clause to restrict each digit as needed.
SELECT
-- Some columns and operations
FROM
numrange
-- Join to other data if needed
This is admittedly quite verbose. Oracle's recursion functionality is limited. The syntax is clunky, it's not performant, and it is limited to 500 (I think) nested levels. This is why I chose to use recursion only for the first 10 digits, and then cross (cartesian) joins to combine them into actual numbers.
I haven't used SQL Server's Common Table Expressions myself, but since they allow self-reference, recursion is MUCH simpler than it is in Oracle. Whether performance is comparable, and what the nesting limits are, I don't know.
At any rate, recursion and the WITH clause are very useful tools in creating queries that require on-the-fly generated data sets. Then by querying this data set, doing operations on the values, you can get all sorts of different types of generated data. Aggregations, duplications, combinations, permutations, and so on. You can even use such generated data to aid in rolling up or drilling down into other data.
UPDATE: I just want to add that, once you start working with data in this way, it opens your mind to new ways of thinking about SQL. It's not just a scripting language. It's a fairly robust data-driven declarative language. Sometimes it's a pain to use because for years it has suffered a dearth of enhancements to aid in reducing the redundancy needed for complex operations. But nonetheless it is very powerful, and a fairly intuitive way to work with data sets as both the target and the driver of your algorithms.
I created a SQL CLR table valued function that works great for this purpose.
SELECT n FROM dbo.Range(1, 11, 2) -- returns odd integers 1 to 11
SELECT n FROM dbo.RangeF(3.1, 3.5, 0.1) -- returns 3.1, 3.2, 3.3 and 3.4, but not 3.5 because of float inprecision. !fault(this)
Here's the code:
using System;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
using System.Collections;
[assembly: CLSCompliant(true)]
namespace Range {
public static partial class UserDefinedFunctions {
[Microsoft.SqlServer.Server.SqlFunction(DataAccess = DataAccessKind.None, IsDeterministic = true, SystemDataAccess = SystemDataAccessKind.None, IsPrecise = true, FillRowMethodName = "FillRow", TableDefinition = "n bigint")]
public static IEnumerable Range(SqlInt64 start, SqlInt64 end, SqlInt64 incr) {
return new Ranger(start.Value, end.Value, incr.Value);
}
[Microsoft.SqlServer.Server.SqlFunction(DataAccess = DataAccessKind.None, IsDeterministic = true, SystemDataAccess = SystemDataAccessKind.None, IsPrecise = true, FillRowMethodName = "FillRowF", TableDefinition = "n float")]
public static IEnumerable RangeF(SqlDouble start, SqlDouble end, SqlDouble incr) {
return new RangerF(start.Value, end.Value, incr.Value);
}
public static void FillRow(object row, out SqlInt64 n) {
n = new SqlInt64((long)row);
}
public static void FillRowF(object row, out SqlDouble n) {
n = new SqlDouble((double)row);
}
}
internal class Ranger : IEnumerable {
Int64 _start, _end, _incr;
public Ranger(Int64 start, Int64 end, Int64 incr) {
_start = start; _end = end; _incr = incr;
}
public IEnumerator GetEnumerator() {
return new RangerEnum(_start, _end, _incr);
}
}
internal class RangerF : IEnumerable {
double _start, _end, _incr;
public RangerF(double start, double end, double incr) {
_start = start; _end = end; _incr = incr;
}
public IEnumerator GetEnumerator() {
return new RangerFEnum(_start, _end, _incr);
}
}
internal class RangerEnum : IEnumerator {
Int64 _cur, _start, _end, _incr;
bool hasFetched = false;
public RangerEnum(Int64 start, Int64 end, Int64 incr) {
_start = _cur = start; _end = end; _incr = incr;
if ((_start < _end ^ _incr > 0) || _incr == 0)
throw new ArgumentException("Will never reach end!");
}
public long Current {
get { hasFetched = true; return _cur; }
}
object IEnumerator.Current {
get { hasFetched = true; return _cur; }
}
public bool MoveNext() {
if (hasFetched) _cur += _incr;
return (_cur > _end ^ _incr > 0);
}
public void Reset() {
_cur = _start; hasFetched = false;
}
}
internal class RangerFEnum : IEnumerator {
double _cur, _start, _end, _incr;
bool hasFetched = false;
public RangerFEnum(double start, double end, double incr) {
_start = _cur = start; _end = end; _incr = incr;
if ((_start < _end ^ _incr > 0) || _incr == 0)
throw new ArgumentException("Will never reach end!");
}
public double Current {
get { hasFetched = true; return _cur; }
}
object IEnumerator.Current {
get { hasFetched = true; return _cur; }
}
public bool MoveNext() {
if (hasFetched) _cur += _incr;
return (_cur > _end ^ _incr > 0);
}
public void Reset() {
_cur = _start; hasFetched = false;
}
}
}
and I deployed it like this:
create assembly Range from 'Range.dll' with permission_set=safe -- mod path to point to actual dll location on disk.
go
create function dbo.Range(#start bigint, #end bigint, #incr bigint)
returns table(n bigint)
as external name [Range].[Range.UserDefinedFunctions].[Range]
go
create function dbo.RangeF(#start float, #end float, #incr float)
returns table(n float)
as external name [Range].[Range.UserDefinedFunctions].[RangeF]
go
This is basically one of those things that reveal SQL to be less than ideal. I'm thinking maybe the right way to do this is to build a function that creates the range. (Or a generator.)
I believe the correct answer to your question is basically, "you can't".
(Sorry.)
You can use a common table expression to do this in SQL2005+.
WITH CTE AS
(
SELECT 100 AS n
UNION ALL
SELECT n + 1 AS n FROM CTE WHERE n + 1 <= 200
)
SELECT n FROM CTE
If using SQL Server 2000 or greater, you could use the table datatype to avoid creating a normal or temporary table. Then use the normal table operations on it.
With this solution you have essentially a table structure in memory that you can use almost like a real table, but much more performant.
I found a good discussion here: Temporary tables vs the table data type
Here's a hack you should never use:
select sum(numberGenerator.rank)
from
(
select
rank = ( select count(*)
from reallyLargeTable t1
where t1.uniqueValue > t2.uniqueValue ),
t2.uniqueValue id1,
t2.uniqueValue id2
from reallyLargeTable t2
) numberGenerator
where rank between 1 and 10
You can simplify this using the Rank() or Row_Number functions in SQL 2005