Diff Tool That Ignores Newlines [closed] - sql

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
I frequently need to compare SQL procedures to determine what has changed in the newest version. The problem is, everyone has their own style of formatting, and SQL doesn't (usually) care about where one puts their newlines (e.g. where clauses all on one line vs. newline before each AND).
This makes it very difficult (especially for long procedures) to see the actual differences. I cannot seem to find a free diff/merge utility that will allow me to ignore newlines (i.e. treat as whitespace). So far I've tried WinMerge and Beyond Compare without any luck. Does anyone know of a diff tool (ideally free) that would see these two examples as identical?
Ex. 1:
the quick
brown
Ex. 2:
the
quick
brown
Thanks in advance.

I really like SourceGear's DiffMerge!
It works on all platforms and has built in rulesets, but allows you to create and add your own. Which means that you can ignore what you want, when you want it.
Bonus, it is free!

What i've done in my own similar case is to use a sql prettifier which will organize two sets of semi-disparate SQL in very similar fashion automatically. i then paste and compare the results with WinMerge.
It's a two-step process but it's much more palatable than many other options, especially when many lines of code are involved.
Link to web-based Sql Pretty printer that's decent.

I love Araxis merge. Not free but well worth it. it can, among other things, ignore any kind of whitespace if you want.

You can use The DTP (Data Tool Project) of the Eclipse IDE.
To show it I created two almost identical SQL files and let eclipse show me the differences. After clicking "show next" I took a screenshot.
As you can see it still highlights the newlines, but by the way it does you can immediately see that they contain no substantial change to the SQL. It's easy to spot where I changed the ID from 1 to 2.
Here's the result.

Compare++ is an option, you can try "Ignore code style changes" in the 'smart' menu. It support structured comparison for many langugages such as C/C++, JavaScript, C#, Java, ...

Regardless on your definition of "Free" (beer vs speech/libre), Poor Man's T-SQL Formatter is also available to do this, either with WinMerge (using the winmerge plugin) or Beyond Compare and other comparison tools that allow for command-line pre-formatting, using the command-line bulk formatter.
If you prefer to take it for a whirl without downloading anything, it's available for immediate use online (like its non-libre counterparts T-SQL Tidy, Instant SQL Formatter, etc):
http://poorsql.com

Our SD Smart Differencer compares two source programs according to their precise grammatical syntax and structure, rather than according to raw text. It does so by parsing (SQL) source
the way a compiler would, and comparing the corresponding compiler data structures (e.g., abstract syntax trees). The SmartDifference consequently does not care about newlines, whitespace or intervening comments.
It reports differences, not in terms of line breaks, but rather in terms of programming language structures (variables, expressions, statements, blocks, functions, ...) and in terms close to programmer intentions (delete, insert, move, copy, rename) rather than line-insert or line delete.
SQL (like many other computer language names) is the name of a family of computer languages that are similar in syntax but differ in detail. So for the Smart Differencer, which dialect of SQL you are using matters. We have SQL front ends (therefore SmartDifferncers) for PLSQL and SQL2011. To the extent you SQL stays within the bounds of either of these, the Smart Differencer can work for you; to the extent you use extra goodies of SQL Server or Postgres, the SmartDifferencer presently can't help you. [We develop language parsers as part of our business, so I expect this is a matter of delay rather than never].
While the OP asked about SQL in the details, his headline question is language agnostic.
There are SmartDifferencers already for many other widely used languages other than SQL too: C, C++, C#, Java, ...

Another alternative is Emacs' Ediff. Works great if you are not afraid of Emacs.

You can use the command-line tool wdiff to ignore newlines. wdiff is a GNU tool for comparing files on a word-by-word basis. It can ignore newlines with the -n option.
Suppose I put your 2 example files into ex1.txt and ex2.txt. Then we can run:
$> wdiff -n ex1.txt ex2.txt
the
quick
brown
The output is actually the contents of the first file. Note that there are no + or - signs, which means the files have the same strings.
If I had added "fox" to the end of ex1.txt, then the output would look like this:
the
quick
brown [-fox-]
If seeing the common words still bothers you, you can add -3 or --no-common. Here's the example again where I added "fox" to the first file:
$> wdiff -n -3 /tmp/ex1.txt /tmp/ex2.txt
======================================================================
[-fox-]
======================================================================

PHPStorm's diff tool's "ignore white space: all" command does it perfectly as you want. And it has integrated support for many VCS like SVN, git, etc. As well as integrated SQL support!
Not free but time isn't free either. Want to waste time doing it the hard way? Go ahead.
I still can't believe it's 2014 and this wasn't a standard feature of all diff tools!!
BTW I believe WebStorm's diff tool would also work.

Have you tried KDiff? I'm certain you can ignore whitespace with it, and if it's not powerful enough for you it allows you to run a preprocessor over the file. Best of all it's free and open source.

If you're on Windows, WinMerge is pretty slick. Under Linux (and maybe OS X), there's Meld.
Both are free as in beer and work pretty well. Not quite as cool as Araxis, but then we don't want you drooling on your desk.
Both are all-purpose diff tools with such features as white space ignoring. I'm not absolutely certain they ignore blank lines, but chances are good they can.

Related

Processing text of SQL script

I want to develop tool which will prettify SQL scripts - make all special words and commands (SELECT, JOIN, FROM, etc.) upper/lower case; add square brackets; and couple other things (yes, ). I'm going to implement it as extension for my IDE or as external tool - I'm not decided it yet.
I was going to split a script by spaces, brackets, commas and periods - get separate words - and check each word to match to one of the keywords. If it matches - then capitalize/lowercase word depending on settings. If not - leave it as it was.
But then I thought that it may be other solutions.
I thought about using RegEx (unfortunately I don't know much about it). I suppose that it will work more efficient. And therefore using it will be more preferred.
Is RegEx the best way to achieve my goal? Or my initial approach is also appropriate?
Is there other ways?
P.S. I know that similar tools can already exist out there. And I will appreciate if you share them. But I want to implement my own tool for self-education reasons.

Are there tools that can execute a SQL script one statement at a time?

When I have to run database migration scripts, I tend to exercise a large degree of paranoia and not run the script all at once like dbcli < migration.sql. Instead, I prefer to run the commands one-at-a-time. So far, I've just been using copy/paste which is miserable.
There has to be a tool that can do this kind of thing, but I'm having a hard time finding one using Google, Wikpedia, or StackOverflow (close but no cigar).
This is definitely something I could write myself, but it just has to exist already, doesn't it?
This really needs to be something that can be run from a command-line with a tiny bit of interactivity (like display the statement that will be executed, let you press e.g. ENTER to execute it, then show you the output if there is any) since servers usually don't have any GUI available.
My specific db target is MySQL but there's no need for such a tool to be db-specific.
Update
Meanwhile, I'm writing a utility in Java that will do what I want.
Oracle SQL Developer will run one line at a time as long as it's ended with a semi-colon.
You can connect to certain 3rd party databases with additional drivers (http://www.oracle.com/technetwork/products/migration/omwb-getstarted-093461.html)
You could try dbForge. There is a free Express Edition, but I can't quite tell from the feature comparison list if it allows you to step through arbitrary SQL scripts.
Emacs' SQL mode supports sending line by line, region by region and file by file.
Of course, you have to learn emacs, but it does do what you want.
Long ago, I wrote my own tool for this purpose, and have been using and perfecting it over time. Feel free to use it and offer suggestions for features, etc.
Flyway Teams Edition (commercial license) also supports executing statements one by one under Stream parameter. This is not an exact answer to your question, but it can at least give you information about existing tools.

Minor mode to make the SQLi buffer more readable

When using Emacs to create a SQL query in SQL-mode, the SQLi buffer is the typical, ugly console window of the command-line tool for the target database. Most of the output is the same with ASCII characters trying to build a visual representation of a table.
Has anyone created a minor mode to make the output more readable? Here are some features that could be useful:
Create a Header. I'm not sure how this could be done, but it would be very cool if there was a way to visually show the user which columns go with the data. If I'm looking at line 300 of an output, it's a bit challenging to count the columns and read the SQL that was used. Maybe it would be something like a hacked up mode-line that doesn't change when scrolling vertically, but does when scrolling horizontally.
Only Show the Last Result. When I run a new query, that's all I want to see in the SQLi window. It'd be nice to have a feature to jump through the history of results from previous SQL queries.
Faces. Create different colors for grid lines and data. Maybe even different colors for different data types (maybe this is too hard)
I think these features would make Emacs more palatable for database developers. There are a ton of packages that do these these same tasks and others well, but they aren't nearly as powerful as Emacs. Also, it's quite annoying to switch editors just to write my SQL queries.
EDIT: Something like hexl-mode would be very cool.
All the SQL mode related wisdom of the Emacs community is probably gathered here. Sadly there is nothing like the mode you wish for - I've been wishing for something like that for quite some time and that's the reason I use sql-mode only for simple queries - the results for anything big is totally unreadable. Hopefully the situation will change in the future...
This issue convinced me to switch to jEdit, at least for database projects. It's a bit difficult to configure, but the SQL integration does exactly what I needed. Also, instead of LISP it uses Java. I highly recommend it.

Tools for Generating Mock Data? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
I'm looking for recommendations of a good, free tool for generating sample data for the purpose of loading into test databases. By analogy, something that produces "lorem ipsum" text for any RDBMS. Features I'm looking for include:
Flexibility to generate data for an existing table definition.
Ability to generate small and large data sets (> 1 million rows or more).
Generate in SQL script format (INSERT statements) or else in a flat file format suitable for bulk import (which is usually faster).
A command-line interface for easy scripting.
Extensible, open source, written in a dynamic language (these are nice-to-haves, not strong requirements).
PS: I did search for a duplicate question on StackOverflow, but I didn't find one. If there is one, I'll be grateful to get a pointer to it.
Thanks for the great responses everyone! I should amend my requirements that I use Mac OS X as my primary development environment, not Windows (though I did say command-line interface is desirable, and that practically rules out Windows). The Windows-specific suggestions will no doubt be useful to other readers of this question, though, so thanks.
Here is my conclusion:
GenerateData:
PHP web app interface, not command line
limited to generating 200 records (or pay $20 for license to generating 5,000 records)
RedGate SQL Data Generator
not free, price $295
requires Windows, .NET, SQL Server
Visual Studio 2008 Database Edition
requires Windows
requires costly MSDN or ISV subscription
Banner Datadect
not free, price $595
requires Windows (?)
no support for MySQL (?)
GUI, not command line or scriptable
Ruby Faker gem
way too slow to use ActiveRecord for bulk data load
Super Smack
chiefly a load-testing tool, with a random data generator built in
pretty simple to use nevertheless
overall a good runner-up tool
Databene Benerator
best solution for my needs
XML scripts, compatible with DbUnit
open source (GPL) Java code
command-line usage
access many databases directly via JDBC
Take a look at databene benerator, a test data generator that looks close to your requirements.
it can generate data for an existing table definition (or even anonymize production data)
it can generate larges data set (unlimited size)
it supports various input (CSV, Flat Files, DBUnit) and output format (CSV, Flat Files, DBUnit, XML, Excel, Scripts)
it can be used on the command line or through a maven plugin
it's open source and customizable
I would give it a try.
BTW, a list of similar products is available on databene benerator's web site.
This looks quite promising: generatedata.com. Open-source, has lots of built-in data types.
There are several others listed here: Test (Sample) Data Generators. I don't have experience with any of them, but a few on that list look like they could be pretty decent.
Try http://www.mockaroo.com
This is a tool my company made to help test our own applications. We've made it free for anyone to use. It's basically the Forgery ruby gem with a web app wrapped around it. You can generate data in CSV, txt, or SQL formats. Hope this helps.
I know you said you were looking for a free tool, but this is one case where I would suggest that spending $295 will pay you back quickly in time saved. I've been using the RedGate tool SQL Data Generator for the last year and it is, to be short, an awesome tool. It allows for setting dependencies between columns, generates realistic data for business objects such as phone numbers, urls, names, etc. I can honestly state that this tool has paid for itself time and time again.
If you are looking or willing to use something MySQL-specific, you could take a look at Super Smack. It is currently maintained by Tony Bourke.
Super Smack allows you to generate random data to insert into your database tables. It is customizable, allowing you to use the packaged words.dat file, or any test data of your choice.
One of the nice things about it is that it is command-line is highly customizable. There is some fairly decent examples of usage in the book High Performance MySQL which is also excerpted here.
Not sure if that is along the lines of what you are looking for, but just a thought.
A Ruby script with one of the available fake data generators should do you just fine.
http://faker.rubyforge.org/ is one such gem. Unfortunately, this doesn't fulfill all your requirements.
Here is another: http://random-data.rubyforge.org/
And a tutorial for using Faker: http://www.rubyandhow.com/how-to-generate-fake-names-addresses-in-ruby/
RE: Flexibility to generate data for an existing table definition. Combine the Faker gem with one of the available ORMs. ActiveRecord would probably be easiest.
Normally very costly, but if you are a small ISV you can get Visual Studio 2008 Database Edition very cheaply, see the empower and bizspark promotions. It provides a lot more functionality then just generating test data (Integration with SCC, Unit Testing, DB Refactoring, etc.)
As I like the fact that Red-Grate tools are so easy to learn, I would still look at SQL Data Generator
a tool that really should not be missing from the list is the Data Generator from Datanamic that populates databases directly or generates insert scripts, has a large collection of pre-installed generators ( and supports multiple databases...
http://www.datanamic.com/datagenerator/index.html
I know you're not looking for actual lorem ipsum text; but in case anyone else searches for an actual lorem ipsum generator and finds this thread: lipsum.com does a great job of it.
Not free, but Visual Studio 2008 Database Edition is a good alternative and it provides a lot more functionality (Integration with SCC, Unit Testing, DB Refactoring, etc...)
I use a tool called Datatect:
Generates data to flat files or any ODBC compliant database.
Extensible via VBScript.
Referentially aware; will populate foreign keys with values from parent table.
Data is context aware; city, state and phone numbers for given zip codes, first names and titles with gender.
Can create custom, complex data types.
Generate over 2 billion proper names, business names, street addresses, cities, states, and zip codes.
I've used this tool to generate as many as 40,000,000 rows of data to a SQLServer database, and 8,000,000 rows of data to an Oracle database.
I am in no way affiliated with Banner Systems, just a satisfied customer.
Here is the list of such tools (both free and commercial):
http://c2.com/cgi/wiki?TestDataGenerator
For OS X there is Data Creator (US $ 7). Download is free for test purpose. You can use it to evaluate the software and its features.
It requires OS X Lion or successive. It can generate a lot of different field type and has a custom export mode plus some pre-set (TSV, CSV, Html table, web page with table inside).
http://www.tensionsoftware.com/osx/datacreator/
here at the App Store:
https://itunes.apple.com/us/app/data-creator/id491686136?mt=12
You can use DbSchema, www.dbschema.com it's a database management tool and it has a Random Data Generator to populate your database.
Not direct answer to your question but this can be helpful for certain kind of data :
Fake Name Generator can be useful - http://www.fakenamegenerator.com/ , not for everything but user accounts or stuff like that. AFAIK They provide support for bulk order.
+1 for Benerator: I tried 3 or 4 of the other tools on offer (including dbmonster) but found Benerator to be very quick, to deliver realistic data and to be flexible. I also got very quick & helpful feedback from the tool's creator when I posted on the forum.

Code formatting: is lining up similar lines ok? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
I recently discovered that our company has a set of coding guidelines (hidden away in a document management system where no one can find it). It generally seems pretty sensible, and keeps away from the usual religious wars about where to put '{'s and whether to use hard tabs. However, it does suggest that "lines SHOULD NOT contain embedded multiple spaces". By which it means don't do this sort of thing:
foo = 1;
foobar = 2;
bar = 3;
Or this:
if ( test_one ) return 1;
else if ( longer_test ) return 2;
else if ( shorter ) return 3;
else return 4;
Or this:
thing foo_table[] =
{
{ "aaaaa", 0 },
{ "aa", 1 },
// ...
}
The justification for this is that changes to one line often require every line to be edited. That makes it more effort to change, and harder to understand diffs.
I'm torn. On the one hand, lining up like this can make repetitive code much easier to read. On the other hand, it does make diffs harder to read.
What's your view on this?
2008: Since I supervise daily merges of source code,... I can only recommend against it.
It is pretty, but if you do merges on a regular basis, the benefit of 'easier to read' is quickly far less than the effort involved in merging that code.
Since that format can not be automated in a easy way, the first developer who does not follow it will trigger non-trivial merges.
Do not forget that in source code merge, one can not ask the diff tool to ignore spaces :
Otherwise, "" and " " will look the same during the diff, meaning no merge necessary... the compiler (and the coder who added the space between the String double quotes) would not agree with that!
2020: as noted in the comments by Marco
most code mergers should be able to handle ignoring whitespace and aligning equals is now an auto format option in most IDE.
I still prefer languages which come with their own formatting options, like Go and its gofmt command.
Even Rust has its rustfmt now.
I'm torn. On the one hand, lining up
like this can make repetitive code
much easier to read. On the other
hand, it does make diffs harder to
read.
Well, since making code understandable is more important than making diffs understandable, you should not be torn.
IMHO lining up similar lines does greatly improve readability. Moreover, it allows easier cut-n-pasting with editors that permit vertical selection.
I never do this, and I always recommend against it. I don't care about diffs being harder to read. I do care that it takes time to do this in the first place, and it takes additional time whenever the lines have to be realigned. Editing code that has this format style is infuriating, because it often turns into a huge time sink, and I end up spending more time formatting than making real changes.
I also dispute the readability benefit. This formatting style creates columns in the file. However, we do not read in column style, top to bottom. We read left to right. The columns distract from the standard reading style, and pull the eyes downward. The columns also become extremely ugly if they aren't all perfectly aligned. This applies to extraneous whitespace, but also to multiple (possibly unrelated) column groups which have different spacing, but fall one after the other in the file.
By the way, I find it really bizarre that your coding standard doesn't specify tabbing or brace placement. Mixing different tabbing styles and brace placements will damage readability far more than using (or not using) column-style formatting.
I never do this. As you said, it sometimes requires modifying every line to adjust spacing. In some cases (like your conditionals above) it would be perfectly readable and much easier to maintain if you did away with the spacing and put the blocks on separate lines from the conditionals.
Also, if you have decent syntax highlighting in your editor, this kind of spacing shouldn't really be necessary.
There is some discussion of this in the ever-useful Code Complete by Steve McConnell. If you don't own a copy of this seminal book, do yourself a favor and buy one. Anyway, the discussion is on pages 426 and 427 in the first edition which is the edition I've got an hand.
Edit:
McConnell suggests aligning the equal signs in a group of assignment statements to indicate that they're related. He also cautions against aligning all equal signs in a group of assignments because it can visually imply relationship where there is none. For example, this would be appropriate:
Employee.Name = "Andrew Nelson"
Employee.Bdate = "1/1/56"
Employee.Rank = "Senator"
CurrentEmployeeRecord = 0
For CurrentEmployeeRecord From LBound(EmployeeArray) To UBound(EmployeeArray)
. . .
While this would not
Employee.Name = "Andrew Nelson"
Employee.Bdate = "1/1/56"
Employee.Rank = "Senator"
CurrentEmployeeRecord = 0
For CurrentEmployeeRecord From LBound(EmployeeArray) To UBound(EmployeeArray)
. . .
I trust that the difference is apparent. There is also some discussion of aligning continuation lines.
Personally I prefer the greater code readability at the expense of slightly harder-to-read diffs. It seems to me that in the long run an improvement to code maintainability -- especially as developers come and go -- is worth the tradeoff.
With a good editor their point is just not true. :)
(See "visual block" mode for vim.)
P.S.: Ok, you still have to change every line but it's fast and simple.
I try to follow two guidelines:
Use tabs instead of spaces whenever possible to minimize the need to reformat.
If you're concerned about the effect on revision control, make your functional changes first, check them in, then make only cosmetic changes.
Public flogging is permissible if bugs are introduced in the "cosmetic" change. :-)
2020-04-19 Update: My, how things change in a dozen years! If I were to answer this question today, it would probably be something like, "Ask your editor to format your code for you and/or tell your diff tool to ignore whitespace when you're making cosmetic changes.
Today, when I review code for readability and think the clarity would be improved by formatting it differently, I always end the suggestion with, "...unless the editor does it this way automatically. Don't fight your tools. They always win."
My stance is that this is an editor problem: While we use fancy tools to look at web pages and when writing texts in a word processor, our code editors are still stuck in the ASCII ages. They are as dumb as we can make them and then, we try to overcome the limitations of the editor by writing fancy code formatters.
The root cause is that your compiler can't ignore formatting statements in the code which say "hey, this is a table" and that IDEs can't create a visually pleasing representation of the source code on the fly (i.e. without actually changing one byte of the code).
One solution would be to use tabs but our editors can't automatically align tabs in consecutive rows (which would make so many thing so much more easy). And to add injury to insult, if you mess with the tab width (basically anything != 8), you can read your source code but no code from anyone else, say, the example code which comes with the libraries you use. And lastly, our diff tools have no option "ignore whitespace except when it counts" and the compilers can't produce diffs, either.
Eclipse can at least format assignments in a tabular manner which will make big sets of global constants much more readable. But that's just a drop of water in the desert.
If you're planning to use an automated code standard validation (i.e. CheckStyle, ReShaper or anything like that) those extra spaces will make it quite difficult to write and enforce the rules
You can set your diff tool to ignore whitespace (GNU diff: -w).
This way, your diffs will skip those lines and only show the real changes. Very handy!
We had a similar issue with diffs at multiple contracts... We found that tabs worked best for everyone. Set your editor to maintain tabs and every developer can choose his own tab length as well.
Example: I like 2 space tabs to code is very compact on the left, but the default is 4, so although it looks very different as far as indents, etc. go on our various screens, the diffs are identical and doesn't cause issues with source control.
I like the first and last, but not the middle so much.
This is PRECISELY the reason the good Lord gave as Tabs -- adding a character in the middle of the line doesn't screw up alignment.