Why does git interpret sql files as binaries during a merge conflict? - sql

I got the problem with resolving merge conflicts within sql files.
MenkeTTA#909086 MINGW64 //FILE0019 (master)
$ git pull
remote: Microsoft (R) Visual Studio (R) Team Services
remote: Found 5 objects to send. (5 ms)
Unpacking objects: 100% (5/5), done.
From https://***
d58a69b..4830c58 master -> origin/master
warning: Cannot merge binary files: example_StoredProcedure.sql (HEAD vs. 4830c5886d3e1eac5ac76d1d49496afb43f444c3)
Auto-merging WRR - example_StoredProcedure.sql
CONFLICT (content): Merge conflict in example_StoredProcedure.sql
Automatic merge failed; fix conflicts and then commit the result.
When the merge conflict is created git isn't creating a pre-merged file with the competing changes as in the usual structure:
/SQL-File/
<<<<<<< HEAD
competing change A
=======
competing change B
>>>>>>> branch-a
Git is treating both files as binaries – but only for the merge-conflict operations (normal merge without conflict works properly). I can choose my own version of the file or the pulled competing file from the remote as the new head for the next push.
I reproduced this conflict with a normal .txt file. Git is treating the merge conflict then as expected with creating one pre-merged file with both competing changes/commits where I can manually fix the code how I want to.
To make git recognize the sql files as text I added
.sql diff
to the .gitattributes file like it's described here. Does anyone know how I can make git to create a ordinary pre-merged file with both versions of the competing commits when working with sql files?

First, a quick note:
To make git recognize the sql files as text I added
.sql diff
to the .gitattributes file ...
The .gitattributes line should read *.sql diff (I've fixed the linked answer, which is on a question about getting git diff to treat the file as text). However, if the file really is text, you may want, or even need, *.sql text. Note: this will not help at all if the file is not text. If the file's content is UTF-16, it is not text to Git, at least.
Consider marking the file as example_StoredProcedure.sql text, i.e., not all .sql files, just this one particular file. I'm also curious to see whether just marking it diff suffices! Update, Nov 2019: apparently marking the file as diff is not sufficient, though I have not verified this myself.
(The difference is that the diff attribute tells Git how to show the file in git diff output,1 while the text tells Git that instead of using its built in guessing algorithm, it should, for all purposes, use the setting to decide whether the file is text. The guessing algorithm consists of scanning an initial chunk of the file's contents to see how many "text-like" characters there are vs "non-text-like" characters. Probably there should be a special allowance for UTF-8 Byte Order Markers at the top, but there isn't. Curiously, during filtering, there are explicit checks.)
1Well, it's actually more involved than just showing, but I think this is a good way to start thinking about the issues. Note that you can augment the diff setting with a driver. It's not clear to me how the low level file merge interacts with a diff driver and I do not have time to experiment with it right now.
Longer explanation
warning: Cannot merge binary files: example_StoredProcedure.sql (... vs ...)
tells us that you are correct, that Git is treating the three versions of example_StoredProcedure.sql as binary. (I see you added this output after the initial question; good thing, since it's the key!)
But why did I say three versions, when the line goes on to say:
HEAD vs. 4830c5886d3e1eac5ac76d1d49496afb43f444c3
Git is being a little lazy here: all merges involve three inputs, not just two. One of these is the one you supply explicitly—or, as in this case, git pull ran git merge and git pull itself supplied the big ugly hash ID 4830c5886d3e1eac5ac76d1d49496afb43f444c3.
The second input to a merge is always the current commit, aka HEAD. You normally get this by being on the branch in the first place: HEAD names the branch-name, the branch-name identifies the commit, and this is where you want the final merge commit to go, so it all fits together.
The third input—or internally, first; internally the "theirs" version is the third input—is one that Git computes for you, based on the HEAD and other or --theirs commits: Git walks through enough of the commit graph to find the best common ancestor commit.1 It's this common ancestor commit that determines which files need merging, and if a file does need merging, the built in merge driver needs to use diffs to get textual changes to merge. For both this and for git diff, Git has a differencing engine built in to it (modified from LibXDiff).
Hence Git can, in effect, run:
git diff --find-renames <merge base commit> HEAD
to see what we did to each of our files, and:
git diff --find-renames <merge base commit> <other commit>
to see what they did to each of our files. Then:
If we changed a file and they did not touch it at all, the merge is easy: take ours.
If they changed a file and we did not touch it at all, the merge is easy: take theirs.
If we both changed a file but made the new file exactly the same, the merge is easy: take either one (ours, really, since it's in place).
Otherwise, attempt to combine the changes.
For speed reasons, Git uses the hash IDs ("blob" hashes, for the file's content) to accomplish the first three bullet points without ever having to fire up the file-level diff. This can, and does, merge unconflicted binary-file changes. It's only the final stage, where all three blob hashes differ, that requires a textual diff so as to combine changes.
Obviously, if Git can't diff the file, it cannot merge the two diff outputs. But does just marking the file as text-diff-able (pattern diff in .gitattributes) make the merge proceed? What happens if you set a diff driver, does the low-level file merge code use that driver? It "wants" to use the xdiff internal interface to find hunks; that's a lot easier than interpreting text output from a driver; you probably have to define a merge driver to get a detected-as-binary file to be merged, even if you have marked it as diff.
Additional note, Nov 2019: Since Git 2.18, Git has the ability to convert between committed UTF-8 data and in-work-tree other-format data. To use this, set the working-tree-encoding attribute. For instance, [the gitattributes documentation] shows an example line:
*.ps1 text working-tree-encoding=UTF-16LE eol=CRLF
that would keep all *.ps1 files in UTF-8 internally (in the frozen, committed files inside each commit) but keep the useful-format versions of those files in your work-tree in UTF-16-LE. I have no data as to whether this would work with these SQL files.
1In all cases, but especially in problem cases where there's more than one best common ancestor, git merge's behavior actually depends on the strategy you chose. The usual recursive strategy will merge the merge bases, commit the result, and then use that commit as the merge base! Other merge strategies work differently.

Related

How to watch Changes to SQlite Database and Trigger Shell Script

Note: I believe I may be missing a simple solution to this problem. I'm relatively new to programming. Any advice is appreciated.
The problem: A small team of people (~3-5) want to be able to automate, as far as possible, the filing of downloaded files in appropriate folders. Files will be downloaded into a shared downloads folder. The files in this downloads folder will be sorted into a large shared folder structure according to their file-type, URL the file was downloaded from, and so on and so forth. These files are stored on a shared server, and the actual sorting will be done by some kind of shell script running on the server itself.
Whilst there are some utilities which do this (such as Maid), they don't do everything I want them to do. Maid for example doesn't have a way to get the the download url of a file in Linux. Additionally, it is written in Ruby, which I'd like to avoid.
The biggest stumbling block then is finding a find a way to get the url of the downloaded file that can be passed into the shell script. Originally I thought this could be done via getfattr, which would get a file's extended attributes. Frustratingly however, whilst chromium saves a file's download url as an extended attribute, Firefox doesn't seem to do the same thing. So relying on extended attributes seems to be out of the question.
What Firefox does do however is store download 'metadata' in the places.sqlite file, in two separate tables - moz_annos and moz_places. Inspired by this, I dediced to build a Firefox extension that writes all information about the downloaded file to a SQLite database downloads.sqlite on our server upon the completion of said download. This includes the url, MIME type, etc. of the downloaded file.
The idea is that with this data, the server could run a shell script that does some fine-grained sorting of the downloaded file into our shared file system.
However, I am struggling to find out a stable, reliable, and portable way of 'triggering' the script that will actually move the files, as well as passing information about these files to the script so that it can sort them accordingly.
There are a few ways I thought I could go about this. I'm not sure which method is the most appropriate:
1) Watch Downloads Folder
This method would watch for changes to the shared downloads directory, then use the file name of the downloaded file to query downloads.sqlite, getting the matching row, then finally passing the file's attributes into a bash script that sorts said file.
Difficulties: Finding a way to reliably match the downloaded file with the appropriate record in the database. Files may have the same download name but need to be sorted differently, perhaps, for example, if they were downloaded from a different URL. Additionally, I'd like to get additional attributes like whether the file was downloaded in incognito mode.
2) Create Auxillary 'Helper' File
Upon a file download event, the extension creates a 'helper' text file, which is the name of the file + some marker that contains the additional file attribute:
/Downloads/
mydownload.pdf
mydownload-downloadhelper.txt
The server can then watch for the creation of a .txt file in the downloads directory run the necessary shell script from this.
Difficulties: Whilst this avoids using a SQlite databse, it seems rather ungraceful and hacky, and I can see a multitude of ways in which this method would just break or not work.
3) Watch
SQlite Database
This method writes to the shared SQlite database downloads.sqlite on server. Then, by some method, watch for a new insert of a row into this database. This could either be by watching the sqlite databse for a new INSERT on a table, or have a sqlite trigger on INSERT that runs a bash script, passing on the download information into a shell script.
Difficulties: there doesn't seem to be any easy way to watch an SQlite database for a new row insert, and a trigger within SQlite doesn't seem to be able to launch an external script/program. I've searched high and low for a method of doing either of these two processes, but I'm struggling to find any documented way to do it that I am able to understand.
What I would like is :
Some feedback on which of these methods is appropriate, or if there is a more appropriate method that I am overlooking.
An example of a system/program that does something similar to this.
Many thanks in advance.
It seems to me that you have put "the cart in front of the horse":
Use cron to periodically check for new downloads. Process them on the command line instead of trying to trigger things from inside sqlite3:
a) Here is an approach using your shared sqlite3 database "downloads.sqlite":
Upfront once:
Add a table to your database containing just an integer as record counter and a timeStamp field, e.g., "table_counter":
sqlite3 downloads.sqlite "CREATE TABLE "table_counter" ( "counter" INTEGER PRIMARY KEY NOT NULL, "timestamp" DATETIME DEFAULT (datetime('now','UTC')));" 2>/dev/null
Insert an initial record into this new table setting the "counter" to zero and recording a timeStamp:
sqlite3 downloads.sqlite "INSERT INTO "table_counter" VALUES (0, (SELECT datetime('now','UTC')));" 2>/dev/null
Every so often:
Query the table containing the downloads with a "SELECT COUNT(*)" statement:
sqlite3 downloads.sqlite "SELECT COUNT(*) from table_downloads;" 2>/dev/null
Result e.g., 20
Compare this number to the number stored in the record counter field:
sqlite3 downloads.sqlite "SELECT (counter) from table_counter;" 2>/dev/null
Result e.g., 17
If result from 3) > result from 4), then you have downloaded more files than processed.
If so, query the table containing the downloads with a "SELECT" statement for the oldest not yet processed download, using a "subselect":
sqlite3 downloads.sqlite "SELECT * from table_downloads where rowid = (SELECT (counter+1) from table_counter);" 2>/dev/null
In my example this would SELECT all values for the data record with the rowid of 17+1 = 18;
Do your magic in regards to the downloaded file stored as record #18.
Increase the record counter in the "table_counter", again using a subselect:
sqlite3 downloads.sqlite "UPDATE table_counter SET counter = (SELECT (counter) from table_counter)+1;" 2>/dev/null
Finally, update the timeStamp for the "table_counter":
Why? Shit happens on shared drives... This way you can always check how many download records have been processed and when this has happened last time.
sqlite3 downloads.sqlite "UPDATE table_counter SET timeStamp = datetime('now','UTC');" 2>/dev/null
If you want to have a log of this processing then change the SQL statements in 4) to a "SELECT COUNT(*)" and in 7) to an "INSERT counter" and its subselect to an "(SELECT (counter+1) from table_counter)" respectively ...
Please note: The redirections " 2>/dev/null" at the end of the SQL statements are just to suppress this kind of line issued by newer versions of SQLite3 before showing your query results.
-- Loading resources from /usr/home/bernie/.sqliterc
If you don't like timeStamps based on UTC then use localtime instead:
(datetime('now','localtime'))
Put steps 3) inclusive 8) in a shell-script and use a cron entry to run this query/comparism periodically...
Use the complete /path/to/sqlite3 in this shell-script (just in case running on a shared drive. Someone could be fooling around with paths and could surprise your cron ...)
b) I will give you a simpler answer using awk and some hash like md5 in a separate answer.
So it is easier for future readers and easier for you to "rate" :-)

Liquibase changelog without MD5SUM

I have a database here which is updated using Liquibase. If I understand correctly, Liquibase will apply a changeset and write a line to the DATABASECHANGELOG table with the execution date and an MD5 checksum. This way, Liquibase can find out when a changeset (unexpectedly) changes.
However, in this database, many (most) of the MD5SUM entries are NULL. I have no idea why that should be the case. Is this in any way a normal mode of operation?
When using the Liquibase status command, I see many ‘unapplied’ changes. How can Liquibase determine that without the MD5 sum? Or are changesets without MD5 sums considered changed.
The DATABASECHANGELOG in question was used by two different programs. This may be questionable, but used to work quite well.
Until… the programs started to use different versions of Liquibase with different formats of the MD5SUM. Upon detecting this, Liquibase writes NULL to the whole column (which is questionable in itself), thereby removing all the checksums for all the other programs. Depending on the order of Liquibase runs, different sets of checksums remained NULL in the end.

How to get the list of unstaged files during git-merge?

Is there a way to find the list of unstaged files during merge?
That is, when we merge from branch A to branch B, few files get merged (staged) if there are no conflicts. Suppose if I discard some of them (unstaging/discarding), How will I get the list of those files?
Assume I have merged couple of months ago, I wouldn't remember the unstaged files.
Agreed: the merge commit would only reflect staged files (automatically merged or resolved after merge conflict)
And don't forget that you could also unstaged part of one file, and merging the trest of that same file.
What if I want to merge those files now again?
You could, for testing, reset your HEAD to one commit before the last merge commit, effectively cancelling that merge.
Then merge again (the same source commit to your moved HEAD) and list the files being merged: git diff --name-only
You can compare that list to the one of the previous merge commit:
git diff-tree --no-commit-id --name-only -r <commit-ish>
The difference between the two lists would be the list of files that were unstaged during the initial merge.

How do I merge two heads of a branch on Bitbucket?

I'm not terribly familiar with Mercurial, and I have no idea how I managed to do this in the first place...
https://bitbucket.org/agent154/controlsfx/branch/wizard-before_advance?head=d6dda855fd9885d3413121068e73c0fa73e3cc2e
See above link. My branch "wizard-before_advance" has multiple heads. I've been doing development using IntelliJ IDEA, but I have TortoiseHG installed. How might I fix this?
What you probably did was make two separate commits with revision 22ec847 as the parent. This might have occurred in two separate clones, where you committed and pushed both of them to bitbucket (expect you'd have needed a -f on the second one). It could also have occurred in a single clone because you updated to an old version and committed there.
Regardless, it's not a problem. All you have to do is merge them. The message tells you which two revisions to work with.
d6dda85
2d5a883
So we update to one of these, and merge the other.
% hg update -r 2d5a883
% hg merge -r d6dda85
<Run checks, resolve conflicts, and basically make sure everything is good>
% hg commit -m 'Merging divergent heads'
This will leave you with a graph like this:
o---o---o---------M---
\ /
\-o---o---o
Where M is the merge change-set. Simple

create function sql server 2005 disable errors

I'm trying to write a *.bat file which runs all sql-scripts in given folder (every file in this folder has a create function script):
for /r "%~dp0\Production\Functions" %%X in (*.sql) do (
sqlcmd -S%1 -d%2 -b -i "%%X"
)
But some functions in the folder are depended on others. So I get Invalid object name error. Is there a way to disable this error?
Rename your files so that they're listed in the correct order of precedence. So, for example, if FuncA.sql uses FuncB.sql, then rename the files as 001-FuncB.sql, 002-FuncA.sql.
It is not possible to disable errors generated by SQL when you run (what I think of as) code-based object: stored procedures, functions, views, triggers, and anything else that has to be the sole object of a batch submitted to SQL.
It is also awkward at best to work around this problem. Some options:
One way, as Joe Stefanelli recommends, is to name your files such that they get executed in proper order (by name, or perhaps by date created or something more esoteric).
Another way is to group related functions in single scripts, such that referenced objects must be created before referencing objects.
Or combine the above two, putting all your dependent objects in one script you can guarantee will always run first. Not so useful if your have nested references.
A last (and more kludgy) way is to iterate over your scripts several times (assuming your "create" script will properly deal with an object that already exists), until a given pass raises no errors.
For development purposes, we store code-based objects in individual files, but when it comes time to wrap the code up for push to Production systems, I glom the files together, test it, and shuffle the contents around and retest until no more errors are generated.