Diff all files in an AccuRev transaction with the parent stream version - accurev

Given a transaction ID, how would one go about generating a diff between all of the files in the transaction with their corresponding versions in the parent stream using the AccuRev command line tools?

You will need to write a script to accomplish this. Here's a rough algorithm that should work:
Run a hist command on the transaction using the "-fvx" switch. This will verbose output in XML format. You want the list of elements, the version for each element and the name of the stream.
Run a show streams with the -fx switch against the stream in which the transaction resides. The goal here is to get the name of the backing stream.
For each element run a stat command to find the version currently in the backing stream and then run a diff command between the version in the transaction and the version found to be in the backing stream.

Related

Snowflake COPY INTO Command return

I have a question about the snowflake COPY INTO, searched but did not get my answers.
Suppose I want to push data from snowflake to s3 bucket and using the snowflake COPY INTO command in my code, How will I know if the file is ready or command is completed? So that I can read the file from the s3 location.
You can do the following things to check whether your COPY INTO was successful or at least to retrieve some useful information about your command:
Set DETAILED_OUTPUT = TRUE and check the result (this means you get information about every single unloaded file as a output; if set to "false" you only receive information about the whole unload-process)
Query your stage by using the syntax that can be found here https://docs.snowflake.com/en/user-guide/querying-stage.html
Query the metadata of your staged data by using metadata$filename and metadata$file_row_number: https://docs.snowflake.com/en/user-guide/querying-metadata.html
Keep in mind that even a failed COPY-command can result in some unloaded files on your stage.
More information can also be found at https://docs.snowflake.com/en/sql-reference/sql/copy-into-location.html#validating-data-to-be-unloaded-from-a-query
depending on how you're actually running this.
any Snowflake interface will run synchronously so the query will just spin until it's complete.
any async call would need extra checks - the easiest one being the web interface (it will show the status of the query and when it completes the unload is complete)

How to watch Changes to SQlite Database and Trigger Shell Script

Note: I believe I may be missing a simple solution to this problem. I'm relatively new to programming. Any advice is appreciated.
The problem: A small team of people (~3-5) want to be able to automate, as far as possible, the filing of downloaded files in appropriate folders. Files will be downloaded into a shared downloads folder. The files in this downloads folder will be sorted into a large shared folder structure according to their file-type, URL the file was downloaded from, and so on and so forth. These files are stored on a shared server, and the actual sorting will be done by some kind of shell script running on the server itself.
Whilst there are some utilities which do this (such as Maid), they don't do everything I want them to do. Maid for example doesn't have a way to get the the download url of a file in Linux. Additionally, it is written in Ruby, which I'd like to avoid.
The biggest stumbling block then is finding a find a way to get the url of the downloaded file that can be passed into the shell script. Originally I thought this could be done via getfattr, which would get a file's extended attributes. Frustratingly however, whilst chromium saves a file's download url as an extended attribute, Firefox doesn't seem to do the same thing. So relying on extended attributes seems to be out of the question.
What Firefox does do however is store download 'metadata' in the places.sqlite file, in two separate tables - moz_annos and moz_places. Inspired by this, I dediced to build a Firefox extension that writes all information about the downloaded file to a SQLite database downloads.sqlite on our server upon the completion of said download. This includes the url, MIME type, etc. of the downloaded file.
The idea is that with this data, the server could run a shell script that does some fine-grained sorting of the downloaded file into our shared file system.
However, I am struggling to find out a stable, reliable, and portable way of 'triggering' the script that will actually move the files, as well as passing information about these files to the script so that it can sort them accordingly.
There are a few ways I thought I could go about this. I'm not sure which method is the most appropriate:
1) Watch Downloads Folder
This method would watch for changes to the shared downloads directory, then use the file name of the downloaded file to query downloads.sqlite, getting the matching row, then finally passing the file's attributes into a bash script that sorts said file.
Difficulties: Finding a way to reliably match the downloaded file with the appropriate record in the database. Files may have the same download name but need to be sorted differently, perhaps, for example, if they were downloaded from a different URL. Additionally, I'd like to get additional attributes like whether the file was downloaded in incognito mode.
2) Create Auxillary 'Helper' File
Upon a file download event, the extension creates a 'helper' text file, which is the name of the file + some marker that contains the additional file attribute:
/Downloads/
mydownload.pdf
mydownload-downloadhelper.txt
The server can then watch for the creation of a .txt file in the downloads directory run the necessary shell script from this.
Difficulties: Whilst this avoids using a SQlite databse, it seems rather ungraceful and hacky, and I can see a multitude of ways in which this method would just break or not work.
3) Watch
SQlite Database
This method writes to the shared SQlite database downloads.sqlite on server. Then, by some method, watch for a new insert of a row into this database. This could either be by watching the sqlite databse for a new INSERT on a table, or have a sqlite trigger on INSERT that runs a bash script, passing on the download information into a shell script.
Difficulties: there doesn't seem to be any easy way to watch an SQlite database for a new row insert, and a trigger within SQlite doesn't seem to be able to launch an external script/program. I've searched high and low for a method of doing either of these two processes, but I'm struggling to find any documented way to do it that I am able to understand.
What I would like is :
Some feedback on which of these methods is appropriate, or if there is a more appropriate method that I am overlooking.
An example of a system/program that does something similar to this.
Many thanks in advance.
It seems to me that you have put "the cart in front of the horse":
Use cron to periodically check for new downloads. Process them on the command line instead of trying to trigger things from inside sqlite3:
a) Here is an approach using your shared sqlite3 database "downloads.sqlite":
Upfront once:
Add a table to your database containing just an integer as record counter and a timeStamp field, e.g., "table_counter":
sqlite3 downloads.sqlite "CREATE TABLE "table_counter" ( "counter" INTEGER PRIMARY KEY NOT NULL, "timestamp" DATETIME DEFAULT (datetime('now','UTC')));" 2>/dev/null
Insert an initial record into this new table setting the "counter" to zero and recording a timeStamp:
sqlite3 downloads.sqlite "INSERT INTO "table_counter" VALUES (0, (SELECT datetime('now','UTC')));" 2>/dev/null
Every so often:
Query the table containing the downloads with a "SELECT COUNT(*)" statement:
sqlite3 downloads.sqlite "SELECT COUNT(*) from table_downloads;" 2>/dev/null
Result e.g., 20
Compare this number to the number stored in the record counter field:
sqlite3 downloads.sqlite "SELECT (counter) from table_counter;" 2>/dev/null
Result e.g., 17
If result from 3) > result from 4), then you have downloaded more files than processed.
If so, query the table containing the downloads with a "SELECT" statement for the oldest not yet processed download, using a "subselect":
sqlite3 downloads.sqlite "SELECT * from table_downloads where rowid = (SELECT (counter+1) from table_counter);" 2>/dev/null
In my example this would SELECT all values for the data record with the rowid of 17+1 = 18;
Do your magic in regards to the downloaded file stored as record #18.
Increase the record counter in the "table_counter", again using a subselect:
sqlite3 downloads.sqlite "UPDATE table_counter SET counter = (SELECT (counter) from table_counter)+1;" 2>/dev/null
Finally, update the timeStamp for the "table_counter":
Why? Shit happens on shared drives... This way you can always check how many download records have been processed and when this has happened last time.
sqlite3 downloads.sqlite "UPDATE table_counter SET timeStamp = datetime('now','UTC');" 2>/dev/null
If you want to have a log of this processing then change the SQL statements in 4) to a "SELECT COUNT(*)" and in 7) to an "INSERT counter" and its subselect to an "(SELECT (counter+1) from table_counter)" respectively ...
Please note: The redirections " 2>/dev/null" at the end of the SQL statements are just to suppress this kind of line issued by newer versions of SQLite3 before showing your query results.
-- Loading resources from /usr/home/bernie/.sqliterc
If you don't like timeStamps based on UTC then use localtime instead:
(datetime('now','localtime'))
Put steps 3) inclusive 8) in a shell-script and use a cron entry to run this query/comparism periodically...
Use the complete /path/to/sqlite3 in this shell-script (just in case running on a shared drive. Someone could be fooling around with paths and could surprise your cron ...)
b) I will give you a simpler answer using awk and some hash like md5 in a separate answer.
So it is easier for future readers and easier for you to "rate" :-)

Why does git interpret sql files as binaries during a merge conflict?

I got the problem with resolving merge conflicts within sql files.
MenkeTTA#909086 MINGW64 //FILE0019 (master)
$ git pull
remote: Microsoft (R) Visual Studio (R) Team Services
remote: Found 5 objects to send. (5 ms)
Unpacking objects: 100% (5/5), done.
From https://***
d58a69b..4830c58 master -> origin/master
warning: Cannot merge binary files: example_StoredProcedure.sql (HEAD vs. 4830c5886d3e1eac5ac76d1d49496afb43f444c3)
Auto-merging WRR - example_StoredProcedure.sql
CONFLICT (content): Merge conflict in example_StoredProcedure.sql
Automatic merge failed; fix conflicts and then commit the result.
When the merge conflict is created git isn't creating a pre-merged file with the competing changes as in the usual structure:
/SQL-File/
<<<<<<< HEAD
competing change A
=======
competing change B
>>>>>>> branch-a
Git is treating both files as binaries – but only for the merge-conflict operations (normal merge without conflict works properly). I can choose my own version of the file or the pulled competing file from the remote as the new head for the next push.
I reproduced this conflict with a normal .txt file. Git is treating the merge conflict then as expected with creating one pre-merged file with both competing changes/commits where I can manually fix the code how I want to.
To make git recognize the sql files as text I added
.sql diff
to the .gitattributes file like it's described here. Does anyone know how I can make git to create a ordinary pre-merged file with both versions of the competing commits when working with sql files?
First, a quick note:
To make git recognize the sql files as text I added
.sql diff
to the .gitattributes file ...
The .gitattributes line should read *.sql diff (I've fixed the linked answer, which is on a question about getting git diff to treat the file as text). However, if the file really is text, you may want, or even need, *.sql text. Note: this will not help at all if the file is not text. If the file's content is UTF-16, it is not text to Git, at least.
Consider marking the file as example_StoredProcedure.sql text, i.e., not all .sql files, just this one particular file. I'm also curious to see whether just marking it diff suffices! Update, Nov 2019: apparently marking the file as diff is not sufficient, though I have not verified this myself.
(The difference is that the diff attribute tells Git how to show the file in git diff output,1 while the text tells Git that instead of using its built in guessing algorithm, it should, for all purposes, use the setting to decide whether the file is text. The guessing algorithm consists of scanning an initial chunk of the file's contents to see how many "text-like" characters there are vs "non-text-like" characters. Probably there should be a special allowance for UTF-8 Byte Order Markers at the top, but there isn't. Curiously, during filtering, there are explicit checks.)
1Well, it's actually more involved than just showing, but I think this is a good way to start thinking about the issues. Note that you can augment the diff setting with a driver. It's not clear to me how the low level file merge interacts with a diff driver and I do not have time to experiment with it right now.
Longer explanation
warning: Cannot merge binary files: example_StoredProcedure.sql (... vs ...)
tells us that you are correct, that Git is treating the three versions of example_StoredProcedure.sql as binary. (I see you added this output after the initial question; good thing, since it's the key!)
But why did I say three versions, when the line goes on to say:
HEAD vs. 4830c5886d3e1eac5ac76d1d49496afb43f444c3
Git is being a little lazy here: all merges involve three inputs, not just two. One of these is the one you supply explicitly—or, as in this case, git pull ran git merge and git pull itself supplied the big ugly hash ID 4830c5886d3e1eac5ac76d1d49496afb43f444c3.
The second input to a merge is always the current commit, aka HEAD. You normally get this by being on the branch in the first place: HEAD names the branch-name, the branch-name identifies the commit, and this is where you want the final merge commit to go, so it all fits together.
The third input—or internally, first; internally the "theirs" version is the third input—is one that Git computes for you, based on the HEAD and other or --theirs commits: Git walks through enough of the commit graph to find the best common ancestor commit.1 It's this common ancestor commit that determines which files need merging, and if a file does need merging, the built in merge driver needs to use diffs to get textual changes to merge. For both this and for git diff, Git has a differencing engine built in to it (modified from LibXDiff).
Hence Git can, in effect, run:
git diff --find-renames <merge base commit> HEAD
to see what we did to each of our files, and:
git diff --find-renames <merge base commit> <other commit>
to see what they did to each of our files. Then:
If we changed a file and they did not touch it at all, the merge is easy: take ours.
If they changed a file and we did not touch it at all, the merge is easy: take theirs.
If we both changed a file but made the new file exactly the same, the merge is easy: take either one (ours, really, since it's in place).
Otherwise, attempt to combine the changes.
For speed reasons, Git uses the hash IDs ("blob" hashes, for the file's content) to accomplish the first three bullet points without ever having to fire up the file-level diff. This can, and does, merge unconflicted binary-file changes. It's only the final stage, where all three blob hashes differ, that requires a textual diff so as to combine changes.
Obviously, if Git can't diff the file, it cannot merge the two diff outputs. But does just marking the file as text-diff-able (pattern diff in .gitattributes) make the merge proceed? What happens if you set a diff driver, does the low-level file merge code use that driver? It "wants" to use the xdiff internal interface to find hunks; that's a lot easier than interpreting text output from a driver; you probably have to define a merge driver to get a detected-as-binary file to be merged, even if you have marked it as diff.
Additional note, Nov 2019: Since Git 2.18, Git has the ability to convert between committed UTF-8 data and in-work-tree other-format data. To use this, set the working-tree-encoding attribute. For instance, [the gitattributes documentation] shows an example line:
*.ps1 text working-tree-encoding=UTF-16LE eol=CRLF
that would keep all *.ps1 files in UTF-8 internally (in the frozen, committed files inside each commit) but keep the useful-format versions of those files in your work-tree in UTF-16-LE. I have no data as to whether this would work with these SQL files.
1In all cases, but especially in problem cases where there's more than one best common ancestor, git merge's behavior actually depends on the strategy you chose. The usual recursive strategy will merge the merge bases, commit the result, and then use that commit as the merge base! Other merge strategies work differently.

Azure Power-shell command to get the Count of records in Azure Data lake file

I have set of files on Azure Data-lake store folder location. Is there any simple power-shell command to get the count of records in a file? I would like to do this with out using Get-AzureRmDataLakeStoreItemContent command on the file item as the size of the files in gigabytes. Using this command on big files is giving the below error.
Error:
Get-AzureRmDataLakeStoreItemContent : The remaining data to preview is greater than 1048576 bytes. Please specify a
length or use the Force parameter to preview the entire file. The length of the file that would have been previewed:
749319688
Azure data lake operates at the file/folder level. The concept of a record really depends on how an application interprets it. For instance, in one case the file may have CSV line or in another a set of JSON objects. In some cases files contain binary data. Therefore, there is no way at the file system level to get the count of records.
The best way to get this information is to submit a job such as a USQL job in Azure Data Lake Analytics. The script will be really simple: An EXTRACT statement followed by a COUNT aggregation and an OUTPUT statement.
If you prefer Spark or Hadoop here is a StackOverflow question that discusses that: Finding total number of lines in hdfs distributed file using command line

How to generate log in cvs containing the diffrence of revision

I am using the CVS repository and using CVS diff but in diff I can see only the the modified file but I want who did these changes.
So I want the difference and the name of author who make changes on this revision in a same log file.
Is there any available command in CVS to do that?
If you're looking for the authoring information (who committed the file) that's
cvs log <filename>
This will give you each revision, the author, added lines, etc