Dan Newcome, blog

I'm bringing cyber back

Line-ending bigotism

with 3 comments

UPDATE: I’ve added a post that addresses the cause of my issues here.

I was adding a new file to one of my git repositories and was confronted with an error on trying to commit the file:

C:\>git commit -m “adding README”
warning: LF will be replaced by CRLF in README
*
* You have some suspicious patch lines:
*
* In README
* unresolved merge conflict (line 17)
README:17:========
* unresolved merge conflict (line 19)
README:19:========
* unresolved merge conflict (line 34)

…..

Reading up a little bit on the way git handles line endings, I turned up this gem in the man page of git-config:

core.autocrlf If true, makes git convert CRLF at the end of lines in text files to LF when reading from the filesystem, and convert in reverse when writing to the filesystem. The variable can be set to input, in which case the conversion happens only while reading from the filesystem but files are written out with LF at the end of lines. Currently, which paths to consider “text” (i.e. be subjected to the autocrlf mechanism) is decided purely based on the contents.
core.safecrlf
If true, makes git check if converting CRLF as controlled by core.autocrlf is reversible. Git will verify if a command modifies a file in the work tree either directly or indirectly. For example, committing a file followed by checking out the same file should yield the original file in the work tree. If this is not the case for the current setting of core.autocrlf, git will reject the file. The variable can be set to “warn”, in which case git will only warn about an irreversible conversion but continue the operation.

CRLF conversion bears a slight chance of corrupting data. autocrlf=true will convert CRLF to LF during commit and LF to CRLF during checkout. A file that contains a mixture of LF and CRLF before the commit cannot be recreated by git. For text files this is the right thing to do: it corrects line endings such that we have only LF line endings in the repository. But for binary files that are accidentally classified as text the conversion can corrupt data.

If you recognize such corruption early you can easily fix it by setting the conversion type explicitly in .gitattributes. Right after committing you still have the original file in your work tree and this file is not yet corrupted. You can explicitly tell git that this file is binary and git will handle the file appropriately.

Unfortunately, the desired effect of cleaning up text files with mixed line endings and the undesired effect of corrupting binary files cannot be distinguished. In both cases CRLFs are removed in an irreversible way. For text files this is the right thing to do because CRLFs are line endings, while for binary files converting CRLFs corrupts data.

Note, this safety check does not mean that a checkout will generate a file identical to the original file for a different setting of core.autocrlf, but only for the current one. For example, a text file with LF would be accepted with core.autocrlf=input and could later be checked out with core.autocrlf=true, in which case the resulting file would contain CRLF, although the original file contained LF. However, in both work trees the line endings would be consistent, that is either all LF or all CRLF, but never mixed. A file with mixed line endings would be reported by the core.safecrlf mechanism.

Since when has it been the domain of a version control system to actually change a file’s contents when it is checked in or out?  I know that Linus is quite opinionated when it comes to the Right Way of handling line endings, but this is absurd.  I don’t want my version control system to touch that at all.  I’m a big boy, and I can figure out how to get my tools and editor to play nice with the appropriate line endings for my system. What I don’t need is the introduction of a possible source of confounding errors that typically take forever to track down.  Line endings can be akin to trying to figure out why a multi-line shell script isn’t running properly, when the problem is that there is some extra whitespace after one of the line continuation characters.  All hell would break loose if the version control system took it upon itself to adjust whitespace in your files!

I suppose that this is all a non-issue if you run things on Linux as Linus intends you to.

Advertisements

Written by newcome

October 12, 2009 at 6:05 pm

Posted in Uncategorized

3 Responses

Subscribe to comments with RSS.

  1. […] a comment » Previously, I had written/complained about the way that the Git version control system handled newline characters. I wanted to update […]

  2. Thanks for the post. This issue is long lasting, hence this reply. It was useful to find the required information distilled here.

    I must however challenge your assertion that EOL integrity is not important for the SCM to handle. You have the option to turn it off, and due to the nature of Git (IE collaborative) Linus is right to have this as the default. Imagine the mess that would occur when users of the 3 major platforms are altering and committing code. Ugh!

    Nicholas

    March 21, 2012 at 8:11 pm

  3. @nicholas I’ve been using git now pretty much exclusively since I wrote this post. I must say I haven’t had any real issues stemming from newline conversions. I guess this is one of those things that initially feels like the SCM overstepping bounds, but ultimately is for the best.

    When I used to use SVN sometimes I would copy my working directory around between different systems to work, and when I tried to do this with git, I ran into this issue when going between Windows and Linux. Turns out that it’s kind of stupid to work that way with a distributed SCM like git anyway!

    newcome

    March 21, 2012 at 8:39 pm


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: