<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <title>Gray Soft / Character Encodings / Ruby 1.9's Three Default Encodings</title>
  <id>tag:graysoftinc.com,2014-03-20:/posts/81</id>
  <updated>2014-03-27T01:38:28Z</updated>
  <link rel="self" href="http://graysoftinc.com/character-encodings/ruby-19s-three-default-encodings/feed.xml"/>
  <link rel="alternate" href="http://graysoftinc.com/character-encodings/ruby-19s-three-default-encodings"/>
  <author>
    <name>James Edward Gray II</name>
  </author>
  <entry>
    <title>The 41st Comment on "Ruby 1.9's Three Default Encodings"</title>
    <link rel="alternate" href="http://graysoftinc.com/character-encodings/ruby-19s-three-default-encodings#comment_499"/>
    <id>tag:graysoftinc.com,2012-12-28:/comments/499</id>
    <updated>2014-03-27T01:38:28Z</updated>
    <summary>Great article. Thanks very much.</summary>
    <content type="html">&lt;p&gt;Great article. Thanks very much.&lt;/p&gt;</content>
    <author>
      <name>Jonah Burke</name>
    </author>
  </entry>
  <entry>
    <title>The 40th Comment on "Ruby 1.9's Three Default Encodings"</title>
    <link rel="alternate" href="http://graysoftinc.com/character-encodings/ruby-19s-three-default-encodings#comment_498"/>
    <id>tag:graysoftinc.com,2012-12-16:/comments/498</id>
    <updated>2014-04-18T18:30:32Z</updated>
    <summary>Fixed.  Thanks.</summary>
    <content type="html">&lt;p&gt;Fixed.  Thanks.&lt;/p&gt;</content>
    <author>
      <name>James Edward Gray II</name>
    </author>
  </entry>
  <entry>
    <title>The 39th Comment on "Ruby 1.9's Three Default Encodings"</title>
    <link rel="alternate" href="http://graysoftinc.com/character-encodings/ruby-19s-three-default-encodings#comment_497"/>
    <id>tag:graysoftinc.com,2012-12-16:/comments/497</id>
    <updated>2014-04-18T18:30:32Z</updated>
    <summary>Great series of articles, really clarifies the whole encoding mess for me.

The only contribution I can offer at this time is an insignificant typo to note. In point 2. of four things to note in the example,  you start with &amp;quot;Use can use&amp;quot;, but I ...</summary>
    <content type="html">&lt;p&gt;Great series of articles, really clarifies the whole encoding mess for me.&lt;/p&gt;

&lt;p&gt;The only contribution I can offer at this time is an insignificant typo to note. In point 2. of four things to note in the example,  you start with "Use can use", but I think you meant to write "You can use".&lt;/p&gt;

&lt;p&gt;Thanks for putting this all out there so clearly.&lt;/p&gt;</content>
    <author>
      <name>HerbCSO</name>
    </author>
  </entry>
  <entry>
    <title>The 38th Comment on "Ruby 1.9's Three Default Encodings"</title>
    <link rel="alternate" href="http://graysoftinc.com/character-encodings/ruby-19s-three-default-encodings#comment_488"/>
    <id>tag:graysoftinc.com,2012-09-04:/comments/488</id>
    <updated>2014-04-18T18:29:55Z</updated>
    <summary>The default internal and external encodings apply to `IO` based communication, not to the encoding of the source code.  That&amp;#39;s the third encoding type, separate from the other two.

The source encoding is controlled via the &amp;quot;(en)coding&amp;quot; comments...</summary>
    <content type="html">&lt;p&gt;The default internal and external encodings apply to &lt;code&gt;IO&lt;/code&gt; based communication, not to the encoding of the source code.  That's the third encoding type, separate from the other two.&lt;/p&gt;

&lt;p&gt;The source encoding is controlled via the "(en)coding" comments.  Also, as you've noted, &lt;code&gt;-K&lt;/code&gt; can change it as well.  That's really just for backwards compatibility though and you should be using the comments.&lt;/p&gt;

&lt;p&gt;Hope that helps.&lt;/p&gt;</content>
    <author>
      <name>James Edward Gray II</name>
    </author>
  </entry>
  <entry>
    <title>The 37th Comment on "Ruby 1.9's Three Default Encodings"</title>
    <link rel="alternate" href="http://graysoftinc.com/character-encodings/ruby-19s-three-default-encodings#comment_486"/>
    <id>tag:graysoftinc.com,2012-09-04:/comments/486</id>
    <updated>2014-04-18T18:29:16Z</updated>
    <summary>The line endings where still there, but these comments are int Markdown.  I reformatted your comment to indent the code.</summary>
    <content type="html">&lt;p&gt;The line endings where still there, but these comments are int Markdown.  I reformatted your comment to indent the code.&lt;/p&gt;</content>
    <author>
      <name>James Edward Gray II</name>
    </author>
  </entry>
  <entry>
    <title>The 36th Comment on "Ruby 1.9's Three Default Encodings"</title>
    <link rel="alternate" href="http://graysoftinc.com/character-encodings/ruby-19s-three-default-encodings#comment_485"/>
    <id>tag:graysoftinc.com,2012-09-04:/comments/485</id>
    <updated>2014-04-18T18:29:16Z</updated>
    <summary>Sorry for the formatting of the previous comment.  The form ate my linefeeds pasted from terminal.</summary>
    <content type="html">&lt;p&gt;Sorry for the formatting of the previous comment.  The form ate my linefeeds pasted from terminal.&lt;/p&gt;</content>
    <author>
      <name>jpgeek</name>
    </author>
  </entry>
  <entry>
    <title>The 35th Comment on "Ruby 1.9's Three Default Encodings"</title>
    <link rel="alternate" href="http://graysoftinc.com/character-encodings/ruby-19s-three-default-encodings#comment_484"/>
    <id>tag:graysoftinc.com,2012-09-04:/comments/484</id>
    <updated>2014-04-18T18:29:55Z</updated>
    <summary>Thanks much for this!

using ruby 1.9.3, it does not appear that default_external applies to requiring files:

```
$ cat test.rb

puts &amp;#39;internal:&amp;#39;
puts Encoding.default_internal
puts &amp;#39;external:&amp;#39;
puts Encoding.default_external
require &amp;#39;....</summary>
    <content type="html">&lt;p&gt;Thanks much for this!&lt;/p&gt;

&lt;p&gt;using ruby 1.9.3, it does not appear that default_external applies to requiring files:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ cat test.rb

puts 'internal:'
puts Encoding.default_internal
puts 'external:'
puts Encoding.default_external
require './test2'

$ cat test2.rb 

def test_str
  return "a string"
end

$ ruby -E UTF-8:UTF-8 test.rb 
internal:
UTF-8
external:
UTF-8
a string
US-ASCII
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;However, the &lt;code&gt;-Ku&lt;/code&gt; option does seem to work:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ ruby -E UTF-8:UTF-8 -Ku test.rb 
internal:
UTF-8
external:
UTF-8
a string
UTF-8
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Any idea if this is by design or a bug?&lt;/p&gt;</content>
    <author>
      <name>jpgeek</name>
    </author>
  </entry>
  <entry>
    <title>The 34th Comment on "Ruby 1.9's Three Default Encodings"</title>
    <link rel="alternate" href="http://graysoftinc.com/character-encodings/ruby-19s-three-default-encodings#comment_480"/>
    <id>tag:graysoftinc.com,2012-08-04:/comments/480</id>
    <updated>2014-04-18T18:26:50Z</updated>
    <summary>Fixed.  Thanks.</summary>
    <content type="html">&lt;p&gt;Fixed.  Thanks.&lt;/p&gt;</content>
    <author>
      <name>James Edward Gray II</name>
    </author>
  </entry>
  <entry>
    <title>The 33rd Comment on "Ruby 1.9's Three Default Encodings"</title>
    <link rel="alternate" href="http://graysoftinc.com/character-encodings/ruby-19s-three-default-encodings#comment_479"/>
    <id>tag:graysoftinc.com,2012-08-04:/comments/479</id>
    <updated>2014-04-18T18:26:50Z</updated>
    <summary>Thanks for a great series!  Very helpful.

I just noticed a typo:

&amp;gt; That makes sense, as it&amp;#39;s really to late

should be

&amp;gt; That makes sense, as it&amp;#39;s really too late</summary>
    <content type="html">&lt;p&gt;Thanks for a great series!  Very helpful.&lt;/p&gt;

&lt;p&gt;I just noticed a typo:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;That makes sense, as it's really to late&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;should be&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;That makes sense, as it's really too late&lt;/p&gt;
&lt;/blockquote&gt;</content>
    <author>
      <name>Colin Kelley</name>
    </author>
  </entry>
  <entry>
    <title>The 32nd Comment on "Ruby 1.9's Three Default Encodings"</title>
    <link rel="alternate" href="http://graysoftinc.com/character-encodings/ruby-19s-three-default-encodings#comment_478"/>
    <id>tag:graysoftinc.com,2012-08-03:/comments/478</id>
    <updated>2014-04-18T18:26:07Z</updated>
    <summary>Great article, since long I am trying to understand the encoding of `IO`, gone through multiple articles but not convinced with anyone of them. Finally got my hand on this article. Now things are pretty clear.</summary>
    <content type="html">&lt;p&gt;Great article, since long I am trying to understand the encoding of &lt;code&gt;IO&lt;/code&gt;, gone through multiple articles but not convinced with anyone of them. Finally got my hand on this article. Now things are pretty clear.&lt;/p&gt;</content>
    <author>
      <name>Kaushal Kishore</name>
    </author>
  </entry>
  <entry>
    <title>The 31st Comment on "Ruby 1.9's Three Default Encodings"</title>
    <link rel="alternate" href="http://graysoftinc.com/character-encodings/ruby-19s-three-default-encodings#comment_446"/>
    <id>tag:graysoftinc.com,2011-05-10:/comments/446</id>
    <updated>2014-04-18T18:25:34Z</updated>
    <summary>Thanks Edward, it worked. I had few comments in there.</summary>
    <content type="html">&lt;p&gt;Thanks Edward, it worked. I had few comments in there.&lt;/p&gt;</content>
    <author>
      <name>skawley</name>
    </author>
  </entry>
  <entry>
    <title>The 30th Comment on "Ruby 1.9's Three Default Encodings"</title>
    <link rel="alternate" href="http://graysoftinc.com/character-encodings/ruby-19s-three-default-encodings#comment_445"/>
    <id>tag:graysoftinc.com,2011-05-09:/comments/445</id>
    <updated>2014-04-18T18:25:34Z</updated>
    <summary>The encoding line must be the very first line of the file, or the second if you have a shebang line.</summary>
    <content type="html">&lt;p&gt;The encoding line must be the very first line of the file, or the second if you have a shebang line.&lt;/p&gt;</content>
    <author>
      <name>James Edward Gray II</name>
    </author>
  </entry>
  <entry>
    <title>The 29th Comment on "Ruby 1.9's Three Default Encodings"</title>
    <link rel="alternate" href="http://graysoftinc.com/character-encodings/ruby-19s-three-default-encodings#comment_444"/>
    <id>tag:graysoftinc.com,2011-05-09:/comments/444</id>
    <updated>2014-04-18T18:25:34Z</updated>
    <summary>Sorry to get in so late on this. 

I tried putting `# encoding: UTF-8` to my file but I still get the error &amp;quot;invalid multibyte char (US-ASCII) (SyntaxError)&amp;quot;. syntax error, unexpected &amp;#39;|&amp;#39;

It doesn&amp;#39;t seem to like &amp;quot;|&amp;quot;(pipe) character. My Ruby v...</summary>
    <content type="html">&lt;p&gt;Sorry to get in so late on this. &lt;/p&gt;

&lt;p&gt;I tried putting &lt;code&gt;# encoding: UTF-8&lt;/code&gt; to my file but I still get the error "invalid multibyte char (US-ASCII) (SyntaxError)". syntax error, unexpected '|'&lt;/p&gt;

&lt;p&gt;It doesn't seem to like "|"(pipe) character. My Ruby version is ruby 1.9.2p180 (2011-02-18) [i386-mingw32]&lt;/p&gt;</content>
    <author>
      <name>skawley</name>
    </author>
  </entry>
  <entry>
    <title>The 28th Comment on "Ruby 1.9's Three Default Encodings"</title>
    <link rel="alternate" href="http://graysoftinc.com/character-encodings/ruby-19s-three-default-encodings#comment_443"/>
    <id>tag:graysoftinc.com,2011-04-18:/comments/443</id>
    <updated>2014-04-18T18:24:03Z</updated>
    <summary>Even if all the Japanese switch the UTF-8 today, they will have plenty of legacy data to contend with.  They also have [reasons for the slow adoption](/character-encodings/the-unicode-character-set-and-encodings).</summary>
    <content type="html">&lt;p&gt;Even if all the Japanese switch the UTF-8 today, they will have plenty of legacy data to contend with.  They also have &lt;a href="/character-encodings/the-unicode-character-set-and-encodings"&gt;reasons for the slow adoption&lt;/a&gt;.&lt;/p&gt;</content>
    <author>
      <name>James Edward Gray II</name>
    </author>
  </entry>
  <entry>
    <title>The 27th Comment on "Ruby 1.9's Three Default Encodings"</title>
    <link rel="alternate" href="http://graysoftinc.com/character-encodings/ruby-19s-three-default-encodings#comment_442"/>
    <id>tag:graysoftinc.com,2011-04-18:/comments/442</id>
    <updated>2014-04-18T18:24:03Z</updated>
    <summary>I guess somebody never heard of convention over configuration.

I mean would it really kill those Japanese gem developers (or the rest of us) to set text editors to UTF8 instead of SJIS (or US-ASCII)? The whole point of Unicode is that it&amp;#39;s univ...</summary>
    <content type="html">&lt;p&gt;I guess somebody never heard of convention over configuration.&lt;/p&gt;

&lt;p&gt;I mean would it really kill those Japanese gem developers (or the rest of us) to set text editors to UTF8 instead of SJIS (or US-ASCII)? The whole point of Unicode is that it's universal - a unique ID for every symbol of every language on earth. Instead of that, now every Ruby file on earth needs to have an extra line of garbage at the top.&lt;/p&gt;</content>
    <author>
      <name>Tobias Cohen</name>
    </author>
  </entry>
  <entry>
    <title>The 26th Comment on "Ruby 1.9's Three Default Encodings"</title>
    <link rel="alternate" href="http://graysoftinc.com/character-encodings/ruby-19s-three-default-encodings#comment_441"/>
    <id>tag:graysoftinc.com,2011-04-09:/comments/441</id>
    <updated>2014-04-18T18:22:44Z</updated>
    <summary>Thank you for this great article.

Regarding magic comments:
my version of GNU Emacs 22.1.1 complained about an &amp;quot;Invalid coding system&amp;quot; when using this version of the magic comment: 

```ruby
# -*- coding: UTF-8 -*-
```

However, Emacs wa...</summary>
    <content type="html">&lt;p&gt;Thank you for this great article.&lt;/p&gt;

&lt;p&gt;Regarding magic comments:&lt;br&gt;
my version of GNU Emacs 22.1.1 complained about an "Invalid coding system" when using this version of the magic comment: &lt;/p&gt;

&lt;div class="highlight highlight-ruby"&gt;&lt;pre&gt;&lt;span class="c1"&gt;# -*- coding: UTF-8 -*-&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;However, Emacs was happy with this:&lt;/p&gt;

&lt;div class="highlight highlight-ruby"&gt;&lt;pre&gt;&lt;span class="c1"&gt;# -*- coding: utf-8 -*-&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Just thought I'd pass along the info.&lt;/p&gt;

&lt;p&gt;Thanks again for your post.&lt;/p&gt;</content>
    <author>
      <name>jgpawletko</name>
    </author>
  </entry>
  <entry>
    <title>The 25th Comment on "Ruby 1.9's Three Default Encodings"</title>
    <link rel="alternate" href="http://graysoftinc.com/character-encodings/ruby-19s-three-default-encodings#comment_435"/>
    <id>tag:graysoftinc.com,2011-03-21:/comments/435</id>
    <updated>2014-04-18T18:21:38Z</updated>
    <summary>Probably not.  If you are having trouble it&amp;#39;s best to discuss on the Ruby Core mailing list where you can get help from several people smarter than me.</summary>
    <content type="html">&lt;p&gt;Probably not.  If you are having trouble it's best to discuss on the Ruby Core mailing list where you can get help from several people smarter than me.&lt;/p&gt;</content>
    <author>
      <name>James Edward Gray II</name>
    </author>
  </entry>
  <entry>
    <title>The 24th Comment on "Ruby 1.9's Three Default Encodings"</title>
    <link rel="alternate" href="http://graysoftinc.com/character-encodings/ruby-19s-three-default-encodings#comment_434"/>
    <id>tag:graysoftinc.com,2011-03-21:/comments/434</id>
    <updated>2014-04-18T18:21:38Z</updated>
    <summary>Hello,

for now, in Ruby 1.9.2p136, the setting `LC_CTYPE` or `LANG` method isn&amp;#39;t working. Do you know why?</summary>
    <content type="html">&lt;p&gt;Hello,&lt;/p&gt;

&lt;p&gt;for now, in Ruby 1.9.2p136, the setting &lt;code&gt;LC_CTYPE&lt;/code&gt; or &lt;code&gt;LANG&lt;/code&gt; method isn't working. Do you know why?&lt;/p&gt;</content>
    <author>
      <name>3ануда</name>
    </author>
  </entry>
  <entry>
    <title>The 23rd Comment on "Ruby 1.9's Three Default Encodings"</title>
    <link rel="alternate" href="http://graysoftinc.com/character-encodings/ruby-19s-three-default-encodings#comment_423"/>
    <id>tag:graysoftinc.com,2011-01-19:/comments/423</id>
    <updated>2014-04-18T18:20:59Z</updated>
    <summary>You could definitely replace shelling out to `ls` with a call to `Dir.glob()`, but guessing encodings is a lot trickier.  The `rchardet` gem can help figure it out, I believe.</summary>
    <content type="html">&lt;p&gt;You could definitely replace shelling out to &lt;code&gt;ls&lt;/code&gt; with a call to &lt;code&gt;Dir.glob()&lt;/code&gt;, but guessing encodings is a lot trickier.  The &lt;code&gt;rchardet&lt;/code&gt; gem can help figure it out, I believe.&lt;/p&gt;</content>
    <author>
      <name>James Edward Gray II</name>
    </author>
  </entry>
  <entry>
    <title>The 22nd Comment on "Ruby 1.9's Three Default Encodings"</title>
    <link rel="alternate" href="http://graysoftinc.com/character-encodings/ruby-19s-three-default-encodings#comment_422"/>
    <id>tag:graysoftinc.com,2011-01-19:/comments/422</id>
    <updated>2014-04-18T18:20:59Z</updated>
    <summary>I have a collection of input files in several different formats - usascii, iso-8859-1, and utf-8

I need to read them all into a utf-8 encoding for further processing (regex, `split`, etc).

My solution, which seems to be working at the moment...</summary>
    <content type="html">&lt;p&gt;I have a collection of input files in several different formats - usascii, iso-8859-1, and utf-8&lt;/p&gt;

&lt;p&gt;I need to read them all into a utf-8 encoding for further processing (regex, &lt;code&gt;split&lt;/code&gt;, etc).&lt;/p&gt;

&lt;p&gt;My solution, which seems to be working at the moment, is:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;[user1@hoho6 ~]$ cat james.rb
# coding: utf-8
puts `ruby -v`

files =  `ls -1 /home/user1/Accounts/Kto*.sta`
files &amp;lt;&amp;lt; `ls -1 /home/user1/Accounts/Kto*.scn`
files &amp;lt;&amp;lt; `ls -1 /home/user1/Accounts/umsMT940*.txt`

pflag = true

files.each_line {|fname|
  fname.chomp!
  enc = `file -bi "#{fname}"`.chomp.split('=')[1]
# puts enc
  mode = "r"
  mode &amp;lt;&amp;lt; ":#{enc}:utf-8" if enc != 'utf-8'
  File.open(fname,mode) {|f|
    f.each_line {|line|
      line.chomp!
      begin
        fields = line.split(':')
      rescue ArgumentError =&amp;gt; e
        puts e.message
        puts fname
        puts line
        exit(1)
      end
      # &amp;lt;further processing of fields&amp;gt;
      if /UEBERZ/.match(line) != nil
        puts line if pflag
        pflag = false
      end
    }
  }
} 
[user1@hoho6 ~]$ 

[user1@hoho6 ~]$ ruby james.rb
ruby 1.9.2p136 (2010-12-25 revision 30365) [x86_64-linux]
:86:809?00UEBERZ.-ZINS?10999116?20ÜBERZIEHUNGSZINSEN?21ZURZEIT       
[user1@hoho6 ~]$
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;I determine the encoding of each input file prior to opening by using the Unix command &lt;code&gt;file -bi&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;mode&lt;/code&gt; is used to avoid spurious messages when trying to open using &lt;code&gt;"r:utf-8:utf-8"&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;split&lt;/code&gt; command was my original problem when porting from Ruby 1.8.6 Rails 2.3.5 to Ruby 1.9.2 Rails 3.0.2. Testing each line of each file with a &lt;code&gt;rescue&lt;/code&gt; around the &lt;code&gt;split&lt;/code&gt; gives me some confidence that I can process these files.&lt;/p&gt;

&lt;p&gt;Any suggestions for code simplification would be appreciated. It would be nice if it could be done without the external Unix command.&lt;/p&gt;</content>
    <author>
      <name>Bob Gustafson</name>
    </author>
  </entry>
  <entry>
    <title>The 21st Comment on "Ruby 1.9's Three Default Encodings"</title>
    <link rel="alternate" href="http://graysoftinc.com/character-encodings/ruby-19s-three-default-encodings#comment_412"/>
    <id>tag:graysoftinc.com,2010-11-22:/comments/412</id>
    <updated>2014-04-18T18:17:28Z</updated>
    <summary>A note on the script `show_internal.rb`; for each line, it prints the correct encoding (UTF-16LE), but then instead of the expected text it prints the internal byte structure (like it did not recognize the encoding).

I got the same result runni...</summary>
    <content type="html">&lt;p&gt;A note on the script &lt;code&gt;show_internal.rb&lt;/code&gt;; for each line, it prints the correct encoding (UTF-16LE), but then instead of the expected text it prints the internal byte structure (like it did not recognize the encoding).&lt;/p&gt;

&lt;p&gt;I got the same result running 1.9.1, but with 1.9.2 the problem does not occur and the text is the expected one. Just in case you can update it (perhaps printing the text and an &lt;code&gt;unpack&lt;/code&gt; of the line to show the structure). &lt;/p&gt;

&lt;p&gt;Thanks again for this tutorial &lt;/p&gt;</content>
    <author>
      <name>Raul Parolari</name>
    </author>
  </entry>
  <entry>
    <title>The 20th Comment on "Ruby 1.9's Three Default Encodings"</title>
    <link rel="alternate" href="http://graysoftinc.com/character-encodings/ruby-19s-three-default-encodings#comment_411"/>
    <id>tag:graysoftinc.com,2010-11-21:/comments/411</id>
    <updated>2014-04-18T18:15:29Z</updated>
    <summary>I did mean, &amp;quot;before it was written to the file.&amp;quot;  Just think of in terms of `encode()` instead of `encode!()`, with the returned result being written into the file.

As for your other comment, it&amp;#39;s not always possible for Ruby to know the encodi...</summary>
    <content type="html">&lt;p&gt;I did mean, "before it was written to the file."  Just think of in terms of &lt;code&gt;encode()&lt;/code&gt; instead of &lt;code&gt;encode!()&lt;/code&gt;, with the returned result being written into the file.&lt;/p&gt;

&lt;p&gt;As for your other comment, it's not always possible for Ruby to know the encoding from just the data, so it leaves it to us to specify what is intended.&lt;/p&gt;

&lt;p&gt;But yeah, it sounds like you have it pretty figured out to me.&lt;/p&gt;</content>
    <author>
      <name>James Edward Gray II</name>
    </author>
  </entry>
  <entry>
    <title>The 19th Comment on "Ruby 1.9's Three Default Encodings"</title>
    <link rel="alternate" href="http://graysoftinc.com/character-encodings/ruby-19s-three-default-encodings#comment_410"/>
    <id>tag:graysoftinc.com,2010-11-21:/comments/410</id>
    <updated>2014-04-18T18:15:29Z</updated>
    <summary>James, thanks for this memorable ride across the enigmatic world of Unicode.

I have a couple of observations on the script `write_internal.rb`:

(1) The sentence _note how my data was transcoded before it was written_ is not clear (it cannot ...</summary>
    <content type="html">&lt;p&gt;James, thanks for this memorable ride across the enigmatic world of Unicode.&lt;/p&gt;

&lt;p&gt;I have a couple of observations on the script &lt;code&gt;write_internal.rb&lt;/code&gt;:&lt;/p&gt;

&lt;p&gt;(1) The sentence &lt;em&gt;note how my data was transcoded before it was written&lt;/em&gt; is not clear (it cannot mean "in the block, before it is written to the file", as the printout shows that the data is of course still in Utf-8); we only see it transcoded when we read from the file (or running an &lt;code&gt;od -cx data.txt&lt;/code&gt; from the shell), so I lost what that meant.&lt;/p&gt;

&lt;p&gt;(2) But the real problem was presented by the &lt;strong&gt;format of the string&lt;/strong&gt; read from the file and printed at the end; I could not make sense of it. Finally I realized that while the string content is encoded as UTF-16LE, Ruby assigned to the string encoding UTF-8 (as per the script &lt;em&gt;coding&lt;/em&gt; line); thus, the apparent oddity of the string derives from the fact that ruby is representing in UTF-8 an UTF-16LE string.&lt;/p&gt;

&lt;p&gt;Only applying &lt;code&gt;force_encode("UTF-16LE")&lt;/code&gt; to the string read, the string made sense (the unicode triple dot is shown via its unicode codepoint, and all those zero bytes disappeared in the printout). And then when encoding the previous result to UTF-8, we find the exact string we had at the beginning.&lt;/p&gt;

&lt;p&gt;It all makes sense (although I confess that I had thought, even without realizing it, that Ruby would do at least the first step above, i.e. read the encoding from the file and present a string whose encoding matched the content).&lt;/p&gt;

&lt;p&gt;If you have a chance, let me know if I am correct in interpreting this, or if miss something. In any case, thanks again for this extremely useful series.&lt;/p&gt;

&lt;p&gt;Raul&lt;/p&gt;</content>
    <author>
      <name>raulparolari@gmail.com</name>
    </author>
  </entry>
  <entry>
    <title>The 18th Comment on "Ruby 1.9's Three Default Encodings"</title>
    <link rel="alternate" href="http://graysoftinc.com/character-encodings/ruby-19s-three-default-encodings#comment_404"/>
    <id>tag:graysoftinc.com,2010-10-16:/comments/404</id>
    <updated>2014-04-18T18:09:53Z</updated>
    <summary>Hey James,

thanks so much for the &amp;#39;magic comment&amp;#39; hint.
</summary>
    <content type="html">&lt;p&gt;Hey James,&lt;/p&gt;

&lt;p&gt;thanks so much for the 'magic comment' hint.&lt;/p&gt;</content>
    <author>
      <name>Daniel</name>
    </author>
  </entry>
  <entry>
    <title>The 17th Comment on "Ruby 1.9's Three Default Encodings"</title>
    <link rel="alternate" href="http://graysoftinc.com/character-encodings/ruby-19s-three-default-encodings#comment_370"/>
    <id>tag:graysoftinc.com,2010-04-04:/comments/370</id>
    <updated>2014-04-18T18:09:30Z</updated>
    <summary>Thanks.

Not running in 1.9.1 version.

```ruby
irb(main):001:0&amp;gt; require &amp;#39;stringio&amp;#39;
=&amp;gt; true
irb(main):002:0&amp;gt; sio = StringIO.open(&amp;quot;&amp;quot;, &amp;quot;w:UTF-8&amp;quot;)
=&amp;gt; #&amp;lt;StringIO:0x29d4a00&amp;gt;
irb(main):003:0&amp;gt; sio &amp;lt;&amp;lt; &amp;quot;abc&amp;quot;
=&amp;gt; #&amp;lt;StringIO:0x29d4a00&amp;gt;
irb(main):00...</summary>
    <content type="html">&lt;p&gt;Thanks.&lt;/p&gt;

&lt;p&gt;Not running in 1.9.1 version.&lt;/p&gt;

&lt;div class="highlight highlight-ruby"&gt;&lt;pre&gt;&lt;span class="n"&gt;irb&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;main&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;&lt;span class="mo"&gt;001&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;require&lt;/span&gt; &lt;span class="s1"&gt;'stringio'&lt;/span&gt;
&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="kp"&gt;true&lt;/span&gt;
&lt;span class="n"&gt;irb&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;main&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;&lt;span class="mo"&gt;002&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;sio&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;StringIO&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"w:UTF-8"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="c1"&gt;#&amp;lt;StringIO:0x29d4a00&amp;gt;&lt;/span&gt;
&lt;span class="n"&gt;irb&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;main&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;&lt;span class="mo"&gt;003&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;sio&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class="s2"&gt;"abc"&lt;/span&gt;
&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="c1"&gt;#&amp;lt;StringIO:0x29d4a00&amp;gt;&lt;/span&gt;
&lt;span class="n"&gt;irb&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;main&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;&lt;span class="mo"&gt;004&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;sio&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;string&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;encoding&lt;/span&gt;
&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="c1"&gt;#&amp;lt;Encoding:CP850&amp;gt;&lt;/span&gt;
&lt;span class="n"&gt;irb&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;main&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;&lt;span class="mo"&gt;005&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="no"&gt;RUBY_VERSION&lt;/span&gt;
&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s2"&gt;"1.9.1"&lt;/span&gt;
&lt;span class="n"&gt;irb&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;main&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;&lt;span class="mo"&gt;006&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;</content>
    <author>
      <name>Julio Fernández</name>
    </author>
  </entry>
  <entry>
    <title>The 16th Comment on "Ruby 1.9's Three Default Encodings"</title>
    <link rel="alternate" href="http://graysoftinc.com/character-encodings/ruby-19s-three-default-encodings#comment_369"/>
    <id>tag:graysoftinc.com,2010-04-04:/comments/369</id>
    <updated>2014-04-18T18:09:30Z</updated>
    <summary>I think `StringIO` was a m17n enhanced a little later in the 1.9 conversion game:

```ruby
&amp;gt;&amp;gt; require &amp;quot;stringio&amp;quot;
=&amp;gt; true
&amp;gt;&amp;gt; sio = StringIO.open(&amp;quot;&amp;quot;, &amp;quot;w:UTF-8&amp;quot;)
=&amp;gt; #&amp;lt;StringIO:0x00000100848ae0&amp;gt;
&amp;gt;&amp;gt; sio &amp;lt;&amp;lt; &amp;quot;abc&amp;quot;
=&amp;gt; #&amp;lt;StringIO:0x00000100848ae0&amp;gt;...</summary>
    <content type="html">&lt;p&gt;I think &lt;code&gt;StringIO&lt;/code&gt; was a m17n enhanced a little later in the 1.9 conversion game:&lt;/p&gt;

&lt;div class="highlight highlight-ruby"&gt;&lt;pre&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;require&lt;/span&gt; &lt;span class="s2"&gt;"stringio"&lt;/span&gt;
&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="kp"&gt;true&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;sio&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;StringIO&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"w:UTF-8"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="c1"&gt;#&amp;lt;StringIO:0x00000100848ae0&amp;gt;&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;sio&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class="s2"&gt;"abc"&lt;/span&gt;
&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="c1"&gt;#&amp;lt;StringIO:0x00000100848ae0&amp;gt;&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;sio&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;string&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;encoding&lt;/span&gt;
&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="c1"&gt;#&amp;lt;Encoding:UTF-8&amp;gt;&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="no"&gt;RUBY_VERSION&lt;/span&gt;
&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s2"&gt;"1.9.2"&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;It doesn't seem to support the &lt;code&gt;Hash&lt;/code&gt;-style arguments in my version though.&lt;/p&gt;</content>
    <author>
      <name>James Edward Gray II</name>
    </author>
  </entry>
  <entry>
    <title>The 15th Comment on "Ruby 1.9's Three Default Encodings"</title>
    <link rel="alternate" href="http://graysoftinc.com/character-encodings/ruby-19s-three-default-encodings#comment_368"/>
    <id>tag:graysoftinc.com,2010-04-04:/comments/368</id>
    <updated>2014-04-18T18:09:30Z</updated>
    <summary>Sorry.

`StringIO.new` not seem to have an option to set `external_encoding` at its constructor; It says `StringIO#string.encoding` is CP850 in mi pc.

I had to set `Encoding.default_external=&amp;quot;UTF-8&amp;quot;` and now `StringIO#string.encoding` is UTF-...</summary>
    <content type="html">&lt;p&gt;Sorry.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;StringIO.new&lt;/code&gt; not seem to have an option to set &lt;code&gt;external_encoding&lt;/code&gt; at its constructor; It says &lt;code&gt;StringIO#string.encoding&lt;/code&gt; is CP850 in mi pc.&lt;/p&gt;

&lt;p&gt;I had to set &lt;code&gt;Encoding.default_external="UTF-8"&lt;/code&gt; and now &lt;code&gt;StringIO#string.encoding&lt;/code&gt; is UTF-8.&lt;/p&gt;</content>
    <author>
      <name>Julio Fernández</name>
    </author>
  </entry>
  <entry>
    <title>The 14th Comment on "Ruby 1.9's Three Default Encodings"</title>
    <link rel="alternate" href="http://graysoftinc.com/character-encodings/ruby-19s-three-default-encodings#comment_364"/>
    <id>tag:graysoftinc.com,2010-03-31:/comments/364</id>
    <updated>2014-04-18T18:05:58Z</updated>
    <summary>Right.  At that point, you will need to call `force_encoding()`, as I was referring to earlier, to set the proper encoding for your data.</summary>
    <content type="html">&lt;p&gt;Right.  At that point, you will need to call &lt;code&gt;force_encoding()&lt;/code&gt;, as I was referring to earlier, to set the proper encoding for your data.&lt;/p&gt;</content>
    <author>
      <name>James Edward Gray II</name>
    </author>
  </entry>
  <entry>
    <title>The 13th Comment on "Ruby 1.9's Three Default Encodings"</title>
    <link rel="alternate" href="http://graysoftinc.com/character-encodings/ruby-19s-three-default-encodings#comment_363"/>
    <id>tag:graysoftinc.com,2010-03-31:/comments/363</id>
    <updated>2014-04-18T18:05:58Z</updated>
    <summary>OK
It returns US-ASCII if all the bytes in the content is &amp;lt; 128
but return ASCII-8BIT if I put any char &amp;gt;128 in the content.



</summary>
    <content type="html">&lt;p&gt;OK&lt;br&gt;
It returns US-ASCII if all the bytes in the content is &amp;lt; 128&lt;br&gt;
but return ASCII-8BIT if I put any char &amp;gt;128 in the content.&lt;/p&gt;</content>
    <author>
      <name>Julio Fernández</name>
    </author>
  </entry>
  <entry>
    <title>The 12th Comment on "Ruby 1.9's Three Default Encodings"</title>
    <link rel="alternate" href="http://graysoftinc.com/character-encodings/ruby-19s-three-default-encodings#comment_362"/>
    <id>tag:graysoftinc.com,2010-03-31:/comments/362</id>
    <updated>2014-04-18T18:05:58Z</updated>
    <summary>Well, it obviously can&amp;#39;t return US-ASCII for all cases.  What does it do if the data contains extended characters, like UTF-8?</summary>
    <content type="html">&lt;p&gt;Well, it obviously can't return US-ASCII for all cases.  What does it do if the data contains extended characters, like UTF-8?&lt;/p&gt;</content>
    <author>
      <name>James Edward Gray II</name>
    </author>
  </entry>
  <entry>
    <title>The 11th Comment on "Ruby 1.9's Three Default Encodings"</title>
    <link rel="alternate" href="http://graysoftinc.com/character-encodings/ruby-19s-three-default-encodings#comment_361"/>
    <id>tag:graysoftinc.com,2010-03-31:/comments/361</id>
    <updated>2014-04-18T18:05:58Z</updated>
    <summary>return  US-ASCII, no `force_encoding`.</summary>
    <content type="html">&lt;p&gt;return  US-ASCII, no &lt;code&gt;force_encoding&lt;/code&gt;.&lt;/p&gt;</content>
    <author>
      <name>Julio Fernández</name>
    </author>
  </entry>
  <entry>
    <title>The 10th Comment on "Ruby 1.9's Three Default Encodings"</title>
    <link rel="alternate" href="http://graysoftinc.com/character-encodings/ruby-19s-three-default-encodings#comment_360"/>
    <id>tag:graysoftinc.com,2010-03-31:/comments/360</id>
    <updated>2014-04-18T18:05:58Z</updated>
    <summary>I believe `Net::HTTP` leaves it to the programmer to manage the conversions.  Thus, it will likely always return content in something like ASCII-8BIT and leave it to you to call `force_encoding()` using information you pull out of the headers, doc...</summary>
    <content type="html">&lt;p&gt;I believe &lt;code&gt;Net::HTTP&lt;/code&gt; leaves it to the programmer to manage the conversions.  Thus, it will likely always return content in something like ASCII-8BIT and leave it to you to call &lt;code&gt;force_encoding()&lt;/code&gt; using information you pull out of the headers, documentation for the service, or whatever.&lt;/p&gt;</content>
    <author>
      <name>James Edward Gray II</name>
    </author>
  </entry>
  <entry>
    <title>The 9th Comment on "Ruby 1.9's Three Default Encodings"</title>
    <link rel="alternate" href="http://graysoftinc.com/character-encodings/ruby-19s-three-default-encodings#comment_359"/>
    <id>tag:graysoftinc.com,2010-03-31:/comments/359</id>
    <updated>2014-04-18T18:05:58Z</updated>
    <summary>What with class `Net::HTTP`?
It take always `#&amp;lt;Encoding:US-ASCII&amp;gt;`</summary>
    <content type="html">&lt;p&gt;What with class &lt;code&gt;Net::HTTP&lt;/code&gt;?&lt;br&gt;
It take always &lt;code&gt;#&amp;lt;Encoding:US-ASCII&amp;gt;&lt;/code&gt;&lt;/p&gt;</content>
    <author>
      <name>Julio Fernández</name>
    </author>
  </entry>
  <entry>
    <title>The 8th Comment on "Ruby 1.9's Three Default Encodings"</title>
    <link rel="alternate" href="http://graysoftinc.com/character-encodings/ruby-19s-three-default-encodings#comment_301"/>
    <id>tag:graysoftinc.com,2009-08-07:/comments/301</id>
    <updated>2014-03-27T01:38:26Z</updated>
    <summary>It&amp;#39;s worth noting, Ruby currently requires that a source `Encoding` be ASCII compatible.</summary>
    <content type="html">&lt;p&gt;It's worth noting, Ruby currently requires that a source &lt;code&gt;Encoding&lt;/code&gt; be ASCII compatible.&lt;/p&gt;</content>
    <author>
      <name>James Edward Gray II</name>
    </author>
  </entry>
  <entry>
    <title>The 7th Comment on "Ruby 1.9's Three Default Encodings"</title>
    <link rel="alternate" href="http://graysoftinc.com/character-encodings/ruby-19s-three-default-encodings#comment_275"/>
    <id>tag:graysoftinc.com,2009-04-17:/comments/275</id>
    <updated>2014-04-18T18:03:58Z</updated>
    <summary>No worries.  My hope is that we are making things better for all by talking this stuff out.
</summary>
    <content type="html">&lt;p&gt;No worries.  My hope is that we are making things better for all by talking this stuff out.&lt;/p&gt;</content>
    <author>
      <name>James Edward Gray II</name>
    </author>
  </entry>
  <entry>
    <title>The 6th Comment on "Ruby 1.9's Three Default Encodings"</title>
    <link rel="alternate" href="http://graysoftinc.com/character-encodings/ruby-19s-three-default-encodings#comment_274"/>
    <id>tag:graysoftinc.com,2009-04-17:/comments/274</id>
    <updated>2014-04-18T18:03:58Z</updated>
    <summary>Thanks for answering such an obvious question, James. I should have read the [RDoc for `IO#new`](http://www.ruby-doc.org/core-2.1.1/IO.html#method-c-new), which clearly describes the API changes.</summary>
    <content type="html">&lt;p&gt;Thanks for answering such an obvious question, James. I should have read the &lt;a href="http://www.ruby-doc.org/core-2.1.1/IO.html#method-c-new"&gt;RDoc for &lt;code&gt;IO#new&lt;/code&gt;&lt;/a&gt;, which clearly describes the API changes.&lt;/p&gt;</content>
    <author>
      <name>Nathan de Vries</name>
    </author>
  </entry>
  <entry>
    <title>The 5th Comment on "Ruby 1.9's Three Default Encodings"</title>
    <link rel="alternate" href="http://graysoftinc.com/character-encodings/ruby-19s-three-default-encodings#comment_272"/>
    <id>tag:graysoftinc.com,2009-04-16:/comments/272</id>
    <updated>2014-04-18T18:03:58Z</updated>
    <summary>It&amp;#39;s easy to use an `Integer` mode with an `Encoding`.  Most `open()`-like methods now take an optional `Hash` of arguments at the end where you can set things like `:mode`, `:external_encoding`, or `:internal_encoding`.  Thus your example could b...</summary>
    <content type="html">&lt;p&gt;It's easy to use an &lt;code&gt;Integer&lt;/code&gt; mode with an &lt;code&gt;Encoding&lt;/code&gt;.  Most &lt;code&gt;open()&lt;/code&gt;-like methods now take an optional &lt;code&gt;Hash&lt;/code&gt; of arguments at the end where you can set things like &lt;code&gt;:mode&lt;/code&gt;, &lt;code&gt;:external_encoding&lt;/code&gt;, or &lt;code&gt;:internal_encoding&lt;/code&gt;.  Thus your example could be written as:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ cat modes_and_encoding.rb 
open( "utf16.txt", File::WRONLY | File::CREAT | File::TRUNC,
                   external_encoding: "UTF-16BE" ) do |f|
  f.puts "Some data."
end
$ ruby modes_and_encoding.rb 
$ ruby -e 'p File.binread("utf16.txt")'
"\x00S\x00o\x00m\x00e\x00 \x00d\x00a\x00t\x00a\x00.\x00\n"
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;I do talk about this &lt;a href="/character-encodings/miscellaneous-m17n-details"&gt;later in the series&lt;/a&gt;.  I just had to spread some of these topics out a bit because there's a lot to cover and the articles where already very long.&lt;/p&gt;</content>
    <author>
      <name>James Edward Gray II</name>
    </author>
  </entry>
  <entry>
    <title>The 4th Comment on "Ruby 1.9's Three Default Encodings"</title>
    <link rel="alternate" href="http://graysoftinc.com/character-encodings/ruby-19s-three-default-encodings#comment_271"/>
    <id>tag:graysoftinc.com,2009-04-16:/comments/271</id>
    <updated>2014-04-18T18:03:58Z</updated>
    <summary>When using `File#open`, is it still possible to specify the file mode using the integer values available through the constants of the File class? How would you represent encoding intentions while specifying a mode of `File::WRONLY | File::CREAT | ...</summary>
    <content type="html">&lt;p&gt;When using &lt;code&gt;File#open&lt;/code&gt;, is it still possible to specify the file mode using the integer values available through the constants of the File class? How would you represent encoding intentions while specifying a mode of &lt;code&gt;File::WRONLY | File::CREAT | File::TRUNC&lt;/code&gt;, for example?&lt;/p&gt;</content>
    <author>
      <name>Nathan de Vries</name>
    </author>
  </entry>
  <entry>
    <title>The 3rd Comment on "Ruby 1.9's Three Default Encodings"</title>
    <link rel="alternate" href="http://graysoftinc.com/character-encodings/ruby-19s-three-default-encodings#comment_268"/>
    <id>tag:graysoftinc.com,2009-04-07:/comments/268</id>
    <updated>2014-04-18T17:59:44Z</updated>
    <summary>I see that both &amp;#39;coding&amp;#39; and &amp;#39;encoding&amp;#39; are valid but since the example shown immediately before that paragraph used &amp;#39;encoding&amp;#39; and I had my finger on the trigger....

My apologies..</summary>
    <content type="html">&lt;p&gt;I see that both 'coding' and 'encoding' are valid but since the example shown immediately before that paragraph used 'encoding' and I had my finger on the trigger....&lt;/p&gt;

&lt;p&gt;My apologies..&lt;/p&gt;</content>
    <author>
      <name>Saimon Moore</name>
    </author>
  </entry>
  <entry>
    <title>The 2nd Comment on "Ruby 1.9's Three Default Encodings"</title>
    <link rel="alternate" href="http://graysoftinc.com/character-encodings/ruby-19s-three-default-encodings#comment_267"/>
    <id>tag:graysoftinc.com,2009-04-07:/comments/267</id>
    <updated>2014-04-18T17:59:44Z</updated>
    <summary>James,

A possible typo:

&amp;quot;If the first line of your code is a comment that includes the word &amp;#39;coding&amp;#39; &amp;lt;== (shouldn&amp;#39;t this be &amp;#39;encoding&amp;#39;), followed by a colon and space&amp;quot;</summary>
    <content type="html">&lt;p&gt;James,&lt;/p&gt;

&lt;p&gt;A possible typo:&lt;/p&gt;

&lt;p&gt;"If the first line of your code is a comment that includes the word 'coding' &amp;lt;== (shouldn't this be 'encoding'), followed by a colon and space"&lt;/p&gt;</content>
    <author>
      <name>Saimon Moore</name>
    </author>
  </entry>
  <entry>
    <title>The 1st Comment on "Ruby 1.9's Three Default Encodings"</title>
    <link rel="alternate" href="http://graysoftinc.com/character-encodings/ruby-19s-three-default-encodings#comment_266"/>
    <id>tag:graysoftinc.com,2009-04-06:/comments/266</id>
    <updated>2014-04-18T17:59:08Z</updated>
    <summary>It&amp;#39;s probably worth noting that using the default `Encoding` setters will trigger warnings:

```
$ ruby -we &amp;#39;Encoding.default_internal = Encoding.default_external = &amp;quot;UTF-8&amp;quot;&amp;#39;
-e:1: warning: setting Encoding.default_external
-e:1: warning: sett...</summary>
    <content type="html">&lt;p&gt;It's probably worth noting that using the default &lt;code&gt;Encoding&lt;/code&gt; setters will trigger warnings:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ ruby -we 'Encoding.default_internal = Encoding.default_external = "UTF-8"'
-e:1: warning: setting Encoding.default_external
-e:1: warning: setting Encoding.default_internal
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;That makes sense, as it's really too late to set these in code after &lt;code&gt;IO&lt;/code&gt; objects may have already been created.&lt;/p&gt;</content>
    <author>
      <name>James Edward Gray II</name>
    </author>
  </entry>
  <entry>
    <title>Ruby 1.9's Three Default Encodings</title>
    <link rel="alternate" href="http://graysoftinc.com/character-encodings/ruby-19s-three-default-encodings"/>
    <id>tag:graysoftinc.com,2009-04-05:/posts/81</id>
    <updated>2014-04-18T18:40:50Z</updated>
    <summary>Now that we've covered String, we need to talk about how String's get their initial Encoding.</summary>
    <content type="html">&lt;p&gt;I suspect early contact with the new m17n (multilingualization) engine is going to come to Rubyists in the form of this error message:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;invalid multibyte char (US-ASCII)
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Ruby 1.8 didn't care what you stuck in a random &lt;code&gt;String&lt;/code&gt; literal, but 1.9 is a touch pickier.  I think you'll see that the change is for the better, but we do need to spend some time learning to play by Ruby's new rules.&lt;/p&gt;

&lt;p&gt;That takes us to the first of Ruby's three default &lt;code&gt;Encoding&lt;/code&gt;s.&lt;/p&gt;

&lt;h4&gt;The Source Encoding&lt;/h4&gt;

&lt;p&gt;In Ruby's new grown up world of all encoded data, each and every &lt;code&gt;String&lt;/code&gt; needs an &lt;code&gt;Encoding&lt;/code&gt;.  That means an &lt;code&gt;Encoding&lt;/code&gt; must be selected for a &lt;code&gt;String&lt;/code&gt; as soon as it is created.  One way that a &lt;code&gt;String&lt;/code&gt; can be created is for Ruby to execute some code with a &lt;code&gt;String&lt;/code&gt; literal in it, like this:&lt;/p&gt;

&lt;div class="highlight highlight-ruby"&gt;&lt;pre&gt;&lt;span class="n"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"A new String"&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;That's a pretty simple &lt;code&gt;String&lt;/code&gt;, but what if I use a literal like the following instead?&lt;/p&gt;

&lt;div class="highlight highlight-ruby"&gt;&lt;pre&gt;&lt;span class="n"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Résumé"&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;What &lt;code&gt;Encoding&lt;/code&gt; is that in?  That fundamental question is probably the main reason we all struggle a bit with character encodings.  You can't tell just from looking at that data what &lt;code&gt;Encoding&lt;/code&gt; it is in.  Now, if I showed you the bytes you may be able to make an educated guess, but the data just isn't wearing an &lt;code&gt;Encoding&lt;/code&gt; name tag.&lt;/p&gt;

&lt;p&gt;That's true of a frightening lot of data we deal with every day.  A plain text file doesn't generally say what &lt;code&gt;Encoding&lt;/code&gt; the data inside is in.  When you think about that, it's a miracle we can successfully read a lot of things.&lt;/p&gt;

&lt;p&gt;When we're talking about program code, the problem gets worse.  I may want to write my code in UTF-8, but some Japanese programmer may want to write his code in Shift JIS.  Ruby should support that and, in fact, 1.9 does.  Let's complicate things a bit more though:  imagine that I bundle up that UTF-8 code I wrote in a gem and the Japanese programmer later uses it to help with his Shift JIS code.  How do we make that work seamlessly?&lt;/p&gt;

&lt;p&gt;The Ruby 1.8 strategy of one global variable won't survive a test like this, so it was time to switch strategies.  Ruby 1.9's answer to this problem is the source &lt;code&gt;Encoding&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;All Ruby source code now has some &lt;code&gt;Encoding&lt;/code&gt;.  When you create a &lt;code&gt;String&lt;/code&gt; literal in your code, it is assigned the &lt;code&gt;Encoding&lt;/code&gt; of your source.  That simple rule solves all the problems I just described pretty nicely.  As long my source &lt;code&gt;Encoding&lt;/code&gt; is UTF-8 and the Japanese programmer's source &lt;code&gt;Encoding&lt;/code&gt; is Shift JIS, my literals will work as I expect and his will work as he expects.  Obviously if we share any data, we will need to establish some rules about our shared formats using documentation or code that can adapt to different &lt;code&gt;Encoding&lt;/code&gt;s, but we should have been doing that all along anyway.&lt;/p&gt;

&lt;p&gt;Thus the only question becomes, what's my source &lt;code&gt;Encoding&lt;/code&gt; and how do I change it?&lt;/p&gt;

&lt;p&gt;There are a few different ways Ruby can select a source &lt;code&gt;Encoding&lt;/code&gt;.  Here are the options:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ cat no_encoding.rb 
p __ENCODING__
$ ruby no_encoding.rb 
#&amp;lt;Encoding:US-ASCII&amp;gt;

$ cat magic_comment.rb 
# encoding: UTF-8
p __ENCODING__
$ ruby magic_comment.rb 
#&amp;lt;Encoding:UTF-8&amp;gt;
$ cat magic_comment2.rb 
#!/usr/bin/env ruby -w
# encoding: UTF-8
p __ENCODING__
$ ruby magic_comment2.rb 
#&amp;lt;Encoding:UTF-8&amp;gt;

$ echo $LC_CTYPE
en_US.UTF-8
$ ruby -e 'p __ENCODING__'
#&amp;lt;Encoding:UTF-8&amp;gt;

$ ruby -KU no_encoding.rb 
#&amp;lt;Encoding:UTF-8&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The first example shows us two important things.  The first is the main rule of source &lt;code&gt;Encoding&lt;/code&gt;s:  source files receive a US-ASCII &lt;code&gt;Encoding&lt;/code&gt;, unless you say otherwise.  &lt;em&gt;[&lt;strong&gt;Update&lt;/strong&gt;:  this was changed to UTF-8 in Ruby 2.0 and up.]&lt;/em&gt;  This is where I expect programmers to run into the error I mentioned earlier.  If you place any non-ASCII content in a &lt;code&gt;String&lt;/code&gt; literal without changing the source &lt;code&gt;Encoding&lt;/code&gt;, Ruby will die with that error.  Thus you need to change the source &lt;code&gt;Encoding&lt;/code&gt; to work with any non-ASCII data.  The second thing we see here is the new &lt;code&gt;__ENCODING__&lt;/code&gt; keyword that can be used to get the source &lt;code&gt;Encoding&lt;/code&gt; that's active where it is executed.&lt;/p&gt;

&lt;p&gt;The second example shows the preferred way to set your source &lt;code&gt;Encoding&lt;/code&gt; and it's called a magic comment.  If the first line of your code is a comment that includes the word &lt;code&gt;coding&lt;/code&gt;, followed by a colon and space, and then an &lt;code&gt;Encoding&lt;/code&gt; name, the source &lt;code&gt;Encoding&lt;/code&gt; for that file is changed to the indicated &lt;code&gt;Encoding&lt;/code&gt;.  If your code has a shebang line, the magic comment must come on the second line, with no spacing between them.  Once set, all &lt;code&gt;String&lt;/code&gt; literals you create in that file will have that &lt;code&gt;Encoding&lt;/code&gt; attached to them.&lt;/p&gt;

&lt;p&gt;The third example shows an exception to the rule for your convenience.  When you feed Ruby some code on the command-line using the &lt;code&gt;-e&lt;/code&gt; switch, it gets a source &lt;code&gt;Encoding&lt;/code&gt; from your environment.  I have UTF-8 set in the &lt;code&gt;LC_CTYPE&lt;/code&gt; environment variable, but some people also use the &lt;code&gt;LANG&lt;/code&gt; variable for this.  This makes scripting easier since Ruby will (hopefully) match the &lt;code&gt;Encoding&lt;/code&gt; of any other commands you chain together.&lt;/p&gt;

&lt;p&gt;The fourth example is another interesting exception to the rule.  Ruby 1.9 still supports the &lt;code&gt;-K*&lt;/code&gt; style switches from Ruby 1.8 including the &lt;a href="http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-core/19552"&gt;&lt;code&gt;-KU&lt;/code&gt; switch&lt;/a&gt; I've recommended so heavily in this series.  These switches have a couple of effects, but of particular note they are the only non-magic comment way to modify the source &lt;code&gt;Encoding&lt;/code&gt;.  This is good news for backwards compatibility, because some Ruby 1.8 code may be able to run on Ruby 1.9 without &lt;code&gt;Encoding&lt;/code&gt; problems thanks to this.  I must stress that this is just for backwards compatibility though, and magic comments are the future.&lt;/p&gt;

&lt;p&gt;With magic comments the code will include its &lt;code&gt;Encoding&lt;/code&gt; data.  It will probably seem a little tedious to add them to all your source files at first, but it's really not that big of a change.  In the past, I've recommended we stick the following shebang line at the top of our files:&lt;/p&gt;

&lt;div class="highlight highlight-ruby"&gt;&lt;pre&gt;&lt;span class="c1"&gt;#!/usr/bin/env ruby -wKU&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Now, for Ruby 1.9, I'm recommending we switch to something like this:&lt;/p&gt;

&lt;div class="highlight highlight-ruby"&gt;&lt;pre&gt;&lt;span class="c1"&gt;#!/usr/bin/env ruby -w&lt;/span&gt;
&lt;span class="c1"&gt;# encoding: UTF-8&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Note that the magic comment format rules are pretty loose and all of following examples would work the same:&lt;/p&gt;

&lt;div class="highlight highlight-ruby"&gt;&lt;pre&gt;&lt;span class="c1"&gt;# encoding: UTF-8&lt;/span&gt;

&lt;span class="c1"&gt;# coding: UTF-8&lt;/span&gt;

&lt;span class="c1"&gt;# -*- coding: UTF-8 -*-&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This is nice for support in some text editors that also read such comments.&lt;/p&gt;

&lt;p&gt;If we all get into that habit of adding magic comments, our code can work together regardless of the various &lt;code&gt;Encoding&lt;/code&gt;s we personally favor.  Ruby will know how to handle each separate file.  As an added bonus, we programmers also get to see these comments and know more about the code we are working with.  That makes it a good habit to get into, I think.&lt;/p&gt;

&lt;h4&gt;The Default External and Internal Encodings&lt;/h4&gt;

&lt;p&gt;There's another way &lt;code&gt;String&lt;/code&gt;s are commonly created and that's by reading from some &lt;code&gt;IO&lt;/code&gt; object.  It doesn't make sense to give those &lt;code&gt;String&lt;/code&gt;s the source &lt;code&gt;Encoding&lt;/code&gt; because the external data doesn't have to be related to your source code.  Also, you really need to know how data is encoded to read it correctly.  Even a simple concept like reading the next line of data changes if you are talking about UTF-8 or UTF-16LE (the LE stands for a &lt;a href="http://en.wikipedia.org/wiki/Endianness"&gt;Little Endian byte order&lt;/a&gt;) data.  Thus, it makes sense for &lt;code&gt;IO&lt;/code&gt; objects to have at least one &lt;code&gt;Encoding&lt;/code&gt; attached to them.  Ruby 1.9 is generous and gives them two:  the external &lt;code&gt;Encoding&lt;/code&gt; and the internal &lt;code&gt;Encoding&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The external &lt;code&gt;Encoding&lt;/code&gt; is the &lt;code&gt;Encoding&lt;/code&gt; the data is in inside the &lt;code&gt;IO&lt;/code&gt; object.  That affects how data will be read and this is the &lt;code&gt;Encoding&lt;/code&gt; data will be returned in as long as the internal &lt;code&gt;Encoding&lt;/code&gt; isn't set (more on that in a bit).  Let's look at an example of how this plays out in practice:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ cat show_external.rb 
open(__FILE__, "r:UTF-8") do |file|
  puts file.external_encoding.name
  p    file.internal_encoding
  file.each do |line|
    p [line.encoding.name, line]
  end
end
$ ruby show_external.rb 
UTF-8
nil
["UTF-8", "open(__FILE__, \"r:UTF-8\") do |file|\n"]
["UTF-8", "  puts file.external_encoding.name\n"]
["UTF-8", "  p    file.internal_encoding\n"]
["UTF-8", "  file.each do |line|\n"]
["UTF-8", "    p [line.encoding.name, line]\n"]
["UTF-8", "  end\n"]
["UTF-8", "end\n"]
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;There are four things to notice in this example:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;I set the external &lt;code&gt;Encoding&lt;/code&gt; by tacking &lt;code&gt;:UTF-8&lt;/code&gt; onto the end of my mode &lt;code&gt;String&lt;/code&gt; when I opened the &lt;code&gt;File&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;You can use &lt;code&gt;external_encoding()&lt;/code&gt; to check the external &lt;code&gt;Encoding&lt;/code&gt; as I have here&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;internal_encoding()&lt;/code&gt; works the same for the internal &lt;code&gt;Encoding&lt;/code&gt;, which will be &lt;code&gt;nil&lt;/code&gt; unless you explicitly set it&lt;/li&gt;
&lt;li&gt;Note how each &lt;code&gt;String&lt;/code&gt; created as I read the data is given the &lt;code&gt;external_encoding()&lt;/code&gt;
&lt;/li&gt;
&lt;/ol&gt;&lt;p&gt;The internal &lt;code&gt;Encoding&lt;/code&gt; just adds one more twist.  When set, data will still be read in the external &lt;code&gt;Encoding&lt;/code&gt;, but transcoded to the internal &lt;code&gt;Encoding&lt;/code&gt; as the &lt;code&gt;String&lt;/code&gt; is created.  It's a convenience for you as the programmer.  Watch how that changes things:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ cat show_internal.rb 
open(__FILE__, "r:UTF-8:UTF-16LE") do |file|
  puts file.external_encoding.name
  puts file.internal_encoding.name
  file.each do |line|
    p [line.encoding.name, line[0..3]]
  end
end
$ ruby show_internal.rb 
UTF-8
UTF-16LE
["UTF-16LE", "o\x00p\x00e\x00n\x00"]
["UTF-16LE", " \x00 \x00p\x00u\x00"]
["UTF-16LE", " \x00 \x00p\x00u\x00"]
["UTF-16LE", " \x00 \x00f\x00i\x00"]
["UTF-16LE", " \x00 \x00 \x00 \x00"]
["UTF-16LE", " \x00 \x00e\x00n\x00"]
["UTF-16LE", "e\x00n\x00d\x00\n\x00"]
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;There are a couple differences here:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;A second added &lt;code&gt;Encoding&lt;/code&gt; on the mode &lt;code&gt;String&lt;/code&gt; (my &lt;code&gt;:UTF-16LE&lt;/code&gt; in this example) sets the &lt;code&gt;internal_encoding()&lt;/code&gt; as I show with the second &lt;code&gt;puts()&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;This little change gets Ruby to translate all of the data for me (I just shortened the output because UTF-16LE is noisy)&lt;/li&gt;
&lt;/ol&gt;&lt;p&gt;The external &lt;code&gt;Encoding&lt;/code&gt; works the same when writing.  It still represents the &lt;code&gt;Encoding&lt;/code&gt; in the &lt;code&gt;IO&lt;/code&gt; object, or the &lt;code&gt;Encoding&lt;/code&gt; data is going to.  However, you don't need to specify an internal &lt;code&gt;Encoding&lt;/code&gt; when writing.  Ruby will automatically use the &lt;code&gt;Encoding&lt;/code&gt; of a &lt;code&gt;String&lt;/code&gt; you output as the internal &lt;code&gt;Encoding&lt;/code&gt; and transcode as needed to reach the external &lt;code&gt;Encoding&lt;/code&gt;.  For example:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ cat write_internal.rb 
# encoding: UTF-8
open("data.txt", "w:UTF-16LE") do |file|
  puts file.external_encoding.name
  p    file.internal_encoding
  data = "My data…"
  p [data.encoding.name, data]
  file &amp;lt;&amp;lt; data
end
p File.read("data.txt")
$ ruby write_internal.rb 
UTF-16LE
nil
["UTF-8", "My data…"]
"M\x00y\x00 \x00d\x00a\x00t\x00a\x00&amp;amp; "
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Note how my data was transcoded before it was written even though the &lt;code&gt;internal_encoding()&lt;/code&gt; was &lt;code&gt;nil&lt;/code&gt;.  Ruby used the &lt;code&gt;String&lt;/code&gt;'s &lt;code&gt;Encoding&lt;/code&gt; to decide what was needed.&lt;/p&gt;

&lt;p&gt;Both of those &lt;code&gt;IO&lt;/code&gt; &lt;code&gt;Encoding&lt;/code&gt;s should be pretty straight forward.  The only question left about them is:  what happens if you don't set them?  The answer is that the &lt;code&gt;IO&lt;/code&gt; inherits the default external &lt;code&gt;Encoding&lt;/code&gt; and/or the default internal &lt;code&gt;Encoding&lt;/code&gt; whenever one isn't set.  Now we need to know how Ruby chooses those defaults.&lt;/p&gt;

&lt;p&gt;The default external &lt;code&gt;Encoding&lt;/code&gt; is pulled from your environment, much like the source &lt;code&gt;Encoding&lt;/code&gt; is for code given on the command-line.  Have a look:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ echo $LC_CTYPE
en_US.UTF-8
$ ruby -e 'puts Encoding.default_external.name'
UTF-8
$ LC_CTYPE=ja_JP.sjis ruby -e 'puts Encoding.default_external.name'
Shift_JIS
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The default internal &lt;code&gt;Encoding&lt;/code&gt; is simply &lt;code&gt;nil&lt;/code&gt;.  You must actively change it to get anything else.&lt;/p&gt;

&lt;p&gt;Both default &lt;code&gt;IO&lt;/code&gt; &lt;code&gt;Encoding&lt;/code&gt;s have a global setter:  &lt;code&gt;Encoding.default_external=()&lt;/code&gt; and &lt;code&gt;Encoding.default_internal=()&lt;/code&gt;.  You can set them to an &lt;code&gt;Encoding&lt;/code&gt; object or just the &lt;code&gt;String&lt;/code&gt; name of an &lt;code&gt;Encoding&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;You can also change these default &lt;code&gt;Encoding&lt;/code&gt;s using some command-line switches.  The new &lt;code&gt;-E&lt;/code&gt; switch can be used to set one or both of the &lt;code&gt;IO&lt;/code&gt; &lt;code&gt;Encoding&lt;/code&gt;s:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ ruby -e 'p [Encoding.default_external, Encoding.default_internal]'
[#&amp;lt;Encoding:UTF-8&amp;gt;, nil]
$ ruby -E Shift_JIS \
&amp;gt; -e 'p [Encoding.default_external, Encoding.default_internal]'
[#&amp;lt;Encoding:Shift_JIS&amp;gt;, nil]
$ ruby -E :UTF-16LE \
&amp;gt; -e 'p [Encoding.default_external, Encoding.default_internal]'
[#&amp;lt;Encoding:UTF-8&amp;gt;, #&amp;lt;Encoding:UTF-16LE&amp;gt;]
$ ruby -E Shift_JIS:UTF-16LE \
&amp;gt; -e 'p [Encoding.default_external, Encoding.default_internal]'
[#&amp;lt;Encoding:Shift_JIS&amp;gt;, #&amp;lt;Encoding:UTF-16LE&amp;gt;]
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;As you can see, the argument for this switch is just like what you would append to a mode &lt;code&gt;String&lt;/code&gt; in a call to &lt;code&gt;File.open()&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;There's one more command-line switch shortcut for those of us who prefer to just use UTF-8 everywhere.  The new &lt;code&gt;-U&lt;/code&gt; switch sets &lt;code&gt;Encoding.default_internal()&lt;/code&gt; to UTF-8.  Using that, you can just set the external &lt;code&gt;Encoding&lt;/code&gt; for your &lt;code&gt;IO&lt;/code&gt; objects, or let it default from your environment, and all &lt;code&gt;String&lt;/code&gt;s you read will be transcoded to the preferred UTF-8.&lt;/p&gt;

&lt;p&gt;Probably the most important thing to note about &lt;code&gt;Encoding.default_external()&lt;/code&gt; and &lt;code&gt;Encoding.default_internal()&lt;/code&gt; is that you should really just treat them as shortcuts for your own scripting.  Pulling &lt;code&gt;Encoding&lt;/code&gt;s from the environment or command-line switches can be handy when you're in control of where the code runs, but you're going to need to be more explicit for code you intend for others to run.  When in doubt, set the external and internal &lt;code&gt;Encoding&lt;/code&gt;s the way you want them for each &lt;code&gt;IO&lt;/code&gt; object.  It's a bit more tedious, but also safer in that it won't mysteriously be changed by some outside force.  Also remember that the defaults are global settings affecting all loaded code, including any libraries you &lt;code&gt;require()&lt;/code&gt;.  That can be a boon or bane, so just remember to factor it into your thinking when you're wondering, "Where does this &lt;code&gt;String&lt;/code&gt; get its &lt;code&gt;Encoding&lt;/code&gt; from?"&lt;/p&gt;</content>
    <author>
      <name>James Edward Gray II</name>
    </author>
  </entry>
</feed>
