162 lines
11 KiB
HTML
162 lines
11 KiB
HTML
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
|
|
<html xmlns="http://www.w3.org/1999/xhtml">
|
|
<head>
|
|
<meta http-equiv="Content-Type" content="text/xhtml;charset=UTF-8"/>
|
|
<meta http-equiv="X-UA-Compatible" content="IE=9"/>
|
|
<meta name="generator" content="Doxygen 1.8.6"/>
|
|
<title>Boost.Locale: Character Set Conversions</title>
|
|
<link href="tabs.css" rel="stylesheet" type="text/css"/>
|
|
<script type="text/javascript" src="jquery.js"></script>
|
|
<script type="text/javascript" src="dynsections.js"></script>
|
|
<link href="navtree.css" rel="stylesheet" type="text/css"/>
|
|
<script type="text/javascript" src="resize.js"></script>
|
|
<script type="text/javascript" src="navtree.js"></script>
|
|
<script type="text/javascript">
|
|
$(document).ready(initResizable);
|
|
$(window).load(resizeHeight);
|
|
</script>
|
|
<link href="doxygen.css" rel="stylesheet" type="text/css" />
|
|
</head>
|
|
<body>
|
|
<div id="top"><!-- do not remove this div, it is closed by doxygen! -->
|
|
<div id="titlearea">
|
|
<table cellspacing="0" cellpadding="0">
|
|
<tbody>
|
|
<tr style="height: 56px;">
|
|
<td id="projectlogo"><img alt="Logo" src="boost-small.png"/></td>
|
|
<td style="padding-left: 0.5em;">
|
|
<div id="projectname">Boost.Locale
|
|
</div>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
<!-- end header part -->
|
|
<!-- Generated by Doxygen 1.8.6 -->
|
|
<div id="navrow1" class="tabs">
|
|
<ul class="tablist">
|
|
<li><a href="index.html"><span>Main Page</span></a></li>
|
|
<li class="current"><a href="pages.html"><span>Related Pages</span></a></li>
|
|
<li><a href="modules.html"><span>Modules</span></a></li>
|
|
<li><a href="namespaces.html"><span>Namespaces</span></a></li>
|
|
<li><a href="annotated.html"><span>Classes</span></a></li>
|
|
<li><a href="files.html"><span>Files</span></a></li>
|
|
<li><a href="examples.html"><span>Examples</span></a></li>
|
|
</ul>
|
|
</div>
|
|
</div><!-- top -->
|
|
<div id="side-nav" class="ui-resizable side-nav-resizable">
|
|
<div id="nav-tree">
|
|
<div id="nav-tree-contents">
|
|
<div id="nav-sync" class="sync"></div>
|
|
</div>
|
|
</div>
|
|
<div id="splitbar" style="-moz-user-select:none;"
|
|
class="ui-resizable-handle">
|
|
</div>
|
|
</div>
|
|
<script type="text/javascript">
|
|
$(document).ready(function(){initNavTree('charset_handling.html','');});
|
|
</script>
|
|
<div id="doc-content">
|
|
<div class="header">
|
|
<div class="headertitle">
|
|
<div class="title">Character Set Conversions </div> </div>
|
|
</div><!--header-->
|
|
<div class="contents">
|
|
<div class="textblock"><h1><a class="anchor" id="codecvt"></a>
|
|
Convenience Interface</h1>
|
|
<p>Boost.Locale provides <a class="el" href="group__codepage.html#ga2ca59a735ca28c9d5103e37ef2373ca1">to_utf</a>, <a class="el" href="group__codepage.html#gaef8fb7771dce60511d081770547a4139">from_utf</a> and <a class="el" href="group__codepage.html#gaf0ad39959911b000706e0538ec059d44">utf_to_utf</a> functions in the <code><a class="el" href="namespaceboost_1_1locale_1_1conv.html" title="Namespace that contains all functions related to character set conversion. ">boost::locale::conv</a></code> namespace. They are simple and convenient functions to convert a string to and from UTF-8/16/32 strings and strings using other encodings.</p>
|
|
<p>For example:</p>
|
|
<div class="fragment"><div class="line">std::string utf8_string = to_utf<char>(latin1_string,<span class="stringliteral">"Latin1"</span>);</div>
|
|
<div class="line">std::wstring wide_string = to_utf<wchar_t>(latin1_string,<span class="stringliteral">"Latin1"</span>);</div>
|
|
<div class="line">std::string latin1_string = <a class="code" href="group__codepage.html#gaef8fb7771dce60511d081770547a4139">from_utf</a>(wide_string,<span class="stringliteral">"Latin1"</span>);</div>
|
|
<div class="line">std::string utf8_string2 = utf_to_utf<char>(wide_string);</div>
|
|
</div><!-- fragment --><p>This function may use an explicit encoding name like "Latin1" or "ISO-8859-8", or use std::locale as a parameter to fetch this information from it. It also receives a policy parameter that tells it how to behave if the conversion can't be performed (i.e. an illegal or unsupported character is found). By default this function skips all illegal characters and tries to do the best it can, however, it is possible ask it to throw a <a class="el" href="classboost_1_1locale_1_1conv_1_1conversion__error.html">conversion_error</a> exception by passing the <code>stop</code> flag to it:</p>
|
|
<div class="fragment"><div class="line">std::wstring s=to_utf<wchar_t>(<span class="stringliteral">"\xFF\xFF"</span>,<span class="stringliteral">"UTF-8"</span>,<a class="code" href="group__codepage.html#gga8e3c5a274f57107ec5745e227c26ba84aab08f9ee241c405ef40bd3cedb43b383">stop</a>); </div>
|
|
<div class="line"><span class="comment">// Throws because this string is illegal in UTF-8</span></div>
|
|
</div><!-- fragment --><h1><a class="anchor" id="codecvt_codecvt"></a>
|
|
std::codecvt facet</h1>
|
|
<p>Boost.Locale provides stream codepage conversion facets based on the <code>std::codecvt</code> facet. This allows conversion between wide-character encodings and 8-bit encodings like UTF-8, ISO-8859 or Shift-JIS.</p>
|
|
<p>Most of compilers provide such facets, but:</p>
|
|
<ul>
|
|
<li>Under Windows MSVC does not support UTF-8 encodings at all.</li>
|
|
<li>Under Linux the encodings are supported only if the required locales are generated. For example it may be impossible to create a <code>he_IL.CP1255</code> locale even when the <code>he_IL</code> locale is available.</li>
|
|
</ul>
|
|
<p>Thus Boost.Locale provides an option to generate code-page conversion facets for use with Boost.Iostreams filters or <code>std::wfstream</code>. For example:</p>
|
|
<div class="fragment"><div class="line">std::locale loc= generator().generate(<span class="stringliteral">"he_IL.UTF-8"</span>);</div>
|
|
<div class="line">std::wofstream file.</div>
|
|
<div class="line">file.imbue(loc);</div>
|
|
<div class="line">file.open(<span class="stringliteral">"hello.txt"</span>);</div>
|
|
<div class="line">file << L<span class="stringliteral">"שלום!"</span> << endl;</div>
|
|
</div><!-- fragment --><p>Would create a file <code>hello.txt</code> encoded as UTF-8 with "שלום!" (shalom) in it.</p>
|
|
<h1><a class="anchor" id="codecvt_iostreams_integration"></a>
|
|
Integration with Boost.Iostreams</h1>
|
|
<p>You can use the <code>std::codecvt</code> facet directly, but this is quite tricky and requires accurate buffer and error management.</p>
|
|
<p>You can use the <code>boost::iostreams::code_converter</code> class for stream-oriented conversions between the wide-character set and narrow locale character set.</p>
|
|
<p>This is a sample program that converts wide to narrow characters for an arbitrary stream:</p>
|
|
<div class="fragment"><div class="line"><span class="preprocessor">#include <boost/iostreams/stream.hpp></span></div>
|
|
<div class="line"><span class="preprocessor">#include <boost/iostreams/categories.hpp></span> </div>
|
|
<div class="line"><span class="preprocessor">#include <boost/iostreams/code_converter.hpp></span></div>
|
|
<div class="line"></div>
|
|
<div class="line"><span class="preprocessor">#include <boost/locale.hpp></span></div>
|
|
<div class="line"><span class="preprocessor">#include <iostream></span></div>
|
|
<div class="line"></div>
|
|
<div class="line"><span class="keyword">namespace </span>io = boost::iostreams;</div>
|
|
<div class="line"></div>
|
|
<div class="line"><span class="comment">// Device that consumes the converted text,</span></div>
|
|
<div class="line"><span class="comment">// In our case it just writes to standard output</span></div>
|
|
<div class="line"><span class="keyword">class </span>consumer {</div>
|
|
<div class="line"><span class="keyword">public</span>:</div>
|
|
<div class="line"> <span class="keyword">typedef</span> <span class="keywordtype">char</span> char_type;</div>
|
|
<div class="line"> <span class="keyword">typedef</span> io::sink_tag category;</div>
|
|
<div class="line"> std::streamsize write(<span class="keyword">const</span> <span class="keywordtype">char</span>* s, std::streamsize n)</div>
|
|
<div class="line"> {</div>
|
|
<div class="line"> std::cout.write(s,n);</div>
|
|
<div class="line"> <span class="keywordflow">return</span> n;</div>
|
|
<div class="line"> }</div>
|
|
<div class="line">};</div>
|
|
<div class="line"></div>
|
|
<div class="line"></div>
|
|
<div class="line"><span class="keywordtype">int</span> main()</div>
|
|
<div class="line">{ </div>
|
|
<div class="line"> <span class="comment">// the device that converts wide characters</span></div>
|
|
<div class="line"> <span class="comment">// to narrow</span></div>
|
|
<div class="line"> <span class="keyword">typedef</span> io::code_converter<consumer> converter_device;</div>
|
|
<div class="line"> <span class="comment">// the stream that uses this device</span></div>
|
|
<div class="line"> <span class="keyword">typedef</span> io::stream<converter_device> converter_stream;</div>
|
|
<div class="line"></div>
|
|
<div class="line"></div>
|
|
<div class="line"> consumer cons;</div>
|
|
<div class="line"> <span class="comment">// setup out converter to work</span></div>
|
|
<div class="line"> <span class="comment">// with he_IL.UTF-8 locale </span></div>
|
|
<div class="line"> converter_device dev;</div>
|
|
<div class="line"> <a class="code" href="classboost_1_1locale_1_1generator.html">boost::locale::generator</a> gen;</div>
|
|
<div class="line"> dev.imbue(gen(<span class="stringliteral">"he_IL.UTF-8"</span>));</div>
|
|
<div class="line"> dev.open(cons);</div>
|
|
<div class="line"> converter_stream stream;</div>
|
|
<div class="line"> stream.open(dev);</div>
|
|
<div class="line"> <span class="comment">// Now wide characters that are written</span></div>
|
|
<div class="line"> <span class="comment">// to the stream would be given to</span></div>
|
|
<div class="line"> <span class="comment">// our consumer as narrow characters </span></div>
|
|
<div class="line"> <span class="comment">// in UTF-8 encoding</span></div>
|
|
<div class="line"> stream << L<span class="stringliteral">"שלום"</span> << std::flush;</div>
|
|
<div class="line">}</div>
|
|
</div><!-- fragment --><h1><a class="anchor" id="codecvt_limitations"></a>
|
|
Limitations of std::codecvt</h1>
|
|
<p>The Standard does not provide any information about <code>std::mbstate_t</code> that could be used to save intermediate code-page conversion states. It leaves the definition up to the compiler implementation, making it impossible to reimplement <code>std::codecvt<wchar_t,char,mbstate_t></code> for stateful encodings. Thus, Boost.Locale's <code>codecvt</code> facet implementation may be used with stateless encodings like UTF-8, ISO-8859, and Shift-JIS, but not with stateful encodings like UTF-7 or SCSU.</p>
|
|
<p><b>Recommendation:</b> Prefer the Unicode UTF-8 encoding for <code>char</code> based strings and files in your application.</p>
|
|
<dl class="section note"><dt>Note</dt><dd></dd></dl>
|
|
<p>The implementation of codecvt for single byte encodings like ISO-8859-X and for UTF-8 is very efficent and would allow fast conversion of the content, however its performance may be sub-optimal for double-width encodings like Shift-JIS, due to the stateless problem described above. </p>
|
|
</div></div><!-- contents -->
|
|
</div><!-- doc-content -->
|
|
|
|
<li class="footer">
|
|
© Copyright 2009-2012 Artyom Beilis, Distributed under the <a href="http://www.boost.org/LICENSE_1_0.txt">Boost Software License</a>, Version 1.0.
|
|
</li>
|
|
</ul>
|
|
</div>
|
|
</body>
|
|
</html>
|