Encoding            package:base            R Documentation(latin1)

_R_e_a_d _o_r _S_e_t _t_h_e _D_e_c_l_a_r_e_d _E_n_c_o_d_i_n_g_s _f_o_r _a _C_h_a_r_a_c_t_e_r _V_e_c_t_o_r

_D_e_s_c_r_i_p_t_i_o_n:

     Read or set the declared encodings for a character vector.

_U_s_a_g_e:

     Encoding(x)

     Encoding(x) <- value

_A_r_g_u_m_e_n_t_s:

       x: A character vector.

   value: A character vector of positive length.

_D_e_t_a_i_l_s:

     Character strings in R can be declared to be in '"latin1"' or
     '"UTF-8"'.  These declarations can be read by 'Encoding', which
     will return a character vector of values '"latin1"', '"UTF-8"' or
     '"unknown"', or set, when 'value' is recycled as needed and other
     values are silently treated as '"unknown"'.

     There are other ways for character strings to acquire a declared
     encoding apart from explicitly setting it.  Functions 'scan',
     'read.table', 'readLines', 'parse' and  'source' have an
     'encoding' argument that is used to declare encodings, 'iconv'
     declares encodings from its 'from' argument, and console input in
     suitable locales is also declared.  'intToUtf8' declares its
     output as '"UTF-8"', and output text connections are marked if
     running it a suitable locale.

     Most character manipulation functions will set the encoding on
     output strings if it was declared on the corresponding input. 
     These include 'chartr', 'strsplit', 'strtrim', 'substr', 'tolower'
     and 'toupper' as well as 'sub(useBytes = FALSE)' and
     'gsub(useBytes = FALSE)'.  (Also, under some circumstances 'paste'
     will set an encoding.)   Note that such functions do not
     _preserve_ the encoding, but if they know the input encoding and
     that the string has been successfully re-encoded to the current
     encoding, they mark the output with the latter (if it is
     '"latin1"' or '"UTF-8"').

     As from R 2.7.0 'substr' does preserve the encoding, and 'chartr',
     'tolower' and 'toupper' preserve UTF-8 encoding on systems with
     Unicode wide characters. With their 'fixed' and 'perl' options,
     'strsplit', 'sub' and 'gsub' will give a UTF-8 result if any of
     the inputs are UTF-8.

     As from R 2.8.0 'paste' and 'sprintf' return a UTF-8 encoded
     element is any of the inputs to that element are UTF-8.

_V_a_l_u_e:

     A character vector.

_E_x_a_m_p_l_e_s:

     ## x is intended to be in latin1
     x <- "fa\xE7ile"
     Encoding(x)
     Encoding(x) <- "latin1"
     x
     xx <- iconv(x, "latin1", "UTF-8")
     Encoding(c(x, xx))
     c(x, xx)

