Content Encoding Handling (Magic xpi 4.5)
Magic xpi has built-in support for the following textual encodings:
-
Ansi – Can represent a single codepage in addition to English.
-
Unicode (UTF-16) – Can represent multiple codepages at once. For more information, see Unicode Support.
Binary represents content that is not textual. Textual functions, such as Trim and InStr, should not be used on Binary.
When working with textual data manipulation, Magic xpi automatically does the encoding conversion and usually converts the result to Unicode.
There is no built-in support for UTF-8 encoding. See the UTF-8 Content section below.
When updating a Unicode BLOB with the content of an Ansi BLOB, Magic xpi will convert the Ansi content to Unicode first and update the Unicode BLOB with Unicode data.
When updating a Ansi BLOB with the content of an Unicode BLOB, Magic xpi will try to convert the Unicode content to Ansi first and update the Ansi BLOB with Ansi data.
When updating either Unicode or Ansi BLOBs with the content of a Binary BLOB, the data is updated as-is. No conversion is done and the result is a probably mismatch between the BLOB type and BLOB content.
When updating a Binary BLOB with the content of any BLOB type, the data is passed as-is. If the source was Unicode, the Unicode content will be passed as is and the result Binary BLOB will have no BOM.
When mapping a value to a variable (Flow Data, Data Mapper variables, and Call Flow Destination), you can select the content encoding that this variable will hold. Magic xpi will automatically perform the conversion to the selected content type.
If the file contains Unicode BOM, The content is loaded as is without the BOM and from there converted to the Blob type that was selected to hold the content.
If the file contains UTF-8 BOM, The content is converted from UTF-8 to Unicode and from there converted to the Blob type that was selected to hold the content.
If the file contains no BOM, the content is considered Ansi and is converted from Ansi to the Blob type that was selected to hold the content. This part includes both Binary and UTF-8 and Ansi content.
If the Blob is Binary or Ansi, the content of the file will be identical to the content of the Binary Blob.
In the case of Unicode, a BOM will be appended to the beginning of the file to indicate that this is a Unicode file (the Unicode Blob contains no BOM since the container knows it is Unicode already).
There is no built-in support to UTF-8 encoding. Only conversion to and from UTF-8 is possible (for example, with I/O and functions). You should store UTF-8 in a Binary BLOB, including its BOM characters, if a BOM exists. If textual manipulation is required, you should convert to Unicode using the UTF8toUnicode function. If necessary, the data can be converted back to UTF-8 using the UTF8fromUnicode function.
For UTF-8, if you try to use any string functions on the BLOB (which is defined internally as binary), you will lose any multibyte encoded characters.
To overcome this problem, a selection of utility functions is provided that allow you to convert the UTF-8 content to other content types which are natively supported.
If you save UTF-8 content to the file system, since UTF-8 is stored in a binary BLOB, the content will be saved “as is”. The file will have a BOM only if the BOM was part of the BLOB data.
Note:
|
A file with UTF-8 content does not have to contain a BOM. However, the accurate identification of UTF-8 content files that do not contain a BOM is not guaranteed, and the identification process is also time-consuming. It is therefore strongly recommended to ensure that UTF-8 files always contain a BOM. You can use the BlobAddBOM function to ensure that the UTF-8 file is always created with a BOM.
|
Recommendations and Tips