I am dabbling in more Chinese character (中文) for OTM (Oracle Transportation Management). I hit an unusual issue when I try to upload Chinese Characters with csv file. The csv is generated from Excel macro VBA. Thanks to the various forums (OTMFAQ and stackoverflow), I managed to find the root cause on UTF encoding with BOM (Byte Order Mark).
BOM is a 3 bytes in the beginning of the file when you save UTF-8 on Windows. If you generate the UTF-8, you can view the csv output encoding with TextPad++. OTM does not allow the upload of UTF-8 BOM file. It will just throw out an error, which is not friendly at all.
The solution to solve this is to convert the csv to UTF-8 without BOM. This will remove the first 3 Bytes from the beginning of the file. This can also be automated in macro with the following codes. Thanks to the solution codes from stackoverflow.
Public Sub PutTextFileUtf8(ByVal PathFileName As String, ByVal FileBody As String)
‘ Outputs FileBody as a text file (UTF-8 encoding without leading BOM)
‘ named PathFileName
‘ Needs reference to “Microsoft ActiveX Data Objects n.n Library”
‘ Addition to original code says version 2.5. Tested with version 6.1.
‘ 1Nov16 Copied from http://stackoverflow.com/a/4461250/973283
‘ but replaced literals with parameters.
‘ 15Aug17 Discovered routine was adding an LF to the end of the file.
‘ Added code to discard that LF.
Dim BinaryStream As Object
Dim UTFStream As Object
Set UTFStream = CreateObject(“adodb.stream”)
UTFStream.Type = adTypeText
UTFStream.Mode = adModeReadWrite
UTFStream.Charset = “UTF-8”
‘ The LineSeparator will be added to the end of FileBody. It is possible
‘ to select a different value for LineSeparator but I can find nothing to
‘ suggest it is possible to not add anything to the end of FileBody
UTFStream.LineSeparator = adLF
UTFStream.WriteText FileBody, adWriteLine
UTFStream.Position = 3 ‘skip BOM
Set BinaryStream = CreateObject(“adodb.stream”)
BinaryStream.Type = adTypeBinary
BinaryStream.Mode = adModeReadWrite
‘ Oriinally I planned to use “CopyTo Dest, NumChars” to not copy the last
‘ byte. However, NumChars is described as an integer whereas Position is
‘ described as Long. I was concerned by “integer” they mean 16 bits.
BinaryStream.Position = BinaryStream.Position – 1
Set UTFStream = Nothing
BinaryStream.SaveToFile PathFileName, adSaveCreateOverWrite
Set BinaryStream = Nothing
PS: Remember to include “Microsoft ActiveX Data Objects 2.5 Object Library” under Tools->References.