OTM CSV and UTF8 Encoding

I am dabbling in more Chinese character (中文) for OTM (Oracle Transportation Management). I hit an unusual issue when I try to upload Chinese Characters with csv file. The csv is generated from Excel macro VBA. Thanks to the various forums (OTMFAQ and stackoverflow), I managed to find the root cause on UTF encoding with BOM (Byte Order Mark).

Source: OTMFAQ

BOM is a 3 bytes in the beginning of the file when you save UTF-8 on Windows. If you generate the UTF-8, you can view the csv output encoding with TextPad++. OTM does not allow the upload of UTF-8 BOM file. It will just throw out an error, which is not friendly at all.

The solution to solve this is to convert the csv to UTF-8 without BOM. This will remove the first 3 Bytes from the beginning of the file. This can also be automated in macro with the following codes. Thanks to the solution codes from stackoverflow.


Public Sub PutTextFileUtf8(ByVal PathFileName As String, ByVal FileBody As String)

‘ Outputs FileBody as a text file (UTF-8 encoding without leading BOM)
‘ named PathFileName

‘ Needs reference to “Microsoft ActiveX Data Objects n.n Library”
‘ Addition to original code says version 2.5. Tested with version 6.1.

‘ 1Nov16 Copied from http://stackoverflow.com/a/4461250/973283
‘ but replaced literals with parameters.
‘ 15Aug17 Discovered routine was adding an LF to the end of the file.
‘ Added code to discard that LF.

‘ References: http://stackoverflow.com/a/4461250/973283
https://www.w3schools.com/asp/ado_ref_stream.asp

Dim BinaryStream As Object
Dim UTFStream As Object

Set UTFStream = CreateObject(“adodb.stream”)

UTFStream.Type = adTypeText
UTFStream.Mode = adModeReadWrite
UTFStream.Charset = “UTF-8”
‘ The LineSeparator will be added to the end of FileBody. It is possible
‘ to select a different value for LineSeparator but I can find nothing to
‘ suggest it is possible to not add anything to the end of FileBody
UTFStream.LineSeparator = adLF
UTFStream.Open
UTFStream.WriteText FileBody, adWriteLine

UTFStream.Position = 3 ‘skip BOM

Set BinaryStream = CreateObject(“adodb.stream”)
BinaryStream.Type = adTypeBinary
BinaryStream.Mode = adModeReadWrite
BinaryStream.Open

UTFStream.CopyTo BinaryStream

‘ Oriinally I planned to use “CopyTo Dest, NumChars” to not copy the last
‘ byte. However, NumChars is described as an integer whereas Position is
‘ described as Long. I was concerned by “integer” they mean 16 bits.
‘Debug.Print BinaryStream.Position
BinaryStream.Position = BinaryStream.Position – 1
BinaryStream.SetEOS
‘Debug.Print BinaryStream.Position

UTFStream.Flush
UTFStream.Close
Set UTFStream = Nothing

BinaryStream.SaveToFile PathFileName, adSaveCreateOverWrite
BinaryStream.Flush
BinaryStream.Close
Set BinaryStream = Nothing

End Sub


PS: Remember to include “Microsoft ActiveX Data Objects 2.5 Object Library” under Tools->References.

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s