JBehave
  1. JBehave
  2. JBEHAVE-374

Upgrade to commons-lang 3.0 to correctly escape characters from the Unicode Supplemental Multilingual Plane

    Details

    • Type: Task Task
    • Status: Open Open
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: 3.0.3
    • Fix Version/s: 3.x
    • Component/s: Core
    • Labels:
      None
    • Environment:
      Windows 7, 64-bit
    • Number of attachments :
      0

      Description

      If one includes characters from the Unicode Supplemental Multilingual Plane (code points U+10000 upwards) in a story file, if one then asks for an HTML report from the test run the characters will not be HTML-escaped correctly.

      For example, given a story file with the following scenario:
      ------------
      Scenario: Some scenario
      Given some situation
      When I do something
      Then the result is 𐐆
      ------------
      (The "dagger"-type character is actually code point U+10406 - see http://en.wikibooks.org/wiki/Unicode/Character_reference/10000-10FFF)

      The resulting HTML report will have the "dagger" character escaped as �� - which represent surrogate-pair code points (used in UTF-16 only) and so is rendered as gibberish in HTML. The escape should be 𐐆

      NOTE: This is NOT a bug in JBehave per se - the bug is in the StringEscapeUtils class of commons-lang. A related bug has already been raised (and fixed) in commons-lang: https://issues.apache.org/jira/browse/LANG-617. Although the commons-lang bug report relates to XML escaping rather than HTML escaping, it seems likely that the fix will cover both. Unfortunately, the fix is in commons-lang 3.0...

        Activity

        Hide
        Alistair Dutton added a comment -

        Ah, web interface to JIRA is rendering the character escapes as HTML rather than escaping them. Here is what I meant to say in the penultimate paragraph:

        The resulting HTML report will have the "dagger" character escaped as �� - which represent surrogate-pair code points (used in UTF-16 only) and so is rendered as gibberish in HTML. The escape should be 𐐆

        Show
        Alistair Dutton added a comment - Ah, web interface to JIRA is rendering the character escapes as HTML rather than escaping them. Here is what I meant to say in the penultimate paragraph: The resulting HTML report will have the "dagger" character escaped as �� - which represent surrogate-pair code points (used in UTF-16 only) and so is rendered as gibberish in HTML. The escape should be 𐐆
        Hide
        Mauro Talevi added a comment -

        We use commons-lang to escape both html and xml. So, we'd have to wait for the fix in 3.0 to be released, unless you feel like digging into the common-lang trunk and cherry-pick the patch that fixed your unicode problem. We'd be happy to apply it as stop-gap until commons-lang is released.

        Show
        Mauro Talevi added a comment - We use commons-lang to escape both html and xml. So, we'd have to wait for the fix in 3.0 to be released, unless you feel like digging into the common-lang trunk and cherry-pick the patch that fixed your unicode problem. We'd be happy to apply it as stop-gap until commons-lang is released.
        Hide
        Alistair Dutton added a comment -

        The bug doesn't actually impede the functionality of any tests - it's purely cosmetic on the test results. Therefore, to me the importance of the bug doesn't outweigh the risk of introducing more bugs by coming up with a stopgap fix (which someone then has to remember to remove once commons-lang 3.x is introduced as a dependency in JBehave). But I did think that there was value in getting the issue logged so that anyone else encountering it had a reference point.

        So I'm happy to wait for commons-lang 3.x and for the bug I've raised to be archived off in whatever way you think is best.

        Show
        Alistair Dutton added a comment - The bug doesn't actually impede the functionality of any tests - it's purely cosmetic on the test results. Therefore, to me the importance of the bug doesn't outweigh the risk of introducing more bugs by coming up with a stopgap fix (which someone then has to remember to remove once commons-lang 3.x is introduced as a dependency in JBehave). But I did think that there was value in getting the issue logged so that anyone else encountering it had a reference point. So I'm happy to wait for commons-lang 3.x and for the bug I've raised to be archived off in whatever way you think is best.
        Hide
        Mauro Talevi added a comment -

        Changed to task as reminder to upgrade to commons-lang 3.0 when released.

        Show
        Mauro Talevi added a comment - Changed to task as reminder to upgrade to commons-lang 3.0 when released.
        Mauro Talevi made changes -
        Field Original Value New Value
        Issue Type Bug [ 1 ] Task [ 3 ]
        Mauro Talevi made changes -
        Summary Characters from the Unicode Supplemental Multilingual Plane included in story definitions get rendered incorrectly in HTML Upgrade to commons-lang 3.0 to correctly escape characters from the Unicode Supplemental Multilingual Plane
        Fix Version/s 3.2 [ 16757 ]
        Component/s Core [ 11086 ]
        Mauro Talevi made changes -
        Fix Version/s 3.x [ 16979 ]
        Fix Version/s 3.2 [ 16757 ]
        Hide
        Henri Yandell added a comment -

        Noting that Lang3 is released - waiting on it to sync to the central maven repo etc etc.

        Show
        Henri Yandell added a comment - Noting that Lang3 is released - waiting on it to sync to the central maven repo etc etc.
        Mauro Talevi made changes -
        Assignee Mauro Talevi [ maurotalevi ]
        Fix Version/s 3.5 [ 17393 ]
        Fix Version/s 3.x [ 16979 ]
        Hide
        Mauro Talevi added a comment -

        Unfortunately, a commons-lang 3.x is not backward compatible with 2.x. We need to provide a plugglable and backward compatible way to escape chars using a given version of commons-lang.

        Show
        Mauro Talevi added a comment - Unfortunately, a commons-lang 3.x is not backward compatible with 2.x. We need to provide a plugglable and backward compatible way to escape chars using a given version of commons-lang.
        Mauro Talevi made changes -
        Fix Version/s 3.x [ 16979 ]
        Fix Version/s 3.5 [ 17393 ]

          People

          • Assignee:
            Mauro Talevi
            Reporter:
            Alistair Dutton
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated: