[JBEHAVE-374] Upgrade to commons-lang 3.0 to correctly escape characters from the Unicode Supplemental Multilingual Plane - jira.codehaus.org

Details

Type: Task
Status: Open
Priority: Minor
Resolution: Unresolved
Affects Version/s: 3.0.3
Fix Version/s: 3.x
Component/s: Core
Labels:
None
Environment:
Windows 7, 64-bit

Number of attachments :
0

Description

If one includes characters from the Unicode Supplemental Multilingual Plane (code points U+10000 upwards) in a story file, if one then asks for an HTML report from the test run the characters will not be HTML-escaped correctly.

For example, given a story file with the following scenario:
------------
Scenario: Some scenario
Given some situation
When I do something
Then the result is 𐐆
------------
(The "dagger"-type character is actually code point U+10406 - see http://en.wikibooks.org/wiki/Unicode/Character_reference/10000-10FFF)

The resulting HTML report will have the "dagger" character escaped as �� - which represent surrogate-pair code points (used in UTF-16 only) and so is rendered as gibberish in HTML. The escape should be 𐐆

NOTE: This is NOT a bug in JBehave per se - the bug is in the StringEscapeUtils class of commons-lang. A related bug has already been raised (and fixed) in commons-lang: https://issues.apache.org/jira/browse/LANG-617. Although the commons-lang bug report relates to XML escaping rather than HTML escaping, it seems likely that the fix will cover both. Unfortunately, the fix is in commons-lang 3.0...

Activity

Hide

Permalink

Alistair Dutton added a comment - 25/Oct/10 2:16 AM

Ah, web interface to JIRA is rendering the character escapes as HTML rather than escaping them. Here is what I meant to say in the penultimate paragraph:

The resulting HTML report will have the "dagger" character escaped as &#55297;&#56326; - which represent surrogate-pair code points (used in UTF-16 only) and so is rendered as gibberish in HTML. The escape should be 𐐆

Show

Alistair Dutton added a comment - 25/Oct/10 2:16 AM Ah, web interface to JIRA is rendering the character escapes as HTML rather than escaping them. Here is what I meant to say in the penultimate paragraph: The resulting HTML report will have the "dagger" character escaped as &#55297;&#56326; - which represent surrogate-pair code points (used in UTF-16 only) and so is rendered as gibberish in HTML. The escape should be 𐐆

Hide

Permalink

Mauro Talevi added a comment - 27/Oct/10 9:07 AM

We use commons-lang to escape both html and xml. So, we'd have to wait for the fix in 3.0 to be released, unless you feel like digging into the common-lang trunk and cherry-pick the patch that fixed your unicode problem. We'd be happy to apply it as stop-gap until commons-lang is released.

Show

Mauro Talevi added a comment - 27/Oct/10 9:07 AM We use commons-lang to escape both html and xml. So, we'd have to wait for the fix in 3.0 to be released, unless you feel like digging into the common-lang trunk and cherry-pick the patch that fixed your unicode problem. We'd be happy to apply it as stop-gap until commons-lang is released.

Hide

Permalink

Alistair Dutton added a comment - 27/Oct/10 9:37 AM

The bug doesn't actually impede the functionality of any tests - it's purely cosmetic on the test results. Therefore, to me the importance of the bug doesn't outweigh the risk of introducing more bugs by coming up with a stopgap fix (which someone then has to remember to remove once commons-lang 3.x is introduced as a dependency in JBehave). But I did think that there was value in getting the issue logged so that anyone else encountering it had a reference point.

So I'm happy to wait for commons-lang 3.x and for the bug I've raised to be archived off in whatever way you think is best.

Show

Alistair Dutton added a comment - 27/Oct/10 9:37 AM The bug doesn't actually impede the functionality of any tests - it's purely cosmetic on the test results. Therefore, to me the importance of the bug doesn't outweigh the risk of introducing more bugs by coming up with a stopgap fix (which someone then has to remember to remove once commons-lang 3.x is introduced as a dependency in JBehave). But I did think that there was value in getting the issue logged so that anyone else encountering it had a reference point. So I'm happy to wait for commons-lang 3.x and for the bug I've raised to be archived off in whatever way you think is best.

Hide

Permalink

Mauro Talevi added a comment - 27/Oct/10 9:42 AM

Changed to task as reminder to upgrade to commons-lang 3.0 when released.

Show

Mauro Talevi added a comment - 27/Oct/10 9:42 AM Changed to task as reminder to upgrade to commons-lang 3.0 when released.

Mauro Talevi made changes - 27/Oct/10 9:42 AM

Field	Original Value	New Value
Issue Type	Bug [ 1 ]	Task [ 3 ]

Mauro Talevi made changes - 27/Oct/10 9:48 AM

Summary	Characters from the Unicode Supplemental Multilingual Plane included in story definitions get rendered incorrectly in HTML	Upgrade to commons-lang 3.0 to correctly escape characters from the Unicode Supplemental Multilingual Plane
Fix Version/s		3.2 [ 16757 ]
Component/s		Core [ 11086 ]

Mauro Talevi made changes - 04/Dec/10 6:42 AM

Fix Version/s		3.x [ 16979 ]
Fix Version/s	3.2 [ 16757 ]

Hide

Permalink

Henri Yandell added a comment - 19/Jul/11 4:11 AM

Noting that Lang3 is released - waiting on it to sync to the central maven repo etc etc.

Show

Henri Yandell added a comment - 19/Jul/11 4:11 AM Noting that Lang3 is released - waiting on it to sync to the central maven repo etc etc.

Mauro Talevi made changes - 19/Jul/11 4:58 AM

Assignee		Mauro Talevi [ maurotalevi ]
Fix Version/s		3.5 [ 17393 ]
Fix Version/s	3.x [ 16979 ]

Hide

Permalink

Mauro Talevi added a comment - 04/Sep/11 7:06 AM

Unfortunately, a commons-lang 3.x is not backward compatible with 2.x. We need to provide a plugglable and backward compatible way to escape chars using a given version of commons-lang.

Show

Mauro Talevi added a comment - 04/Sep/11 7:06 AM Unfortunately, a commons-lang 3.x is not backward compatible with 2.x. We need to provide a plugglable and backward compatible way to escape chars using a given version of commons-lang.

Mauro Talevi made changes - 04/Sep/11 7:06 AM

Fix Version/s		3.x [ 16979 ]
Fix Version/s	3.5 [ 17393 ]

People

Assignee:

Mauro Talevi

Reporter:

Alistair Dutton

Votes:

0 Vote for this issue

Watchers:

0 Start watching this issue

Dates

Created:

24/Oct/10 3:37 PM

Updated:

04/Sep/11 7:06 AM