JBehave
  1. JBehave
  2. JBEHAVE-992

Provide more efficient step collection for large projects

    Details

    • Type: Improvement Improvement
    • Status: Open Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: 4.0
    • Component/s: Core
    • Labels:
      None
    • Number of attachments :
      0

      Description

      With a large project where there can be 600 candidate steps, the ByLevenshteinDistance PrioritisingStrategy is very slow, can take 4 seconds to complete the step collection.
      The ByPriorityField is much faster, but is not a practical solution for this size project, as it would be difficult to manage the priority on this number of steps.

      So need a more efficient step collection strategy.

        Activity

        Hide
        Tim Walters added a comment -

        It was mentioned in http://jira.codehaus.org/browse/JBEHAVE-216 that

        as a bit of a hack, by sorting the annotations in descending string length.

        This is good solution and by no means a hack.

        Ordering of candidates

        Given the following candidate steps:

        1. the user logs in as $user
        2. the user logs in as $user, with password $password

        Remove the parameters, to be left with:

        1. the user logs in as
        2. the user logs in as , with password

        And the stepAsText is:

        • the user logs in as nobody, with password secret

        This step could match both candidates, so the ordering is important. If ordering longest candidates first, this will always try to match the most specific patterns first. This approach will be completely deterministic, and eliminate the need for prioritising steps manually.

        Filtering of candidates

        Given the stepAsText is:

        • the user logs in as nobody

        Then since the stepAsText.length() is less than the length of candidate 2 (after removal of parameters), its impossible for the stepAsText to match that candidate.

        If the stepAsText is:

        • the user logs in as nobody interesting

        Then since the stepAsText.length() is greater than the length of both candidates (after removal of parameters), its possible for the stepAsText to match either candidate.
        So its possible to immediately eliminate candidates that are longer than the stepAsText.

        Therefore, filtering based on stepAsText length can also be used to improve the efficiency of step collection.

        Show
        Tim Walters added a comment - It was mentioned in http://jira.codehaus.org/browse/JBEHAVE-216 that as a bit of a hack, by sorting the annotations in descending string length. This is good solution and by no means a hack. Ordering of candidates Given the following candidate steps: the user logs in as $user the user logs in as $user, with password $password Remove the parameters, to be left with: the user logs in as the user logs in as , with password And the stepAsText is: the user logs in as nobody, with password secret This step could match both candidates, so the ordering is important. If ordering longest candidates first, this will always try to match the most specific patterns first. This approach will be completely deterministic, and eliminate the need for prioritising steps manually. Filtering of candidates Given the stepAsText is: the user logs in as nobody Then since the stepAsText.length() is less than the length of candidate 2 (after removal of parameters), its impossible for the stepAsText to match that candidate. If the stepAsText is: the user logs in as nobody interesting Then since the stepAsText.length() is greater than the length of both candidates (after removal of parameters), its possible for the stepAsText to match either candidate. So its possible to immediately eliminate candidates that are longer than the stepAsText. Therefore, filtering based on stepAsText length can also be used to improve the efficiency of step collection.
        Hide
        Tim Walters added a comment -

        Have implemented these changes:
        https://github.com/solubris/jbehave-core

        Have submitted pull request:
        https://github.com/jbehave/jbehave-core/pull/62

        Show
        Tim Walters added a comment - Have implemented these changes: https://github.com/solubris/jbehave-core Have submitted pull request: https://github.com/jbehave/jbehave-core/pull/62
        Mauro Talevi made changes -
        Field Original Value New Value
        Fix Version/s 3.9.2 [ 20180 ]
        Mauro Talevi made changes -
        Summary Step collection is not effecient Provide more efficient step collection for large projects
        Hide
        Mauro Talevi added a comment -

        Hi Tim, thanks for the contribution. It looks well done (and tested

        I have a question: why do you need to introduce the interface StepFinder.OrderingStrategy?

        It feels kind of doubling the scope of the StepFinder.PrioritisingStrategy.

        I think the OrderByPatternLength could be an internal implementation detail of FilteringByPatternLength, without the need to expose another public interface.

        What do you think?

        Show
        Mauro Talevi added a comment - Hi Tim, thanks for the contribution. It looks well done (and tested I have a question: why do you need to introduce the interface StepFinder.OrderingStrategy? It feels kind of doubling the scope of the StepFinder.PrioritisingStrategy. I think the OrderByPatternLength could be an internal implementation detail of FilteringByPatternLength, without the need to expose another public interface. What do you think?
        Hide
        Tim Walters added a comment - - edited

        The main difference is that the OrderingStrategy doesn't take the "stepAsString" parameter.
        It defines an interface for ordering a list of candidate steps (most specific to least specific).
        Where as the PrioritisingStrategy does take the "stepAsString" parameter.
        It defines an interface for arranging steps in respect of how well they could match the story step.

        Could have the OrderByPatternLength implement PrioritisingStrategy and not use the stepAsString parameter (as the ByPriorityField does now), but this doesn't correspond with the underlying implementation. As the OrderingStrategy is called once for each scenario where as the PrioritisingStrategy is called for each step in the story.

        Actually, for this approach to work, really need to use both OrderByPatternLength and FilteringByPatternLength in-conjunction. So could have an interface that defines both methods:

        PrioritisingStrategy {
            order(candidates);
            priorities(stepAsString, candidates);
        }
        

        I like this approach, but it would be more impact to the existing code..

        Show
        Tim Walters added a comment - - edited The main difference is that the OrderingStrategy doesn't take the "stepAsString" parameter. It defines an interface for ordering a list of candidate steps (most specific to least specific). Where as the PrioritisingStrategy does take the "stepAsString" parameter. It defines an interface for arranging steps in respect of how well they could match the story step. Could have the OrderByPatternLength implement PrioritisingStrategy and not use the stepAsString parameter (as the ByPriorityField does now), but this doesn't correspond with the underlying implementation. As the OrderingStrategy is called once for each scenario where as the PrioritisingStrategy is called for each step in the story. Actually, for this approach to work, really need to use both OrderByPatternLength and FilteringByPatternLength in-conjunction. So could have an interface that defines both methods: PrioritisingStrategy { order(candidates); priorities(stepAsString, candidates); } I like this approach, but it would be more impact to the existing code..
        Hide
        Tim Walters added a comment -

        NOTES:

        • with these changes, anyone who has explicitly used the ByLevenshteinDistance or ByPriorityField strategies will not benefit. These have been deprecated in my change set, but it might be better to remove them completely, as it will then force anyone upgrading to the new version to update their config.
        • I believe the whole concept of adding a priority to steps to manually organise them is redundant with these improvements. So ideally would remove this throughout the code, but that is not a simple task; something for the future..
        • have added a CachingStepFinder which caches the collected steps. This is a nice-to-have as performance is satisfactory without it.
        Show
        Tim Walters added a comment - NOTES: with these changes, anyone who has explicitly used the ByLevenshteinDistance or ByPriorityField strategies will not benefit. These have been deprecated in my change set, but it might be better to remove them completely, as it will then force anyone upgrading to the new version to update their config. I believe the whole concept of adding a priority to steps to manually organise them is redundant with these improvements. So ideally would remove this throughout the code, but that is not a simple task; something for the future.. have added a CachingStepFinder which caches the collected steps. This is a nice-to-have as performance is satisfactory without it.
        Mauro Talevi made changes -
        Fix Version/s 3.9.2 [ 20180 ]
        Fix Version/s 3.9.3 [ 20321 ]
        Mauro Talevi made changes -
        Fix Version/s 3.9.3 [ 20321 ]
        Fix Version/s 4.0 [ 18486 ]
        Hide
        Mauro Talevi added a comment -

        Hi Tim,

        getting back to this issue. I've moved it to 4.0 so we can break/change existing interfaces.

        I like the idea of unifying ordering and filtering in the same PrioritisingStrategy. We can also deprecate any existing implementation if that is superseded by the new functionality.

        Could you please provide a pull request for the new unified approach for the 4.x branch?

        Cheers

        Show
        Mauro Talevi added a comment - Hi Tim, getting back to this issue. I've moved it to 4.0 so we can break/change existing interfaces. I like the idea of unifying ordering and filtering in the same PrioritisingStrategy. We can also deprecate any existing implementation if that is superseded by the new functionality. Could you please provide a pull request for the new unified approach for the 4.x branch? Cheers
        Hide
        Tim Walters added a comment -

        Sure, am a bit busy at the moment, give me a week or two..

        Show
        Tim Walters added a comment - Sure, am a bit busy at the moment, give me a week or two..

          People

          • Assignee:
            Unassigned
            Reporter:
            Tim Walters
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated: