Uploaded image for project: 'Blesta Core'
  1. Blesta Core
  2. CORE-1523

Support Manager: Remove some special characters from article titles in URIs

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 3.4.0-b1
    • Fix Version/s: 3.4.0-b2
    • Component/s: None
    • Labels:
      None

      Description

      Update the titles to remove the following basic characters:
      +, ?, %, =, #, %

      We may want to also remove
      ! @ # $ % ^ & * ( ) ' " , < . > ; : - _ |

      { [ }

      ] ^ ` ~

        Issue Links

          Activity

          tyson Tyson Phillips (Inactive) created issue -
          Hide
          admin Paul Phillips added a comment -

          Ultimately, I think we want to allow alpha numeric characters only, replacing certain special characters including spaces with hyphens, and stripping out others.

          Wordpress does this with a function called sanitize_title_with_dashes(), the source of which can be found here - https://core.trac.wordpress.org/browser/tags/4.0.1/src/wp-includes/formatting.php#L0 Reviewing this function may be useful in determining the best approach.

          Show
          admin Paul Phillips added a comment - Ultimately, I think we want to allow alpha numeric characters only, replacing certain special characters including spaces with hyphens, and stripping out others. Wordpress does this with a function called sanitize_title_with_dashes(), the source of which can be found here - https://core.trac.wordpress.org/browser/tags/4.0.1/src/wp-includes/formatting.php#L0 Reviewing this function may be useful in determining the best approach.
          tyson Tyson Phillips (Inactive) made changes -
          Field Original Value New Value
          Description Update the titles to remove the following basic characters:
          +, ?, %, =, #, %

          We may want to also remove
          ! @ # $ % ^ & * ( ) ' " , < . > ; : - _ | { [ } ]
          Update the titles to remove the following basic characters:
          +, ?, %, =, #, %

          We may want to also remove
          ! @ # $ % ^ & * ( ) ' " , < . > ; : - _ | { [ } ] ^ ` ~
          Hide
          tyson Tyson Phillips (Inactive) added a comment -

          I looked at WP when adding in the KB. They use remove_accents() to convert UTF8 characters to ASCII, then only use those filtered alphanumeric ASCII characters in the title.

          Since we're allowing UTF8 characters, each of those will be broken into its corresponding octet (e.g. %21 is an exclamation point), and the browser determines how to display the correct characters by decoding each octet or not. If we were to remove the possibility of encoded characters in URLs, then WP's solution would be better since it only allows ASCII. However, removing each of the above characters is merely our happy medium.

          Show
          tyson Tyson Phillips (Inactive) added a comment - I looked at WP when adding in the KB. They use remove_accents() to convert UTF8 characters to ASCII, then only use those filtered alphanumeric ASCII characters in the title. Since we're allowing UTF8 characters, each of those will be broken into its corresponding octet (e.g. %21 is an exclamation point), and the browser determines how to display the correct characters by decoding each octet or not. If we were to remove the possibility of encoded characters in URLs, then WP's solution would be better since it only allows ASCII. However, removing each of the above characters is merely our happy medium.
          Hide
          tyson Tyson Phillips (Inactive) added a comment -

          This has been updated to remove non-alphanumeric characters from the ASCII table, #0-127. Also removes consecutive dashes that may result from consecutively removed characters separated by spaces.

          Show
          tyson Tyson Phillips (Inactive) added a comment - This has been updated to remove non-alphanumeric characters from the ASCII table, #0-127. Also removes consecutive dashes that may result from consecutively removed characters separated by spaces.
          Hide
          tyson Tyson Phillips (Inactive) added a comment -

          Going to mark this resolved, unless there is more you'd like to update here?

          Show
          tyson Tyson Phillips (Inactive) added a comment - Going to mark this resolved, unless there is more you'd like to update here?
          tyson Tyson Phillips (Inactive) made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Hide
          admin Paul Phillips added a comment -

          That sounds fine with me

          Show
          admin Paul Phillips added a comment - That sounds fine with me
          tyson Tyson Phillips (Inactive) made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          tyson Tyson Phillips (Inactive) made changes -
          Link This issue relates to CORE-1526 [ CORE-1526 ]

            People

            • Assignee:
              tyson Tyson Phillips (Inactive)
              Reporter:
              tyson Tyson Phillips (Inactive)
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:
                Fix Release Date:
                18/Dec/14