# Restricting Text Responses With Regular Expressions **Last updated:** 25 Mar 2024 A regular expression, or regex, is a search pattern used for matching specific characters and ranges of characters within a string. It is widely used to validate, search, extract, and restrict text in most programming languages. KoboToolbox supports regex to control the length and characters during data entry to a particular question _(e.g. controlling the entry of mobile number to exactly 10 digits, controlling the entry of a valid email id etc.)_. ## To use a regex in KoboToolbox, follow these steps 1. Prepare a _Text_ question type. 2. Go to the question's _Settings_. 3. Go to _Validation Criteria_ and choose the _Manually enter your validation logic in XLSForm_ code option. 4. In the _Validation Code_ box, enter your regex formula between the quotation marks `(' ')` of the `regex(., ' ')` format. For reference, the period (`.`) refers to _'this question'_, while the regular expression inside the quotation marks (`' '`) needs to conform to the established regex rules. 5. (Optional) Add a custom _Error Message_ for the person entering data to see when they don't meet the regex criteria. ![image](/images/restrict_responses/regrex.jpg) Regex can also be coded in XLSForm, under the _constraint_ column: | type | name | label | appearance | constraint | constraint_message | | :--- | :--- | :-------------------------- | :--------- | :---------------------- | :-------------------------------- | | text | q1 | Mobile number of respondent | numbers | regex(., '^[0-9]{10}$') | This value must be only 10 digits | Alternatively, you can create a `calculate` question type and then define the regex code under the _calculation_ column. You could then use this variable as many times as needed in the survey: | type | name | label | calculation | constraint | constraint_message | | :-------- | :--- | :--------------------- | :--------------------------------------- | :-------------- | :---------------------------------- | | calculate | q0 | | '^[A-Z]{1}[a-z]{1,}\s[A-Z]{1}[a-z]{1,}$' | | | | text | q1 | Name of the Enumerator | | regex(., ${q0}) | Please use this format: Kobe Bryant | | text | q2 | Name of the Respondent | | regex(., ${q0}) | Please use this format: Kobe Bryant | | integer | q3 | Age of the Respondent | | | | ## How do I build the regex that I need? In addition to the examples and tips provided below, please visit [this website](http://www.regexr.com) for more help and examples.
Regex in KoboToolbox should always be written in-between the apostrophes regex(., ' ')
as shown in the examples.
|
| A vertical bar matches any element separated. |
| `\` | A back slash is used to match the literal value of any metacharacter (e.g. try using `\.` or `\@` or `\$` while building regex). |
| `\number` | Matches the same character as most recently matched by the nth (number used) capturing group. |
| `\s` | Matches any _space_ or _tab_. |
| `\b` | Matches, without consuming any characters immediately between a character matched by `\w` and a character not matched by `\w` (in either order). `\b` is also known as the _word boundary_. |
| `\d` | Matches any equivalent numbers `[0-9]` |
| `\D` | Matches anything other than numbers `(0 to 9)`. |
| `\w` | Matches any word character (i.e. `a` to `z` or `A` to `Z` or `0` to `9` or `_`). |
| `\W` | Matches anything other than what `\w` matches (i.e. it matches wild cards and spaces). |
| `?` | A question mark used just behind a character matches or skips (if not required) a character match. |
| `*` | An asterisk symbol used just behind a character matches zero or more consecutive character. |
| `+` | The plus symbol used just behind a character matches one or more consecutive character. |
| `{x}` | Matches exactly `x` consecutive characters. |
| `{x,}` | Matches at least `x` consecutive characters (or more). |
| `{x,y}` | Matches between `x` and `y` consecutive characters. |
## Characters with accents
| **Regex** | **Description** |
| :------------------ | :------------------------------------------------------------------------------------------------------------- |
| `[A-zÀ-ú]` | Accepts lowercase and uppercase accents characters |
| `[A-zÀ-ÿ]` | Accepts lowercase and uppercase accents characters but including letters with an umlaut (includes [ ] ^ \ × ÷) |
| `[A-Za-zÀ-ÿ]` | Accepts lowercase and uppercase accents characters but not including [ ] ^ \ |
| `[A-Za-zÀ-ÖØ-öø-ÿ]` | Accepts lowercase and uppercase accents characters but not including [ ] ^ \ × ÷ |
## Examples related to use of numbers
For all text
type questions that use numbers, do not forget to type numbers
under the appearance
column.
regex(., '^[\$|\£]\d{3}$')
| Restrict a currency input of _three digits_ with a currency sign (either `dollar` or `pound`) in front (e.g. `$999` or `£500`) |
| `regex(., '^\W*(\w+\b\W*){3}$')` | Restrict an exact input of number of words (e.g. to restrict exactly 3 words `I love you.`) |
| `regex(., '^\W*(\w+\b\W*){3,5}$')` | Restrict an input of number of words (e.g. to restrict a range of words say `3` to `5`) |
## Examples related to the restriction of valid email inputs
These examples are purely illustrative and should be adjusted for your use-case. Using regex for constraining email addresses does not guarantee that they are valid, only that they follow an expected pattern.
| **Regex** | **Description** | | :----------------------------------------------------------------------------------------------------------------- | :----------------------------------------------------------------------- | | `regex(., '^[A-Za-z0-9._%+-]+@[A-Za-z0-9-]+[.][A-Za-z.]{2,}$')` or `regex(., '^([\W\d\D]+[@][\D]+[.][\D]{2,})+$')` | Restrict an input with a valid email address e.g. `example.domain.com` or `example.domain.com.np` | ## Examples related to use of time inputs | **Regex** | **Description** | | :------------------------------------------------------- | :---------------------------------------- | | `regex(., '^([00-23]{0,2}:[00-59]{0,2}:[00-59]{0,2})$')` | Restrict a time input in `24` hour format | | `regex(., '^([00-12]{0,2}:[00-59]{0,2}:[00-59]{0,2})$')` | Restrict a time input in `12` hour format | ## Considerations when using regex - If you wish to use a regex constraint on a number in a `text` type question, make sure you _always_ have the value `numbers` under the `appearance` column. This restricts the display of alphabets, making only numbers visible for inputs. - The Collect Android app and Enketo behave differently with their handling of regex expressions. Collect behaves as if you have used the anchors `^` and `$` around the expression (even if you have not used them), while Enketo requires the anchors as mandatory for an exact match.