Restricting Text Responses With Regular Expressions¶

Last updated: 29 Jul 2025

A regular expression, or regex, is a search pattern used for matching specific characters and ranges of characters within a string. It is widely used to validate, search, extract, and restrict text in most programming languages. KoboToolbox supports regex to control the length and characters during data entry to a particular question (e.g. controlling the entry of mobile number to exactly 10 digits, controlling the entry of a valid email id etc.).

To use a regex in KoboToolbox, follow these steps¶

Prepare a Text question type.
Go to the question’s Settings.
Go to Validation Criteria and choose the Manually enter your validation logic in XLSForm code option.
In the Validation Code box, enter your regex formula between the quotation marks (' ') of the regex(., ' ') format. For reference, the period (.) refers to ‘this question’, while the regular expression inside the quotation marks (' ') needs to conform to the established regex rules.
(Optional) Add a custom Error Message for the person entering data to see when they don’t meet the regex criteria.

Regex can also be coded in XLSForm, under the constraint column:

survey sheet

type	name	label	appearance	constraint	constraint_message
text	q1	Mobile number of respondent	numbers	regex(., ‘^[0-9]{10}$’)	This value must be only 10 digits
survey

Alternatively, you can create a calculate question type and then define the regex code under the calculation column. You could then use this variable as many times as needed in the survey:

survey sheet

type	name	label	calculation	constraint	constraint_message
calculate	q0		‘^[A-Z]{1}[a-z]{1,}\s[A-Z]{1}[a-z]{1,}$’
text	q1	Name of the Enumerator		regex(., ${q0})	Please use this format: Kobe Bryant
text	q2	Name of the Respondent		regex(., ${q0})	Please use this format: Kobe Bryant
integer	q3	Age of the Respondent
survey

How do I build the regex that I need?¶

In addition to the examples and tips provided below, please visit this website for more help and examples.

Regex in KoboToolbox should always be written in-between the apostrophes regex(., ' ') as shown in the examples.

Regex	Description
`^`	The caret symbol matches the start of a string without consuming any character.
`$`	The dollar symbol matches the end of a string without consuming any character.
`[abc]`	Matches either `a`, `b` or `c` from within the square brackets `[ ]`.
`[a-z]`	Matches any lowercase character from `a` to `z`.
`[A-Z]`	Matches any uppercase character from `A` to `Z`.
`[0-9]`	Matches any whole numbers from `0` to `9`.
`[a-zA-Z0-9]`	Matches any character from `a` to `z` or `A` to `Z` or `0` to `9`.
`[^abc]`	Matches any character except `a`, `b` or `c`.
`[^A-Z]`	Matches any characters except those in the range `A` to `Z`.
`(apple)`	The grouping character `( )` matches anything that is within the parenthesis.
`\|`	A vertical bar matches any element separated.
`\`	A back slash is used to match the literal value of any metacharacter (e.g. try using `\.` or `\@` or `\$` while building regex).
`\number`	Matches the same character as most recently matched by the n^th (number used) capturing group.
`\s`	Matches any space or tab.
`\b`	Matches, without consuming any characters immediately between a character matched by `\w` and a character not matched by `\w` (in either order). `\b` is also known as the word boundary.
`\d`	Matches any equivalent numbers `[0-9]`
`\D`	Matches anything other than numbers `(0 to 9)`.
`\w`	Matches any word character (i.e. `a` to `z` or `A` to `Z` or `0` to `9` or `_`).
`\W`	Matches anything other than what `\w` matches (i.e. it matches wild cards and spaces).
`?`	A question mark used just behind a character matches or skips (if not required) a character match.
`*`	An asterisk symbol used just behind a character matches zero or more consecutive character.
`+`	The plus symbol used just behind a character matches one or more consecutive character.
`{x}`	Matches exactly `x` consecutive characters.
`{x,}`	Matches at least `x` consecutive characters (or more).
`{x,y}`	Matches between `x` and `y` consecutive characters.

Characters with accents¶

Regex	Description
`[A-zÀ-ú]`	Accepts lowercase and uppercase accents characters
`[A-zÀ-ÿ]`	Accepts lowercase and uppercase accents characters but including letters with an umlaut (includes [ ] ^ \ × ÷)
`[A-Za-zÀ-ÿ]`	Accepts lowercase and uppercase accents characters but not including [ ] ^ \
`[A-Za-zÀ-ÖØ-öø-ÿ]`	Accepts lowercase and uppercase accents characters but not including [ ] ^ \ × ÷

Examples related to use of numbers¶

For all text type questions that use numbers, do not forget to type numbers under the appearance column.

Regex	Description
`regex(., '^[0-9]{10}$')` or `regex(., '^\d{10}$')`	Restrict mobile number to ten digits
`regex(., '^[0-9]{4}.[0-9]{2}.[0-9]{2}$')` or `regex(., '^\d{4}\.\d{2}\.\d{2}$')`	Restrict an input to `1234.56.78`
`regex(., '^[01-99]{2}$') and (. >= 01)`	Restrict an input between `01` to `99` digits where input format of a single number (like 1 or 2) is not allowed
`regex(., '^(12\|345)$')`	Restrict an input to either to `12` or `345`
`regex(., '^[1-9][0-9]{8}$')` or `regex(., '^[^0][0-9]{8}$')`	Restrict an input of nine digits where the first number can’t be `0`
`regex(., '^\d$')`	Restrict an input to one digit in between `0` to `9`
`regex(., '^\d{5}$')`	Restrict an input to five digits in between `0` to `9`
`regex(., '^\d{2}\.\d{3}$')`	Restrict an input to two digits and three decimals (e.g. `12.345`)
`regex(., '^\d{2}(\.\d{3})?$')`	Restrict an input to two digits and three decimals (while the decimals are optional) (e.g. `12` or `12.345`)
`regex(., '^[0-9]{10}$') or regex(., '^[0-9]{13}$')` or `regex(., '^[0-9]{17}$')` or `regex(., '^([0-9]{10}\|[0-9]{13}\|[0-9]{17})$')` or `regex(., '^([\d]{10}\|[\d]{13}\|[\d]{17})$')`	Restrict an input to `10`, `13` or `17` digit numbers only

Examples related to use of letters¶

Regex	Description
`regex(., '^[a-z]{1,6}$')`	Restrict an input to any lowercase letters (up to 6 characters long)
`regex(., '^[A-Z]{1,10}$')`	Restrict an input to any uppercase letters (up to 10 characters long)
`regex(., '^(Apple\|Orange\|Banana)$')`	Restrict an input to only either to `Apple` or `Orange` or `Banana`
`regex(., '^p(ea\|ai)r$')`	Restrict an input to only `pear` or `pair`
`regex(., '^[A-Z]{1}[a-z]{1,}[ ]{1}[A-Z]{1}[a-z]{1,}$')`	Restrict an input of the beneficiaries name where the initials of the first name and last name are uppercase e.g. `Kobe Bryant`
`regex(., '^\w{1,}\s(\w{1,})?(\s)?\w{1,}$')`	Restrict an input of the beneficiaries name with first name, middle name (if any) and last name e.g. `Kobe Bean Bryant`
`regex(., '^([A-Z]{1}[a-z]{1,}\s)([A-Z]{1}[a-z]{1,}\s?)+$')` or `regex(., '^([A-Z]{1}[a-z]{1,}\s?)+$')`	Restrict an input of the beneficiaries’ full name where the initials of the names are in uppercase and the name are quite long (often greater than 3 words) e.g. `Samayamantri Venkata Rama Naga Butchi Anjaneya Satya Krishna Vijay` (this is an example of south Indian names)
`regex(., '^(\D+)\s(\D+)\s?\1$')`	Restrict an input of the beneficiaries’ first name (so that you are able to capture the exact spelling) where the enumerators are forced to enter the beneficiaries first name twice e.g. `Kobe Bryant Kobe`. (This could be helpful when you are trying to document beneficiaries details where a typo error could cost you heavy)
`regex(., '^(\D+)\s(\D+)\s?\2$')`	Restrict an input of the beneficiaries’ last name (so that you are able to capture the exact spelling) where the enumerators are forced to enter the beneficiaries last name twice e.g. `Kobe Bryant Bryant`. (This could be helpful when you are trying to document beneficiaries details where a typo error could cost you heavy) \|
`regex(., '^colou?r$')`	Restrict a character within a word by using the `?` (quantifier) e.g. allow either `color` or `colour` as an input
`regex(., '^ah*!$')`	Restrict a character within a word by using the `*` (quantifier) e.g. allow either `a!` or `ah!` or `ahh!` or `ahhh!` and so on as an input
`regex(., '^ah+!$')`	Restrict a character within a word by using the `+` (quantifier) e.g. allow either `ah!` or `ahh!` or `ahhh!` and so on as an input
`regex(., '^\D$')`	Restrict an input to a non-digit character (e.g. `a` or `c` or `!` or `#` or `%` etc.)
`regex(., '^\D{5 }$')`	Restrict an input to five non-digit character (e.g. `aZcB!#%` etc.)
`regex(., '^[A-Z\s]+$')`	Restrict all the text input to uppercase, excluding special characters (! @ # $ % ^ & * ( ) . , ? / ” ‘ etc.)
`regex(., '^[A-Z\W]+$')`	Restrict all the text input to uppercase, including special characters (! @ # $ % ^ & * ( ) . , ? / ” ‘ etc.)
`regex(., '^[a-z\s]+$')`	Restrict all the text input to lowercase, excluding special characters (! @ # $ % ^ & * ( ) . , ? / ” ‘ etc.)
`regex(., '^[a-z\W]+$')`	Restrict all the text input to lowercase, including special characters (! @ # $ % ^ & * ( ) . , ? / ” ‘ etc.)

Examples related to use of a combination of letters and numbers¶

XLSForm Regex	Description
`regex(., '^\w$')`	Restrict one character which matches between `a` to `z` or `A` to `Z` or `0` to `9` or `_` (i.e. match one character from `[a-zA-Z0-9_]`)
`regex(., '^\w{3}$')`	Restrict three character which matches between `a` to `z` or `A` to `Z` or `0` to `9` or `_` (i.e. match one character from `[a-zA-Z0-9_]`)
`regex(., '^[A-Z]{3}[_][A-Z]{3}[_][0-9]{4}[_][0-9]{4}$')`	Restrict your beneficiary ID to a specific format e.g. `CAR_PRC_2020_0048`
`regex(., '^CAR-PRC-2020-[0-9]{4}$')`	Restrict your beneficiary ID to a specific format e.g. `CAR-PRC-2020-0048` (where the enumerators should enter an exact match from `CAR` to `-` i.e. `CAR-PRC-2020-` and can enter any 4 digit serial number)
`regex(., ‘^[$\|\£]\d{3}$’)`	Restrict a currency input of three digits with a currency sign (either `dollar` or `pound`) in front (e.g. `$999` or `£500`)
`regex(., '^\W(\w+\b\W){3}$')`	Restrict an exact input of number of words (e.g. to restrict exactly 3 words `I love you.`)
`regex(., '^\W(\w+\b\W){3,5}$')`	Restrict an input of number of words (e.g. to restrict a range of words say `3` to `5`)

Examples related to the restriction of valid email inputs¶

These examples are purely illustrative and should be adjusted for your use-case. Using regex for constraining email addresses does not guarantee that they are valid, only that they follow an expected pattern.

Regex	Description
`regex(., '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9]+(-[a-zA-Z0-9]+)(\.[a-zA-Z0-9]+(-[a-zA-Z0-9]+))*\.[a-zA-Z]{2,}$')`	Restrict input to valid email address (e.g., `example.domain.com` or `example.domain.com.np)`.

Examples related to use of time inputs¶

Regex	Description
`regex(., '^([00-23]{0,2}:[00-59]{0,2}:[00-59]{0,2})$')`	Restrict a time input in `24` hour format
`regex(., '^([00-12]{0,2}:[00-59]{0,2}:[00-59]{0,2})$')`	Restrict a time input in `12` hour format

Considerations when using regex¶

If you wish to use a regex constraint on a number in a text type question, make sure you always have the value numbers under the appearance column. This restricts the display of alphabets, making only numbers visible for inputs.
The Collect Android app and Enketo behave differently with their handling of regex expressions. Collect behaves as if you have used the anchors ^ and $ around the expression (even if you have not used them), while Enketo requires the anchors as mandatory for an exact match.

What do you need help with?

Restricting Text Responses With Regular Expressions¶

To use a regex in KoboToolbox, follow these steps¶

How do I build the regex that I need?¶

Characters with accents¶

Considerations when using regex¶

Was this article useful?