Restricting Text Responses With Regular Expressions

Last updated: 3 May 2024

A regular expression, or regex, is a search pattern used for matching specific characters and ranges of characters within a string. It is widely used to validate, search, extract, and restrict text in most programming languages. KoboToolbox supports regex to control the length and characters during data entry to a particular question (e.g. controlling the entry of mobile number to exactly 10 digits, controlling the entry of a valid email id etc.).

To use a regex in KoboToolbox, follow these steps

  1. Prepare a Text question type.

  2. Go to the question’s Settings.

  3. Go to Validation Criteria and choose the Manually enter your validation logic in XLSForm code option.

  4. In the Validation Code box, enter your regex formula between the quotation marks (' ') of the regex(., ' ') format. For reference, the period (.) refers to ‘this question’, while the regular expression inside the quotation marks (' ') needs to conform to the established regex rules.

  5. (Optional) Add a custom Error Message for the person entering data to see when they don’t meet the regex criteria.

image

Regex can also be coded in XLSForm, under the constraint column:

type

name

label

appearance

constraint

constraint_message

text

q1

Mobile number of respondent

numbers

regex(., ‘^[0-9]{10}$’)

This value must be only 10 digits

Alternatively, you can create a calculate question type and then define the regex code under the calculation column. You could then use this variable as many times as needed in the survey:

type

name

label

calculation

constraint

constraint_message

calculate

q0

‘^[A-Z]{1}[a-z]{1,}\s[A-Z]{1}[a-z]{1,}$’

text

q1

Name of the Enumerator

regex(., ${q0})

Please use this format: Kobe Bryant

text

q2

Name of the Respondent

regex(., ${q0})

Please use this format: Kobe Bryant

integer

q3

Age of the Respondent

How do I build the regex that I need?

In addition to the examples and tips provided below, please visit this website for more help and examples.

Regex in KoboToolbox should always be written in-between the apostrophes regex(., ' ') as shown in the examples.

Regex

Description

^

The caret symbol matches the start of a string without consuming any character.

$

The dollar symbol matches the end of a string without consuming any character.

[abc]

Matches either a, b or c from within the square brackets [ ].

[a-z]

Matches any lowercase character from a to z.

[A-Z]

Matches any uppercase character from A to Z.

[0-9]

Matches any whole numbers from 0 to 9.

[a-zA-Z0-9]

Matches any character from a to z or A to Z or 0 to 9.

[^abc]

Matches any character except a, b or c.

[^A-Z]

Matches any characters except those in the range A to Z.

(apple)

The grouping character ( ) matches anything that is within the parenthesis.

|

A vertical bar matches any element separated.

\

A back slash is used to match the literal value of any metacharacter (e.g. try using \. or \@ or \$ while building regex).

\number

Matches the same character as most recently matched by the nth (number used) capturing group.

\s

Matches any space or tab.

\b

Matches, without consuming any characters immediately between a character matched by \w and a character not matched by \w (in either order). \b is also known as the word boundary.

\d

Matches any equivalent numbers [0-9]

\D

Matches anything other than numbers (0 to 9).

\w

Matches any word character (i.e. a to z or A to Z or 0 to 9 or _).

\W

Matches anything other than what \w matches (i.e. it matches wild cards and spaces).

?

A question mark used just behind a character matches or skips (if not required) a character match.

*

An asterisk symbol used just behind a character matches zero or more consecutive character.

+

The plus symbol used just behind a character matches one or more consecutive character.

{x}

Matches exactly x consecutive characters.

{x,}

Matches at least x consecutive characters (or more).

{x,y}

Matches between x and y consecutive characters.

Characters with accents

Regex

Description

[A-zÀ-ú]

Accepts lowercase and uppercase accents characters

[A-zÀ-ÿ]

Accepts lowercase and uppercase accents characters but including letters with an umlaut (includes [ ] ^ \ × ÷)

[A-Za-zÀ-ÿ]

Accepts lowercase and uppercase accents characters but not including [ ] ^ \

[A-Za-zÀ-ÖØ-öø-ÿ]

Accepts lowercase and uppercase accents characters but not including [ ] ^ \ × ÷

Considerations when using regex

  • If you wish to use a regex constraint on a number in a text type question, make sure you always have the value numbers under the appearance column. This restricts the display of alphabets, making only numbers visible for inputs.

  • The Collect Android app and Enketo behave differently with their handling of regex expressions. Collect behaves as if you have used the anchors ^ and $ around the expression (even if you have not used them), while Enketo requires the anchors as mandatory for an exact match.