1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
|
# Regular Expression Tokenizer
Tokenizes strings that represent a regular expressions.
[![Build Status](https://secure.travis-ci.org/fent/ret.js.svg)](http://travis-ci.org/fent/ret.js)
[![Dependency Status](https://david-dm.org/fent/ret.js.svg)](https://david-dm.org/fent/ret.js)
[![codecov](https://codecov.io/gh/fent/ret.js/branch/master/graph/badge.svg)](https://codecov.io/gh/fent/ret.js)
# Usage
```js
var ret = require('ret');
var tokens = ret(/foo|bar/.source);
```
`tokens` will contain the following object
```js
{
"type": ret.types.ROOT
"options": [
[ { "type": ret.types.CHAR, "value", 102 },
{ "type": ret.types.CHAR, "value", 111 },
{ "type": ret.types.CHAR, "value", 111 } ],
[ { "type": ret.types.CHAR, "value", 98 },
{ "type": ret.types.CHAR, "value", 97 },
{ "type": ret.types.CHAR, "value", 114 } ]
]
}
```
# Token Types
`ret.types` is a collection of the various token types exported by ret.
### ROOT
Only used in the root of the regexp. This is needed due to the posibility of the root containing a pipe `|` character. In that case, the token will have an `options` key that will be an array of arrays of tokens. If not, it will contain a `stack` key that is an array of tokens.
```js
{
"type": ret.types.ROOT,
"stack": [token1, token2...],
}
```
```js
{
"type": ret.types.ROOT,
"options" [
[token1, token2...],
[othertoken1, othertoken2...]
...
],
}
```
### GROUP
Groups contain tokens that are inside of a parenthesis. If the group begins with `?` followed by another character, it's a special type of group. A ':' tells the group not to be remembered when `exec` is used. '=' means the previous token matches only if followed by this group, and '!' means the previous token matches only if NOT followed.
Like root, it can contain an `options` key instead of `stack` if there is a pipe.
```js
{
"type": ret.types.GROUP,
"remember" true,
"followedBy": false,
"notFollowedBy": false,
"stack": [token1, token2...],
}
```
```js
{
"type": ret.types.GROUP,
"remember" true,
"followedBy": false,
"notFollowedBy": false,
"options" [
[token1, token2...],
[othertoken1, othertoken2...]
...
],
}
```
### POSITION
`\b`, `\B`, `^`, and `$` specify positions in the regexp.
```js
{
"type": ret.types.POSITION,
"value": "^",
}
```
### SET
Contains a key `set` specifying what tokens are allowed and a key `not` specifying if the set should be negated. A set can contain other sets, ranges, and characters.
```js
{
"type": ret.types.SET,
"set": [token1, token2...],
"not": false,
}
```
### RANGE
Used in set tokens to specify a character range. `from` and `to` are character codes.
```js
{
"type": ret.types.RANGE,
"from": 97,
"to": 122,
}
```
### REPETITION
```js
{
"type": ret.types.REPETITION,
"min": 0,
"max": Infinity,
"value": token,
}
```
### REFERENCE
References a group token. `value` is 1-9.
```js
{
"type": ret.types.REFERENCE,
"value": 1,
}
```
### CHAR
Represents a single character token. `value` is the character code. This might seem a bit cluttering instead of concatenating characters together. But since repetition tokens only repeat the last token and not the last clause like the pipe, it's simpler to do it this way.
```js
{
"type": ret.types.CHAR,
"value": 123,
}
```
## Errors
ret.js will throw errors if given a string with an invalid regular expression. All possible errors are
* Invalid group. When a group with an immediate `?` character is followed by an invalid character. It can only be followed by `!`, `=`, or `:`. Example: `/(?_abc)/`
* Nothing to repeat. Thrown when a repetitional token is used as the first token in the current clause, as in right in the beginning of the regexp or group, or right after a pipe. Example: `/foo|?bar/`, `/{1,3}foo|bar/`, `/foo(+bar)/`
* Unmatched ). A group was not opened, but was closed. Example: `/hello)2u/`
* Unterminated group. A group was not closed. Example: `/(1(23)4/`
* Unterminated character class. A custom character set was not closed. Example: `/[abc/`
# Install
npm install ret
# Tests
Tests are written with [vows](http://vowsjs.org/)
```bash
npm test
```
# License
MIT
|