Hide this comment

Ooh Brian you're so unfunctional today... kiddin :-)

It is indeed a nice exercise. Unfortunately it can't be done with regexes because your language is context free (more general than a regular language). It requires a stack machine to be parsed. So here we go with the state machine implementation... The stack state is given by the 'nesting' variable.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
open System


let newToken = ""


let eatChar (nesting, (currentToken : String), (prevTokens : String list)) currentChar =
    match currentChar with
        (* Split a token on spaces only if it is at top level *)
        | ' ' -> if nesting = 0 then (0, newToken, List.append prevTokens [currentToken])
                 else (nesting, currentToken + currentChar.ToString(), prevTokens)
        | '(' -> if nesting = 0 then (1, newToken, List.append prevTokens [currentToken])
                 else (nesting + 1, currentToken + currentChar.ToString(), prevTokens)
        | ')' -> if nesting = 1 then (0, newToken, List.append prevTokens [currentToken])
                 else (nesting - 1, currentToken + currentChar.ToString(), prevTokens)
        | _   -> (nesting, currentToken + currentChar.ToString(), prevTokens)


let parse (string : String) =
    let (_, finalToken, tokens) = Seq.fold eatChar (0, "", []) string
    List.append tokens [finalToken] |> List.map (fun s -> s.Trim()) |> List.filter (String.length >> (<) 0)

It doesn't fully respect your example because it removes () from the generated tokens. I'm guessing it would still be ok. It's easy to extend to generate a tree of tokens (full parsing).

1
2
3
> parse "(alpha (brown) (charlie delta)) echo (foxtrot golf)";;
val it : string list =
  ["alpha (brown) (charlie delta)"; "echo"; "foxtrot golf"]

Mau

-Edited to correct an issue with spaces.-

By on 5/21/2010 2:36 PM ()Reply
Hide this comment

Thanks Brian and Mau.

You guys are really good although I'm bit disappointed that it's not using regex. I thought regex is a magical tool that can deal with everything. I didn't know it can't be done with regex (no wonder why it's so hard to find a right pattern :P).

By on 5/21/2010 5:38 PM ()Reply
Hide this comment

Hey Alex, if you knew that you were going to have at most n of levels of nesting, then you could use a regex. But I think it would grow ugly pretty quick as n increases.

By on 5/22/2010 3:06 AM ()Reply
Hide this comment

I see. I thought it could be done with a recursive function that might be able to handle nested brackets. How about generate regex patterns dynamically? Say evaluate the string once so find out the depth of nested brackets (I guess this one would be really simple and quick.) and then pragmatically produce right regex patterns for it... Does this sound crazy? :P

Thanks for your explanation and the source codes.

Regards,

By on 5/22/2010 5:35 AM ()Reply
Hide this comment

> I see. I thought it could be done with a recursive function that might be able to handle nested brackets.

It can. The recursion gives you the stack.

> How about generate regex patterns dynamically? Say evaluate the string once so find out the depth of nested brackets (I guess this one would be really simple and quick.)

> and then pragmatically produce right regex patterns for it... Does this sound crazy? :P

Well if you scan the input to count brackets you may as well tokenize while you're at it.

I'd like to see your dyamic RE idea implemented though! :-)

By on 5/22/2010 6:54 AM ()Reply
Hide this comment

anyone think about using FParsec for this kind of thing ?

The other useful one I used in Lua(when I need to write a json parser) is PEG(i.e. LPEG in lua). I found it to be very powerful and seems to be very fast too.

By on 5/22/2010 8:12 AM ()Reply
Hide this comment

I hacked this because it was fun, but I don't necessarily claim that the code is good or correct.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
 
type Data =
    | Atom of string
    | List of Data list

let parse (input:string) =
    let max = input.Length 
    let rec skipWS i =
        if i < max && input.[ i] = ' ' then skipWS (i+1) else i
    let rec parseList n =
        assert(input.[ n] = '(')
        let mutable i = skipWS(n+1)
        let l = new ResizeArray<_>()
        while i < max && input.[ i] <> ')' do
            let data, j = parseData i
            l.Add(data)
            i <- j
        if i=max then failwith "bad parse - missing )"
        assert(input.[ i] = ')')
        List(Seq.toList l), i+1
    and parseAtom n =
        let mutable i = skipWS(n)
        if i=max then failwith "bad parse - expected atom"
        let sb = new System.Text.StringBuilder()
        while i < max && input.[ i] <> ' ' && input.[ i] <> '(' && input.[ i] <> ')' do
            sb.Append(input.[ i]) |> ignore
            i <- i + 1
        if sb.Length = 0 then failwith "bad parse - expected atom"
        Atom(sb.ToString()), i
    and parseData n =
        let i = skipWS(n)
        if input.[ i] = '(' then
            parseList i
        else
            parseAtom i
    let r, i = parseList 0
    r
        
let s = "(alpha (brown) (charlie delta)) echo (foxtrot golf)"
printfn "%A" (parse ("("+s+")"))

By on 5/21/2010 11:19 AM ()Reply
IntelliFactory Offices Copyright (c) 2011-2012 IntelliFactory. All rights reserved.
Home | Products | Consulting | Trainings | Blogs | Jobs | Contact Us | Terms of Use | Privacy Policy | Cookie Policy
Built with WebSharper

Logging in...