Tuesday, May 3, 2011

php string parse with look ahead

I have this string in PHP:

$string = "name=Shake & Bake&difficulty=easy";

For which I want to parse into array:

Array
(
    [name] => Shake & Bake 
    [difficulty] => easy
)

NOT:

Array
(
    [name] => Shake
    [difficulty] => easy
)

What is the most efficient way to do this ?

From stackoverflow
  • Regular expressions seems to be the best way to do this.

    <html>
    <head>
      <title>Test params</title>
    </head>
    <body>
    <?php
    test_build('a=b');
    test_build('blah=foo&foo=foo2');
    test_build('blah=foo&foo&foo2=foo3&foo');
    
    function test_build($string) {
      echo "<p>Testing: $string</p>\n";
      $params = build_params($string);
      if ($params) {
        echo "<ul>\n";
        foreach ($params as $k => $v) {
          echo "<li>'$k' => '$v'</li>\n";
        }
        echo "</ul>\n";
      } else {
        echo "<p>Found no parameters.</p>\n";
      }
    }
    
    function build_params($string) {
      preg_match_all('!([^=&]+)=([^=]*)(?=(&|$))!', $string, $matches);
      $ret = array();
      for ($i=0; $i<sizeof($matches[1]); $i++) {
        $ret[$matches[1][$i]] = $matches[2][$i];
      }
      return $ret;
    }
    ?>
    </body>
    </html>
    

    Output:

    Testing: a=b
    
        * 'a' => 'b'
    
    Testing: blah=foo&foo=foo2
    
        * 'blah' => 'foo'
        * 'foo' => 'foo2'
    
    Testing: blah=foo&foo&foo2=foo3&foo
    
        * 'blah' => 'foo&foo'
        * 'foo2' => 'foo3&foo'
    
    bob : My assumption is you can end up with a & in the value, but not the key.
    cletus : Like blah=foo&foo&blah2=foo should be { 'blah' => 'foo&foo', 'blah2' => 'foo' }? If so that's a little more complicated.
    bob : yes, blah=foo&foo&blah2=foo should be { 'blah' => 'foo&foo', 'blah2' => 'foo' }
  • There's probably a more effective way of doing this, but try

    $foo = 'name=Shake & Bake&difficulty=easy';
    $pairs = preg_split('{&(?=[^\s])}',$foo);
    //$pairs = preg_split('/&(?=[^\s])/',$foo); //equivalent, using different delimiters.
    //$pairs = preg_split('%&(?=[^\s])%',$foo); //equivalent, using different delimiters.
    $done = Array();
    foreach($pairs as $keyvalue){
     $parts = preg_split('{=}',$keyvalue);
     $done[$parts[0]] = $parts[1];
    }
    print_r($done);
    

    PHP's regex engine is PCRE, and it supports look ahead assertions. Googling around for PCRE, PHP, RegEx, look ahead assertions and zero width assertions should give you more than you ever want to know on the subject.

    bob : could you explain the meaning the curly braces. From my understanding, they are to specify repetition.
    Ben Blank : PHP's PCRE module uses "delimiters" on its expressions. Traditionally, the forward slash is used, but here Alan is using matching braces. The expressions here could also have been written '/&(?=[^\s])/' and '/=/'.
    Alan Storm : Sorry Bob, it's an old quirk of mine. PCRE makes/lets you put delimiter characters around regular expressions, it lets you include pattern modifiers directly after the expression, perl style. /like this/six They're usualy forward slashes, but when I started hacking on reg exes, it was to munge through HTML, so I found it easier to pick a different delimiter since I always needed a "/" for the pattern itself and didn't want to escape it. I picked "{}", which probably wasn't the best as those are used for repetition in an expression. Sorry to confuse you.
  • The function parse_str() does exactly what you need - just make sure, you pass the second parameter for obvious security reasons. You need to translate your input string, though:

    $string = "name=Shake & Bake&difficulty=easy";
    parse_str(str_replace(' & ', '+%26+', $string), $array);
    
    jmucchiello : I'm sure "name=Shake&Bake&difficulty=easy" will fail. The spaces were there to highlight not suggest the format.
    soulmerge : Then you could add keys with empty values in the resulting array to the previous key. I'll update the answer
    soulmerge : I just noticed that it would transform urlencoded stuff (%48 => a), which is possibly not what you want, I'll leave the answer as it was.
  • <?php
    $pattern ='/([^&]+)=([^=]+)(?=$|&[^=]+=)/';
    $test = array( 'name=Shake & Bake&difficulty=easy', 'name=Shake&Bake&difficulty=easy', 'difficulty=easy&name=Shake & Bake', 'difficulty=easy&name=Shake&Bake', 'name=Shake&Bake', 'difficulty=easy', 'name=Shake&Bake&foo&difficulty=easy', 'name=Shake&Bake&difficulty=easy&', 'name=Shake&Bake&difficulty=' ); foreach($test as $foo) { preg_match_all($pattern, $foo, $m); echo $foo, "\n"; for($i=0; $i<count($m[0]); $i++) { echo ' ', $m[1][$i], ' =$gt; "', $m[2][$i], "\"\n"; } echo "\n"; } ?>
    produces
    name=Shake & Bake&difficulty=easy
      name => "Shake & Bake"
      difficulty => "easy"
    name=Shake&Bake&difficulty=easy name => "Shake&Bake" difficulty => "easy"
    difficulty=easy&name=Shake & Bake difficulty => "easy" name => "Shake & Bake"
    difficulty=easy&name=Shake&Bake difficulty => "easy" name => "Shake&Bake"
    name=Shake&Bake name => "Shake&Bake"
    difficulty=easy difficulty => "easy"
    name=Shake&Bake&foo&difficulty=easy name => "Shake&Bake&foo" difficulty => "easy"
    name=Shake&Bake&difficulty=easy& name => "Shake&Bake" difficulty => "easy&"
    name=Shake&Bake&difficulty= name => "Shake&Bake"
    which seems to be working (except for difficulty= not being matched in the last example).
    I'm not sure whether a once-only subpattern matching would improve the speed. You might want to look this up.

  • urlencode() the & in "Shake & Bake" and use parse_str()

0 comments:

Post a Comment

Note: Only a member of this blog may post a comment.