cjhaas.com

PHP Base85 encode 128-bit integer (GUID/UUID)

Posted in PHP by Chris Haas on November 12th, 2013

I’m working with GUIDs in PHP and am trying to find the most compact way to represent a 128-bit integer via a string. Searches showed theory and C# code along with a Wikipedia article that referenced many languages except for PHP so I decided to roll my own. The code below relies on having the GNU Multiple Precision extension installed which I’ll leave up to you. You can test this by looking at RFC1924 Section 5 which has the number 21932261930451111902915077091070067066 which encodes to 4)+k&C#VzJ4br>0wv%Yp

  /**
   * Converts a 32-character guid string to a 20-character string using Base85
   *
   * @since  0.3.1
   *
   * @see  http://www.mathsisfun.com/binary-decimal-hexadecimal-converter.html  Hex/Dec/Binary conversion with large numbers.
   * @see  http://tools.ietf.org/html/rfc1924                                   Section 5 has sample 128-bit integers in different formats.
   *
   * @param  string $guid_as_hex_string A guid expressed as a hexidecimal string.
   * @return string                     A 20-character representation of the string encoded as base-85.
   */
  public static function convert_guid_to_base85($guid_as_hex_string){
    //Remove any non-guid characters
    $guid_as_hex_string = preg_replace('/[^0-9A-F]/i', '', $guid_as_hex_string);

    //Not a GUID, fail
    if(strlen($guid_as_hex_string) !== 32){
      return false;
    }

    //Possible characters
    $chars = array('0', '1', '2', '3', '4', '5', '6', '7', '8', '9',
                   'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z',
                   'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z',
                   '!', '#', '$', '%', '&', '(', ')', '*', '+', '-', ';', '<', '=', '>', '?', '@', '^', '_', '`', '{', '|', '}', '~'
                  );

    //Build all 20 (nineteen through zero) powers of 85
    $powers = array();
    for($i = 19; $i >= 0; $i--){
      $powers[] = gmp_pow(85, $i);
    }

    //Create our large integer
    $t = gmp_init($guid_as_hex_string, 16);

    //Buf will be our return string, initialize as empty
    $buf = '';

    //Loop through each power in decending order
    foreach($powers as $pow){

      //Divide the current large number by the current power of 85
      //Returns an array, the first element is the integer division of the two, the second is remainder
      $ir = gmp_div_qr($t, $pow);

      //If the result is greater than or equal to 1
      if(gmp_cmp($ir[0], 1) >= 0){
        //Get the character at the integer representation of that position
        $buf .= $chars[gmp_intval($ir[0])];
      }else{
        //Otherwise get the character at the zero position
        $buf .= $chars[0];
      }

      //Reset our large number to just the remainder of the division
      $t = $ir[1];

      //Repeat
    }

    return $buf;
  }

This code isn’t getting run a million times per day so for future reading purposes I’m building the power array manually. The long form of it would be

    $powers = array(
                    gmp_pow(85, 19),
                    gmp_pow(85, 18),
                    gmp_pow(85, 17),
                    gmp_pow(85, 16),
                    gmp_pow(85, 15),
                    gmp_pow(85, 14),
                    gmp_pow(85, 13),
                    gmp_pow(85, 12),
                    gmp_pow(85, 11),
                    gmp_pow(85, 10),
                    gmp_pow(85,  9),
                    gmp_pow(85,  8),
                    gmp_pow(85,  7),
                    gmp_pow(85,  6),
                    gmp_pow(85,  5),
                    gmp_pow(85,  4),
                    gmp_pow(85,  3),
                    gmp_pow(85,  2),
                    gmp_pow(85,  1),
                    gmp_pow(85,  0),
                  );

And if you really want to optimize things (possibly) then you can just use the actual numbers. I didn’t bother testing perf on gmp so I don’t know (or care) what’s faster, math or string parsing.

    $powers = array(
                    gmp_init('4559944833472277161543903350830078125', 10),
                    gmp_init('53646409805556201900516510009765625', 10),
                    gmp_init('631134233006543551770782470703125', 10),
                    gmp_init('7425108623606394726715087890625', 10),
                    gmp_init('87354219101251702667236328125', 10),
                    gmp_init('1027696695308843560791015625', 10),
                    gmp_init('12090549356574630126953125', 10),
                    gmp_init('142241757136172119140625', 10),
                    gmp_init('1673432436896142578125', 10),
                    gmp_init('19687440434072265625', 10),
                    gmp_init('231616946283203125', 10),
                    gmp_init('2724905250390625', 10),
                    gmp_init('32057708828125', 10),
                    gmp_init('377149515625', 10),
                    gmp_init('4437053125', 10),
                    gmp_init('52200625', 10),
                    gmp_init('614125', 10),
                    gmp_init('7225', 10),
                    gmp_init('85', 10),
                    gmp_init('1', 10),
                  );