Atomically update serialized PHP arrays in MySQL

Okay, okay, it’s hard to find a use case for this when it’s so obvious that the correct way to handle one-to-many is with JOIN. But if you’re already committed to your schema and you decide you need to append serialized PHP data to a row atomically, you can cons serialized values with this query:

INSERT INTO tbl
  …
  serialized = "i:1;"
  ON DUPLICATE KEY UPDATE
    serialized = CONCAT(
      'a:3:{i:0;s:4:"cons";i:1;',
      VALUES(serialized),
      'i:2;',
      serialized,
      '}'
    )

After you have performed this three times with the serialized values 1, 2, and 3, the row contains this:

'a:3:{i:0;s:4:"cons";i:1;i:3;i:2;a:3:{i:0;s:4:"cons";i:1;i:2;i:2;a:2:{i:0;s:4:"cons";i:1;i:1;}}}'

After unserializing, deconstruct it with this function:

function decons($list) {
    $res = array();
    while ( $list != array() ) {
        if ( $list[0] === 'cons' ) {
            array_unshift( $res, $list[1] );
            $list = $list[2];
        } else {
            array_unshift( $res, $list );
            break;
        }
    }
    return $res;
}

The result:

array(1, 2, 3)

I haven’t actually used it (probably never will) but you are welcome to try this at home!

Proving that this is stupid is left as an exercise for the reader.

1 Comment

Filed under Coding, MySQL, PHP

Erlang is a hoarder

One day you set aside a shoebox to store newspaper clippings. Suddenly you are trapped under an avalanche of whole newspapers and wondering how long your body will lie there before anyone misses you.

That is what kept happening to my Erlang apps. They would store obsolete binary data in memory until memory filled up. Then they would go into swap and become unresponsive and unrecoverable. Eventually somebody would notice the smell and restart the server.

The problem seems to be related to Erlang’s memory management optimizations. Sometimes an optimization becomes pathological. If you store a piece of binary data for a while (a newspaper clipping) Erlang “optimizes” by remembering the whole binary (the newspaper). When you remove all references to that data (toss the clipping) Erlang sometimes fails to purge the data (lets the newspapers pile up everywhere). If nobody shows up to collect the garbage, Erlang dies an embarrassing death.

The first step to recovery is to monitor the app’s memory footprint and log in every so often to sweep out the detritus. It can be tricky to find the PIDs that need attention and tragic if you arrive too late. The permanent solution is to build periodic garbage collection into the app. It’s not hard to do. The only hazard is doing it too often since it incurs some CPU overhead.

Each time I have found an app doing this, I’ve had to locate the offending module and install explicit garbage collection. If there is a periodic event, such as a timeout that happens every second, I’ll use it to call something like this:

gc(Tick) ->
    case Tick rem 60 of
        0 -> erlang:garbage_collect(self());
        _ -> ok
    end.

Today I installed this simple code and here is the result:

Memory footprint reduced drastically

Memory footprint reduced drastically


CPU utilization raised slightly

CPU utilization raised slightly

For the cost of 5% of one CPU core I stopped the cycle of swap and restart. I would like to learn why my binaries are not being garbage collected automatically. The processes involved queue the binaries in lists for a short time, then send them to socket loops which dispose of them via gen_tcp:send/2. Setting fullsweep_after to 0 had no effect. I’ll be interested in any theories. However, I’m not looking for a new solution since mine is satisfactory. I hope other Erlang hackers find it useful.

14 Comments

Filed under Coding, Erlang

2011 in review

The WordPress.com stats helper monkeys prepared a 2011 annual report for this blog.

Here’s an excerpt:

The concert hall at the Syndey Opera House holds 2,700 people. This blog was viewed about 46,000 times in 2011. If it were a concert at Sydney Opera House, it would take about 17 sold-out performances for that many people to see it.

Click here to see the complete report.

Leave a Comment

Filed under blogging, Stats

Knuth on Knowing

We often fail to realize how little we know about a thing until we attempt to simulate it on a computer.
Donald E. Knuth
The Art of Computer Programming, Volume 1, Third Edition
Section 2.2.5, Exercise 10 (p. 298) 

Leave a Comment

Filed under Coding, Programming

WCSF 2011 Voodoo

Rarst asks: what magic turns pretty permalinks into query variables?

The setup:

The magic:

foreach ( $rewrite as $match => $query ) {
	if ( preg_match("#^$match#", $request_match, $matches) ) {
		// Got a match.
		$this->matched_rule = $match;
		break;
	}
}

The real voodoo is in creating the rewrite rules. Example: bbPress

  • register_post_types
  • register_taxonomies
  • add_rewrite_tags
  • generate_rewrite_rules

Exercise: optimize parse_request by restructuring the rules into a tree.

Nacin suggests: wp-hackers Skip Main Query

  • the grand scheme of things (png, blog post)
  • $wp->init()
  • class freshlypressed_wp extends wp
  • wp() calls $wp->main()
  • $wp->main() calls $this->parse_request()
  • $this is a freshlypressed_wp

Problems:

  • Can’t extend a variable class (class my_wp extends $wp_class)
  • No pluggable inheritance chaining
  • No way for several plugins to cooperatively extend a class

Leave a Comment

Filed under Asides, wordcamp, WordPress