popp crop

POPP Diary - Revision Timetable Edition

This was supposed to be the week I really got stuck into writing and revision. To be fair, I made a half-decent start. And then I gained access to the Apress project repository for this edition. In addition to all my old chapters and some familiar style guides, I came across a new possibility. Apress will accept my chapters in Markdown format!

I don’t know how new this is but to me it’s huge – because it means that for the first time changes to editorial files might be tracked in git and maybe even incorporated into dev scripts. But could I face the overhead involved in converting my old chapters to Markdown? Take a wild guess.

A couple of decades ago when I was still starting out as a coder, I worked for Time Out. One of my favourite projects was to automate the conversion of Time Out travel guides from book to Web format – saving hundreds of hours of manual work. I guess that kind of editorial geekiness has never worn off because, it turns out, I still love that stuff.

So this week, here’s what I did:

  • A bit of editorial work
  • Fixed up broken tests
  • Found a tool – Pandoc – to generate Markdown from my old chapters
  • Wrote another tool to take this conversion a few steps further (because Pandoc can’t understand the intentions behind various cusom styles)
  • Wrapped all that up into a script and added the lot to Vagrant provisioning
  • Wrote a tool to auto-import listings from their test/run homes into the correct slots in their chapters
  • Wrote yet another tool to renumber all listings on demand to support listing removals and additions

Sound like a blast? Read on for the details. Or maybe just back away and look in next week to see if I’m actually talking about PHP 8 yet. OK. If you’re sure. See you later!

Tuesday 02 June 2020

So after a day off to catch up on other work and to watch the news obsessively, it’s time to actually look at some chapters.

I tend to tackle top and tail chapters (book or section introductions and conclusions) last because those will be shaped by the work I do in between. So a good place to start might be Chapter 3 – a general introduction to PHP and Objects.

I left off last week with some warnings in the tests for Chapter 3. These were:

assertRegExp() is deprecated and will be removed in PHPUnit 10. Refactor your code to use assertMatchesRegularExpression() instead.

Good to know for the testing chapter. I’ll go ahead and fix that up with another find/change script… done.

Now that all my tests run for the original version of that chapter, I can start looking at copy and examples. I don’t yet have a generated version of my old chapter to work on but of course I have the version I submitted after author review so I can begin ploughing through that. Changes will be a bit of a pain because I’ll have to apply them back to the target chapters when I get them – but hopefully there won’t be too many changes in Chapter 3 anyway.

Of course I start Chapter 3 with the object equivalent of ‘Hello World’ – an empty class. Nothing controversial there, but it’s a chance to test some of my scripts. Like this one:

$ ./scripts/poppindex src/ch03/

03.01: 
    src/ch03//batch01/ShopProduct.php
03.02: 
    src/ch03//batch01/Runner.php
03.03: 
    src/ch03//batch01/Runner.php
03.04: 
    src/ch03//batch02/ShopProduct.php

...

This tells me where to find the code examples in source. So I can open up src/ch03//batch01/ShopProduct.php and see the first listing

<?php
namespace popp\ch03\batch01;

/* listing 03.01 */

class ShopProduct
{
    // class body
}
/* /listing 03.01 */

Of course, I only want the part between the comments. I can generate that with another scratch script:

$ ./scripts/poppout src/ch03/batch01/ShopProduct.php 03.01

// listing 03.01

class ShopProduct
{
    // class body
}

I’ve also begun working on actual text edits. The last edition was written back when many people were still running PHP 5 in their production environments. That number has obviously shrunk a lot now that PHP 5 has reached end of life. The text needs to reflect that which means a bunch of small changes in language.

Wednesday 03 June 2020

I received access to a shared repository today with all the old chapters ready to edit. There were also various introductory materials and style guides.

So I spent much of today’s work session playing with Word documents. I run Linux everywhere, old contrarian that I am, so I don’t have ready access to Word. I found I could open the chapters directly in Google Docs – but not without losing the named styles for code, results, notes and so on. LibreOffice opens and saves the chapters OK, but I need to find out whether it works for the production team at Apress. I have had mixed results with sharing between LibreOffice/OpenOffice and Word in the past.

One of the intro documents in the archive suggests that Apress would be OK with Markdown. Naturally I’d leap at a plain text solution if I were writing from scratch. It might be a bit of a pain in everyone’s arse, though, to jump horses now after five editions.

I’ve packaged up my findings and a sample chapter edit saved out from LibreOffice and mailed the team at Apress for their take.

Update: It’s confirmed – Markdown is fine. And though it won’t be trivial to switch, I’m beginning to think about the possibility of compiling a chapter – automatically incorporating listings at compile time. That would remove a huge annoyance – which is that edits need to be copy/pasted into the manuscript document every time a change is required. It would be much better to just run a compile script to generate the markdown for submission. Still thinking about this one.

Also, today, Paul Tregoing has agreed once again to fit tech reviewing into his over-busy professional life. Brilliant news.

Thursday 04 June 2020

Markdown it is. The prospect of getting all chapters into git and accessible to scripting is just too tempting. So I devoted today to automating conversion of the old chapters from Word’s docx format to Markdown.

Pandoc is an excellent tool, but it’s not easy to map custom styles to markdown – which is a problem for code, both inline and in blocks. However, you can get it to output custom styles in span and div elements if you invoke like this:

$ pandoc --from=docx+styles --to=gfm Ch3.docx > chapter3.md

This output isn’t yet usable. But with an HTML DOM manipulation library like php-html-parser it’s not too hard to build a script to take the chapter most of the rest of the way. It doesn’t have to be perfect, since this is a one shot task per chapter. It’s taken me a day’s work session to get this far, but it now feels nearly trivial to get from docx to the kind of Markdown format that Apress can use without much fiddling for each chapter.

Friday 05 June 2020

Formalised yesterday’s work by adding the requirement for Pandoc to my Vagrant provision script. By default – that is, through Yum, Centos 7 installs a pretty ancient version which does not recognise that +styles modifier. Luckily the Pandoc site offers a standalone binary. Added the php-html-parser package to my main composer.json file and wrote a script to combine the Pandoc call and post processing for converting last edition’s chapters from docx to Markdown.

./scripts/docx2md_popp chapters/00_Previous_Edition_files/978-1-4842-1995-9_Ch3.docx chapters/01_First_Drafts/Ch3.md

Had a bit of shock when, during testing vagrant provision, I encountered a fatal error building the latest PHP. Luckily, that was fixed by running make clean and running again.

I feel like I might have blundered a bit into revision timetable territory here (that syndrome where you build ever more intricate colour-coded schedules for exam revision without actually ever quite getting to the work itself). But I know from experience that failing to get set up can cost big time once you’re in the editorial groove. I think I’m ready to get cracking with some real copy and code updates now, though. In summary, here’s where I’m at.

  • Vagrant environment revitalised
  • PHP 8-dev build script
  • Latest PHPUnit version working with new PHP
  • Access to previous chapters for development
  • Excellent expert tech reviewer on board to keep me honest
  • Scripts and tools source/written to convert docx to Markdown for chapters

Saturday 06 June 2020

And so to revision and writing.

… I meant it when I wrote that line this morning. I swear I did. But then I got to my first code example I felt the pull of code. The problem is that I had not yet automated import of listings into the chapter files. This meant that I am forced to keep code examples in chapters in line with the testable/runnable code. This was a great source of frustration in previous editions when I was working with Word/LibreOffice. If you’ve read POPP, you’ll know that duplication is one of the great sins I try to avoid.

So, today’s work went in to a tool named gencode. I’ll spare you the blow by blow on the development of this one – here, though is the application help block:

$ php toolsrc/scripts/gencode.php 
usage: gencode.php [options] <srcdir> <chapterfile.md> [<output.md>]
   -r  reflow. Ignore listing nn.nn and apply listings in sort order
   -f  force. Where available slots do not match listings available in -r mode -- apply anyway. Careful!
   -d  dry-run. Will show the current occupant of a slot against the incoming code index. Nothing written

By default this tool looks for all listing blocks in the Markdown version of the chapter, reading the listing comment:

```
// listing 03.22

// code example here
```

It then fetches listing 03.22 using the libraries that already exist in my toolkit for that purpose. It replaces the current code example in the chapter with an update if needed.

I have also included a ‘reflow’ mode which simply adds code examples to chapter code blocks without checking the version number – the first code listing found is added to the first code block in the chapter, the second listing to the second block, and so on. This is a risky feature which is designed to support renumbering of code examples. It will be useful but best applied carefully in conjunction with git diff to ensure that only the expected changes appear.

Sunday 07 June 2020

I mentioned yesterday the reflow option to gencode. This is designed to support another tool that solves a particularly thorny problem with numbered code examples – dealing with deletions and additions. In some chapters in POPP there are over a hundred individual code examples all numbered by chapter and listing. Imagine the fun close to deadline if it become necessary to add a listing between 03.48 and 04.49. One solution here would be do away with code numbering altogether. Another solution is the occasional listing number omission. Neither are terribly satisfactory.

Today I wrote the renum script which will renumber code examples – either to take account of additions or deletions. To delete, of course, you simply remove the listing example (either taking out the file or removing the special comment that indicates a listing). To add a listing I can indicate a new listing that will appear in sort order between 04.48 and 04.49. Here’s what that looks like:

/* listing 03.48.1 */

// some code here

/* /listing 03.48.1 */

The tools support this dot notation for sorting up to three levels – so it’s easy to insert listings. And then, when ready I can run renum:

$ php toolsrc/scripts/renum.php src/ch03/

# snip

no change: 03.46
no change: 03.47
no change: 03.48
03.48.1 -> 03.49
   src/ch03//batch15/ShopProduct.php
03.49 -> 03.50
   src/ch03//batch15/CdProduct.php
03.50 -> 03.51
   src/ch03//batch15/BookProduct.php
writing src/ch03//batch15/ShopProduct.php
writing src/ch03//batch15/CdProduct.php
writing src/ch03//batch15/BookProduct.php

After running, the same block of code has been relabelled:

/* listing 03.49 */

// some code here

/* /listing 03.49 */

As have all subsequent blocks. Of course the chapter is now out of alignment with the code blocks – what the chapter thinks of 03.49 is actually my new listing. That’s what the reflow flag to gencode is designed to handle.

Whew. OK. I’m finally happy. After two weeks of preparation, I think I can probably dive into real chapter work.