August 4, 2016

Querying and Manipulating HTML using PHP

Querying and Manipulating HTML using PHP

You know PHP so why not use it locally to query and manipulate HTML files?

You can use it to automate any number of tasks.

In what circumstances would you want to use PHP scripts with HTML files?
– cleaning up files that have been converted to HTML
– automation of bulk conversions
– creating dynamic pages offline
– link validation code, formatting and transformations

Don’t duplicate what other tools can easily do (tidy, xmllint, a good (XML) editor)

Comparison of manipulating XML in PHP to other languages: Lua, Python, Perl

Ways to make using PHP scripts easier.
– Simplifying command-line scripts is fairly easy in Linux and Mac OS X
– How to do this in Windows

The different ways you can use PHP with HTML: DOMDocument, XSLTProcessor,
SimpleXMLElement, manually building/altering pages, using regex

Discuss XPath especially in conjunction with DOMDocument

Ways to alter one file, all files in a directory or recursively

Code examples for each way to manipulate HTML.

Examples of the kinds of things you can do:
– Querying – collecting and validating all links on a website
– Formatting text in <pre> tags
– transforming HTML format (changing tables to lists, for example)

Pros and cons of various methods

