Monday, January 6, 2014

Entity Cleaner

The entity cleaner was a workaround for a problem I didn't want to take more time to resolve.  I have a script that queries a MySQL database and outputs an html table.  This works great, however, I wanted to be able to format some of the content in the table, in this case, align the text to the right side of the cell.  The code to right align the content was easy.  However, when MySQL outputted the HTML, the HTML I had added was converted into HTML entites.  So, instead of showing the content right aligned, it showed all the markup around the content.  So, I needed to do some post processing of the file to convert the HTML entities to their actual characters.  This Perl script was the answer.  Since my output only contained three entites, they are the only ones I replace here.  However, this could be easily expanded to include all the basic entities.

Line 2 gets the input file name and stores it for later use.  Line 3 sets up the array that will contain the content as it's being cleaned.  Line 4 sets up a counter.  Line 6 opens the file.  Lines7-14 clean the entities.  Line 8 grabs the next single line from the input file.  Line 9 stores that line in the corresponding element in the array.  Line 10 replaces the &lt; with <.  Line 11 replaces the &gt; with >.  Line 12 replaces &quot; with ".  Line 13 moves us to the next line.

Lines 15-18 closes the intput file, sets the output file to the input file (change in place) and opens the file for outputting.

Lines 19-23 output the contents of the array to the output file and give a confirmation message.