Belle Nuit Montage - Editing etc. - Lausanne - Suisse - e-mail: matti@belle-nuit.com
Homepage : Textfilter : Examples [ Index - EDL - HTML - Registration - Subtitle] - Search - Sitemap
The following is an example how to use Textfilter in a real-world context. If you did not read the Userguide yet, you may want to do that before, so that you can follow each step below.
You can build the project following the explications, but you can also open the project (which is a textfile) from this link HTMLCleaningProject. If you are reading this from internet, click on the link, save it as a textfile and then open it with Textfilter.
You receive regularly HTML files from your client to publish on the web. Your client uses word to export the HTML code from his documents. He then uses Home Page, but does not properly format the page.
Before Textfilter, i spent 15 minutes cleaning up and reformatting the textfile using find-replace in Home Page and manually reformatting the file. With Textfilter, I can clean up most of the file with one filtering through a project and then manually fine tune things like missing links etc.
Open a new textfilter project. An Input file filter is already specified. Select the file Forum1_01.htm. You can download the file, if you are using this example online.
Add an Output Folder filter and select a new folder to save the file. Open Netscape open the file in the output folder. You have now linked Netscape to the output file and can check all changes with a simple Reload command.
To insert filters between the Input file and the Output folder filter, select the Input file filter and add the filter, or select the Output folder filter and add the filter while holding the option-key.
When you analyze the file, you see that there is a confusion between logical and physical styles
We remove first the headers of level two. Add a Replace filter with the parameters find "<H2>(.*)</H2>" and replace "$1" and check the Regex option. The .* notation finds any text sequences, and the paranthesis saves it to register 1. So the filter will find any text between the h2 start and end tags and remove the tags. You could also have made two non-regex replace filters which remove "<H2>" and "</H2>" separately.
Now, you can go straightforward. Add a Replace filter with the parameters find "<FONT SIZE\=\"\+4\".*>(.*)</FONT>" and replace "<H1>$1</H1>" and check the Regex option . In this case, the replace is only possible with the regex option, because we change the tags and we also remove the font name information which comes after the font size inside the font tag.
Add a Replace filter with the parameters find "<FONT SIZE\=\"\+1\".*>(.*)</FONT>" and replace "<H3>$1</H3>" and check the Regex option .
Now, we are ready to remove all other font tags. Add a Replace filter with the parameters find "<FONT .*>(.*)</FONT>" and replace "$1". You will note that one occurence of the tag is not removed at the place
<H3><B><A
NAME=Weiterbildung></A>Weiterbildung
/ Ausbildung</B></FONT> </P>
This happens because there was no opening font tag.You can go for sure and add another Replace filter which removes only "</FONT>" tags.
Now we want to capture the headers of level 4. This is a little bit more difficult. They are only bold, but a lot of other text is bold, too, and we do not want to change this text to header 4. However, there is always a link anchor at the beginning of the title, so we can put this also into the search.
Add a Replace filter with the parameters find "<P><B><A(.*)</B>.*</P>" and replace "<H4><A$1</H4>" and check the Regex option .So we look for all paragraphs which are bold and start with an anchor. I had to put an .* between the closing bold and the closing paragraph, because sometimes there is a nonbreaking space inserted there.
We are ready to reformat the table. We will remove all widthes in the table header and in the column tags.
Add a Replace filter with the parameters find "<TABLE (.*) WIDTH=.*>" and replace "<TABLE $1>" and check the Regex option .This will retain the tag without the width.
Add a Replace filter with the parameters find "<TD (.*) WIDTH=.*>" and replace "<TD $1>" and check the Regex option.This will keap the vertical align, but remove the width.
Now, we are quite finished. We still have to remove the emdashes and the empty lines.
Add a Replace filter with the parameters find "&emdash;" and replace "-".
Add a Replace filter with the parameters find "<P> </P>" and replace "".
Click on the output folder filter, save the project for later use and you are done. You can now prefilter the html file you receive from your client and concentrate on real editing.
You may now automatically insert the css file, if you use one. You may also automatically add meta-tags, based on a script which searches for words sop you do not have to write it for every file from scratch.
This example was written before I wrote the HTML Tags filter. You may create a much simpler project using this filter.
Textfilter was written with REALbasic http://www.realsoftware.com
![]()
Homepage : Textfilter : Examples [ Index - EDL - HTML - Registration - Subtitle] - Search - Sitemap
e-mail: matti@belle-nuit.com - www.belle-nuit.com - 6.1.02