ASP / VB Script Smart-tag stripper

Often, input comes from MS Word etc, which tends to insert a great deal of 'smart tags'. Combine this with the problems of Euro characters, copyright, accented letters etc, and you shortly have a database full of text that displays as square boxes once it has been through your ancient Cold Fusion code.

I spent a long time looking for useful solutions, and here is my own effort - a web page that you can paste text into, 'fix' it, and preview how it looks in the next page. The fixing replaces smart characters with normal ones, and then loops through and changes all odd characters from 129 to 255 to their escaped versions.


<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<!-- Evil Tag Stripper by Ralpharama.co.uk -->
<title>Tag Stripper</title>
This utility removes all evil tags that cause display problems on mywebsite.com.
<br>Paste text into the box, click on FIX, then copy fixed text before pasting it into required web-page.
<form name=stripper" action="stripper_action.asp" method="POST">
<textarea id="txt" name="txt" cols="80" rows="30"></textarea>
<input type="submit" value="FIX">


<!-- Evil Tag Stripper by Ralpharama.co.uk -->
sourceTxt = Request("txt")
destnTxt = fixit(sourceTxt)
Function fixit(strText)
strText = Replace(strText,"Source","Dest")
' Smart Open Single Quote
strText = Replace(strText,Chr(145),"'")
' Smart Close Single Quote
strText = Replace(strText,Chr(146),"'")
' Smart Open Double Quote
strText = Replace(strText,Chr(147),Chr(34))
' Smart Close Double Quote
strText = Replace(strText,Chr(148),Chr(34))
' Smart Short Hyphen
strText = Replace(strText,Chr(150),"-")
' Smart Long Hyphen
strText = Replace(strText,Chr(151),"--")
' Odd Apostrophe Top-Right
strText = Replace(strText,Chr(180),"'")
' Cidilla without a letter / Odd Comma
strText = Replace(strText,Chr(184),",")
' Bullet
strText = Replace(strText,Chr(149),"·")
' Smart Dot dot dot
strText = Replace(strText,Chr(133),"...")
' Bottom Quote
strText = Replace(strText,Chr(132),Chr(34))
' Approx symbol at top
strText = Replace(strText,Chr(152),"~")
' Approx symbol (long)
strText = Replace(strText,Chr(126),"~")
' Line Feed
strText = Replace(strText,Chr(10),"<br>")
' CR
strText = Replace(strText,Chr(21),"<br>")
' Do all Greater than Char 128
For i = 129 to 255
c = "&#" & i & ";"
strText = Replace(strText,Chr(i),c)
fixit = strText
End Function
<title>Tag Stripper Results</title>
<script language="JavaScript">
function updateIt()
var a=document.getElementById("ref");
var b=document.getElementById("txt");
a.innerHTML = b.value;
Results. All nasty characters removed, some HTML added.
<form name="results" type="POST">
<textarea id="txt" name="txt" cols="80" rows="30"><%=destnTxt %></textarea>
<input type="button" value="Select All" onClick="javascript:this.form.txt.focus();this.form.txt.select();">
| <a href="tag_stripper.asp">[Go back and do another]</a>
<b>Preview of how it will look on website.</b>
<form name="refresh">
<input type="button" value="Refresh this" onClick="javascipt:updateIt()">
<div id="ref">
Response.Write destnTxt

This is a pretty nifty script. I've been trying to modify it so that it will actually search and replace text in an html document when activated. I was wondering if you had any tips for that.

If you remove the lines that replace chr(10) and chr(21) with br tags, then it should leave html in-tact, but with the tags removed as usual. is that what you mean?
