html unicode±àÂëת»»·½·¨
¶ÔÓÚ"&# 24038;&# 36793;"ÕâÖÖ&#¿ªÊ¼µÄ×Ö·û£¬Ó¦¸ÃΪhtml unicode±àÂëÀàÐÍ£¬½âÂë·½·¨ÈçÏ£º
s="&# 24038;& # 36793;"
s="×ó±ß"
import re
_=re.compile('&#(x)?([0-9a-fA-F]+);')
to_str=lambda s,charset='utf-8':_.sub(lambda result:unichr(int(result.group(2),result.group(1)=='x' and 16 or 10)).encode(charset) ,s)
print to_str(s)
Ïà¹ØÎĵµ£º
<html>
<head>
<title>text-font</title>
</head>
<body>
************************<font size="7" color="red">±êÌâÕ½Ú</font>*************************<br>
Õý³£Îı¾
<h1>Ò»¼¶±êÌâ</h1>
<h2 align=righ ......
µÃµ½HTML±êÇ©µÄIDºÍNAME
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML>
<HEAD>
<TITLE> New Document </TITLE>
<META NAME="Generator" CONTENT="EditPlus">
<META NAME="Author" CONTENT=""& ......
´¿Îı¾»¹ÊÇHTML?
---ÄÄÒ»ÖÖÓʼþÀàÐ͸üÊʺÏÄ㣿
ÒýÑÔ
Èç¹ûÄãÕý×¼±¸Æô¶¯Ò»ÏîÓʼþÓªÏú¼Æ»®£¬µ«²»È·¶¨ÊǸÃÓÃͼÎIJ¢Ã¯µÄHTMLÓʼþÀ´ÌáÉýÓʼþµÄÊÓ¾õÌåÑ飬»¹ÊÇÓô¿Îı¾µÄÓʼþÀ´Ìá¸ßÓʼþµÄËÍ´ïÂÊ£¨²¢½ÚÊ¡×ÊÔ´£©£¬Comm100½«Í¨¹ý±¾ÎÄΪÄãÁоÙÕâÁ½ÖÖÓʼþÀàÐ͸÷×ÔµÄÓÅÁÓÊÆ£¬²¢½ÌÄãÈçºÎͨ¹ýÄ£°åÀàÐͺÍÏÔʾЧ¹ûÀ´ÓÅ»¯ÄãµÄÓʼþÓªÏú¼Æ»®¡ ......
ÈçºÎÓÐÒ»¸ö×Ö·û´®ÊÇÕâÑùµÄÐÎʽstr = "&bbbLAA";
ÏëµÃµ½"L"µÄ»°¿ÉÒÔÕâÑùȥʵÏÖ£º
//sDataStr = "&bbbLAA";
//sLeftQuote = ""&bbb";
//sRightQuote = "&AA";
µ÷ÓÃÕâ¸ö·½·¨½«µÃµ½L×ֶΡ£
function abCutString( sDataStr, sLeftQuote, sRightQuote)
{
var sReturnVal = '';
var nStart ......
Dim objReg,objMatches,objMatch
Set objReg=new RegExp
objReg.Global=True
objReg.IgnoreCase=True
objReg.Pattern="<('[^']*'|""[^""]*""|[^'"">])*?>"
Set objMatches=objReg.Execute(×Ö·û´®)
For Each objMatch In objMatches
ÕÒµ½µÄHTML £ºobjMatch.value
Next
Set objMatches=Nothing
Set objRe ......