最近發現有時候抓回來的RSS,會有格式錯誤的問題,
問了我們這的大師 部落小波 後,他指引了我一條路 【HTML Tidy Library Project】
( 真的覺的 部落小波 懂的還真多耶~~ )
Tidy 這東西可以把一些缺少的或多餘的Tag修正,
因為現在很多發表文章的編輯器,都可以自己修改程式碼了~
但有時都會把程式碼改的亂七八糟~這時候這個東西就挺有用的,
可以幫你修正這些錯誤~~
很幸運的,PHP也有支援tidy了~
安裝方式如下 -- ( 偷貼 部落小波 的文章 )
(1) tidy 安裝
我是使用 SuSE, 所以就去找一找 libtidy, libtidy_devel 這兩個 rpm後, 給他裝上去
(2) 安裝 tidy extension in php
./configure --with-tidy=/path/to/libtidy
在您原本的 build options 裏面, 加上 --with-tidy
(3) build 完以後, 就可以開心的用了
下來貼上一些範例
如果是修正 XML 的話~
將在【.....醜醜風的老頭~~】後面的Tag都刪掉
<?php header("Content-Type:text/xml"); ob_start(); ?> <rss version="2.0"> <channel> <title>My Lief</title> <link>http://blog.xuite.net/chingwei/blog%26lt%3B/link> <description>爽快過生活!!</description> <item> <title>【圖】畫了個醜醜風的老頭</title> <link>http://blog.xuite.net/chingwei/blog/23691532%26lt%3B/link> <description>醜醜風的老頭~~ <?php $buffer = ob_get_clean(); $tidy_options = array( 'input-xml' => true, 'output-xml' => true, 'indent' => true, 'wrap' => false, ); $tidy = new tidy(); $tidy->parseString($buffer, $tidy_options,'utf8'); $tidy->cleanRepair(); echo $tidy; ?>
輸出結果,他幫我把少掉的Tag都補上了,真棒。
<rss version="2.0"> <channel> <title>My Lief</title> <link>http://blog.xuite.net/chingwei/blog%26lt%3B/link> <description>爽快過生活!!</description> <item> <title>【圖】畫了個醜醜風的老頭</title> <link>http://blog.xuite.net/chingwei/blog/23691532%26lt%3B/link> <description>醜醜風的老頭~~</description> </item> </channel> </rss>
接著我們將在【.....醜醜風的老頭~~】後面再加上<Error 的Tag
<?php header("Content-Type:text/xml"); ob_start(); ?> <rss version="2.0"> <channel> <title>My Lief</title> <link>http://blog.xuite.net/chingwei/blog%26lt%3B/link> <description>爽快過生活!!</description> <item> <title>【圖】畫了個醜醜風的老頭</title> <link>http://blog.xuite.net/chingwei/blog/23691532%26lt%3B/link> <description>醜醜風的老頭~~<Error <?php $buffer = ob_get_clean(); $tidy_options = array( 'input-xml' => true, 'output-xml' => true, 'indent' => true, 'wrap' => false, ); $tidy = new tidy(); $tidy->parseString($buffer, $tidy_options,'utf8'); $tidy->cleanRepair(); echo $tidy; ?>
最後的結果就多出了個Error的Tag,
這就不是我想要的結果了,但他的做法應該也沒錯。
不能太強求~他已經很強了~~
<rss version="2.0"> <channel> <title>My Lief</title> <link>http://blog.xuite.net/chingwei/blog%26lt%3B/link> <description>爽快過生活!!</description> <item> <title>【圖】畫了個醜醜風的老頭</title> <link>http://blog.xuite.net/chingwei/blog/23691532%26lt%3B/link> <description>醜醜風的老頭~~ <Error></Error></description> </item> </channel> </rss>
下面貼上php官方網站的Sample,是修正HTML (tidy_repair_string)
<?php ob_start(); ?> <html> <head> <title>test</title> </head> <body> <p>error</i> </body> </html> <?php $buffer = ob_get_clean(); $tidy = tidy_repair_string($buffer); echo $tidy; ?>
結果它把少的Tag跟錯掉的,都修正好了
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2//EN"> <html> <head> <title>test</title> </head> <body> <p>error</p> </body> </html>
參考網站:
PS.
這篇文章我重寫了三次,我快被這編輯器搞瘋了~我只要按Ctrl+V,他就會將我textarea裡的 < 給取代掉,天呀~~~~所以我不會再更新這個文章了~~~~
附上 HTML Tidy Configuration Options
HTML, XHTML, XML Options | Top | |
Option | Type | Default |
add-xml-decl | Boolean | no |
add-xml-space | Boolean | no |
alt-text | String | - |
anchor-as-name | Boolean | yes |
assume-xml-procins | Boolean | no |
bare | Boolean | no |
clean | Boolean | no |
css-prefix | String | - |
decorate-inferred-ul | Boolean | no |
doctype | DocType | auto |
drop-empty-paras | Boolean | yes |
drop-font-tags | Boolean | no |
drop-proprietary-attributes | Boolean | no |
enclose-block-text | Boolean | no |
enclose-text | Boolean | no |
escape-cdata | Boolean | no |
fix-backslash | Boolean | yes |
fix-bad-comments | Boolean | yes |
fix-uri | Boolean | yes |
hide-comments | Boolean | no |
hide-endtags | Boolean | no |
indent-cdata | Boolean | no |
input-xml | Boolean | no |
join-classes | Boolean | no |
join-styles | Boolean | yes |
literal-attributes | Boolean | no |
logical-emphasis | Boolean | no |
lower-literals | Boolean | yes |
merge-divs | AutoBool | auto |
merge-spans | AutoBool | auto |
ncr | Boolean | yes |
new-blocklevel-tags | Tag names | - |
new-empty-tags | Tag names | - |
new-inline-tags | Tag names | - |
new-pre-tags | Tag names | - |
numeric-entities | Boolean | no |
output-html | Boolean | no |
output-xhtml | Boolean | no |
output-xml | Boolean | no |
preserve-entities | Boolean | no |
quote-ampersand | Boolean | yes |
quote-marks | Boolean | no |
quote-nbsp | Boolean | yes |
repeated-attributes | enum | keep-last |
replace-color | Boolean | no |
show-body-only | AutoBool | no |
uppercase-attributes | Boolean | no |
uppercase-tags | Boolean | no |
word-2000 | Boolean | no |
Diagnostics Options | Top | |
Option | Type | Default |
accessibility-check | enum | 0 (Tidy Classic) |
show-errors | Integer | 6 |
show-warnings | Boolean | yes |
Pretty Print Options | Top | |
Option | Type | Default |
break-before-br | Boolean | no |
indent | AutoBool | no |
indent-attributes | Boolean | no |
indent-spaces | Integer | 2 |
markup | Boolean | yes |
punctuation-wrap | Boolean | no |
sort-attributes | enum | none |
split | Boolean | no |
tab-size | Integer | 8 |
vertical-space | Boolean | no |
wrap | Integer | 68 |
wrap-asp | Boolean | yes |
wrap-attributes | Boolean | no |
wrap-jste | Boolean | yes |
wrap-php | Boolean | yes |
wrap-script-literals | Boolean | no |
wrap-sections | Boolean | yes |
Character Encoding Options | Top | |
Option | Type | Default |
ascii-chars | Boolean | no |
char-encoding | Encoding | ascii |
input-encoding | Encoding | latin1 |
language | String | - |
newline | enum | Platform dependent |
output-bom | AutoBool | auto |
output-encoding | Encoding | ascii |
Miscellaneous Options | Top | |
Option | Type | Default |
error-file | String | - |
force-output | Boolean | no |
gnu-emacs | Boolean | no |
gnu-emacs-file | String | - |
keep-time | Boolean | no |
output-file | String | - |
quiet | Boolean | no |
slide-style | String | - |
tidy-mark | Boolean | yes |
write-back | Boolean | no |
|
0 comments:
張貼留言