This portion of the Fit spe
Contents:
Leading
and Trailing Whitespace
Fit parses the tables from HTML do
fat.Do |
|
HTML |
Stru |
<table> <tr><td>1</td></tr> </table> |
[1] |
<table> <tr><td>1</td> <td>2</td></tr> <tr><td>3</td> <td>4</td></tr> </table> |
[1] [2] [3] [4] |
<table> <tr><td>1</td> <td>2</td></tr> <tr><td>3</td> <td>4</td></tr> </table> <table> <tr><td>5</td></tr> <tr><td>6</td></tr> </table> |
[1] [2] [3] [4] ---- [5] [6] |
Everything but table stru
fat.Do |
|
|
HTML |
Stru |
Output() |
<HTML> <body>Text before table... <table> <tr><td>1</td></tr> </table> Text after table...</body> </HTML> |
[1] |
<HTML> <body>Text before table... <table> <tr><td>1</td></tr> </table> Text after table...</body> </HTML> |
<table> Text in table <tr> Text in row <td>Text in more row </tr> more table</table> |
[Text in |
<table> Text in table <tr> Text in row <td>Text in more row </tr> more table</table> |
<table <tr attribute=”yes”><td align=”top”>Cell</td></tr> </table> |
[Cell] |
<table <tr attribute=”yes”><td align=”top”>Cell</td></tr> </table> |
Even whitespa
fat.Do |
|
|
HTML |
Stru |
Output() |
<HTML><body><table> <tr><td>1</td></tr> </table></body></HTML> |
[1] |
<HTML><body><table> <tr><td>1</td></tr> </table></body></HTML> |
<HTML> <body> <table> <tr> <td>1</td> </tr> </table> </body> </HTML> |
[1] |
<HTML> <body> <table> <tr> <td>1</td> </tr> </table> </body> </HTML> |
The
fat.Do |
|
HTML |
Stru |
<table> <tr><td>1</td></tr> <tr><td>2</td> <td>3</td> <td>4</td></tr> <tr><td>5</td> <td>6</td></tr> </table> |
[1] [2] [3] [4] [5] [6] |
<table> <tr><td rowspan=2>1</td> <td>2</td> <td>3</td></tr> <tr><td
</table> |
[1] [2] [3] [4] [5] |
Tables that are missing “table,” “tr,” or “td” tags generate an error.
fat.Do |
|
|
|
HTML |
Stru |
Output() |
Note |
<table> <tr><td>1</td> </table> |
error |
error |
no ending <tr> tag |
<tr><td>1</td></tr> |
error |
error |
no <table> tag |
<table> <td>1</td> </table> |
error |
error |
no <tr> tag |
<table> <tr><td>1</tr> </table> |
error |
error |
no ending </td> tag |
Tables
fat.Do |
|
HTML |
Stru |
<table> <tr><td>1</td></tr>
<table> <tr><td>2</td></tr> </table> |
error |
<table> <tr><td>1</td></tr>
<tr> <tr><td>2</td></tr> </table> |
error |
<table> <tr><td>1</td><td></tr> </table> |
error |
However, ex
fat.Do |
|
HTML |
Stru |
<table> <tr><td>1</td></td></tr> <tr><td>2</td></tr> </tr> </table> </table> |
[1] [2] |
HTML mistakes that aren’t related to tables are ignored.
fat.Do |
|
HTML |
Stru |
<table> <tr><badTag...<td>1</td></tr> </table> |
[1] |
Fixtures (des
Fixtures
fat.TableParseFixture |
|
|
|
HTML |
Row |
Column |
CellBody() |
<table> <tr><td>top left</td><td>top right</td></tr> <tr><td>bottom left</td><td>bottom right</td></tr> </table> |
1 |
1 |
top left |
|
1 |
2 |
top right |
|
2 |
1 |
bottom left |
|
2 |
2 |
bottom right |
When fixtures look at the
fat.TableParseFixture |
|
|
|
HTML |
Row |
Column |
CellBody() |
<table> <tr><td>text with a <tag /></td></tr> </table> |
1 |
1 |
text with a <tag /> |
Fixtures
fat.TableParseFixture |
|
|
|
HTML |
Row |
Column |
CellTag() |
<table> <tr><td align=”top”>text</td></tr> </table> |
1 |
1 |
<td align=”top”> |
This applies to row tags as well...
fat.TableParseFixture |
|
|
HTML |
Row |
RowTag() |
<table> <tr bg <tr bg </table> |
1 |
<tr bg |
|
2 |
<tr bg |
...and even to table tags.
fat.TableParseFixture |
|
HTML |
TableTag() |
<table border=”1”> <tr><td>text</td></tr> </table> |
<table border=”1”> |
Fit implementations may provide fun
Fixtures may ask Fit to
These spe
fat.HtmlToTextFixture |
|
HTML |
Text() |
& |
& |
( ) |
( ) |
< |
< |
> |
> |
" |
" |
The non-breaking spa
fat.HtmlToTextFixture |
|
HTML |
Text() |
(\u00a0) |
( ) |
Non-ASCII
fat.HtmlToTextFixture |
|
HTML |
Text() |
ń |
ń |
Line break tags are
fat.HtmlToTextFixture |
|
HTML |
Text() |
intentional<br>line-break |
intentional\nline-break |
another form<br />of line-break |
another form\nof line-break |
yet<br/>more<br />forms< br / > |
yet\nmore\nforms\n |
Fit has a few spe
“Smart quotes” are
fat.HtmlToTextFixture |
|
HTML |
Text() |
“double-quotes” |
"double-quotes" |
‘single quotes’ |
'single quotes' |
Word’s use of paragraph tags for line breaks is supported.
fat.HtmlToTextFixture |
|
HTML |
Text() |
<p>Line breaks</p> <p>in Word</p> |
Line breaks\nin Word |
<p>Another</p><p
|
Another\nform |
<p>Don’t think every tag that</p> <poe>starts
with ‘p’ is a paragraph</poe> |
Don’t think every tag that starts with ‘p’ is a paragraph |
Leading and trailing whitespa
fat.HtmlToTextFixture |
|
HTML |
Text() |
spa |
spa |
blank lines |
blank lines |
tabs |
tabs |
The entity and
non-breaking spa
fat.HtmlToTextFixture |
|
HTML |
Text() |
a |
a |
a |
a |
\u00a0 a \u00a0 |
a |
Leading and trailing line breaks are not removed. (Line breaks are
fat.HtmlToTextFixture |
|
HTML |
Text() |
<br />a |
\na |
<p></p><p>a</p> |
\na |
a<br /> |
a\n |
<p>a</p><p></p> |
a\n |
Whitespa
fat.HtmlToTextFixture |
|
HTML |
Text() |
<br /> a <br /> |
\n a \n |
Tags other than line-break tags are ignored.
Leading and trailing whitespa
fat.HtmlToTextFixture |
|
HTML |
Text() |
<ignored> a <tags /> |
a |
Adjoining whitespa
fat.HtmlToTextFixture |
|
HTML |
Text() |
1 + 2 |
1 + 2 |
1 <tag /> 2 |
1 2 |
1 2 |
1 2 |
1 \u00a0\u00a0\u00a02 |
1 2 |
Other HTML markup is ignored.
fat.HtmlToTextFixture |
|
HTML |
Text() |
<b>text</b> |
text |
a more <i> <spell |
a more |
fit.Summary |
|
counts | 70 right, 0 wrong, 0 ignores, 0 exceptions |
input file | spec/input/parse.html |
input update | Sun Apr 17 21:00:14 2005 |
output file | spec/output/parse.html |
run date | Wed Apr 27 16:11:55 2005 |
run elapsed time | 0 wallclock secs ( 0.05 usr + 0.01 sys = 0.06 CPU) |