The GOSH dataset for Web page segmentation is available in HTML fomat or in Scrapbook format. (This extension is no longer available).
Can also visit the Web page segmentation repository. Including segmentations of BoM, VIPS, JavaVIPS (jVIPS), BlockFusion (BF), MIG45 and BoM2.
Related Links:
Download Scrapbook firefox extension- GOSH Collection (Google Search Collection)
- MIG5 Collection (Migration from HTML4 to HTML5)