Final Project Report - December 12, 2008:
The seam carving approach produces satisfactory results for images that do not have a specific structure or layout that is critical to understanding them correctly (e.g., a person's portrait, a scene in nature). However in the case of documents, layout and the structure of the document's components such as blocks of text and figures need to be preserved in order to be properly understood.
This issue poses a significant problem for applying the seam carving technique in a practical application such as a document reader. Despite experimentation with various energy functions, and other slight modifications to the fundamental technique, the end result was that simply scaling a line of text uniformly to fit on screen (as a hardware-supported blitting operation) is a much faster operation to perform, and produces a result that is more readable than the seam carving technique and its variants. We show a few results below.
Seam carving | Seam carving (variant) |
Uniform scaling (blitting) |
Given these results, I settled for using the seam carving technique as a custom image viewer in the application (or alternatively, as an interactive caricature-maker).
Implementing a Document Viewer:
I still had specific interest in creating an improved (or at least, different) document viewer. There are any number of creative approaches one might take that produce a more usable application than those commercially available.
I decided that if the goal was to produce software that is more practical for document viewing that what is provided commercially, my approach to salient lenses would incorporate the following:
Document types such as PDF (Portable Document Format) are not raster-based formats. The application's first step when loading such a document type would be to convert it to a raster format for internal use, at a resolution sufficient for reading when viewed pixel for pixel. Potentially, this step could be performed on a desktop machine before the document is uploaded to the device, however I am confident that performing the conversion on the device is not just feasible but can be peformed in a reasonable amount of time (a matter of seconds, as demonstrated with the ClearVue PDF viewer).
This step would be a "todo" list item before releasing the software as a fully-usable PDF viewer to the public. During my project demonstration, the part where I show the software working on the device to view the first page of a PDF document was with a pre-processed rasterized version.
Extracting and Reordering Rectilinear Regions:
Once rasterized, the next step is to determine rectilinear regions of the document (e.g., titles, section headings, figures, individual lines of text), and reorder them into a single-column format. This is both non-trivial and also beneficial in the case of two or more-column format documents, which are very common. By restructuring the document this way, the document will be scaled to fit on screen on a per-column basis, and will thus be more readable (horizontally wider) when in the periphery of the salient lens. I have created a figure below to illustrate this extraction and reordering process.
Rectilinear regions in the document are first identified, and their relative spatial positions are used to determine an appropriate reordering. Each individual region is then scaled horizontally to fit fully onto the device's screen, in a single-column format. This formatting already makes the document significantly more readable despite the 320 pixel horizontal resolution.
This is another "todo" item if the software were to be released. To have something ready for the presentation, I simply hard-coded the values that describe the rectilinear regions and their correct reordering in the single-column result.
As the document is transformed into a single-column format that can be observed in its horizontal entirety, there need only be user interface elements for navigating vertically, and closing the application. I implemented a scroll bar on the right side which provides a visual affordance of the user's current view position relative to the entire document. The user may use the stylus to interact with the scrollbar, dragging the bar allows quick navigation to a specific point in the document, or dragging on the document directly allows the user to move up and down shorter distances. The thumbwheel or up and down arrow keys can also be used to move up or down shorter distances. A red "X" icon at the upper right can be clicked at any time to terminate the application.
We have implemented a lens that moves across each line (or rectilinear region) extracted from the document (which may be text, a figure, etc.), that performs a local horizontal stretch. When the lens reaches the end of each line, it smoothly scrolls the vertical viewing position downward to the next line, such that each active line of text is centered vertically on the screen. We call this lens the dynamic reading lens.
Since the effect of the lens is less noticeable when not animating over time, the figures show the lens as it appears normally on the left, and we show it brighter than the surrounding document on the right.
Within the bounds of the lens (a tunable horizontal distance which we call the lens's radius), a specific portion of the document can be viewed in full detail. The figure below details how the source rectilinear region is scaled to fit within the target screen's horizontal bounds.
In our implementation, we arbitrarily assign lensWidth a value of 120. This value may in fact be best defined using a linear relationship with the width of the source region, for example, lensWidth=lineWidth/4, or the width of the target region, lensWidth=screenWidth/3.0, or even some non-linear combination of these terms. Proper justification for the assignment of the lensWidth value may result from performing a user study.
Overall, I feel significant effort and progress was made toward the development of an application for viewing documents such as PDF that is superior in terms of ease of use, and readability than what is made commercially available. Aside from the implementation points mentioned previously which require further attention: rasterization of specific document types, and extraction of the rectilinear regions, we have developed a solution and demonstrated a software application that shows potential for overcoming the usability problems associated with using document viewer applications on devices with low resolution displays.
Less of an achievement, but still a lesson, was that the seam carving approach was unsuitable for adapting raster images whose content is laid out with a specific structure. As the graphics relating to the practical, intended uses of the device most commonly possess a rigid, specific structure, they do not see improvement from the use of the seam carving approach, which seeks only to preserve regions of high gradient magnitude.