How to capture text from a PDF file

Capturing text from the Adobe Acrobat application

Important Note:
QuickTest Professional does not have special support for working with Acrobat or .PDF files. The content of this article is provided on an “as is” basis and is not part of QuickTest Professional. It is not guaranteed to work and is not supported by Mercury Customer Support. You are responsible for any and all modifications that may be required.

Please be aware that the steps needed to capture text may change with different versions of Adobe Acrobat Reader.

Acrobat Reader 6.0
QuickTest Professional will not be able to capture the text from the Acrobat window using its built in functionality, but it should still be possible to get the text.

1. Enable text selection in Acrobat Reader.
2. Select the text you wish to capture.
3. Copy the text to the system clipboard.
4. Use the Clipboard object to retrieve the text.
5. Once the text is in a variable, you can use VBScript string functions (e.g., InStr, Left, Right, Mid, Split) to parse through the string and get information out of it.

Example:
‘ This function enables Text Selection in Acrobat Reader 6.0
Public Function AcrobatEnableTextSelection()
‘ Press Alt+T, S, X to enable Text Selection in Acrobat Reader
Window(“regexpwndtitle:=Adobe Reader”,”regexpwndclass:=AdobeAcrobat”).Activate
Window(“regexpwndtitle:=Adobe Reader”,”regexpwndclass:=AdobeAcrobat”).Type micAltDwn + “t” + micAltUp
Window(“regexpwndtitle:=Adobe Reader”,”regexpwndclass:=AdobeAcrobat”).Type “s”
Window(“regexpwndtitle:=Adobe Reader”,”regexpwndclass:=AdobeAcrobat”).Type “x”
wait 0, 500
End Function

‘ This function copies the selected text to the system clipboard
Public Function AcrobatCopy(obj)
‘ Copy the selected text to the clipboard
obj.Type micCtrlDwn + “c” + micCtrlUp
End Function

‘ This function selects all the text in the PDF file
Public Function AcrobatSelectAll(obj)
obj.Click
obj.Type micCtrlDwn + “a” + micCtrlUp
End Function

‘ Selects the text in the specified coordinates. NOTE: The coordinates are relative to the object, not the screen.
Public Function AcrobatSelectPartial(obj, x1, y1, x2, y2)

‘ Calculate the screen coordinates for the text
ax = obj.GetROProperty(“abs_x”)
ay = obj.GetROProperty(“abs_y”)

sx = ax + x1
sy = ay + y1
ex = ax + x2
ey = ay + y2

‘ Select the text you wish to copy
Set DeviceReplay = CreateObject(“Mercury.DeviceReplay”)

DeviceReplay.MouseMove sx, sy
DeviceReplay.MouseDown sx, sy, 0
DeviceReplay.MouseMove ex, ey
DeviceReplay.MouseUp ex, ey, 0

Set DeviceReplay = Nothing
End Function

‘ Register the functions to the appropriate Test Object Classes.
RegisterUserFunc “WinObject”, “AcrobatSelectPartial”, “AcrobatSelectPartial”
RegisterUserFunc “WinObject”, “AcrobatSelectAll”, “AcrobatSelectAll”
RegisterUserFunc “WinObject”, “AcrobatCopy”, “AcrobatCopy”

‘ Instantiate the Clipboard object
Set cb = CreateObject(“Mercury.Clipboard”)

‘ Clear the Clipboard contents
cb.Clear

‘ Enable the Text Selection option
AcrobatEnableTextSelection()

‘ Select all the text in the pdf document and copy it to the clipboard.
Window(“Adobe Reader”).Window(“QTP_SWT_Support.pdf”).WinObject(“AVPageView”).AcrobatSelectAll
Window(“Adobe Reader”).Window(“QTP_SWT_Support.pdf”).WinObject(“AVPageView”).AcrobatCopy

‘ Get the text from the clipboard using the Clipboard object
pdfText = cb.GetText
msgbox pdfText

Set cb = Nothing

The above example selects the entire PDF file. You can also select text from specified coordinates.

Example:
‘ Select text in a specified location
‘ Instantiate the Clipboard object
Set cb = CreateObject(“Mercury.Clipboard”)

cb.Clear

‘ Put focus to the pdf document

Window(“Adobe Reader”).Window(“QTP_SWT_Support.pdf”).WinObject(“AVPageView”).Click 0,0

‘ Specify the coordinates
x1 = 170
y1 = 88
x2 = 543
y2 = 155

‘ Capture the text from within the specified coordinates. These coordinates are relative to the object, not the screen.
Window(“Adobe Reader”).Window(“QTP_SWT_Support.pdf”).WinObject(“AVPageView”).AcrobatSelectPartial x1, y1, x2, y2
Window(“Adobe Reader”).Window(“QTP_SWT_Support.pdf”).WinObject(“AVPageView”).AcrobatCopy

‘ Get the text from the clipboard using the Clipboard object
pdfText = cb.GetText
msgbox pdfText

Note:
If no text is selected by the AcrobatSelectPartial function, check the coordinates you used. If the mouse is not over an area that can be selected, QuickTest Professional will not be able to select the text. You can use Paint to help determine the coordinates; make sure you use calculate the coordinates using the specific object and not the entire Acrobat window.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s