最近在研究pdf to img 看了网上一些资料没找到个实用的,用第三方程序怕有问题,于是就想利用Adobe Acrobat 7.0 Professional 自带的导出图片的功能,资料忒少啊。找了半天,下面这个比较有用。不过说句实在话,我只是用别人的东西,没自己原创的东西。
原文:http://fidodido2010.spaces.live.com/blog/cns!42DBF9483C966838!129.entry
---------------------------------------------------------------------------------------------------------------------------
PDF转换成其他格式的COM解决方案
需求起源:
最近图片格式之间相互转换做得比较多,以往的转换LEADTOOLS R13肯定能搞定,却发现有一部分PDF用LEADTOOLS打不开,只好又把Acrobat捡起来了.
研究了半天,总算找到个办法,却被AcroExch.PDDoc的一个接口GetJSObject()难住了.照理说,这个接口返回的是一个JavaScript对象,可是C++下却没有相应的类型说明,只能用IDispatch,调用机制及IType完全搞不清楚,所有的能G到的相关内容几乎全是VB的,只有一个可怜的老外问过"Using GetJSObject() in C++"的问题,而且答案还是"since it involves low level COM API's to get directly to the IDispatch for the object.".
正想以头抢地,或者考虑整个工程迁移到VB下去,忽然想起,何不用VB做个专门调用这个接口的COM,在C++下面调用?
说干就干
用VB6编写Acrobat COM:
建立一个VB6的ActiveX Dll工程,工程名称改为MPDF2SIMG(Multi-page PDF to Single-page Image),模块名改为Converter,添加引用"Adobe Acrobat 7.0 Type Library".模块的全部代码如下:
Option Explicit On
Dim oApp As Acrobat.CAcroApp
Dim oMultiPageDoc As Acrobat.CAcroPDDoc
Dim oSinglePageDoc As Acrobat.CAcroPDDoc
Dim JSO As Object
Private Sub Class_Initialize()
oApp = CreateObject("AcroExch.App")
oMultiPageDoc = CreateObject("AcroExch.PDDoc")
End Sub
Public Function ConvertPDF(ByVal SourcePDF As String, _
ByVal TargetFolder As String, _
ByVal TargetFormat As String, _
ByVal StartImgNumber As Integer) As Integer
Dim iNumbers As Integer
Dim i As Integer
Dim OutPath As String
Dim OutFile As String
OutPath = TargetFolder
If Right(OutPath, 1) <> "\" Then OutPath = OutPath & "\"
On Error GoTo err1
oMultiPageDoc.Open(SourcePDF)
iNumbers = oMultiPageDoc.GetNumPages
For i = 0 To iNumbers - 1
oSinglePageDoc = CreateObject("AcroExch.PDDoc")
oSinglePageDoc.Create()
oSinglePageDoc.InsertPages(-1, oMultiPageDoc, i, 1, 0)
JSO = oSinglePageDoc.GetJSObject
OutFile = OutPath & Format(i + StartImgNumber, "00000000") & _
".tif"
JSO.SaveAs(OutFile, "com.adobe.acrobat." & TargetFormat)
JSO = Nothing
oSinglePageDoc = Nothing
Next
oMultiPageDoc.Close()
oApp.CloseAllDocs()
ConvertPDF = iNumbers
Exit Function
err1:
ConvertPDF = -1
End Function
Private Sub Class_Terminate()
oMultiPageDoc = Nothing
oSinglePageDoc = Nothing
End Sub然后编译成DLL. 使用这个DLL的方法:1.在计算机上运行regsvr32 mpdf2simg.dll注册这个DLL.2.使用这个DLL的C++程序里导入该COM的类型库,代码如下:#import "E:\project\Converter\mpdf2simg.dll"
using namespace MPDF2SIMG;3.定义COM型变量并建立实例,代码如下:
_ConverterPtr pConverter;
HRESULT hr = pConverter.CreateInstance(_T("MPDF2SIMG.Converter"));
if(!FAILED(hr))
{
//do something if failed.
...
}4.调用该COM的接口
int nConv = pConverter->ConvertPDF(
CString(_T("xxxxxx\\source.pdf")).AllocSysString(),
CString(_T("d:\\TargetPath")).AllocSysString(),
CString(_T("tiff")).AllocSysString(),
nStart);该调用会将指定的SoucePDF转至TargetPath下连续的单页TIFF文件,文件名为8位数字编号形式,编号起始由nStart指定.
调用成功返回转换的页数,失败返回-1
其他支持的格式:
值 可用扩展名
"com.adobe.acrobat.eps" eps
"com.adobe.acrobat.html-3-20" html, htm
"com.adobe.acrobat.html-4-01-css-1-00" html, htm
"com.adobe.acrobat.jpeg" jpeg, jpg, jpe
"com.adobe.acrobat.jp2k" jpf, jpx, jp2, j2k, j2c,jpc
"com.adobe.acrobat.doc" doc
"com.adobe.acrobat.png" png
"com.adobe.acrobat.ps" ps
"com.adobe.acrobat.rtf" rtf
"com.adobe.acrobat.accesstext" txt
"com.adobe.acrobat.plain-text" txt
"com.adobe.acrobat.tiff" tiff, tif
"com.adobe.acrobat.xml-1-00" xml
已知问题和BUG:
如果C++程序使用多字节字符集编译, TargetPath中含有中文字符会导致无法正常转换,ConvertPDF调用会导致弹出"无法保存文件"的Acrobat对话框, 点确定ConvertPDF会返回-1.Unicode字符集未做测试.
补充说明:
使用Acrobat COM, 应在计算机上部署Adobe Acrobat (Not Reader).
-----------------------------------我自己的就简单了用-Adobe professinal-------------------------------
Dim gApp As Acrobat.CAcroApp
Dim oMultiPageDoc As Acrobat.CAcroPDDoc
Dim oSinglePageDoc As Acrobat.CAcroPDDoc
Dim iNumbers As Integer
Dim StartImgNumber As Integer
Dim OutFile As String
Dim i As Integer
Dim jso As Object
gApp = CreateObject("AcroExch.App")
oMultiPageDoc = CreateObject("AcroExch.PDDoc")
'pdf和生成的文件要在同一个文件夹下
If oMultiPageDoc.Open("F:\test.pdf") Then
iNumbers = oMultiPageDoc.GetNumPages
For i = 0 To iNumbers - 1
oSinglePageDoc = CreateObject("AcroExch.PDDoc")
oSinglePageDoc.Create()
oSinglePageDoc.InsertPages(-1, oMultiPageDoc, i, 1, 0)
jso = oSinglePageDoc.GetJSObject
OutFile = Format(i + StartImgNumber, "00000000") & ".png"
jso.SaveAs("F:\" & OutFile, "com.adobe.acrobat.png")
jso = Nothing
oSinglePageDoc = Nothing
Next
End If
-----附送一个用GhostScriptView的----------------
C:\Program Files\gs\gs8.61\bin\gswin32c.exe -dSAFER -dBATCH -dNOPAUSE -r300 -sDEVICE=png16m -dGraphicsAlphaBits=4
-sOutputFile="F:\test.pdf" "F:\test\"