C# 提取PDF中的表格
来源:网络收集 点击: 时间:2024-03-12鼠标右键点击“引用”,“管理NuGet程序包”,

点击“浏览”,在搜索框中输入,点击“安装”,

或者使用PM控制台安装:
PMInstall-Package Spire.PDF -Version 7.10.4
C#代码1/2using Spire.Pdf;
using Spire.Pdf.Utilities;
using System.IO;
using System.Text;
namespace ExtractTable
{
class Program
{
static void Main(string args)
{
//加载PDF文档
PdfDocument pdf = new PdfDocument();
pdf.LoadFromFile(sample.pdf);
StringBuilder builder = new StringBuilder();
//抽取表格
PdfTableExtractor extractor = new PdfTableExtractor(pdf);
PdfTable tableLists = null;
for (int pageIndex = 0; pageIndex pdf.Pages.Count; pageIndex++)
{
tableLists = extractor.ExtractTable(pageIndex);
if (tableLists != null tableLists.Length 0)
{
foreach (PdfTable table in tableLists)
{
int row = table.GetRowCount();
int column = table.GetColumnCount();
for (int i = 0; i row; i++)
{
for (int j = 0; j column; j++)
{
string text = table.GetText(i, j);
builder.Append(text + );
}
builder.Append(\r\n);
}
}
}
}
//保存提取的表格内容到txt文档
File.WriteAllText(ExtractedTable.txt, builder.ToString());
}
}
}
2/2完成代码后,执行程序,生成txt文档。表格提取效果如图:

Imports Spire.Pdf
Imports Spire.Pdf.Utilities
Imports System.IO
Imports System.Text
Namespace ExtractTable
Class Program
Private Shared Sub Main(args As String())
加载PDF文档
Dim pdf As New PdfDocument()
pdf.LoadFromFile(sample.pdf)
Dim builder As New StringBuilder()
抽取表格
Dim extractor As New PdfTableExtractor(pdf)
Dim tableLists As PdfTable() = Nothing
For pageIndex As Integer = 0 To pdf.Pages.Count - 1
tableLists = extractor.ExtractTable(pageIndex)
If tableLists IsNot Nothing AndAlso tableLists.Length 0 Then
For Each table As PdfTable In tableLists
Dim row As Integer = table.GetRowCount()
Dim column As Integer = table.GetColumnCount()
For i As Integer = 0 To row - 1
For j As Integer = 0 To column - 1
Dim text As String = table.GetText(i, j)
builder.Append(text Convert.ToString( ))
Next
builder.Append(vbCr vbLf)
Next
Next
End If
Next
保存提取的表格内容到txt文档
File.WriteAllText(ExtractedTable.txt, builder.ToString())
End Sub
End Class
End Namespace
注意事项代码中的PDF文件以及生成的.txt文件路径为F:\VS2017Project\ExtractTable\bin\Debugample.pdf和F:\VS2017Project\ ExtractTable\bin\Debug\ExtractedTable.txt。文件路径也可以自定义为其他路径。
PDF表格提取版权声明:
1、本文系转载,版权归原作者所有,旨在传递信息,不代表看本站的观点和立场。
2、本站仅提供信息发布平台,不承担相关法律责任。
3、若侵犯您的版权或隐私,请联系本站管理员删除。
4、文章链接:http://www.1haoku.cn/art_332666.html