47 Commits
1_2 ... 1_6_0

Author SHA1 Message Date
d5d843014a Bump version 1.6.0 2019-10-28 02:58:59 +01:00
b9750745bc FrmPdfInfo: Allow raw coordinates input for GetColumn. 2019-10-28 02:58:28 +01:00
c8c7e32acc PdfTextExtractor: Better column extraction, spliting big TextElements. 2019-10-28 02:57:42 +01:00
781f212289 PdfPageRenderer: Fix Rendering of null pages. 2019-10-28 00:43:50 +01:00
8a966049f6 PdfPageRenderer: Adjust column rendering. 2019-10-27 22:40:52 +01:00
80ab9b9ff3 FrmPdfInfo: Better configuration handling with the Configuration class. 2019-10-27 22:36:54 +01:00
9af363529c PdfTextExtractor: Get results as PdfTextElementColumn, for debugging purposes. 2019-10-27 18:45:13 +01:00
386b38bd21 PdfPageRenderer: Refactor using Rect. 2019-10-27 13:12:11 +01:00
53d07db9c0 Use Rect class for size definition of TextElements and pages. 2019-10-27 13:11:40 +01:00
9bc7854b48 README.md: Adjust year on LICENSE section. 2019-10-27 12:43:32 +01:00
77a5cd1b0e PdfTextExtractor: Adjust public method names. 2019-10-27 12:40:51 +01:00
b6611b6285 Put class PdfTextElement in his own file. 2019-10-27 12:37:16 +01:00
7badc8e4b1 PdfPageRenderer: Better rendering of character size. 2019-10-27 09:59:46 +01:00
203f30e55c FrmPdfInfo: Pages selector.
A simple textbox where you can put page numbers separated by comma. And ranges joined by dash.
2019-10-27 09:59:08 +01:00
c3967dd439 Set C# lang version to 6.0. 2019-10-27 09:57:24 +01:00
da8b512c1b Move page rendering code to PdfPageRenderer. 2019-10-27 08:58:34 +01:00
beb3b931ea Bump version 1.5.2 2019-10-21 13:09:13 +02:00
8806020036 ignore ".vs" directory. 2019-10-21 13:08:44 +02:00
f3b7cd1b0d PdfTextExtractor: Better joining and splitting heuristics. 2019-10-21 13:08:19 +02:00
33f9723ac6 Bump version: 1.5.1 2017-11-14 13:34:21 +01:00
13ba41f851 PdfTextExtractor: Change Join and Split logic to use max character width of the elements. 2017-11-02 13:27:38 +01:00
06de734658 Bump version: 1.5 2017-10-11 16:52:29 +02:00
901d7e62ca FrmPdfInfo: Change test fields to have multiple actions. 2017-10-11 16:50:58 +02:00
631f8c34b2 PdfTextExtractor: Split text elements with big separations between characters 2017-10-11 16:49:57 +02:00
7ac6b19331 FrmPdfInfo: Show all fonts used on any text element 2017-10-11 16:48:01 +02:00
34e7424273 PdfCharElement: Width attribute 2017-10-11 16:47:10 +02:00
6b8bbc367f Fixes to character size calculations. 2017-10-11 16:44:42 +02:00
6dfc248b9a FrmPdfInfo.Render: Adjust scale. 2017-10-11 10:59:21 +02:00
f3aca2ffa5 Add placeholders for more commands. 2017-10-11 09:39:13 +02:00
7ba320a22c Bump version: 1.4 2017-08-02 13:30:03 +02:00
1edddf17b1 Fix JoinTextElements to only join text elements near m-size. 2017-08-02 13:28:20 +02:00
62120898d2 Bump version: 1.3 2017-06-27 01:10:26 +02:00
dc1b9bc7ca PdfTextExtractor.JoinTextElements: Joins PdfTextElements when they are nearby. 2017-06-27 01:09:50 +02:00
d1ea41474b Reorder Code. 2017-06-27 01:03:27 +02:00
b11a2ac393 Simplify PdfFont.ParseSizes. 2017-06-26 22:17:46 +02:00
36fb20eb2e Remove VisualStudio2015 incompatibilities (Remove C#7.0-isms) 2017-06-26 08:25:30 +02:00
15fbec2470 FrmPdfInfo: Improve rendering, making more accurate the location of the glyphs. 2017-06-26 01:49:48 +02:00
52841de51b PdfFont: Convert "Zero" widths to default 0.5 2017-06-26 01:46:05 +02:00
d4c4615684 PdfTextExtractor: Rework text position calculations. 2017-06-26 01:45:34 +02:00
ae76cab45d PdfTextExtractor: Fix HasText method to match contained text, instead of full PdfTextElements. 2017-06-25 12:38:27 +02:00
8dc54105fd Refactorings 2017-06-25 12:03:41 +02:00
3469593a2a VAR.PdfTools.Workbench: Crude rendering of the parsed PDF. 2017-06-25 02:21:37 +02:00
ebff0c2028 Remove Visual Studio 2010 support 2017-06-11 16:29:24 +02:00
2fd074e041 Add Visual Studio 2017 support to NuGet Generation script. 2017-06-11 16:16:55 +02:00
4223619802 Set "Times-Roman" as default basefont. 2017-06-11 16:05:17 +02:00
771305f5d0 Refactor PdfFont creator. 2017-06-11 16:04:40 +02:00
90c7c5db92 Fix NuGet buid script 2017-04-13 08:23:19 +02:00
42 changed files with 1892 additions and 930 deletions

2
.gitignore vendored
View File

@@ -27,3 +27,5 @@ obj/
_ReSharper*/
*.userprefs
*.nupkg
.vs

View File

@@ -1,6 +1,6 @@
The MIT License (MIT)
Copyright (c) 2016-2017 Valeriano Alfonso Rodriguez
Copyright (c) 2016-2019 Valeriano Alfonso Rodriguez
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal

View File

@@ -5,27 +5,33 @@
### VAR.PdfTools
Add the resulting assembly as reference in your projects, and this line on code:
using VAR.PdfTools;
```csharp
using VAR.PdfTools;
```
Then extract the contents of a data column using:
```csharp
var columnData = new List<string>();
PdfDocument doc = PdfDocument.Load("document.pdf");
foreach (PdfDocumentPage page in doc.Pages)
{
PdfTextExtractor extractor = new PdfTextExtractor(page);
columnData.AddRange(extractor.GetColumn("Column"));
columnData.AddRange(extractor.GetColumnAsStrings("Column"));
}
```
Or the content of a field (text on the right of the indicated text):
```csharp
var fieldData = new List<string>();
PdfDocument doc = PdfDocument.Load("document.pdf");
foreach (PdfDocumentPage page in doc.Pages)
{
PdfTextExtractor extractor = new PdfTextExtractor(page);
fieldData.Add(extractor.GetField(txtFieldName.Text));
fieldData.Add(extractor.GetFieldAsString(txtFieldName.Text));
}
```
### VAR.PdfTools.Workbench
It is a simple Windows.Forms application, to test basic funcitionallity of the library.
@@ -34,7 +40,8 @@ It is a simple Windows.Forms application, to test basic funcitionallity of the l
A Visual Studio 2015 and 2010 solutions are provided. Simply, click build on the IDE.
A .nuget package can be build using:
VAR.PdfTools\Build.NuGet.cmd
VAR.PdfTools\Build.NuGet.cmd
## Contributing
1. Fork it!
@@ -50,7 +57,7 @@ A .nuget package can be build using:
The MIT License (MIT)
Copyright (c) 2016-2017 Valeriano Alfonso Rodriguez
Copyright (c) 2016-2019 Valeriano Alfonso Rodriguez
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal

View File

@@ -1,29 +0,0 @@
Microsoft Visual Studio Solution File, Format Version 11.00
# Visual Studio 2010
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "VAR.PdfTools.Net35", "VAR.PdfTools\VAR.PdfTools.Net35.csproj", "{EB7E003A-6A95-4002-809F-926C7C8A11E9}"
EndProject
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "VAR.PdfTools.Workbench.Net35", "VAR.PdfTools.Workbench\VAR.PdfTools.Workbench.Net35.csproj", "{A5825D8E-9F81-49E0-B610-8AE5E46D02EA}"
EndProject
Global
GlobalSection(SolutionConfigurationPlatforms) = preSolution
Debug|Any CPU = Debug|Any CPU
Release|Any CPU = Release|Any CPU
EndGlobalSection
GlobalSection(ProjectConfigurationPlatforms) = postSolution
{A5825D8E-9F81-49E0-B610-8AE5E46D02EA}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
{A5825D8E-9F81-49E0-B610-8AE5E46D02EA}.Debug|Any CPU.Build.0 = Debug|Any CPU
{A5825D8E-9F81-49E0-B610-8AE5E46D02EA}.Release|Any CPU.ActiveCfg = Release|Any CPU
{A5825D8E-9F81-49E0-B610-8AE5E46D02EA}.Release|Any CPU.Build.0 = Release|Any CPU
{EB7E003A-6A95-4002-809F-926C7C8A11E9}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
{EB7E003A-6A95-4002-809F-926C7C8A11E9}.Debug|Any CPU.Build.0 = Debug|Any CPU
{EB7E003A-6A95-4002-809F-926C7C8A11E9}.Release|Any CPU.ActiveCfg = Release|Any CPU
{EB7E003A-6A95-4002-809F-926C7C8A11E9}.Release|Any CPU.Build.0 = Release|Any CPU
EndGlobalSection
GlobalSection(MonoDevelopProperties) = preSolution
StartupItem = VAR.PdfTools.Workbench\VAR.PdfTools.Workbench.Net35.csproj
EndGlobalSection
GlobalSection(SolutionProperties) = preSolution
HideSolutionNode = FALSE
EndGlobalSection
EndGlobal

View File

@@ -0,0 +1,117 @@
using System.Collections.Generic;
using System.IO;
using System.Text;
namespace VAR.PdfTools.Workbench
{
public class Configuration
{
private Dictionary<string, string> _configItems = new Dictionary<string, string>();
private static string GetConfigFileName()
{
string location = System.Reflection.Assembly.GetEntryAssembly().Location;
string path = Path.GetDirectoryName(location);
string filenameWithoutExtension = Path.GetFileNameWithoutExtension(location);
string configFile = string.Format("{0}/{1}.cfg", path, filenameWithoutExtension);
return configFile;
}
private static string[] GetConfigurationLines()
{
string configFile = GetConfigFileName();
string[] config;
if (File.Exists(configFile) == false)
{
config = new string[0];
}
else
{
config = File.ReadAllLines(configFile);
}
return config;
}
public void Load()
{
_configItems.Clear();
string[] configLines = GetConfigurationLines();
foreach (string configLine in configLines)
{
int idxSplit = configLine.IndexOf('|');
if (idxSplit < 0) { continue; }
string configName = configLine.Substring(0, idxSplit);
string configData = configLine.Substring(idxSplit + 1);
if (_configItems.ContainsKey(configName))
{
_configItems[configName] = configData;
}
else
{
_configItems.Add(configName, configData);
}
}
}
public void Save()
{
StringBuilder sbConfig = new StringBuilder();
foreach (KeyValuePair<string, string> pair in _configItems)
{
sbConfig.AppendFormat("{0}|{1}\n", pair.Key, pair.Value);
}
string configFileName = GetConfigFileName();
File.WriteAllText(configFileName, sbConfig.ToString());
}
public string Get(string key, string defaultValue)
{
if (_configItems == null) { return defaultValue; }
if (_configItems.ContainsKey(key))
{
return _configItems[key];
}
return defaultValue;
}
public bool Get(string key, bool defaultValue)
{
if (_configItems == null) { return defaultValue; }
if (_configItems.ContainsKey(key))
{
string value = _configItems[key];
return (value == "true");
}
return defaultValue;
}
public void Set(string key, string value)
{
if (_configItems == null) { return; }
if (_configItems.ContainsKey(key))
{
_configItems[key] = value;
}
else
{
_configItems.Add(key, value);
}
}
public void Set(string key, bool value)
{
if (_configItems == null) { return; }
if (_configItems.ContainsKey(key))
{
_configItems[key] = value ? "true" : "false";
}
else
{
_configItems.Add(key, value ? "true" : "false");
}
}
}
}

View File

@@ -34,12 +34,21 @@
this.txtPdfPath = new System.Windows.Forms.TextBox();
this.txtOutput = new System.Windows.Forms.TextBox();
this.btnProcess = new System.Windows.Forms.Button();
this.btnGetColumn = new System.Windows.Forms.Button();
this.txtColumnName = new System.Windows.Forms.TextBox();
this.txtFieldName = new System.Windows.Forms.TextBox();
this.btnGetField = new System.Windows.Forms.Button();
this.txtText = new System.Windows.Forms.TextBox();
this.btnHasText = new System.Windows.Forms.Button();
this.btnGetColumn1 = new System.Windows.Forms.Button();
this.txtField1 = new System.Windows.Forms.TextBox();
this.btnGetField1 = new System.Windows.Forms.Button();
this.btnHasText1 = new System.Windows.Forms.Button();
this.btnRender = new System.Windows.Forms.Button();
this.btnHasText2 = new System.Windows.Forms.Button();
this.btnGetField2 = new System.Windows.Forms.Button();
this.txtField2 = new System.Windows.Forms.TextBox();
this.btnGetColumn2 = new System.Windows.Forms.Button();
this.btnHasText3 = new System.Windows.Forms.Button();
this.btnGetField3 = new System.Windows.Forms.Button();
this.txtField3 = new System.Windows.Forms.TextBox();
this.btnGetColumn3 = new System.Windows.Forms.Button();
this.txtPages = new System.Windows.Forms.TextBox();
this.chkRender = new System.Windows.Forms.CheckBox();
this.SuspendLayout();
//
// lblOutputs
@@ -108,68 +117,166 @@
this.btnProcess.UseVisualStyleBackColor = true;
this.btnProcess.Click += new System.EventHandler(this.btnProcess_Click);
//
// btnGetColumn
// btnGetColumn1
//
this.btnGetColumn.Location = new System.Drawing.Point(163, 51);
this.btnGetColumn.Name = "btnGetColumn";
this.btnGetColumn.Size = new System.Drawing.Size(75, 23);
this.btnGetColumn.TabIndex = 12;
this.btnGetColumn.Text = "GetColumn";
this.btnGetColumn.UseVisualStyleBackColor = true;
this.btnGetColumn.Click += new System.EventHandler(this.btnGetColumn_Click);
this.btnGetColumn1.Location = new System.Drawing.Point(292, 51);
this.btnGetColumn1.Name = "btnGetColumn1";
this.btnGetColumn1.Size = new System.Drawing.Size(69, 23);
this.btnGetColumn1.TabIndex = 12;
this.btnGetColumn1.Text = "GetColumn";
this.btnGetColumn1.UseVisualStyleBackColor = true;
this.btnGetColumn1.Click += new System.EventHandler(this.btnGetColumn1_Click);
//
// txtColumnName
// txtField1
//
this.txtColumnName.Location = new System.Drawing.Point(15, 53);
this.txtColumnName.Name = "txtColumnName";
this.txtColumnName.Size = new System.Drawing.Size(142, 20);
this.txtColumnName.TabIndex = 13;
this.txtField1.Location = new System.Drawing.Point(15, 53);
this.txtField1.Name = "txtField1";
this.txtField1.Size = new System.Drawing.Size(142, 20);
this.txtField1.TabIndex = 13;
//
// txtFieldName
// btnGetField1
//
this.txtFieldName.Location = new System.Drawing.Point(15, 82);
this.txtFieldName.Name = "txtFieldName";
this.txtFieldName.Size = new System.Drawing.Size(142, 20);
this.txtFieldName.TabIndex = 15;
this.btnGetField1.Location = new System.Drawing.Point(226, 51);
this.btnGetField1.Name = "btnGetField1";
this.btnGetField1.Size = new System.Drawing.Size(60, 23);
this.btnGetField1.TabIndex = 14;
this.btnGetField1.Text = "GetField";
this.btnGetField1.UseVisualStyleBackColor = true;
this.btnGetField1.Click += new System.EventHandler(this.btnGetField1_Click);
//
// btnGetField
// btnHasText1
//
this.btnGetField.Location = new System.Drawing.Point(163, 80);
this.btnGetField.Name = "btnGetField";
this.btnGetField.Size = new System.Drawing.Size(75, 23);
this.btnGetField.TabIndex = 14;
this.btnGetField.Text = "GetField";
this.btnGetField.UseVisualStyleBackColor = true;
this.btnGetField.Click += new System.EventHandler(this.btnGetField_Click);
this.btnHasText1.Location = new System.Drawing.Point(163, 51);
this.btnHasText1.Name = "btnHasText1";
this.btnHasText1.Size = new System.Drawing.Size(57, 23);
this.btnHasText1.TabIndex = 16;
this.btnHasText1.Text = "HasText";
this.btnHasText1.UseVisualStyleBackColor = true;
this.btnHasText1.Click += new System.EventHandler(this.btnHasText1_Click);
//
// txtText
// btnRender
//
this.txtText.Location = new System.Drawing.Point(15, 111);
this.txtText.Name = "txtText";
this.txtText.Size = new System.Drawing.Size(142, 20);
this.txtText.TabIndex = 17;
this.btnRender.Anchor = ((System.Windows.Forms.AnchorStyles)((System.Windows.Forms.AnchorStyles.Top | System.Windows.Forms.AnchorStyles.Right)));
this.btnRender.Location = new System.Drawing.Point(397, 52);
this.btnRender.Name = "btnRender";
this.btnRender.Size = new System.Drawing.Size(75, 23);
this.btnRender.TabIndex = 18;
this.btnRender.Text = "Render";
this.btnRender.UseVisualStyleBackColor = true;
this.btnRender.Click += new System.EventHandler(this.btnRender_Click);
//
// btnHasText
// btnHasText2
//
this.btnHasText.Location = new System.Drawing.Point(163, 109);
this.btnHasText.Name = "btnHasText";
this.btnHasText.Size = new System.Drawing.Size(75, 23);
this.btnHasText.TabIndex = 16;
this.btnHasText.Text = "HasText";
this.btnHasText.UseVisualStyleBackColor = true;
this.btnHasText.Click += new System.EventHandler(this.btnHasText_Click);
this.btnHasText2.Location = new System.Drawing.Point(163, 80);
this.btnHasText2.Name = "btnHasText2";
this.btnHasText2.Size = new System.Drawing.Size(57, 23);
this.btnHasText2.TabIndex = 22;
this.btnHasText2.Text = "HasText";
this.btnHasText2.UseVisualStyleBackColor = true;
this.btnHasText2.Click += new System.EventHandler(this.btnHasText2_Click);
//
// btnGetField2
//
this.btnGetField2.Location = new System.Drawing.Point(226, 80);
this.btnGetField2.Name = "btnGetField2";
this.btnGetField2.Size = new System.Drawing.Size(60, 23);
this.btnGetField2.TabIndex = 21;
this.btnGetField2.Text = "GetField";
this.btnGetField2.UseVisualStyleBackColor = true;
this.btnGetField2.Click += new System.EventHandler(this.btnGetField2_Click);
//
// txtField2
//
this.txtField2.Location = new System.Drawing.Point(15, 82);
this.txtField2.Name = "txtField2";
this.txtField2.Size = new System.Drawing.Size(142, 20);
this.txtField2.TabIndex = 20;
//
// btnGetColumn2
//
this.btnGetColumn2.Location = new System.Drawing.Point(292, 80);
this.btnGetColumn2.Name = "btnGetColumn2";
this.btnGetColumn2.Size = new System.Drawing.Size(69, 23);
this.btnGetColumn2.TabIndex = 19;
this.btnGetColumn2.Text = "GetColumn";
this.btnGetColumn2.UseVisualStyleBackColor = true;
this.btnGetColumn2.Click += new System.EventHandler(this.btnGetColumn2_Click);
//
// btnHasText3
//
this.btnHasText3.Location = new System.Drawing.Point(163, 109);
this.btnHasText3.Name = "btnHasText3";
this.btnHasText3.Size = new System.Drawing.Size(57, 23);
this.btnHasText3.TabIndex = 26;
this.btnHasText3.Text = "HasText";
this.btnHasText3.UseVisualStyleBackColor = true;
this.btnHasText3.Click += new System.EventHandler(this.btnHasText3_Click);
//
// btnGetField3
//
this.btnGetField3.Location = new System.Drawing.Point(226, 109);
this.btnGetField3.Name = "btnGetField3";
this.btnGetField3.Size = new System.Drawing.Size(60, 23);
this.btnGetField3.TabIndex = 25;
this.btnGetField3.Text = "GetField";
this.btnGetField3.UseVisualStyleBackColor = true;
this.btnGetField3.Click += new System.EventHandler(this.btnGetField3_Click);
//
// txtField3
//
this.txtField3.Location = new System.Drawing.Point(15, 111);
this.txtField3.Name = "txtField3";
this.txtField3.Size = new System.Drawing.Size(142, 20);
this.txtField3.TabIndex = 24;
//
// btnGetColumn3
//
this.btnGetColumn3.Location = new System.Drawing.Point(292, 109);
this.btnGetColumn3.Name = "btnGetColumn3";
this.btnGetColumn3.Size = new System.Drawing.Size(69, 23);
this.btnGetColumn3.TabIndex = 23;
this.btnGetColumn3.Text = "GetColumn";
this.btnGetColumn3.UseVisualStyleBackColor = true;
this.btnGetColumn3.Click += new System.EventHandler(this.btnGetColumn3_Click);
//
// txtPages
//
this.txtPages.Anchor = ((System.Windows.Forms.AnchorStyles)((System.Windows.Forms.AnchorStyles.Top | System.Windows.Forms.AnchorStyles.Right)));
this.txtPages.Location = new System.Drawing.Point(397, 82);
this.txtPages.Name = "txtPages";
this.txtPages.Size = new System.Drawing.Size(75, 20);
this.txtPages.TabIndex = 27;
//
// chkRender
//
this.chkRender.AutoSize = true;
this.chkRender.Location = new System.Drawing.Point(292, 138);
this.chkRender.Name = "chkRender";
this.chkRender.Size = new System.Drawing.Size(61, 17);
this.chkRender.TabIndex = 28;
this.chkRender.Text = "Render";
this.chkRender.UseVisualStyleBackColor = true;
//
// FrmPdfInfo
//
this.AutoScaleDimensions = new System.Drawing.SizeF(6F, 13F);
this.AutoScaleMode = System.Windows.Forms.AutoScaleMode.Font;
this.ClientSize = new System.Drawing.Size(484, 461);
this.Controls.Add(this.txtText);
this.Controls.Add(this.btnHasText);
this.Controls.Add(this.txtFieldName);
this.Controls.Add(this.btnGetField);
this.Controls.Add(this.txtColumnName);
this.Controls.Add(this.btnGetColumn);
this.Controls.Add(this.chkRender);
this.Controls.Add(this.txtPages);
this.Controls.Add(this.btnHasText3);
this.Controls.Add(this.btnGetField3);
this.Controls.Add(this.txtField3);
this.Controls.Add(this.btnGetColumn3);
this.Controls.Add(this.btnHasText2);
this.Controls.Add(this.btnGetField2);
this.Controls.Add(this.txtField2);
this.Controls.Add(this.btnGetColumn2);
this.Controls.Add(this.btnRender);
this.Controls.Add(this.btnHasText1);
this.Controls.Add(this.btnGetField1);
this.Controls.Add(this.txtField1);
this.Controls.Add(this.btnGetColumn1);
this.Controls.Add(this.lblOutputs);
this.Controls.Add(this.lblInputs);
this.Controls.Add(this.btnBrowse);
@@ -193,11 +300,20 @@
private System.Windows.Forms.TextBox txtPdfPath;
private System.Windows.Forms.TextBox txtOutput;
private System.Windows.Forms.Button btnProcess;
private System.Windows.Forms.Button btnGetColumn;
private System.Windows.Forms.TextBox txtColumnName;
private System.Windows.Forms.TextBox txtFieldName;
private System.Windows.Forms.Button btnGetField;
private System.Windows.Forms.TextBox txtText;
private System.Windows.Forms.Button btnHasText;
private System.Windows.Forms.Button btnGetColumn1;
private System.Windows.Forms.TextBox txtField1;
private System.Windows.Forms.Button btnGetField1;
private System.Windows.Forms.Button btnHasText1;
private System.Windows.Forms.Button btnRender;
private System.Windows.Forms.Button btnHasText2;
private System.Windows.Forms.Button btnGetField2;
private System.Windows.Forms.TextBox txtField2;
private System.Windows.Forms.Button btnGetColumn2;
private System.Windows.Forms.Button btnHasText3;
private System.Windows.Forms.Button btnGetField3;
private System.Windows.Forms.TextBox txtField3;
private System.Windows.Forms.Button btnGetColumn3;
private System.Windows.Forms.TextBox txtPages;
private System.Windows.Forms.CheckBox chkRender;
}
}

View File

@@ -1,7 +1,12 @@
using System;
using System.Collections.Generic;
using System.Drawing;
using System.Drawing.Imaging;
using System.IO;
using System.Linq;
using System.Text;
using System.Windows.Forms;
using VAR.PdfTools.PdfElements;
namespace VAR.PdfTools.Workbench
{
@@ -14,19 +19,27 @@ namespace VAR.PdfTools.Workbench
private void FrmPdfInfo_Load(object sender, EventArgs e)
{
txtPdfPath.Text = Properties.Settings.Default.LastPdfPath;
txtColumnName.Text = Properties.Settings.Default.LastColumnName;
txtFieldName.Text = Properties.Settings.Default.LastFieldName;
txtText.Text = Properties.Settings.Default.LastText;
var configuration = new Configuration();
configuration.Load();
txtPdfPath.Text = configuration.Get("LastPdfPath", string.Empty);
txtField1.Text = configuration.Get("Field1", string.Empty);
txtField2.Text = configuration.Get("Field2", string.Empty);
txtField3.Text = configuration.Get("Field3", string.Empty);
txtPages.Text = configuration.Get("Pages", string.Empty);
chkRender.Checked = configuration.Get("Render", false);
}
private void FrmPdfInfo_FormClosing(object sender, FormClosingEventArgs e)
{
Properties.Settings.Default.LastPdfPath = txtPdfPath.Text;
Properties.Settings.Default.LastColumnName = txtColumnName.Text;
Properties.Settings.Default.LastFieldName = txtFieldName.Text;
Properties.Settings.Default.LastText = txtText.Text;
Properties.Settings.Default.Save();
var configuration = new Configuration();
var configItems = new Dictionary<string, string>();
configuration.Set("LastPdfPath", txtPdfPath.Text);
configuration.Set("Field1", txtField1.Text);
configuration.Set("Field2", txtField2.Text);
configuration.Set("Field3", txtField3.Text);
configuration.Set("Pages", txtPages.Text);
configuration.Set("Render", chkRender.Checked);
configuration.Save();
}
private void btnBrowse_Click(object sender, EventArgs e)
@@ -87,9 +100,25 @@ namespace VAR.PdfTools.Workbench
PdfTextExtractor extractor = new PdfTextExtractor(page);
foreach (PdfTextElement textElement in extractor.Elements)
{
string fontName = textElement.Font == null ? "#NULL#" : textElement.Font.Name;
if (fontName == "#NULL#" && textElement.Childs.Count > 0)
{
var fontNames = textElement.Childs.Select(c => c.Font == null ? "#NULL#" : c.Font.Name);
StringBuilder sbFontName = new StringBuilder();
foreach (string fontNameAux in fontNames)
{
if (sbFontName.Length > 0) { sbFontName.Append(";"); }
sbFontName.Append(fontNameAux);
}
fontName = sbFontName.ToString();
}
lines.Add(string.Format("Text({0}, {1})({2}, {3})[{4}]: \"{5}\"",
textElement.Matrix.Matrix[0, 2], textElement.Matrix.Matrix[1, 2], textElement.VisibleWidth, textElement.VisibleHeight,
textElement.Font == null ? string.Empty : textElement.Font.Name,
Math.Round(textElement.Matrix.Matrix[0, 2], 2),
Math.Round(textElement.Matrix.Matrix[1, 2], 2),
Math.Round(textElement.VisibleWidth, 2),
Math.Round(textElement.VisibleHeight, 2),
fontName,
textElement.VisibleText));
}
}
@@ -97,61 +126,252 @@ namespace VAR.PdfTools.Workbench
txtOutput.Lines = lines.ToArray();
}
private void btnGetColumn_Click(object sender, EventArgs e)
private void btnHasText1_Click(object sender, EventArgs e)
{
if (System.IO.File.Exists(txtPdfPath.Text) == false)
{
MessageBox.Show("File does not exist");
return;
}
string pdfPath = txtPdfPath.Text;
string text = txtField1.Text;
PdfDocument doc = PdfDocument.Load(txtPdfPath.Text);
var columnData = new List<string>();
foreach (PdfDocumentPage page in doc.Pages)
{
PdfTextExtractor extractor = new PdfTextExtractor(page);
columnData.AddRange(extractor.GetColumn(txtColumnName.Text));
}
txtOutput.Lines = columnData.ToArray();
Action_HasText(pdfPath, text);
}
private void btnGetField_Click(object sender, EventArgs e)
private void btnGetField1_Click(object sender, EventArgs e)
{
if (System.IO.File.Exists(txtPdfPath.Text) == false)
string pdfPath = txtPdfPath.Text;
string field = txtField1.Text;
Action_GetField(pdfPath, field);
}
private void btnGetColumn1_Click(object sender, EventArgs e)
{
string pdfPath = txtPdfPath.Text;
string column = txtField1.Text;
Action_GetColumn(pdfPath, column);
}
private void btnHasText2_Click(object sender, EventArgs e)
{
string pdfPath = txtPdfPath.Text;
string text = txtField2.Text;
Action_HasText(pdfPath, text);
}
private void btnGetField2_Click(object sender, EventArgs e)
{
string pdfPath = txtPdfPath.Text;
string field = txtField2.Text;
Action_GetField(pdfPath, field);
}
private void btnGetColumn2_Click(object sender, EventArgs e)
{
string pdfPath = txtPdfPath.Text;
string column = txtField2.Text;
Action_GetColumn(pdfPath, column);
}
private void btnHasText3_Click(object sender, EventArgs e)
{
string pdfPath = txtPdfPath.Text;
string text = txtField3.Text;
Action_HasText(pdfPath, text);
}
private void btnGetField3_Click(object sender, EventArgs e)
{
string pdfPath = txtPdfPath.Text;
string field = txtField3.Text;
Action_GetField(pdfPath, field);
}
private void btnGetColumn3_Click(object sender, EventArgs e)
{
string pdfPath = txtPdfPath.Text;
string column = txtField3.Text;
Action_GetColumn(pdfPath, column);
}
private IEnumerable<int> GetSelectedPages(int maxPages)
{
string pages = txtPages.Text;
if (string.IsNullOrEmpty(pages))
{
return Enumerable.Range(1, maxPages);
}
string[] pagesParts;
if (pages.Contains(","))
{
pagesParts = pages.Split(',');
}
else
{
pagesParts = new string[] { pages };
}
List<int> listPages = new List<int>();
foreach (string part in pagesParts)
{
if (part.Contains("-"))
{
string[] range = part.Split('-');
if (range.Length == 2)
{
int pageStart;
int pageEnd;
if (int.TryParse(range[0], out pageStart) && int.TryParse(range[1], out pageEnd))
{
listPages.AddRange(Enumerable.Range(pageStart, (pageEnd - pageStart) + 1));
}
}
}
else
{
int pageNum;
if (int.TryParse(part, out pageNum))
{
listPages.Add(pageNum);
}
}
}
if (listPages.Count == 0)
{
listPages.AddRange(Enumerable.Range(1, maxPages));
}
return listPages;
}
private void Action_HasText(string pdfPath, string text)
{
if (System.IO.File.Exists(pdfPath) == false)
{
MessageBox.Show("File does not exist");
return;
}
PdfDocument doc = PdfDocument.Load(txtPdfPath.Text);
PdfDocument doc = PdfDocument.Load(pdfPath);
var fieldData = new List<string>();
IEnumerable<int> selectedPages = GetSelectedPages(doc.Pages.Count);
List<string> lines = new List<string>();
int pageNum = 0;
foreach (PdfDocumentPage page in doc.Pages)
{
pageNum++;
if (selectedPages.Contains(pageNum) == false) { continue; }
PdfTextExtractor extractor = new PdfTextExtractor(page);
fieldData.Add(extractor.GetField(txtFieldName.Text));
lines.Add(string.Format("Page({0}) : {1}", pageNum, Convert.ToString(extractor.HasText(text))));
}
txtOutput.Lines = lines.ToArray();
}
private void Action_GetField(string pdfPath, string field)
{
if (System.IO.File.Exists(pdfPath) == false)
{
MessageBox.Show("File does not exist");
return;
}
PdfDocument doc = PdfDocument.Load(pdfPath);
IEnumerable<int> selectedPages = GetSelectedPages(doc.Pages.Count);
var fieldData = new List<string>();
int pageNum = 0;
foreach (PdfDocumentPage page in doc.Pages)
{
pageNum++;
if (selectedPages.Contains(pageNum) == false) { continue; }
PdfTextExtractor extractor = new PdfTextExtractor(page);
fieldData.Add(extractor.GetFieldAsString(field));
}
txtOutput.Lines = fieldData.ToArray();
}
private void btnHasText_Click(object sender, EventArgs e)
private void Action_GetColumn(string pdfPath, string column)
{
if (System.IO.File.Exists(txtPdfPath.Text) == false)
if (System.IO.File.Exists(pdfPath) == false)
{
MessageBox.Show("File does not exist");
return;
}
PdfDocument doc = PdfDocument.Load(pdfPath);
string baseDocumentPath = Path.GetDirectoryName(txtPdfPath.Text);
string baseDocumentFilename = Path.GetFileNameWithoutExtension(txtPdfPath.Text);
IEnumerable<int> selectedPages = GetSelectedPages(doc.Pages.Count);
var columns = new List<string>();
int pageNum = 0;
foreach (PdfDocumentPage page in doc.Pages)
{
pageNum++;
if (selectedPages.Contains(pageNum) == false) { continue; }
PdfTextExtractor extractor = new PdfTextExtractor(page);
PdfTextElementColumn columnData;
if (column.StartsWith("#"))
{
string[] columnParts = column.Substring(1).Split(';');
double y = Convert.ToDouble(columnParts[0]);
double x1 = Convert.ToDouble(columnParts[1]);
double x2 = Convert.ToDouble(columnParts[2]);
columnData = extractor.GetColumn(null, y, x1, x2, x1, x2);
}
else
{
columnData = extractor.GetColumn(column);
}
if (chkRender.Checked)
{
var pdfPageRenderer = new PdfPageRenderer(extractor);
Bitmap bmp = pdfPageRenderer.Render();
pdfPageRenderer.RenderColumn(columnData, bmp);
string fileName = Path.Combine(baseDocumentPath, string.Format("{0}_{1:0000}.png", baseDocumentFilename, pageNum));
bmp.Save(fileName, ImageFormat.Png);
}
columns.AddRange(columnData.Elements.Select(t => t.VisibleText));
}
txtOutput.Lines = columns.ToArray();
}
private void btnRender_Click(object sender, EventArgs e)
{
if (File.Exists(txtPdfPath.Text) == false)
{
MessageBox.Show("File does not exist");
return;
}
PdfDocument doc = PdfDocument.Load(txtPdfPath.Text);
string baseDocumentPath = Path.GetDirectoryName(txtPdfPath.Text);
string baseDocumentFilename = Path.GetFileNameWithoutExtension(txtPdfPath.Text);
List<string> lines = new List<string>();
int pageNum = 1;
lines.Add(string.Format("Filename : {0}", baseDocumentFilename));
lines.Add(string.Format("Number of Pages : {0}", doc.Pages.Count));
IEnumerable<int> selectedPages = GetSelectedPages(doc.Pages.Count);
int pageNum = 0;
foreach (PdfDocumentPage page in doc.Pages)
{
PdfTextExtractor extractor = new PdfTextExtractor(page);
lines.Add(string.Format("Page({0}) : {1}", pageNum, Convert.ToString(extractor.HasText(txtText.Text))));
pageNum++;
if (selectedPages.Contains(pageNum) == false) { continue; }
PdfPageRenderer pdfPageRenderer = new PdfPageRenderer(page);
Bitmap bmp = pdfPageRenderer.Render();
lines.Add(string.Format("Page {0:0000} TextElements : {1}", pageNum, pdfPageRenderer.Extractor.Elements.Count));
// Save image to disk
string fileName = Path.Combine(baseDocumentPath, string.Format("{0}_{1:0000}.png", baseDocumentFilename, pageNum));
bmp.Save(fileName, ImageFormat.Png);
}
txtOutput.Lines = lines.ToArray();
}
}

View File

@@ -112,9 +112,9 @@
<value>2.0</value>
</resheader>
<resheader name="reader">
<value>System.Resources.ResXResourceReader, System.Windows.Forms, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089</value>
<value>System.Resources.ResXResourceReader, System.Windows.Forms, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089</value>
</resheader>
<resheader name="writer">
<value>System.Resources.ResXResourceWriter, System.Windows.Forms, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089</value>
<value>System.Resources.ResXResourceWriter, System.Windows.Forms, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089</value>
</resheader>
</root>

View File

@@ -11,4 +11,4 @@ using System.Runtime.InteropServices;
[assembly: AssemblyCulture("")]
[assembly: ComVisible(false)]
[assembly: Guid("a5825d8e-9f81-49e0-b610-8ae5e46d02ea")]
[assembly: AssemblyVersion("1.2.*")]
[assembly: AssemblyVersion("1.6.0.*")]

View File

@@ -1,74 +0,0 @@
//------------------------------------------------------------------------------
// <auto-generated>
// This code was generated by a tool.
// Runtime Version:4.0.30319.42000
//
// Changes to this file may cause incorrect behavior and will be lost if
// the code is regenerated.
// </auto-generated>
//------------------------------------------------------------------------------
namespace VAR.PdfTools.Workbench.Properties {
[global::System.Runtime.CompilerServices.CompilerGeneratedAttribute()]
[global::System.CodeDom.Compiler.GeneratedCodeAttribute("Microsoft.VisualStudio.Editors.SettingsDesigner.SettingsSingleFileGenerator", "10.0.0.0")]
internal sealed partial class Settings : global::System.Configuration.ApplicationSettingsBase {
private static Settings defaultInstance = ((Settings)(global::System.Configuration.ApplicationSettingsBase.Synchronized(new Settings())));
public static Settings Default {
get {
return defaultInstance;
}
}
[global::System.Configuration.UserScopedSettingAttribute()]
[global::System.Diagnostics.DebuggerNonUserCodeAttribute()]
[global::System.Configuration.DefaultSettingValueAttribute("")]
public string LastPdfPath {
get {
return ((string)(this["LastPdfPath"]));
}
set {
this["LastPdfPath"] = value;
}
}
[global::System.Configuration.UserScopedSettingAttribute()]
[global::System.Diagnostics.DebuggerNonUserCodeAttribute()]
[global::System.Configuration.DefaultSettingValueAttribute("")]
public string LastColumnName {
get {
return ((string)(this["LastColumnName"]));
}
set {
this["LastColumnName"] = value;
}
}
[global::System.Configuration.UserScopedSettingAttribute()]
[global::System.Diagnostics.DebuggerNonUserCodeAttribute()]
[global::System.Configuration.DefaultSettingValueAttribute("")]
public string LastFieldName {
get {
return ((string)(this["LastFieldName"]));
}
set {
this["LastFieldName"] = value;
}
}
[global::System.Configuration.UserScopedSettingAttribute()]
[global::System.Diagnostics.DebuggerNonUserCodeAttribute()]
[global::System.Configuration.DefaultSettingValueAttribute("")]
public string LastText {
get {
return ((string)(this["LastText"]));
}
set {
this["LastText"] = value;
}
}
}
}

View File

@@ -1,18 +0,0 @@
<?xml version='1.0' encoding='utf-8'?>
<SettingsFile xmlns="http://schemas.microsoft.com/VisualStudio/2004/01/settings" CurrentProfile="(Default)" GeneratedClassNamespace="VAR.PdfTools.Workbench.Properties" GeneratedClassName="Settings">
<Profiles />
<Settings>
<Setting Name="LastPdfPath" Type="System.String" Scope="User">
<Value Profile="(Default)" />
</Setting>
<Setting Name="LastColumnName" Type="System.String" Scope="User">
<Value Profile="(Default)" />
</Setting>
<Setting Name="LastFieldName" Type="System.String" Scope="User">
<Value Profile="(Default)" />
</Setting>
<Setting Name="LastText" Type="System.String" Scope="User">
<Value Profile="(Default)" />
</Setting>
</Settings>
</SettingsFile>

View File

@@ -23,6 +23,7 @@
<DefineConstants>DEBUG;TRACE</DefineConstants>
<ErrorReport>prompt</ErrorReport>
<WarningLevel>4</WarningLevel>
<LangVersion>6</LangVersion>
</PropertyGroup>
<PropertyGroup Condition=" '$(Configuration)|$(Platform)' == 'Release|AnyCPU' ">
<PlatformTarget>AnyCPU</PlatformTarget>
@@ -47,6 +48,7 @@
<Reference Include="System.Xml" />
</ItemGroup>
<ItemGroup>
<Compile Include="Configuration.cs" />
<Compile Include="FrmPdfInfo.cs">
<SubType>Form</SubType>
</Compile>
@@ -55,15 +57,6 @@
</Compile>
<Compile Include="Program.cs" />
<Compile Include="Properties\AssemblyInfo.cs" />
<None Include="Properties\Settings.settings">
<Generator>SettingsSingleFileGenerator</Generator>
<LastGenOutput>Settings.Designer.cs</LastGenOutput>
</None>
<Compile Include="Properties\Settings.Designer.cs">
<AutoGen>True</AutoGen>
<DependentUpon>Settings.settings</DependentUpon>
<DesignTimeSharedInput>True</DesignTimeSharedInput>
</Compile>
</ItemGroup>
<ItemGroup>
<ProjectReference Include="..\VAR.PdfTools\VAR.PdfTools.csproj">
@@ -71,6 +64,11 @@
<Name>VAR.PdfTools</Name>
</ProjectReference>
</ItemGroup>
<ItemGroup>
<EmbeddedResource Include="FrmPdfInfo.resx">
<DependentUpon>FrmPdfInfo.cs</DependentUpon>
</EmbeddedResource>
</ItemGroup>
<Import Project="$(MSBuildToolsPath)\Microsoft.CSharp.targets" />
<!-- To modify your build process, add your task inside one of the targets below and uncomment it.
Other similar extension points exist, see Microsoft.Common.targets.

View File

@@ -1,8 +1,13 @@
@echo off
:: MSBuild and tools path
if exist "%ProgramFiles%\MSBuild\14.0\bin" set PATH=%ProgramFiles%\MSBuild\14.0\bin;%PATH%
if exist "%ProgramFiles(x86)%\MSBuild\14.0\bin" set PATH=%ProgramFiles(x86)%\MSBuild\14.0\bin;%PATH%
if exist "%windir%\Microsoft.Net\Framework\v4.0.30319" set MsBuildPath=%windir%\Microsoft.NET\Framework\v4.0.30319
if exist "%windir%\Microsoft.Net\Framework64\v4.0.30319" set MsBuildPath=%windir%\Microsoft.NET\Framework64\v4.0.30319
if exist "C:\Program Files (x86)\MSBuild\14.0\Bin" set MsBuildPath=C:\Program Files (x86)\MSBuild\14.0\Bin
if exist "C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\MSBuild\15.0\Bin" set MsBuildPath=C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\MSBuild\15.0\Bin
set PATH=%MsBuildPath%;%PATH%
echo %MsBuildPath%
:: NuGet
set nuget="nuget"
@@ -18,7 +23,7 @@ msbuild VAR.PdfTools.csproj /t:Build /p:Configuration="Release .Net 4.6.1" /p:Pl
:: Packing Nuget
Title Packing Nuget
%nuget% pack VAR.PdfTools.csproj -Verbosity detailed -OutputDir "NuGet" -MSBuildVersion "14.0" -Properties Configuration="Release .Net 4.6.1" -Prop Platform=AnyCPU
%nuget% pack VAR.PdfTools.csproj -Verbosity detailed -OutputDir "NuGet" -Properties Configuration="Release .Net 4.6.1" -Prop Platform=AnyCPU
title Finished
pause

View File

@@ -0,0 +1,121 @@
using System;
namespace VAR.PdfTools.Maths
{
public class Matrix3x3
{
#region Declarations
public double[,] _matrix = new double[3, 3];
#endregion
#region Properties
public double[,] Matrix { get { return _matrix; } }
#endregion
#region Creator
public Matrix3x3()
{
Idenity();
}
public Matrix3x3(double a, double b, double c, double d, double e, double f)
{
Set(a, b, c, d, e, f);
}
#endregion
#region Public methods
public void Idenity()
{
_matrix[0, 0] = 1.0;
_matrix[0, 1] = 0.0;
_matrix[0, 2] = 0.0;
_matrix[1, 0] = 0.0;
_matrix[1, 1] = 1.0;
_matrix[1, 2] = 0.0;
_matrix[2, 0] = 0.0;
_matrix[2, 1] = 0.0;
_matrix[2, 2] = 1.0;
}
public void Set(double a, double b, double c, double d, double e, double f)
{
_matrix[0, 0] = a;
_matrix[1, 0] = b;
_matrix[2, 0] = 0;
_matrix[0, 1] = c;
_matrix[1, 1] = d;
_matrix[2, 1] = 0;
_matrix[0, 2] = e;
_matrix[1, 2] = f;
_matrix[2, 2] = 1;
}
public Vector3D Multiply(Vector3D vect)
{
Vector3D vectResult = new Vector3D();
vectResult.Vector[0] = (vect.Vector[0] * _matrix[0, 0]) + (vect.Vector[1] * _matrix[0, 1]) + (vect.Vector[2] * _matrix[0, 2]);
vectResult.Vector[1] = (vect.Vector[0] * _matrix[1, 0]) + (vect.Vector[1] * _matrix[1, 1]) + (vect.Vector[2] * _matrix[1, 2]);
vectResult.Vector[2] = (vect.Vector[0] * _matrix[2, 0]) + (vect.Vector[1] * _matrix[2, 1]) + (vect.Vector[2] * _matrix[2, 2]);
return vectResult;
}
public Matrix3x3 Multiply(Matrix3x3 matrix)
{
Matrix3x3 newMatrix = new Matrix3x3();
newMatrix._matrix[0, 0] = (_matrix[0, 0] * matrix._matrix[0, 0]) + (_matrix[1, 0] * matrix._matrix[0, 1]) + (_matrix[2, 0] * matrix._matrix[0, 2]);
newMatrix._matrix[0, 1] = (_matrix[0, 1] * matrix._matrix[0, 0]) + (_matrix[1, 1] * matrix._matrix[0, 1]) + (_matrix[2, 1] * matrix._matrix[0, 2]);
newMatrix._matrix[0, 2] = (_matrix[0, 2] * matrix._matrix[0, 0]) + (_matrix[1, 2] * matrix._matrix[0, 1]) + (_matrix[2, 2] * matrix._matrix[0, 2]);
newMatrix._matrix[1, 0] = (_matrix[0, 0] * matrix._matrix[1, 0]) + (_matrix[1, 0] * matrix._matrix[1, 1]) + (_matrix[2, 0] * matrix._matrix[1, 2]);
newMatrix._matrix[1, 1] = (_matrix[0, 1] * matrix._matrix[1, 0]) + (_matrix[1, 1] * matrix._matrix[1, 1]) + (_matrix[2, 1] * matrix._matrix[1, 2]);
newMatrix._matrix[1, 2] = (_matrix[0, 2] * matrix._matrix[1, 0]) + (_matrix[1, 2] * matrix._matrix[1, 1]) + (_matrix[2, 2] * matrix._matrix[1, 2]);
newMatrix._matrix[2, 0] = (_matrix[0, 0] * matrix._matrix[2, 0]) + (_matrix[1, 0] * matrix._matrix[2, 1]) + (_matrix[2, 0] * matrix._matrix[2, 2]);
newMatrix._matrix[2, 1] = (_matrix[0, 1] * matrix._matrix[2, 0]) + (_matrix[1, 1] * matrix._matrix[2, 1]) + (_matrix[2, 1] * matrix._matrix[2, 2]);
newMatrix._matrix[2, 2] = (_matrix[0, 2] * matrix._matrix[2, 0]) + (_matrix[1, 2] * matrix._matrix[2, 1]) + (_matrix[2, 2] * matrix._matrix[2, 2]);
return newMatrix;
}
public Matrix3x3 Copy()
{
Matrix3x3 newMatrix = new Matrix3x3();
newMatrix._matrix[0, 0] = _matrix[0, 0];
newMatrix._matrix[0, 1] = _matrix[0, 1];
newMatrix._matrix[0, 2] = _matrix[0, 2];
newMatrix._matrix[1, 0] = _matrix[1, 0];
newMatrix._matrix[1, 1] = _matrix[1, 1];
newMatrix._matrix[1, 2] = _matrix[1, 2];
newMatrix._matrix[2, 0] = _matrix[2, 0];
newMatrix._matrix[2, 1] = _matrix[2, 1];
newMatrix._matrix[2, 2] = _matrix[2, 2];
return newMatrix;
}
public bool IsCollinear(Matrix3x3 otherMatrix, double horizontalDelta = 0.00001, double verticalDelta = 0.00001)
{
double epsilon = 0.00001;
return (
Math.Abs(_matrix[0, 0] - otherMatrix.Matrix[0, 0]) <= epsilon &&
Math.Abs(_matrix[1, 0] - otherMatrix.Matrix[1, 0]) <= epsilon &&
Math.Abs(_matrix[0, 1] - otherMatrix.Matrix[0, 1]) <= epsilon &&
Math.Abs(_matrix[1, 1] - otherMatrix.Matrix[1, 1]) <= epsilon &&
Math.Abs(_matrix[0, 2] - otherMatrix.Matrix[0, 2]) <= horizontalDelta &&
Math.Abs(_matrix[1, 2] - otherMatrix.Matrix[1, 2]) <= verticalDelta &&
true);
}
#endregion
}
}

View File

@@ -0,0 +1,25 @@
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace VAR.PdfTools.Maths
{
public class Rect
{
public double XMin { get; set; }
public double XMax { get; set; }
public double YMin { get; set; }
public double YMax { get; set; }
public void Add(Rect rect)
{
if (rect.XMax > XMax) { XMax = rect.XMax; }
if (rect.YMax > YMax) { YMax = rect.YMax; }
if (rect.XMin < XMin) { XMin = rect.XMin; }
if (rect.YMin < YMin) { YMin = rect.YMin; }
}
}
}

View File

@@ -0,0 +1,33 @@
namespace VAR.PdfTools.Maths
{
public class Vector3D
{
#region Declarations
public double[] _vector = new double[3];
#endregion
#region Properties
public double[] Vector { get { return _vector; } }
#endregion
#region Creator
public Vector3D()
{
Init();
}
public void Init()
{
_vector[0] = 0.0;
_vector[1] = 0.0;
_vector[2] = 1.0;
}
#endregion
}
}

View File

@@ -1,4 +1,5 @@
using System.Collections.Generic;
using VAR.PdfTools.PdfElements;
namespace VAR.PdfTools
{

View File

@@ -2,6 +2,7 @@
using System.Collections.Generic;
using System.IO;
using System.Linq;
using VAR.PdfTools.PdfElements;
namespace VAR.PdfTools
{

View File

@@ -1,5 +1,6 @@
using System;
using System.Collections.Generic;
using VAR.PdfTools.PdfElements;
namespace VAR.PdfTools
{
@@ -68,7 +69,8 @@ namespace VAR.PdfTools
{
PdfParser parser = new PdfParser(_content);
_contentActions = parser.ParseContent();
}else
}
else
{
_contentActions = new List<PdfContentAction>();
}

View File

@@ -1,202 +0,0 @@
using System.Collections.Generic;
using System.IO;
namespace VAR.PdfTools
{
public enum PdfElementTypes
{
Undefined,
Boolean,
Integer,
Real,
String,
Name,
Array,
Dictionary,
Null,
ObjectReference,
Object,
Stream,
};
public interface IPdfElement
{
PdfElementTypes Type { get; }
}
public class PdfBoolean : IPdfElement
{
public PdfElementTypes Type { get { return PdfElementTypes.Boolean; } }
public bool Value { get; set; }
}
public class PdfInteger : IPdfElement
{
public PdfElementTypes Type { get { return PdfElementTypes.Integer; } }
public long Value { get; set; }
}
public class PdfReal : IPdfElement
{
public PdfElementTypes Type { get { return PdfElementTypes.Real; } }
public double Value { get; set; }
}
public class PdfString : IPdfElement
{
public PdfElementTypes Type { get { return PdfElementTypes.String; } }
public string Value { get; set; }
}
public class PdfName : IPdfElement
{
public PdfElementTypes Type { get { return PdfElementTypes.Name; } }
public string Value { get; set; }
}
public class PdfArray : IPdfElement
{
public PdfElementTypes Type { get { return PdfElementTypes.Array; } }
private List<IPdfElement> _values = new List<IPdfElement>();
public List<IPdfElement> Values { get { return _values; } }
}
public class PdfDictionary : IPdfElement
{
public PdfElementTypes Type { get { return PdfElementTypes.Dictionary; } }
private Dictionary<string, IPdfElement> _values = new Dictionary<string, IPdfElement>();
public Dictionary<string, IPdfElement> Values { get { return _values; } }
public string GetParamAsString(string name)
{
if (Values.ContainsKey(name) == false) { return null; }
IPdfElement value = Values[name];
if (value is PdfArray)
{
value = ((PdfArray)value).Values[0];
}
if (value is PdfName)
{
return ((PdfName)value).Value;
}
if (value is PdfString)
{
return ((PdfString)value).Value;
}
return null;
}
public long? GetParamAsInt(string name)
{
if (Values.ContainsKey(name) == false) { return null; }
IPdfElement value = Values[name];
if (value is PdfArray)
{
value = ((PdfArray)value).Values[0];
}
if (value is PdfInteger)
{
return ((PdfInteger)value).Value;
}
return null;
}
public byte[] GetParamAsStream(string name)
{
if (Values.ContainsKey(name) == false) { return null; }
IPdfElement value = Values[name];
if (value is PdfArray)
{
PdfArray array = value as PdfArray;
MemoryStream memStream = new MemoryStream();
foreach (IPdfElement elem in array.Values)
{
PdfStream stream = elem as PdfStream;
if (stream == null) { continue; }
memStream.Write(stream.Data, 0, stream.Data.Length);
}
if (memStream.Length > 0)
{
return memStream.ToArray();
}
return null;
}
if (value is PdfStream)
{
return ((PdfStream)value).Data;
}
return null;
}
}
public class PdfNull : IPdfElement
{
public PdfElementTypes Type { get { return PdfElementTypes.Null; } }
}
public class PdfObjectReference : IPdfElement
{
public PdfElementTypes Type { get { return PdfElementTypes.ObjectReference; } }
public int ObjectID { get; set; }
public int ObjectGeneration { get; set; }
}
public class PdfStream : IPdfElement
{
public PdfElementTypes Type { get { return PdfElementTypes.Stream; } }
public PdfDictionary Dictionary { get; set; }
public byte[] Data { get; set; }
public byte[] OriginalData { get; set; }
public IPdfElement OriginalFilter { get; set; }
}
public class PdfObject : IPdfElement
{
public PdfElementTypes Type { get { return PdfElementTypes.Object; } }
public int ObjectID { get; set; }
public int ObjectGeneration { get; set; }
public IPdfElement Data { get; set; }
public int UsageCount { get; set; }
}
public static class PdfElementUtils
{
public static double GetReal(IPdfElement elem, double defaultValue)
{
if (elem == null)
{
return defaultValue;
}
if (elem is PdfInteger)
{
return ((PdfInteger)elem).Value;
}
if (elem is PdfReal)
{
return ((PdfReal)elem).Value;
}
return defaultValue;
}
public static long GetInt(IPdfElement elem, long defaultValue)
{
if (elem == null)
{
return defaultValue;
}
if (elem is PdfInteger)
{
return ((PdfInteger)elem).Value;
}
if (elem is PdfReal)
{
return (long)((PdfReal)elem).Value;
}
return defaultValue;
}
}
}

View File

@@ -0,0 +1,7 @@
namespace VAR.PdfTools.PdfElements
{
public interface IPdfElement
{
PdfElementTypes Type { get; }
}
}

View File

@@ -0,0 +1,11 @@
using System.Collections.Generic;
namespace VAR.PdfTools.PdfElements
{
public class PdfArray : IPdfElement
{
public PdfElementTypes Type { get { return PdfElementTypes.Array; } }
private List<IPdfElement> _values = new List<IPdfElement>();
public List<IPdfElement> Values { get { return _values; } }
}
}

View File

@@ -0,0 +1,8 @@
namespace VAR.PdfTools.PdfElements
{
public class PdfBoolean : IPdfElement
{
public PdfElementTypes Type { get { return PdfElementTypes.Boolean; } }
public bool Value { get; set; }
}
}

View File

@@ -0,0 +1,77 @@
using System.Collections.Generic;
using System.IO;
namespace VAR.PdfTools.PdfElements
{
public class PdfDictionary : IPdfElement
{
public PdfElementTypes Type { get { return PdfElementTypes.Dictionary; } }
private Dictionary<string, IPdfElement> _values = new Dictionary<string, IPdfElement>();
public Dictionary<string, IPdfElement> Values { get { return _values; } }
public string GetParamAsString(string name)
{
if (Values.ContainsKey(name) == false) { return null; }
IPdfElement value = Values[name];
if (value is PdfArray)
{
value = ((PdfArray)value).Values[0];
}
if (value is PdfName)
{
return ((PdfName)value).Value;
}
if (value is PdfString)
{
return ((PdfString)value).Value;
}
return null;
}
public long? GetParamAsInt(string name)
{
if (Values.ContainsKey(name) == false) { return null; }
IPdfElement value = Values[name];
if (value is PdfArray)
{
value = ((PdfArray)value).Values[0];
}
if (value is PdfInteger)
{
return ((PdfInteger)value).Value;
}
return null;
}
public byte[] GetParamAsStream(string name)
{
if (Values.ContainsKey(name) == false) { return null; }
IPdfElement value = Values[name];
if (value is PdfArray)
{
PdfArray array = value as PdfArray;
MemoryStream memStream = new MemoryStream();
foreach (IPdfElement elem in array.Values)
{
PdfStream stream = elem as PdfStream;
if (stream == null) { continue; }
memStream.Write(stream.Data, 0, stream.Data.Length);
}
if (memStream.Length > 0)
{
return memStream.ToArray();
}
return null;
}
if (value is PdfStream)
{
return ((PdfStream)value).Data;
}
return null;
}
}
}

View File

@@ -0,0 +1,18 @@
namespace VAR.PdfTools.PdfElements
{
public enum PdfElementTypes
{
Undefined,
Boolean,
Integer,
Real,
String,
Name,
Array,
Dictionary,
Null,
ObjectReference,
Object,
Stream,
};
}

View File

@@ -0,0 +1,56 @@
namespace VAR.PdfTools.PdfElements
{
public static class PdfElementUtils
{
public static double GetReal(IPdfElement elem, double defaultValue)
{
if (elem == null)
{
return defaultValue;
}
if (elem is PdfInteger)
{
return ((PdfInteger)elem).Value;
}
if (elem is PdfReal)
{
return ((PdfReal)elem).Value;
}
return defaultValue;
}
public static long GetInt(IPdfElement elem, long defaultValue)
{
if (elem == null)
{
return defaultValue;
}
if (elem is PdfInteger)
{
return ((PdfInteger)elem).Value;
}
if (elem is PdfReal)
{
return (long)((PdfReal)elem).Value;
}
return defaultValue;
}
public static string GetString(IPdfElement elem, string defaultValue)
{
if (elem == null)
{
return defaultValue;
}
if (elem is PdfString)
{
return ((PdfString)elem).Value;
}
if (elem is PdfName)
{
return ((PdfName)elem).Value;
}
return defaultValue;
}
}
}

View File

@@ -0,0 +1,8 @@
namespace VAR.PdfTools.PdfElements
{
public class PdfInteger : IPdfElement
{
public PdfElementTypes Type { get { return PdfElementTypes.Integer; } }
public long Value { get; set; }
}
}

View File

@@ -0,0 +1,8 @@
namespace VAR.PdfTools.PdfElements
{
public class PdfName : IPdfElement
{
public PdfElementTypes Type { get { return PdfElementTypes.Name; } }
public string Value { get; set; }
}
}

View File

@@ -0,0 +1,7 @@
namespace VAR.PdfTools.PdfElements
{
public class PdfNull : IPdfElement
{
public PdfElementTypes Type { get { return PdfElementTypes.Null; } }
}
}

View File

@@ -0,0 +1,11 @@
namespace VAR.PdfTools.PdfElements
{
public class PdfObject : IPdfElement
{
public PdfElementTypes Type { get { return PdfElementTypes.Object; } }
public int ObjectID { get; set; }
public int ObjectGeneration { get; set; }
public IPdfElement Data { get; set; }
public int UsageCount { get; set; }
}
}

View File

@@ -0,0 +1,9 @@
namespace VAR.PdfTools.PdfElements
{
public class PdfObjectReference : IPdfElement
{
public PdfElementTypes Type { get { return PdfElementTypes.ObjectReference; } }
public int ObjectID { get; set; }
public int ObjectGeneration { get; set; }
}
}

View File

@@ -0,0 +1,8 @@
namespace VAR.PdfTools.PdfElements
{
public class PdfReal : IPdfElement
{
public PdfElementTypes Type { get { return PdfElementTypes.Real; } }
public double Value { get; set; }
}
}

View File

@@ -0,0 +1,12 @@
namespace VAR.PdfTools.PdfElements
{
public class PdfStream : IPdfElement
{
public PdfElementTypes Type { get { return PdfElementTypes.Stream; } }
public PdfDictionary Dictionary { get; set; }
public byte[] Data { get; set; }
public byte[] OriginalData { get; set; }
public IPdfElement OriginalFilter { get; set; }
}
}

View File

@@ -0,0 +1,8 @@
namespace VAR.PdfTools.PdfElements
{
public class PdfString : IPdfElement
{
public PdfElementTypes Type { get { return PdfElementTypes.String; } }
public string Value { get; set; }
}
}

View File

@@ -1,5 +1,5 @@
using System;
using System.Collections.Generic;
using System.Collections.Generic;
using VAR.PdfTools.PdfElements;
namespace VAR.PdfTools
{
@@ -45,6 +45,19 @@ namespace VAR.PdfTools
_tainted = true;
}
PrepareSizes(baseData);
}
#endregion
#region Private methods
private void PrepareSizes(PdfDictionary baseData)
{
// Set "Times-Roman" as default basefont sizes
_widths = PdfStandar14FontMetrics.Times_Roman.Widths;
_height = PdfStandar14FontMetrics.Times_Roman.ApproxHeight;
if (baseData.Values.ContainsKey("ToUnicode"))
{
byte[] toUnicodeStream = ((PdfStream)baseData.Values["ToUnicode"]).Data;
@@ -52,106 +65,107 @@ namespace VAR.PdfTools
_toUnicode = parser.ParseToUnicode();
}
string baseFont = _baseData.GetParamAsString("BaseFont");
if (string.IsNullOrEmpty(baseFont))
{
SetBaseFontSizes(baseFont);
}
if (_baseData.Values.ContainsKey("FirstChar") && _baseData.Values.ContainsKey("LastChar") && _baseData.Values.ContainsKey("Widths"))
{
double glyphSpaceToTextSpace = 1000.0; // FIXME: SubType:Type3 Uses a FontMatrix that may not correspond to 1/1000th
_widths = new Dictionary<char, double>();
char firstChar = (char)_baseData.GetParamAsInt("FirstChar");
char lastChar = (char)_baseData.GetParamAsInt("LastChar");
PdfArray widths = _baseData.Values["Widths"] as PdfArray;
char actualChar = firstChar;
foreach (IPdfElement elem in widths.Values)
{
PdfReal widthReal = elem as PdfReal;
if (widthReal != null)
{
_widths.Add(actualChar, widthReal.Value / glyphSpaceToTextSpace);
actualChar++;
continue;
}
PdfInteger widthInt = elem as PdfInteger;
if (widthInt != null)
{
_widths.Add(actualChar, widthInt.Value / glyphSpaceToTextSpace);
actualChar++;
continue;
}
}
// FIMXE: Calculate real height
ParseSizes();
}
else
}
private void ParseSizes()
{
double glyphSpaceToTextSpace = 1000.0; // FIXME: SubType:Type3 Uses a FontMatrix that may not correspond to 1/1000th
_widths = new Dictionary<char, double>();
char firstChar = (char)_baseData.GetParamAsInt("FirstChar");
char lastChar = (char)_baseData.GetParamAsInt("LastChar");
PdfArray widths = _baseData.Values["Widths"] as PdfArray;
char actualChar = firstChar;
foreach (IPdfElement elem in widths.Values)
{
string baseFont = _baseData.GetParamAsString("BaseFont");
if (baseFont == "Times-Roman")
{
_widths = PdfStandar14FontMetrics.Times_Roman.Widths;
_height = PdfStandar14FontMetrics.Times_Roman.ApproxHeight;
}
if (baseFont == "Times-Bold")
{
_widths = PdfStandar14FontMetrics.Times_Bold.Widths;
_height = PdfStandar14FontMetrics.Times_Bold.ApproxHeight;
}
if (baseFont == "Times-Italic")
{
_widths = PdfStandar14FontMetrics.Times_Italic.Widths;
_height = PdfStandar14FontMetrics.Times_Italic.ApproxHeight;
}
if (baseFont == "Times-BoldItalic")
{
_widths = PdfStandar14FontMetrics.Times_BoldItalic.Widths;
_height = PdfStandar14FontMetrics.Times_BoldItalic.ApproxHeight;
}
if (baseFont == "Helvetica")
{
_widths = PdfStandar14FontMetrics.Helvetica.Widths;
_height = PdfStandar14FontMetrics.Helvetica.ApproxHeight;
}
if (baseFont == "Helvetica-Bold")
{
_widths = PdfStandar14FontMetrics.Helvetica_Bold.Widths;
_height = PdfStandar14FontMetrics.Helvetica_Bold.ApproxHeight;
}
if (baseFont == "Helvetica-Oblique")
{
_widths = PdfStandar14FontMetrics.Helvetica_Oblique.Widths;
_height = PdfStandar14FontMetrics.Helvetica_Oblique.ApproxHeight;
}
if (baseFont == "Helvetica-BoldOblique")
{
_widths = PdfStandar14FontMetrics.Helvetica_BoldOblique.Widths;
_height = PdfStandar14FontMetrics.Helvetica_BoldOblique.ApproxHeight;
}
if (baseFont == "Courier")
{
_widths = PdfStandar14FontMetrics.Courier.Widths;
_height = PdfStandar14FontMetrics.Courier.ApproxHeight;
}
if (baseFont == "Courier-Bold")
{
_widths = PdfStandar14FontMetrics.Courier_Bold.Widths;
_height = PdfStandar14FontMetrics.Courier_Bold.ApproxHeight;
}
if (baseFont == "Courier-Oblique")
{
_widths = PdfStandar14FontMetrics.Courier_Oblique.Widths;
_height = PdfStandar14FontMetrics.Courier_Oblique.ApproxHeight;
}
if (baseFont == "Courier-BoldOblique")
{
_widths = PdfStandar14FontMetrics.Courier_BoldOblique.Widths;
_height = PdfStandar14FontMetrics.Courier_BoldOblique.ApproxHeight;
}
if (baseFont == "Symbol")
{
_widths = PdfStandar14FontMetrics.Symbol.Widths;
_height = PdfStandar14FontMetrics.Symbol.ApproxHeight;
}
if (baseFont == "ZapfDingbats")
{
_widths = PdfStandar14FontMetrics.ZapfDingbats.Widths;
_height = PdfStandar14FontMetrics.ZapfDingbats.ApproxHeight;
}
double width = PdfElementUtils.GetReal(elem, 500);
if (width < 0.0001f && width > -0.0001f) { width = 500; }
_widths.Add(actualChar, width / glyphSpaceToTextSpace);
actualChar++;
}
// FIMXE: Calculate real height
}
private void SetBaseFontSizes(string baseFont)
{
if (baseFont == "Times-Roman")
{
_widths = PdfStandar14FontMetrics.Times_Roman.Widths;
_height = PdfStandar14FontMetrics.Times_Roman.ApproxHeight;
}
if (baseFont == "Times-Bold")
{
_widths = PdfStandar14FontMetrics.Times_Bold.Widths;
_height = PdfStandar14FontMetrics.Times_Bold.ApproxHeight;
}
if (baseFont == "Times-Italic")
{
_widths = PdfStandar14FontMetrics.Times_Italic.Widths;
_height = PdfStandar14FontMetrics.Times_Italic.ApproxHeight;
}
if (baseFont == "Times-BoldItalic")
{
_widths = PdfStandar14FontMetrics.Times_BoldItalic.Widths;
_height = PdfStandar14FontMetrics.Times_BoldItalic.ApproxHeight;
}
if (baseFont == "Helvetica")
{
_widths = PdfStandar14FontMetrics.Helvetica.Widths;
_height = PdfStandar14FontMetrics.Helvetica.ApproxHeight;
}
if (baseFont == "Helvetica-Bold")
{
_widths = PdfStandar14FontMetrics.Helvetica_Bold.Widths;
_height = PdfStandar14FontMetrics.Helvetica_Bold.ApproxHeight;
}
if (baseFont == "Helvetica-Oblique")
{
_widths = PdfStandar14FontMetrics.Helvetica_Oblique.Widths;
_height = PdfStandar14FontMetrics.Helvetica_Oblique.ApproxHeight;
}
if (baseFont == "Helvetica-BoldOblique")
{
_widths = PdfStandar14FontMetrics.Helvetica_BoldOblique.Widths;
_height = PdfStandar14FontMetrics.Helvetica_BoldOblique.ApproxHeight;
}
if (baseFont == "Courier")
{
_widths = PdfStandar14FontMetrics.Courier.Widths;
_height = PdfStandar14FontMetrics.Courier.ApproxHeight;
}
if (baseFont == "Courier-Bold")
{
_widths = PdfStandar14FontMetrics.Courier_Bold.Widths;
_height = PdfStandar14FontMetrics.Courier_Bold.ApproxHeight;
}
if (baseFont == "Courier-Oblique")
{
_widths = PdfStandar14FontMetrics.Courier_Oblique.Widths;
_height = PdfStandar14FontMetrics.Courier_Oblique.ApproxHeight;
}
if (baseFont == "Courier-BoldOblique")
{
_widths = PdfStandar14FontMetrics.Courier_BoldOblique.Widths;
_height = PdfStandar14FontMetrics.Courier_BoldOblique.ApproxHeight;
}
if (baseFont == "Symbol")
{
_widths = PdfStandar14FontMetrics.Symbol.Widths;
_height = PdfStandar14FontMetrics.Symbol.ApproxHeight;
}
if (baseFont == "ZapfDingbats")
{
_widths = PdfStandar14FontMetrics.ZapfDingbats.Widths;
_height = PdfStandar14FontMetrics.ZapfDingbats.ApproxHeight;
}
}
@@ -177,15 +191,23 @@ namespace VAR.PdfTools
public double GetCharWidth(char character)
{
double charWidth = 0;
if (_widths == null)
{
return 0;
return charWidth;
}
if (_widths.ContainsKey(character))
{
return _widths[character];
charWidth = _widths[character];
}
return 0;
// NOTE: Convert "Zero" to default width of 0.5
if (charWidth <= 0.0001)
{
charWidth = 0.5;
}
return charWidth;
}
#endregion

View File

@@ -0,0 +1,210 @@
using System;
using System.Drawing;
using System.Drawing.Drawing2D;
using System.Drawing.Imaging;
using VAR.PdfTools.Maths;
namespace VAR.PdfTools
{
public class PdfPageRenderer
{
private PdfDocumentPage _page;
private PdfTextExtractor _pdfTextExtractor;
private Rect _pageRect;
private int _pageWidth;
private int _pageHeight;
private int _scale = 10;
private const int MaxSize = 10000;
public PdfTextExtractor Extractor { get { return _pdfTextExtractor; } }
public PdfPageRenderer(PdfDocumentPage page)
{
_page = page;
_pdfTextExtractor = new PdfTextExtractor(_page);
InitPage();
}
public PdfPageRenderer(PdfTextExtractor pdfTextExtractor)
{
_pdfTextExtractor = pdfTextExtractor;
_page = pdfTextExtractor.Page;
InitPage();
}
private void InitPage()
{
_pageRect = _pdfTextExtractor.GetRect();
_pageWidth = (int)Math.Ceiling(_pageRect.XMax - _pageRect.XMin);
_pageHeight = (int)Math.Ceiling(_pageRect.YMax - _pageRect.YMin);
while ((_pageWidth * _scale) > MaxSize) { _scale--; }
while ((_pageHeight * _scale) > MaxSize) { _scale--; }
if (_scale <= 0) { _scale = 1; }
}
public Bitmap Render()
{
if (_pdfTextExtractor.Elements.Count == 0)
{
// Nothing to render
Bitmap emptyBmp = new Bitmap(100, 200, PixelFormat.Format32bppArgb);
using (Graphics gcEmpty = Graphics.FromImage(emptyBmp))
gcEmpty.Clear(Color.White);
return emptyBmp;
}
// Prepare image
Bitmap bmp = new Bitmap(_pageWidth * _scale, _pageHeight * _scale, PixelFormat.Format32bppArgb);
Graphics gc = Graphics.FromImage(bmp);
gc.Clear(Color.White);
// Draw text elements of the page
using (Pen penTextElem = new Pen(Color.Blue))
using (Pen penCharElem = new Pen(Color.Navy))
{
foreach (PdfTextElement textElement in _pdfTextExtractor.Elements)
{
DrawTextElement(textElement, gc, penTextElem, penCharElem, _scale, _pageHeight, _pageRect.XMin, _pageRect.YMin, Brushes.Black);
}
}
gc.Dispose();
return bmp;
}
public Bitmap RenderColumn(PdfTextElementColumn columnData, Bitmap bmp = null)
{
Graphics gc;
if (bmp == null)
{
bmp = new Bitmap(_pageWidth * _scale, _pageHeight * _scale, PixelFormat.Format32bppArgb);
gc = Graphics.FromImage(bmp);
gc.Clear(Color.White);
}
else
{
gc = Graphics.FromImage(bmp);
}
// Draw text elements of the column header
using (Pen penTextElem = new Pen(Color.Green))
using (Pen penCharElem = new Pen(Color.DarkGreen))
{
DrawTextElement(columnData.HeadTextElement, gc, penTextElem, penCharElem, _scale, _pageHeight, _pageRect.XMin, _pageRect.YMin, Brushes.Olive);
}
// Draw text elements of the column
using (Pen penTextElem = new Pen(Color.Red))
using (Pen penCharElem = new Pen(Color.DarkRed))
{
foreach (PdfTextElement textElement in columnData.Elements)
{
DrawTextElement(textElement, gc, penTextElem, penCharElem, _scale, _pageHeight, _pageRect.XMin, _pageRect.YMin, Brushes.OrangeRed);
}
}
// Draw column extents
using (Pen penColumn = new Pen(Color.Red))
{
float y = (float)(_pageRect.YMax - columnData.Y);
float x1 = (float)(columnData.X1 - _pageRect.XMin);
float x2 = (float)(columnData.X2 - _pageRect.XMin);
gc.DrawLine(penColumn, x1 * _scale, y * _scale, x2 * _scale, y * _scale);
gc.DrawLine(penColumn, x1 * _scale, y * _scale, x1 * _scale, _pageHeight * _scale);
gc.DrawLine(penColumn, x2 * _scale, y * _scale, x2 * _scale, _pageHeight * _scale);
}
gc.Dispose();
return bmp;
}
private static void DrawTextElement(PdfTextElement textElement, Graphics gc, Pen penTextElem, Pen penCharElem, int scale, int pageHeight, double pageXMin, double pageYMin, Brush brushText)
{
if (textElement == null) { return; }
double textElementX = textElement.GetX() - pageXMin;
double textElementY = textElement.GetY() - pageYMin;
double textElementWidth = textElement.VisibleWidth;
double textElementHeight = textElement.VisibleHeight;
string textElementText = textElement.VisibleText;
string textElementFontName = (textElement.Font == null ? string.Empty : textElement.Font.Name);
if (textElementHeight < 0.0001) { return; }
double textElementPageX = textElementX;
double textElementPageY = pageHeight - textElementY;
if (penTextElem != null)
{
DrawRoundedRectangle(gc, penTextElem,
(int)(textElementPageX * scale),
(int)(textElementPageY * scale),
(int)(textElementWidth * scale),
(int)(textElementHeight * scale),
5);
}
using (Font font = new Font("Arial", (int)(textElementHeight * scale), GraphicsUnit.Pixel))
{
foreach (PdfCharElement c in textElement.Characters)
{
gc.DrawString(c.Char,
font,
brushText,
(int)((textElementPageX + c.Displacement) * scale),
(int)(textElementPageY * scale));
if (penCharElem != null)
{
DrawRoundedRectangle(gc, penCharElem,
(int)((textElementPageX + c.Displacement) * scale),
(int)(textElementPageY * scale),
(int)(c.Width * scale),
(int)(textElementHeight * scale),
5);
}
}
}
}
public static GraphicsPath RoundedRect(int x, int y, int width, int height, int radius)
{
int diameter = radius * 2;
Size size = new Size(diameter, diameter);
Rectangle arc = new Rectangle(x, y, diameter, diameter);
GraphicsPath path = new GraphicsPath();
// top left arc
path.AddArc(arc, 180, 90);
// top right arc
arc.X = (x + width) - diameter;
path.AddArc(arc, 270, 90);
// bottom right arc
arc.Y = (y + height) - diameter;
path.AddArc(arc, 0, 90);
// bottom left arc
arc.X = x;
path.AddArc(arc, 90, 90);
path.CloseFigure();
return path;
}
public static void DrawRoundedRectangle(Graphics graphics, Pen pen, int x, int y, int width, int height, int cornerRadius)
{
if (graphics == null)
throw new ArgumentNullException("graphics");
if (pen == null)
throw new ArgumentNullException("pen");
using (GraphicsPath path = RoundedRect(x, y, width, height, cornerRadius))
{
graphics.DrawPath(pen, path);
}
}
}
}

View File

@@ -4,6 +4,7 @@ using System.Globalization;
using System.IO;
using System.Linq;
using System.Text;
using VAR.PdfTools.PdfElements;
namespace VAR.PdfTools
{

View File

@@ -0,0 +1,149 @@
using System.Collections.Generic;
using System.Linq;
using VAR.PdfTools.Maths;
namespace VAR.PdfTools
{
public struct PdfCharElement
{
public string Char;
public double Displacement;
public double Width;
}
public class PdfTextElement
{
#region Properties
public PdfFont Font { get; set; }
public double FontSize { get; set; }
public Matrix3x3 Matrix { get; set; }
public string RawText { get; set; }
public string VisibleText { get; set; }
public double VisibleWidth { get; set; }
public double VisibleHeight { get; set; }
public List<PdfCharElement> Characters { get; set; }
public List<PdfTextElement> Childs { get; set; }
#endregion
#region Public methods
public double GetX()
{
return Matrix.Matrix[0, 2];
}
public double GetY()
{
return Matrix.Matrix[1, 2];
}
public PdfTextElement SubPart(int startIndex, int endIndex)
{
PdfTextElement blockElem = new PdfTextElement
{
Font = null,
FontSize = FontSize,
Matrix = Matrix.Copy(),
RawText = RawText.Substring(startIndex, endIndex - startIndex),
VisibleText = VisibleText.Substring(startIndex, endIndex - startIndex),
VisibleWidth = 0,
VisibleHeight = VisibleHeight,
Characters = new List<PdfCharElement>(),
Childs = new List<PdfTextElement>(),
};
double displacement = Characters[startIndex].Displacement;
blockElem.Matrix.Matrix[0, 2] += displacement;
for (int j = startIndex; j < endIndex; j++)
{
blockElem.Characters.Add(new PdfCharElement
{
Char = Characters[j].Char,
Displacement = Characters[j].Displacement - displacement,
Width = Characters[j].Width,
});
}
PdfCharElement lastChar = blockElem.Characters[blockElem.Characters.Count - 1];
blockElem.VisibleWidth = lastChar.Displacement + lastChar.Width;
foreach (PdfTextElement elem in Childs)
{
blockElem.Childs.Add(elem);
}
return blockElem;
}
public double MaxWidth()
{
return Characters.Average(c => c.Width);
}
public Rect GetRect()
{
double x = GetX();
double y = GetY();
return new Rect
{
XMin = x,
YMax = y,
XMax = x + VisibleWidth,
YMin = y - VisibleHeight,
};
}
public double GetCharacterPreviousSpacing(int index)
{
if (index <= 0) { return 0; }
double previousEnd = Characters[index - 1].Displacement + Characters[index - 1].Width;
double spacing = Characters[index].Displacement - previousEnd;
return spacing;
}
public double GetCharacterPrecedingSpacing(int index)
{
if (index >= (Characters.Count - 1)) { return 0; }
double currentEnd = Characters[index].Displacement + Characters[index].Width;
double spacing = Characters[index + 1].Displacement - currentEnd;
return spacing;
}
#endregion
}
public class PdfTextElementColumn
{
public PdfTextElement HeadTextElement { get; private set; }
public IEnumerable<PdfTextElement> Elements { get; private set; }
public double Y { get; private set; }
public double X1 { get; private set; }
public double X2 { get; private set; }
public static PdfTextElementColumn Empty { get; } = new PdfTextElementColumn();
private PdfTextElementColumn()
{
Elements = new List<PdfTextElement>();
}
public PdfTextElementColumn(PdfTextElement head, IEnumerable<PdfTextElement> elements, double y, double x1, double x2)
{
HeadTextElement = head;
Elements = elements;
Y = y;
X1 = x1;
X2 = x2;
}
}
}

View File

@@ -2,195 +2,11 @@
using System.Collections.Generic;
using System.Linq;
using System.Text;
using VAR.PdfTools.Maths;
using VAR.PdfTools.PdfElements;
namespace VAR.PdfTools
{
public class Vector3D
{
#region Declarations
public double[] _vector = new double[3];
#endregion
#region Properties
public double[] Vector { get { return _vector; } }
#endregion
#region Creator
public Vector3D()
{
Init();
}
public void Init()
{
_vector[0] = 0.0;
_vector[1] = 0.0;
_vector[2] = 1.0;
}
#endregion
}
public class Matrix3x3
{
#region Declarations
public double[,] _matrix = new double[3, 3];
#endregion
#region Properties
public double[,] Matrix { get { return _matrix; } }
#endregion
#region Creator
public Matrix3x3()
{
Idenity();
}
public Matrix3x3(double a, double b, double c, double d, double e, double f)
{
Set(a, b, c, d, e, f);
}
#endregion
#region Public methods
public void Idenity()
{
_matrix[0, 0] = 1.0;
_matrix[0, 1] = 0.0;
_matrix[0, 2] = 0.0;
_matrix[1, 0] = 0.0;
_matrix[1, 1] = 1.0;
_matrix[1, 2] = 0.0;
_matrix[2, 0] = 0.0;
_matrix[2, 1] = 0.0;
_matrix[2, 2] = 1.0;
}
public void Set(double a, double b, double c, double d, double e, double f)
{
_matrix[0, 0] = a;
_matrix[1, 0] = b;
_matrix[2, 0] = 0;
_matrix[0, 1] = c;
_matrix[1, 1] = d;
_matrix[2, 1] = 0;
_matrix[0, 2] = e;
_matrix[1, 2] = f;
_matrix[2, 2] = 1;
}
public Vector3D Multiply(Vector3D vect)
{
Vector3D vectResult = new Vector3D();
vectResult.Vector[0] = (vect.Vector[0] * _matrix[0, 0]) + (vect.Vector[1] * _matrix[0, 1]) + (vect.Vector[2] * _matrix[0, 2]);
vectResult.Vector[1] = (vect.Vector[0] * _matrix[1, 0]) + (vect.Vector[1] * _matrix[1, 1]) + (vect.Vector[2] * _matrix[1, 2]);
vectResult.Vector[2] = (vect.Vector[0] * _matrix[2, 0]) + (vect.Vector[1] * _matrix[2, 1]) + (vect.Vector[2] * _matrix[2, 2]);
return vectResult;
}
public Matrix3x3 Multiply(Matrix3x3 matrix)
{
Matrix3x3 newMatrix = new Matrix3x3();
newMatrix._matrix[0, 0] = (_matrix[0, 0] * matrix._matrix[0, 0]) + (_matrix[1, 0] * matrix._matrix[0, 1]) + (_matrix[2, 0] * matrix._matrix[0, 2]);
newMatrix._matrix[0, 1] = (_matrix[0, 1] * matrix._matrix[0, 0]) + (_matrix[1, 1] * matrix._matrix[0, 1]) + (_matrix[2, 1] * matrix._matrix[0, 2]);
newMatrix._matrix[0, 2] = (_matrix[0, 2] * matrix._matrix[0, 0]) + (_matrix[1, 2] * matrix._matrix[0, 1]) + (_matrix[2, 2] * matrix._matrix[0, 2]);
newMatrix._matrix[1, 0] = (_matrix[0, 0] * matrix._matrix[1, 0]) + (_matrix[1, 0] * matrix._matrix[1, 1]) + (_matrix[2, 0] * matrix._matrix[1, 2]);
newMatrix._matrix[1, 1] = (_matrix[0, 1] * matrix._matrix[1, 0]) + (_matrix[1, 1] * matrix._matrix[1, 1]) + (_matrix[2, 1] * matrix._matrix[1, 2]);
newMatrix._matrix[1, 2] = (_matrix[0, 2] * matrix._matrix[1, 0]) + (_matrix[1, 2] * matrix._matrix[1, 1]) + (_matrix[2, 2] * matrix._matrix[1, 2]);
newMatrix._matrix[2, 0] = (_matrix[0, 0] * matrix._matrix[2, 0]) + (_matrix[1, 0] * matrix._matrix[2, 1]) + (_matrix[2, 0] * matrix._matrix[2, 2]);
newMatrix._matrix[2, 1] = (_matrix[0, 1] * matrix._matrix[2, 0]) + (_matrix[1, 1] * matrix._matrix[2, 1]) + (_matrix[2, 1] * matrix._matrix[2, 2]);
newMatrix._matrix[2, 2] = (_matrix[0, 2] * matrix._matrix[2, 0]) + (_matrix[1, 2] * matrix._matrix[2, 1]) + (_matrix[2, 2] * matrix._matrix[2, 2]);
return newMatrix;
}
public Matrix3x3 Copy()
{
Matrix3x3 newMatrix = new Matrix3x3();
newMatrix._matrix[0, 0] = _matrix[0, 0];
newMatrix._matrix[0, 1] = _matrix[0, 1];
newMatrix._matrix[0, 2] = _matrix[0, 2];
newMatrix._matrix[1, 0] = _matrix[1, 0];
newMatrix._matrix[1, 1] = _matrix[1, 1];
newMatrix._matrix[1, 2] = _matrix[1, 2];
newMatrix._matrix[2, 0] = _matrix[2, 0];
newMatrix._matrix[2, 1] = _matrix[2, 1];
newMatrix._matrix[2, 2] = _matrix[2, 2];
return newMatrix;
}
public bool IsCollinear(Matrix3x3 otherMatrix, double horizontalDelta = 0.00001, double verticalDelta = 0.00001)
{
double epsilon = 0.00001;
return (
Math.Abs(_matrix[0, 0] - otherMatrix.Matrix[0, 0]) <= epsilon &&
Math.Abs(_matrix[1, 0] - otherMatrix.Matrix[1, 0]) <= epsilon &&
Math.Abs(_matrix[0, 1] - otherMatrix.Matrix[0, 1]) <= epsilon &&
Math.Abs(_matrix[1, 1] - otherMatrix.Matrix[1, 1]) <= epsilon &&
Math.Abs(_matrix[0, 2] - otherMatrix.Matrix[0, 2]) <= horizontalDelta &&
Math.Abs(_matrix[1, 2] - otherMatrix.Matrix[1, 2]) <= verticalDelta &&
true);
}
#endregion
}
public class PdfTextElement
{
#region Properties
public PdfFont Font { get; set; }
public double FontSize { get; set; }
public Matrix3x3 Matrix { get; set; }
public string RawText { get; set; }
public string VisibleText { get; set; }
public double VisibleWidth { get; set; }
public double VisibleHeight { get; set; }
private List<PdfTextElement> _childs = new List<PdfTextElement>();
public List<PdfTextElement> Childs { get { return _childs; } }
#endregion
#region Public methods
public double GetX()
{
return Matrix.Matrix[0, 2];
}
public double GetY()
{
return Matrix.Matrix[1, 2];
}
#endregion
}
public class PdfTextExtractor
{
#region Declarations
@@ -206,15 +22,17 @@ namespace VAR.PdfTools
// Text state
private PdfFont _font = null;
private double _fontSize = 1;
private double _charSpacing = 0;
private double _wordSpacing = 0;
private double _textLeading = 0;
// Text object state
private bool inText = false;
private Matrix3x3 _textMatrix = new Matrix3x3();
private Matrix3x3 _textMatrixCurrent = new Matrix3x3();
private StringBuilder _sbText = new StringBuilder();
private double _textWidth = 0;
PdfTextElement _currentTextElement = null;
private List<PdfCharElement> _listCharacters = new List<PdfCharElement>();
#endregion
@@ -231,7 +49,9 @@ namespace VAR.PdfTools
public PdfTextExtractor(PdfDocumentPage page)
{
_page = page;
ProcessPage();
ProcessPageContent();
JoinTextElements();
SplitTextElements();
}
#endregion
@@ -258,42 +78,26 @@ namespace VAR.PdfTools
PdfTextElement textElem = new PdfTextElement();
textElem.Font = _font;
textElem.FontSize = _fontSize;
textElem.Matrix = _textMatrix.Multiply(_graphicsMatrix);
textElem.Matrix = _textMatrixCurrent.Multiply(_graphicsMatrix);
textElem.RawText = _sbText.ToString();
textElem.VisibleText = PdfString_ToUnicode(textElem.RawText, _font);
textElem.VisibleWidth = _textWidth * textElem.Matrix.Matrix[0, 0];
PdfCharElement lastChar = _listCharacters[_listCharacters.Count - 1];
textElem.VisibleWidth = (lastChar.Displacement + lastChar.Width) * textElem.Matrix.Matrix[0, 0];
textElem.VisibleHeight = (_font.Height * _fontSize) * textElem.Matrix.Matrix[1, 1];
textElem.Characters = new List<PdfCharElement>();
foreach (PdfCharElement c in _listCharacters)
{
textElem.Characters.Add(new PdfCharElement
{
Char = c.Char,
Displacement = (c.Displacement * textElem.Matrix.Matrix[0, 0]),
Width = (c.Width * textElem.Matrix.Matrix[0, 0]),
});
}
textElem.Childs = new List<PdfTextElement>();
return textElem;
}
private void FlushTextElementSoft()
{
if (_sbText.Length == 0)
{
return;
}
PdfTextElement textElem = BuildTextElement();
if (_currentTextElement == null)
{
_currentTextElement = new PdfTextElement();
_currentTextElement.Font = null;
_currentTextElement.FontSize = -1;
_currentTextElement.Matrix = textElem.Matrix.Copy();
_currentTextElement.RawText = string.Empty;
_currentTextElement.VisibleText = string.Empty;
_currentTextElement.VisibleWidth = 0;
_currentTextElement.VisibleHeight = 0;
}
_currentTextElement.VisibleText += textElem.VisibleText;
_currentTextElement.VisibleWidth += textElem.VisibleWidth;
_currentTextElement.VisibleHeight = System.Math.Max(_currentTextElement.VisibleHeight, textElem.VisibleHeight);
_currentTextElement.Childs.Add(textElem);
_sbText = new StringBuilder();
_textWidth = 0;
}
private void AddTextElement(PdfTextElement textElement)
{
if (string.IsNullOrEmpty(textElement.VisibleText.Trim()))
@@ -307,27 +111,16 @@ namespace VAR.PdfTools
{
if (_sbText.Length == 0)
{
if (_currentTextElement != null)
{
AddTextElement(_currentTextElement);
_currentTextElement = null;
}
_textWidth = 0;
return;
}
PdfTextElement textElem = BuildTextElement();
AddTextElement(textElem);
if (_currentTextElement != null)
{
FlushTextElementSoft();
AddTextElement(_currentTextElement);
_currentTextElement = null;
}
else
{
PdfTextElement textElem = BuildTextElement();
AddTextElement(textElem);
}
_textMatrixCurrent.Matrix[0, 2] += _textWidth;
_sbText = new StringBuilder();
_listCharacters.Clear();
_textWidth = 0;
}
@@ -362,6 +155,29 @@ namespace VAR.PdfTools
return null;
}
private List<PdfTextElement> FindElementsContainingText(string text, bool fuzzy)
{
List<PdfTextElement> list = new List<PdfTextElement>();
string matchingText = fuzzy ? SimplifyText(text) : text;
foreach (PdfTextElement elem in _textElements)
{
string elemText = fuzzy ? SimplifyText(elem.VisibleText) : elem.VisibleText;
if (elemText.Contains(matchingText))
{
list.Add(elem);
}
}
return list;
}
private bool TextElementVerticalIntersection(PdfTextElement elem1, double elem2X1, double elem2X2)
{
double elem1X1 = elem1.GetX();
double elem1X2 = elem1.GetX() + elem1.VisibleWidth;
return elem1X2 >= elem2X1 && elem2X2 >= elem1X1;
}
private bool TextElementVerticalIntersection(PdfTextElement elem1, PdfTextElement elem2)
{
double elem1X1 = elem1.GetX();
@@ -391,36 +207,47 @@ namespace VAR.PdfTools
_graphicsMatrixStack.Add(_graphicsMatrix.Copy());
}
private void OpSetGraphMatrix(double a, double b, double c, double d, double e, double f)
{
_graphicsMatrix.Set(a, b, c, d, e, f);
}
private void OpPopGraphState()
{
_graphicsMatrix = _graphicsMatrixStack[_graphicsMatrixStack.Count - 1];
_graphicsMatrixStack.RemoveAt(_graphicsMatrixStack.Count - 1);
}
private void OpSetGraphMatrix(double a, double b, double c, double d, double e, double f)
{
_graphicsMatrix.Set(a, b, c, d, e, f);
}
private void OpBeginText()
{
_textMatrix.Idenity();
_textMatrixCurrent.Idenity();
inText = true;
}
private void OpEndText()
{
FlushTextElementSoft();
FlushTextElement();
inText = false;
}
private void OpTextFont(string fontName, double size)
{
FlushTextElementSoft();
FlushTextElement();
_font = _page.Fonts[fontName];
_fontSize = size;
}
private void OpTextCharSpacing(double charSpacing)
{
_charSpacing = charSpacing;
}
private void OpTextWordSpacing(double wordSpacing)
{
_wordSpacing = wordSpacing;
}
private void OpTextLeading(double textLeading)
{
_textLeading = textLeading;
@@ -433,6 +260,7 @@ namespace VAR.PdfTools
newMatrix.Matrix[0, 2] = x;
newMatrix.Matrix[1, 2] = y;
_textMatrix = newMatrix.Multiply(_textMatrix);
_textMatrixCurrent = _textMatrix.Copy();
}
private void OpTextLineFeed()
@@ -442,35 +270,10 @@ namespace VAR.PdfTools
private void OpSetTextMatrix(double a, double b, double c, double d, double e, double f)
{
double halfSpaceWidth = 0;
double horizontalDelta = 0;
Matrix3x3 newMatrix = new Matrix3x3(a, b, c, d, e, f);
if (_font != null)
{
halfSpaceWidth = _font.GetCharWidth(' ') * _fontSize;
}
horizontalDelta = (_textWidth + halfSpaceWidth);
if (_textMatrix.IsCollinear(newMatrix, horizontalDelta: horizontalDelta))
{
return;
}
if (_currentTextElement != null)
{
if (_currentTextElement.Font != null)
{
halfSpaceWidth = _currentTextElement.Font.GetCharWidth(' ') * _currentTextElement.FontSize;
}
horizontalDelta = (_currentTextElement.VisibleWidth + halfSpaceWidth);
if (_currentTextElement.Matrix.IsCollinear(newMatrix, horizontalDelta: horizontalDelta))
{
FlushTextElementSoft();
_textMatrix = newMatrix;
return;
}
}
FlushTextElement();
_textMatrix = newMatrix;
_textMatrixCurrent = _textMatrix.Copy();
}
private void OpTextPut(string text)
@@ -481,7 +284,12 @@ namespace VAR.PdfTools
{
foreach (char c in text)
{
_textWidth += _font.GetCharWidth(c) * _fontSize;
string realChar = _font.ToUnicode(c);
if (realChar == "\0") { continue; }
double charWidth = _font.GetCharWidth(c) * _fontSize;
_listCharacters.Add(new PdfCharElement { Char = _font.ToUnicode(c), Displacement = _textWidth, Width = charWidth });
_textWidth += charWidth;
_textWidth += ((c == 0x20) ? _wordSpacing : _charSpacing);
}
}
}
@@ -491,17 +299,16 @@ namespace VAR.PdfTools
if (inText == false) { return; }
foreach (IPdfElement elem in array.Values)
{
if(elem is PdfString)
if (elem is PdfString)
{
OpTextPut(((PdfString)elem).Value);
}
else if(elem is PdfInteger || elem is PdfReal)
else if (elem is PdfInteger || elem is PdfReal)
{
double spacing = PdfElementUtils.GetReal(elem, 0);
// FIXME: Apply correctly spacing
//_textWidth += spacing;
_textWidth -= (spacing / 1000) * _fontSize;
}
else if(elem is PdfArray)
else if (elem is PdfArray)
{
OpTextPutMultiple(((PdfArray)elem));
}
@@ -512,13 +319,17 @@ namespace VAR.PdfTools
#region Private methods
private void ProcessPage()
private void ProcessPageContent()
{
int unknowCount = 0;
int lineCount = 0;
int strokeCount = 0;
int pathCount = 0;
for (int i = 0; i < _page.ContentActions.Count; i++)
{
PdfContentAction action = _page.ContentActions[i];
// Graphics Operations
// Special graphics state
if (action.Token == "q")
{
OpPushGraphState();
@@ -549,11 +360,13 @@ namespace VAR.PdfTools
}
else if (action.Token == "Tc")
{
// FIXME: Char spacing
double charSpacing = PdfElementUtils.GetReal(action.Parameters[0], 0);
OpTextCharSpacing(charSpacing);
}
else if (action.Token == "Tw")
{
// FIXME: Word spacing
double wordSpacing = PdfElementUtils.GetReal(action.Parameters[0], 0);
OpTextWordSpacing(wordSpacing);
}
else if (action.Token == "Tz")
{
@@ -561,7 +374,7 @@ namespace VAR.PdfTools
}
else if (action.Token == "Tf")
{
string fontName = ((PdfName)action.Parameters[0]).Value;
string fontName = PdfElementUtils.GetString(action.Parameters[0], string.Empty);
double fontSize = PdfElementUtils.GetReal(action.Parameters[1], 0);
OpTextFont(fontName, fontSize);
}
@@ -607,18 +420,23 @@ namespace VAR.PdfTools
}
else if (action.Token == "Tj")
{
OpTextPut(((PdfString)action.Parameters[0]).Value);
string text = PdfElementUtils.GetString(action.Parameters[0], string.Empty);
OpTextPut(text);
}
else if (action.Token == "'")
{
string text = PdfElementUtils.GetString(action.Parameters[0], string.Empty);
OpTextLineFeed();
OpTextPut(((PdfString)action.Parameters[0]).Value);
OpTextPut(text);
}
else if (action.Token == "\"")
{
double wordSpacing = PdfElementUtils.GetReal(action.Parameters[0], 0);
double charSpacing = PdfElementUtils.GetReal(action.Parameters[1], 0);
OpTextPut(((PdfString)action.Parameters[2]).Value);
string text = PdfElementUtils.GetString(action.Parameters[0], string.Empty);
OpTextCharSpacing(charSpacing);
OpTextWordSpacing(wordSpacing);
OpTextPut(text);
}
else if (action.Token == "TJ")
{
@@ -664,6 +482,45 @@ namespace VAR.PdfTools
{
// FIXME: Interpret this
}
else if (action.Token == "m")
{
// FIXME: Interpret this "moveto: Begin new subpath"
}
else if (action.Token == "l")
{
// FIXME: Interpret this "lineto: Append straight line segment to path"
lineCount++;
}
else if (action.Token == "h")
{
// FIXME: Interpret this "closepath: Close subpath"
pathCount++;
}
else if (action.Token == "W")
{
// FIXME: Interpret this "clip: Set clipping path using nonzero winding number rule"
}
else if (action.Token == "W*")
{
// FIXME: Interpret this "eoclip: Set clipping path using even-odd rule"
}
else if (action.Token == "w")
{
// FIXME: Interpret this "setlinewidth: Set line width"
}
else if (action.Token == "G")
{
// FIXME: Interpret this "setgray: Set gray level for stroking operations"
}
else if (action.Token == "S")
{
// FIXME: Interpret this "stroke: Stroke path"
strokeCount++;
}
else if (action.Token == "M")
{
// FIXME: Interpret this "setmiterlimit: Set miter limit"
}
else
{
unknowCount++;
@@ -672,23 +529,157 @@ namespace VAR.PdfTools
FlushTextElement();
}
private void JoinTextElements()
{
var textElementsCondensed = new List<PdfTextElement>();
while (_textElements.Count > 0)
{
PdfTextElement elem = _textElements[0];
_textElements.Remove(elem);
double blockY = elem.GetY();
double blockXMin = elem.GetX();
double blockXMax = blockXMin + elem.VisibleWidth;
// Prepare first neighbour
var textElementNeighbours = new List<PdfTextElement>();
textElementNeighbours.Add(elem);
// Search Neighbours
int i = 0;
while (i < _textElements.Count)
{
PdfTextElement neighbour = _textElements[i];
if (neighbour.Font != elem.Font || neighbour.FontSize != elem.FontSize)
{
i++;
continue;
}
double neighbourY = neighbour.GetY();
if (Math.Abs(neighbourY - blockY) > 0.001) { i++; continue; }
double maxWidth = neighbour.MaxWidth();
double neighbourXMin = neighbour.GetX();
double neighbourXMax = neighbourXMin + neighbour.VisibleWidth;
double auxBlockXMin = blockXMin - maxWidth;
double auxBlockXMax = blockXMax + maxWidth;
if (auxBlockXMax >= neighbourXMin && neighbourXMax >= auxBlockXMin)
{
_textElements.Remove(neighbour);
textElementNeighbours.Add(neighbour);
if (blockXMax < neighbourXMax) { blockXMax = neighbourXMax; }
if (blockXMin > neighbourXMin) { blockXMin = neighbourXMin; }
i = 0;
continue;
}
i++;
}
if (textElementNeighbours.Count == 1)
{
textElementsCondensed.Add(elem);
continue;
}
// Join neighbours
var chars = new List<PdfCharElement>();
foreach (PdfTextElement neighbour in textElementNeighbours)
{
double neighbourXMin = neighbour.GetX();
foreach (PdfCharElement c in neighbour.Characters)
{
chars.Add(new PdfCharElement
{
Char = c.Char,
Displacement = (c.Displacement + neighbourXMin) - blockXMin,
Width = c.Width,
});
}
}
chars = chars.OrderBy(c => c.Displacement).ToList();
var sbText = new StringBuilder();
foreach (PdfCharElement c in chars)
{
sbText.Append(c.Char);
}
PdfTextElement blockElem = new PdfTextElement
{
Font = null,
FontSize = elem.FontSize,
Matrix = elem.Matrix.Copy(),
RawText = sbText.ToString(),
VisibleText = sbText.ToString(),
VisibleWidth = blockXMax - blockXMin,
VisibleHeight = elem.VisibleHeight,
Characters = chars,
Childs = textElementNeighbours,
};
blockElem.Matrix.Matrix[0, 2] = blockXMin;
textElementsCondensed.Add(blockElem);
}
_textElements = textElementsCondensed;
}
private void SplitTextElements()
{
var textElementsSplitted = new List<PdfTextElement>();
while (_textElements.Count > 0)
{
PdfTextElement elem = _textElements[0];
_textElements.Remove(elem);
double maxWidth = elem.MaxWidth();
int prevBreak = 0;
for (int i = 1; i < elem.Characters.Count; i++)
{
double prevCharEnd = elem.Characters[i - 1].Displacement + elem.Characters[i - 1].Width;
double charSeparation = elem.Characters[i].Displacement - prevCharEnd;
if (charSeparation > maxWidth)
{
PdfTextElement partElem = elem.SubPart(prevBreak, i);
textElementsSplitted.Add(partElem);
prevBreak = i;
}
}
if (prevBreak == 0)
{
textElementsSplitted.Add(elem);
continue;
}
PdfTextElement lastElem = elem.SubPart(prevBreak, elem.Characters.Count);
textElementsSplitted.Add(lastElem);
}
_textElements = textElementsSplitted;
}
#endregion
#region Public methods
public List<string> GetColumn(string column)
public Rect GetRect()
{
return GetColumn(column, true);
Rect rect = null;
foreach (PdfTextElement textElement in _textElements)
{
Rect elementRect = textElement.GetRect();
if (rect == null) { rect = elementRect; }
rect.Add(elementRect);
}
return rect;
}
public List<string> GetColumn(string column, bool fuzzy)
public PdfTextElementColumn GetColumn(string column, bool fuzzy = true)
{
PdfTextElement columnHead = FindElementByText(column, fuzzy);
if(columnHead == null)
if (columnHead == null)
{
return new List<string>();
return PdfTextElementColumn.Empty;
}
double headY = columnHead.GetY();
double headY = columnHead.GetY() - columnHead.VisibleHeight;
double headX1 = columnHead.GetX();
double headX2 = headX1 + columnHead.VisibleWidth;
@@ -697,7 +688,7 @@ namespace VAR.PdfTools
double extentX2 = double.MaxValue;
foreach (PdfTextElement elem in _textElements)
{
if(elem == columnHead){continue;}
if (elem == columnHead) { continue; }
if (TextElementHorizontalIntersection(columnHead, elem) == false) { continue; }
double elemX1 = elem.GetX();
double elemX2 = elemX1 + elem.VisibleWidth;
@@ -716,14 +707,20 @@ namespace VAR.PdfTools
extentX2 = elemX1;
}
}
}
PdfTextElementColumn columnData = GetColumn(columnHead, headY, headX1, headX2, extentX1, extentX2);
return columnData;
}
public PdfTextElementColumn GetColumn(PdfTextElement columnHead, double headY, double headX1, double headX2, double extentX1, double extentX2)
{
// Get all the elements that intersects vertically, are down and sort results
var columnDataRaw = new List<PdfTextElement>();
foreach (PdfTextElement elem in _textElements)
{
if (TextElementVerticalIntersection(columnHead, elem) == false) { continue; }
if (TextElementVerticalIntersection(elem, headX1, headX2) == false) { continue; }
// Only intems down the column
double elemY = elem.GetY();
@@ -733,32 +730,94 @@ namespace VAR.PdfTools
}
columnDataRaw = columnDataRaw.OrderByDescending(elem => elem.GetY()).ToList();
// Only items completelly inside extents, amd break on the first element outside
var columnData = new List<PdfTextElement>();
// Only items completelly inside extents, try spliting big elements and break on big elements that can't be splitted
var columnElements = new List<PdfTextElement>();
foreach (PdfTextElement elem in columnDataRaw)
{
double elemX1 = elem.GetX();
double elemX2 = elemX1 + elem.VisibleWidth;
if (elemX1 < extentX1 || elemX2 > extentX2) { break; }
columnData.Add(elem);
// Add elements completely inside
if (elemX1 > extentX1 && elemX2 < extentX2)
{
columnElements.Add(elem);
continue;
}
// Try to split elements intersecting extents of the column
double maxSpacing = elem.Characters.Average(c => c.Width) / 10;
int indexStart = 0;
int indexEnd = elem.Characters.Count - 1;
bool indexStartValid = true;
bool indexEndValid = true;
if (elemX1 < extentX1)
{
// Search best start
int index = 0;
double characterPosition = elemX1 + elem.Characters[index].Displacement;
while (characterPosition < extentX1 && index < (elem.Characters.Count - 1))
{
index++;
characterPosition = elemX1 + elem.Characters[index].Displacement;
}
double spacing = elem.GetCharacterPreviousSpacing(index);
while (spacing < maxSpacing && index < (elem.Characters.Count - 1))
{
index++;
spacing = elem.GetCharacterPreviousSpacing(index);
}
if (spacing < maxSpacing) { indexStartValid = false; }
indexStart = index;
}
if (elemX2 > extentX2)
{
// Search best end
int index = elem.Characters.Count - 1;
double characterPosition = elemX1 + elem.Characters[index].Displacement + elem.Characters[index].Width;
while (characterPosition > extentX2 && index > 0)
{
index--;
characterPosition = elemX1 + elem.Characters[index].Displacement + elem.Characters[index].Width;
}
double spacing = elem.GetCharacterPrecedingSpacing(index);
while (spacing < maxSpacing && index > 0)
{
index--;
spacing = elem.GetCharacterPrecedingSpacing(index);
}
if (spacing < maxSpacing) { indexEndValid = false; }
indexEnd = index;
}
// Break when there is no good split, spaning all extent
if (indexStartValid == false && indexEndValid == false) { break; }
// Continue when only one of the sides is invalid. (outside elements intersecting extents of the column)
if (indexStartValid == false || indexEndValid == false) { continue; }
// Add splitted element
columnElements.Add(elem.SubPart(indexStart, indexEnd + 1));
}
var columnData = new PdfTextElementColumn(columnHead, columnElements, headY, extentX1, extentX2);
return columnData;
}
public List<string> GetColumnAsStrings(string column, bool fuzzy = true)
{
PdfTextElementColumn columnData = GetColumn(column, fuzzy);
// Emit result
var result = new List<string>();
foreach (PdfTextElement elem in columnData)
foreach (PdfTextElement elem in columnData.Elements)
{
result.Add(elem.VisibleText);
}
return result;
}
public string GetField(string field)
{
return GetField(field, true);
}
public string GetField(string field, bool fuzzy)
public string GetFieldAsString(string field, bool fuzzy = true)
{
PdfTextElement fieldTitle = FindElementByText(field, fuzzy);
if (fieldTitle == null)
@@ -778,7 +837,7 @@ namespace VAR.PdfTools
fieldData.Add(elem);
}
if(fieldData.Count == 0)
if (fieldData.Count == 0)
{
return null;
}
@@ -786,19 +845,10 @@ namespace VAR.PdfTools
return fieldData.OrderBy(elem => elem.GetX()).FirstOrDefault().VisibleText;
}
public bool HasText(string text)
public bool HasText(string text, bool fuzzy = true)
{
return HasText(text, true);
}
public bool HasText(string text, bool fuzzy)
{
PdfTextElement fieldTitle = FindElementByText(text, fuzzy);
if (fieldTitle == null)
{
return false;
}
return true;
List<PdfTextElement> list = FindElementsContainingText(text, fuzzy);
return (list.Count > 0);
}
#endregion

View File

@@ -6,9 +6,9 @@ using System.Runtime.InteropServices;
[assembly: AssemblyConfiguration("")]
[assembly: AssemblyCompany("VAR")]
[assembly: AssemblyProduct("VAR.PdfTools")]
[assembly: AssemblyCopyright("Copyright © VAR 2016-2017")]
[assembly: AssemblyCopyright("Copyright © VAR 2016-2019")]
[assembly: AssemblyTrademark("")]
[assembly: AssemblyCulture("")]
[assembly: ComVisible(false)]
[assembly: Guid("eb7e003a-6a95-4002-809f-926c7c8a11e9")]
[assembly: AssemblyVersion("1.2.*")]
[assembly: AssemblyVersion("1.6.0.*")]

View File

@@ -1,63 +0,0 @@
<?xml version="1.0" encoding="utf-8"?>
<Project ToolsVersion="4.0" DefaultTargets="Build" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
<Import Project="$(MSBuildExtensionsPath)\$(MSBuildToolsVersion)\Microsoft.Common.props" Condition="Exists('$(MSBuildExtensionsPath)\$(MSBuildToolsVersion)\Microsoft.Common.props')" />
<PropertyGroup>
<Configuration Condition=" '$(Configuration)' == '' ">Debug</Configuration>
<Platform Condition=" '$(Platform)' == '' ">AnyCPU</Platform>
<ProjectGuid>{EB7E003A-6A95-4002-809F-926C7C8A11E9}</ProjectGuid>
<OutputType>Library</OutputType>
<AppDesignerFolder>Properties</AppDesignerFolder>
<RootNamespace>VAR.PdfTools</RootNamespace>
<AssemblyName>VAR.PdfTools</AssemblyName>
<TargetFrameworkVersion>v3.5</TargetFrameworkVersion>
<FileAlignment>512</FileAlignment>
<TargetFrameworkProfile />
<ProductVersion>10.0.0</ProductVersion>
<SchemaVersion>2.0</SchemaVersion>
</PropertyGroup>
<PropertyGroup Condition=" '$(Configuration)|$(Platform)' == 'Debug|AnyCPU' ">
<DebugSymbols>true</DebugSymbols>
<DebugType>full</DebugType>
<Optimize>false</Optimize>
<OutputPath>bin\Debug\</OutputPath>
<DefineConstants>DEBUG;TRACE</DefineConstants>
<ErrorReport>prompt</ErrorReport>
<WarningLevel>4</WarningLevel>
</PropertyGroup>
<PropertyGroup Condition=" '$(Configuration)|$(Platform)' == 'Release|AnyCPU' ">
<DebugType>pdbonly</DebugType>
<Optimize>true</Optimize>
<OutputPath>bin\Release\</OutputPath>
<DefineConstants>TRACE</DefineConstants>
<ErrorReport>prompt</ErrorReport>
<WarningLevel>4</WarningLevel>
</PropertyGroup>
<ItemGroup>
<Reference Include="System" />
<Reference Include="System.Core" />
<Reference Include="System.Xml.Linq" />
<Reference Include="System.Data.DataSetExtensions" />
<Reference Include="System.Data" />
<Reference Include="System.Xml" />
</ItemGroup>
<ItemGroup>
<Compile Include="PdfContentAction.cs" />
<Compile Include="PdfDocument.cs" />
<Compile Include="PdfDocumentPage.cs" />
<Compile Include="PdfElements.cs" />
<Compile Include="PdfFilters.cs" />
<Compile Include="PdfFont.cs" />
<Compile Include="PdfParser.cs" />
<Compile Include="PdfStandar14FontMetrics.cs" />
<Compile Include="PdfTextExtractor.cs" />
<Compile Include="Properties\AssemblyInfo.cs" />
</ItemGroup>
<Import Project="$(MSBuildToolsPath)\Microsoft.CSharp.targets" />
<!-- To modify your build process, add your task inside one of the targets below and uncomment it.
Other similar extension points exist, see Microsoft.Common.targets.
<Target Name="BeforeBuild">
</Target>
<Target Name="AfterBuild">
</Target>
-->
</Project>

View File

@@ -22,6 +22,7 @@
<ErrorReport>prompt</ErrorReport>
<WarningLevel>4</WarningLevel>
<TargetFrameworkVersion>v4.6.1</TargetFrameworkVersion>
<LangVersion>6</LangVersion>
</PropertyGroup>
<PropertyGroup Condition=" '$(Configuration)|$(Platform)' == 'Release .Net 4.6.1|AnyCPU' ">
<DebugType>pdbonly</DebugType>
@@ -54,22 +55,41 @@
<ItemGroup>
<Reference Include="System" />
<Reference Include="System.Core" />
<Reference Include="System.Drawing" />
<Reference Include="System.Xml.Linq" />
<Reference Include="System.Data.DataSetExtensions" />
<Reference Include="System.Data" />
<Reference Include="System.Xml" />
</ItemGroup>
<ItemGroup>
<Compile Include="Maths\Matrix3x3.cs" />
<Compile Include="Maths\Rect.cs" />
<Compile Include="PdfContentAction.cs" />
<Compile Include="PdfDocument.cs" />
<Compile Include="PdfDocumentPage.cs" />
<Compile Include="PdfElements.cs" />
<Compile Include="PdfElements\IPdfElement.cs" />
<Compile Include="PdfElements\PdfArray.cs" />
<Compile Include="PdfElements\PdfBoolean.cs" />
<Compile Include="PdfElements\PdfDictionary.cs" />
<Compile Include="PdfElements\PdfElementTypes.cs" />
<Compile Include="PdfElements\PdfElementUtils.cs" />
<Compile Include="PdfFilters.cs" />
<Compile Include="PdfFont.cs" />
<Compile Include="PdfElements\PdfInteger.cs" />
<Compile Include="PdfElements\PdfName.cs" />
<Compile Include="PdfElements\PdfNull.cs" />
<Compile Include="PdfElements\PdfObject.cs" />
<Compile Include="PdfElements\PdfObjectReference.cs" />
<Compile Include="PdfElements\PdfReal.cs" />
<Compile Include="PdfElements\PdfStream.cs" />
<Compile Include="PdfElements\PdfString.cs" />
<Compile Include="PdfParser.cs" />
<Compile Include="PdfPageRenderer.cs" />
<Compile Include="PdfStandar14FontMetrics.cs" />
<Compile Include="PdfTextElement.cs" />
<Compile Include="PdfTextExtractor.cs" />
<Compile Include="Properties\AssemblyInfo.cs" />
<Compile Include="Maths\Vector3D.cs" />
</ItemGroup>
<ItemGroup>
<None Include="NuGet\keep.txt" />