63 Commits
1_1 ... Main

Author SHA1 Message Date
062f80f84c Ignore .issues directory 2025-05-11 20:00:30 +02:00
98a6b8e746 Update README.md 2021-06-17 03:05:38 +02:00
a46bf67f81 Update README.md 2021-06-17 01:36:26 +02:00
6faf2d0fbe Rearrangements
Migrate to Sdk projects.
Bump version 1.6.1.
Migrate to NetStandar2.0.
2021-06-17 01:31:53 +02:00
70c6272113 Merge branch 'master' of https://github.com/Kableado/VAR.PdfTools 2019-12-06 11:17:12 +01:00
3108d03a75 Rename all the "FIXME"s to "TODO"s.
For better visibility on Visual Studio.
2019-12-06 11:13:28 +01:00
c651a22209 Ignore official documentation and testing files.
To avoid copyright problems.
2019-12-06 11:12:40 +01:00
f946a1bc1a FrmPdfInfo: Better memory handling. 2019-10-28 09:14:01 +01:00
cfd8c37ab8 Rect: Remove unused usings. 2019-10-28 03:06:36 +01:00
d5d843014a Bump version 1.6.0 2019-10-28 02:58:59 +01:00
b9750745bc FrmPdfInfo: Allow raw coordinates input for GetColumn. 2019-10-28 02:58:28 +01:00
c8c7e32acc PdfTextExtractor: Better column extraction, spliting big TextElements. 2019-10-28 02:57:42 +01:00
781f212289 PdfPageRenderer: Fix Rendering of null pages. 2019-10-28 00:43:50 +01:00
8a966049f6 PdfPageRenderer: Adjust column rendering. 2019-10-27 22:40:52 +01:00
80ab9b9ff3 FrmPdfInfo: Better configuration handling with the Configuration class. 2019-10-27 22:36:54 +01:00
9af363529c PdfTextExtractor: Get results as PdfTextElementColumn, for debugging purposes. 2019-10-27 18:45:13 +01:00
386b38bd21 PdfPageRenderer: Refactor using Rect. 2019-10-27 13:12:11 +01:00
53d07db9c0 Use Rect class for size definition of TextElements and pages. 2019-10-27 13:11:40 +01:00
9bc7854b48 README.md: Adjust year on LICENSE section. 2019-10-27 12:43:32 +01:00
77a5cd1b0e PdfTextExtractor: Adjust public method names. 2019-10-27 12:40:51 +01:00
b6611b6285 Put class PdfTextElement in his own file. 2019-10-27 12:37:16 +01:00
7badc8e4b1 PdfPageRenderer: Better rendering of character size. 2019-10-27 09:59:46 +01:00
203f30e55c FrmPdfInfo: Pages selector.
A simple textbox where you can put page numbers separated by comma. And ranges joined by dash.
2019-10-27 09:59:08 +01:00
c3967dd439 Set C# lang version to 6.0. 2019-10-27 09:57:24 +01:00
da8b512c1b Move page rendering code to PdfPageRenderer. 2019-10-27 08:58:34 +01:00
beb3b931ea Bump version 1.5.2 2019-10-21 13:09:13 +02:00
8806020036 ignore ".vs" directory. 2019-10-21 13:08:44 +02:00
f3b7cd1b0d PdfTextExtractor: Better joining and splitting heuristics. 2019-10-21 13:08:19 +02:00
33f9723ac6 Bump version: 1.5.1 2017-11-14 13:34:21 +01:00
13ba41f851 PdfTextExtractor: Change Join and Split logic to use max character width of the elements. 2017-11-02 13:27:38 +01:00
06de734658 Bump version: 1.5 2017-10-11 16:52:29 +02:00
901d7e62ca FrmPdfInfo: Change test fields to have multiple actions. 2017-10-11 16:50:58 +02:00
631f8c34b2 PdfTextExtractor: Split text elements with big separations between characters 2017-10-11 16:49:57 +02:00
7ac6b19331 FrmPdfInfo: Show all fonts used on any text element 2017-10-11 16:48:01 +02:00
34e7424273 PdfCharElement: Width attribute 2017-10-11 16:47:10 +02:00
6b8bbc367f Fixes to character size calculations. 2017-10-11 16:44:42 +02:00
6dfc248b9a FrmPdfInfo.Render: Adjust scale. 2017-10-11 10:59:21 +02:00
f3aca2ffa5 Add placeholders for more commands. 2017-10-11 09:39:13 +02:00
7ba320a22c Bump version: 1.4 2017-08-02 13:30:03 +02:00
1edddf17b1 Fix JoinTextElements to only join text elements near m-size. 2017-08-02 13:28:20 +02:00
62120898d2 Bump version: 1.3 2017-06-27 01:10:26 +02:00
dc1b9bc7ca PdfTextExtractor.JoinTextElements: Joins PdfTextElements when they are nearby. 2017-06-27 01:09:50 +02:00
d1ea41474b Reorder Code. 2017-06-27 01:03:27 +02:00
b11a2ac393 Simplify PdfFont.ParseSizes. 2017-06-26 22:17:46 +02:00
36fb20eb2e Remove VisualStudio2015 incompatibilities (Remove C#7.0-isms) 2017-06-26 08:25:30 +02:00
15fbec2470 FrmPdfInfo: Improve rendering, making more accurate the location of the glyphs. 2017-06-26 01:49:48 +02:00
52841de51b PdfFont: Convert "Zero" widths to default 0.5 2017-06-26 01:46:05 +02:00
d4c4615684 PdfTextExtractor: Rework text position calculations. 2017-06-26 01:45:34 +02:00
ae76cab45d PdfTextExtractor: Fix HasText method to match contained text, instead of full PdfTextElements. 2017-06-25 12:38:27 +02:00
8dc54105fd Refactorings 2017-06-25 12:03:41 +02:00
3469593a2a VAR.PdfTools.Workbench: Crude rendering of the parsed PDF. 2017-06-25 02:21:37 +02:00
ebff0c2028 Remove Visual Studio 2010 support 2017-06-11 16:29:24 +02:00
2fd074e041 Add Visual Studio 2017 support to NuGet Generation script. 2017-06-11 16:16:55 +02:00
4223619802 Set "Times-Roman" as default basefont. 2017-06-11 16:05:17 +02:00
771305f5d0 Refactor PdfFont creator. 2017-06-11 16:04:40 +02:00
90c7c5db92 Fix NuGet buid script 2017-04-13 08:23:19 +02:00
b474fc1257 Bump version 1.2 2017-04-12 22:51:12 +02:00
a5879ec9c2 PdfTextExtractor: Apply simple heuristics to join different text blocks checking matrix "collinearity". 2017-04-12 22:49:00 +02:00
0938553510 Add NuGet building files 2017-02-13 21:09:09 +01:00
c1fd18f355 PdfTextExtractor: Fix text size calculation 2016-09-07 09:06:20 +02:00
c0a8de2617 Merge branch 'master' of https://github.com/Kableado/VAR.PdfTools 2016-09-06 17:42:55 +02:00
4d92f144f8 PdfParser: Parse inline images 2016-09-06 17:42:13 +02:00
c388e9daae Fixes on project files to be compatible with Monodevelop 2016-07-04 07:15:09 +02:00
45 changed files with 2045 additions and 1088 deletions

7
.gitignore vendored
View File

@@ -25,3 +25,10 @@ Thumbs.db
obj/ obj/
[Rr]elease*/ [Rr]elease*/
_ReSharper*/ _ReSharper*/
*.userprefs
*.nupkg
.vs
PDFTests
Doc
/.issues/

View File

@@ -1,6 +1,6 @@
The MIT License (MIT) The MIT License (MIT)
Copyright (c) 2014-2015 Valeriano Alfonso Rodriguez Copyright (c) 2016-2019 Valeriano Alfonso Rodriguez
Permission is hereby granted, free of charge, to any person obtaining a copy Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal of this software and associated documentation files (the "Software"), to deal

View File

@@ -5,33 +5,41 @@
### VAR.PdfTools ### VAR.PdfTools
Add the resulting assembly as reference in your projects, and this line on code: Add the resulting assembly as reference in your projects, and this line on code:
```csharp
using VAR.PdfTools; using VAR.PdfTools;
```
Then extract the contents of a data column using: Then extract the contents of a data column using:
```csharp
var columnData = new List<string>(); var columnData = new List<string>();
PdfDocument doc = PdfDocument.Load("document.pdf"); PdfDocument doc = PdfDocument.Load("document.pdf");
foreach (PdfDocumentPage page in doc.Pages) foreach (PdfDocumentPage page in doc.Pages)
{ {
PdfTextExtractor extractor = new PdfTextExtractor(page); PdfTextExtractor extractor = new PdfTextExtractor(page);
columnData.AddRange(extractor.GetColumn("Column")); columnData.AddRange(extractor.GetColumnAsStrings("Column"));
} }
```
Or the content of a field (text on the right of the indicated text): Or the content of a field (text on the right of the indicated text):
```csharp
var fieldData = new List<string>(); var fieldData = new List<string>();
PdfDocument doc = PdfDocument.Load("document.pdf"); PdfDocument doc = PdfDocument.Load("document.pdf");
foreach (PdfDocumentPage page in doc.Pages) foreach (PdfDocumentPage page in doc.Pages)
{ {
PdfTextExtractor extractor = new PdfTextExtractor(page); PdfTextExtractor extractor = new PdfTextExtractor(page);
fieldData.Add(extractor.GetField(txtFieldName.Text)); fieldData.Add(extractor.GetFieldAsString(txtFieldName.Text));
} }
```
### VAR.PdfTools.Workbench ### VAR.PdfTools.Workbench
It is a simple Windows.Forms application, to test basic funcitionallity of the library. It is a simple Windows.Forms application, to test basic funcitionallity of the library.
## Building ## Building
A Visual Studio 2015 and 2010 solutions are provided. Simply, click build on the IDE. A Visual Studio solution is provided. Simply, click build on the IDE.
The build generates a DLL and a Nuget package.
## Contributing ## Contributing
1. Fork it! 1. Fork it!
@@ -43,26 +51,3 @@ A Visual Studio 2015 and 2010 solutions are provided. Simply, click build on the
## Credits ## Credits
* Valeriano Alfonso Rodriguez. * Valeriano Alfonso Rodriguez.
## License
The MIT License (MIT)
Copyright (c) 2014-2015 Valeriano Alfonso Rodriguez
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

View File

@@ -1,26 +0,0 @@
Microsoft Visual Studio Solution File, Format Version 11.00
# Visual Studio 2010
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "VAR.PdfTools.Net35", "VAR.PdfTools\VAR.PdfTools.Net35.csproj", "{EB7E003A-6A95-4002-809F-926C7C8A11E9}"
EndProject
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "VAR.PdfTools.Workbench.Net35", "VAR.PdfTools.Workbench\VAR.PdfTools.Workbench.Net35.csproj", "{A5825D8E-9F81-49E0-B610-8AE5E46D02EA}"
EndProject
Global
GlobalSection(SolutionConfigurationPlatforms) = preSolution
Debug|Any CPU = Debug|Any CPU
Release|Any CPU = Release|Any CPU
EndGlobalSection
GlobalSection(ProjectConfigurationPlatforms) = postSolution
{EB7E003A-6A95-4002-809F-926C7C8A11E9}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
{EB7E003A-6A95-4002-809F-926C7C8A11E9}.Debug|Any CPU.Build.0 = Debug|Any CPU
{EB7E003A-6A95-4002-809F-926C7C8A11E9}.Release|Any CPU.ActiveCfg = Release|Any CPU
{EB7E003A-6A95-4002-809F-926C7C8A11E9}.Release|Any CPU.Build.0 = Release|Any CPU
{A5825D8E-9F81-49E0-B610-8AE5E46D02EA}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
{A5825D8E-9F81-49E0-B610-8AE5E46D02EA}.Debug|Any CPU.Build.0 = Debug|Any CPU
{A5825D8E-9F81-49E0-B610-8AE5E46D02EA}.Release|Any CPU.ActiveCfg = Release|Any CPU
{A5825D8E-9F81-49E0-B610-8AE5E46D02EA}.Release|Any CPU.Build.0 = Release|Any CPU
EndGlobalSection
GlobalSection(SolutionProperties) = preSolution
HideSolutionNode = FALSE
EndGlobalSection
EndGlobal

View File

@@ -0,0 +1,117 @@
using System.Collections.Generic;
using System.IO;
using System.Text;
namespace VAR.PdfTools.Workbench
{
public class Configuration
{
private Dictionary<string, string> _configItems = new Dictionary<string, string>();
private static string GetConfigFileName()
{
string location = System.Reflection.Assembly.GetEntryAssembly().Location;
string path = Path.GetDirectoryName(location);
string filenameWithoutExtension = Path.GetFileNameWithoutExtension(location);
string configFile = string.Format("{0}/{1}.cfg", path, filenameWithoutExtension);
return configFile;
}
private static string[] GetConfigurationLines()
{
string configFile = GetConfigFileName();
string[] config;
if (File.Exists(configFile) == false)
{
config = new string[0];
}
else
{
config = File.ReadAllLines(configFile);
}
return config;
}
public void Load()
{
_configItems.Clear();
string[] configLines = GetConfigurationLines();
foreach (string configLine in configLines)
{
int idxSplit = configLine.IndexOf('|');
if (idxSplit < 0) { continue; }
string configName = configLine.Substring(0, idxSplit);
string configData = configLine.Substring(idxSplit + 1);
if (_configItems.ContainsKey(configName))
{
_configItems[configName] = configData;
}
else
{
_configItems.Add(configName, configData);
}
}
}
public void Save()
{
StringBuilder sbConfig = new StringBuilder();
foreach (KeyValuePair<string, string> pair in _configItems)
{
sbConfig.AppendFormat("{0}|{1}\n", pair.Key, pair.Value);
}
string configFileName = GetConfigFileName();
File.WriteAllText(configFileName, sbConfig.ToString());
}
public string Get(string key, string defaultValue)
{
if (_configItems == null) { return defaultValue; }
if (_configItems.ContainsKey(key))
{
return _configItems[key];
}
return defaultValue;
}
public bool Get(string key, bool defaultValue)
{
if (_configItems == null) { return defaultValue; }
if (_configItems.ContainsKey(key))
{
string value = _configItems[key];
return (value == "true");
}
return defaultValue;
}
public void Set(string key, string value)
{
if (_configItems == null) { return; }
if (_configItems.ContainsKey(key))
{
_configItems[key] = value;
}
else
{
_configItems.Add(key, value);
}
}
public void Set(string key, bool value)
{
if (_configItems == null) { return; }
if (_configItems.ContainsKey(key))
{
_configItems[key] = value ? "true" : "false";
}
else
{
_configItems.Add(key, value ? "true" : "false");
}
}
}
}

View File

@@ -34,12 +34,21 @@
this.txtPdfPath = new System.Windows.Forms.TextBox(); this.txtPdfPath = new System.Windows.Forms.TextBox();
this.txtOutput = new System.Windows.Forms.TextBox(); this.txtOutput = new System.Windows.Forms.TextBox();
this.btnProcess = new System.Windows.Forms.Button(); this.btnProcess = new System.Windows.Forms.Button();
this.btnGetColumn = new System.Windows.Forms.Button(); this.btnGetColumn1 = new System.Windows.Forms.Button();
this.txtColumnName = new System.Windows.Forms.TextBox(); this.txtField1 = new System.Windows.Forms.TextBox();
this.txtFieldName = new System.Windows.Forms.TextBox(); this.btnGetField1 = new System.Windows.Forms.Button();
this.btnGetField = new System.Windows.Forms.Button(); this.btnHasText1 = new System.Windows.Forms.Button();
this.txtText = new System.Windows.Forms.TextBox(); this.btnRender = new System.Windows.Forms.Button();
this.btnHasText = new System.Windows.Forms.Button(); this.btnHasText2 = new System.Windows.Forms.Button();
this.btnGetField2 = new System.Windows.Forms.Button();
this.txtField2 = new System.Windows.Forms.TextBox();
this.btnGetColumn2 = new System.Windows.Forms.Button();
this.btnHasText3 = new System.Windows.Forms.Button();
this.btnGetField3 = new System.Windows.Forms.Button();
this.txtField3 = new System.Windows.Forms.TextBox();
this.btnGetColumn3 = new System.Windows.Forms.Button();
this.txtPages = new System.Windows.Forms.TextBox();
this.chkRender = new System.Windows.Forms.CheckBox();
this.SuspendLayout(); this.SuspendLayout();
// //
// lblOutputs // lblOutputs
@@ -108,68 +117,166 @@
this.btnProcess.UseVisualStyleBackColor = true; this.btnProcess.UseVisualStyleBackColor = true;
this.btnProcess.Click += new System.EventHandler(this.btnProcess_Click); this.btnProcess.Click += new System.EventHandler(this.btnProcess_Click);
// //
// btnGetColumn // btnGetColumn1
// //
this.btnGetColumn.Location = new System.Drawing.Point(163, 51); this.btnGetColumn1.Location = new System.Drawing.Point(292, 51);
this.btnGetColumn.Name = "btnGetColumn"; this.btnGetColumn1.Name = "btnGetColumn1";
this.btnGetColumn.Size = new System.Drawing.Size(75, 23); this.btnGetColumn1.Size = new System.Drawing.Size(69, 23);
this.btnGetColumn.TabIndex = 12; this.btnGetColumn1.TabIndex = 12;
this.btnGetColumn.Text = "GetColumn"; this.btnGetColumn1.Text = "GetColumn";
this.btnGetColumn.UseVisualStyleBackColor = true; this.btnGetColumn1.UseVisualStyleBackColor = true;
this.btnGetColumn.Click += new System.EventHandler(this.btnGetColumn_Click); this.btnGetColumn1.Click += new System.EventHandler(this.btnGetColumn1_Click);
// //
// txtColumnName // txtField1
// //
this.txtColumnName.Location = new System.Drawing.Point(15, 53); this.txtField1.Location = new System.Drawing.Point(15, 53);
this.txtColumnName.Name = "txtColumnName"; this.txtField1.Name = "txtField1";
this.txtColumnName.Size = new System.Drawing.Size(142, 20); this.txtField1.Size = new System.Drawing.Size(142, 20);
this.txtColumnName.TabIndex = 13; this.txtField1.TabIndex = 13;
// //
// txtFieldName // btnGetField1
// //
this.txtFieldName.Location = new System.Drawing.Point(15, 82); this.btnGetField1.Location = new System.Drawing.Point(226, 51);
this.txtFieldName.Name = "txtFieldName"; this.btnGetField1.Name = "btnGetField1";
this.txtFieldName.Size = new System.Drawing.Size(142, 20); this.btnGetField1.Size = new System.Drawing.Size(60, 23);
this.txtFieldName.TabIndex = 15; this.btnGetField1.TabIndex = 14;
this.btnGetField1.Text = "GetField";
this.btnGetField1.UseVisualStyleBackColor = true;
this.btnGetField1.Click += new System.EventHandler(this.btnGetField1_Click);
// //
// btnGetField // btnHasText1
// //
this.btnGetField.Location = new System.Drawing.Point(163, 80); this.btnHasText1.Location = new System.Drawing.Point(163, 51);
this.btnGetField.Name = "btnGetField"; this.btnHasText1.Name = "btnHasText1";
this.btnGetField.Size = new System.Drawing.Size(75, 23); this.btnHasText1.Size = new System.Drawing.Size(57, 23);
this.btnGetField.TabIndex = 14; this.btnHasText1.TabIndex = 16;
this.btnGetField.Text = "GetField"; this.btnHasText1.Text = "HasText";
this.btnGetField.UseVisualStyleBackColor = true; this.btnHasText1.UseVisualStyleBackColor = true;
this.btnGetField.Click += new System.EventHandler(this.btnGetField_Click); this.btnHasText1.Click += new System.EventHandler(this.btnHasText1_Click);
// //
// txtText // btnRender
// //
this.txtText.Location = new System.Drawing.Point(15, 111); this.btnRender.Anchor = ((System.Windows.Forms.AnchorStyles)((System.Windows.Forms.AnchorStyles.Top | System.Windows.Forms.AnchorStyles.Right)));
this.txtText.Name = "txtText"; this.btnRender.Location = new System.Drawing.Point(397, 52);
this.txtText.Size = new System.Drawing.Size(142, 20); this.btnRender.Name = "btnRender";
this.txtText.TabIndex = 17; this.btnRender.Size = new System.Drawing.Size(75, 23);
this.btnRender.TabIndex = 18;
this.btnRender.Text = "Render";
this.btnRender.UseVisualStyleBackColor = true;
this.btnRender.Click += new System.EventHandler(this.btnRender_Click);
// //
// btnHasText // btnHasText2
// //
this.btnHasText.Location = new System.Drawing.Point(163, 109); this.btnHasText2.Location = new System.Drawing.Point(163, 80);
this.btnHasText.Name = "btnHasText"; this.btnHasText2.Name = "btnHasText2";
this.btnHasText.Size = new System.Drawing.Size(75, 23); this.btnHasText2.Size = new System.Drawing.Size(57, 23);
this.btnHasText.TabIndex = 16; this.btnHasText2.TabIndex = 22;
this.btnHasText.Text = "HasText"; this.btnHasText2.Text = "HasText";
this.btnHasText.UseVisualStyleBackColor = true; this.btnHasText2.UseVisualStyleBackColor = true;
this.btnHasText.Click += new System.EventHandler(this.btnHasText_Click); this.btnHasText2.Click += new System.EventHandler(this.btnHasText2_Click);
//
// btnGetField2
//
this.btnGetField2.Location = new System.Drawing.Point(226, 80);
this.btnGetField2.Name = "btnGetField2";
this.btnGetField2.Size = new System.Drawing.Size(60, 23);
this.btnGetField2.TabIndex = 21;
this.btnGetField2.Text = "GetField";
this.btnGetField2.UseVisualStyleBackColor = true;
this.btnGetField2.Click += new System.EventHandler(this.btnGetField2_Click);
//
// txtField2
//
this.txtField2.Location = new System.Drawing.Point(15, 82);
this.txtField2.Name = "txtField2";
this.txtField2.Size = new System.Drawing.Size(142, 20);
this.txtField2.TabIndex = 20;
//
// btnGetColumn2
//
this.btnGetColumn2.Location = new System.Drawing.Point(292, 80);
this.btnGetColumn2.Name = "btnGetColumn2";
this.btnGetColumn2.Size = new System.Drawing.Size(69, 23);
this.btnGetColumn2.TabIndex = 19;
this.btnGetColumn2.Text = "GetColumn";
this.btnGetColumn2.UseVisualStyleBackColor = true;
this.btnGetColumn2.Click += new System.EventHandler(this.btnGetColumn2_Click);
//
// btnHasText3
//
this.btnHasText3.Location = new System.Drawing.Point(163, 109);
this.btnHasText3.Name = "btnHasText3";
this.btnHasText3.Size = new System.Drawing.Size(57, 23);
this.btnHasText3.TabIndex = 26;
this.btnHasText3.Text = "HasText";
this.btnHasText3.UseVisualStyleBackColor = true;
this.btnHasText3.Click += new System.EventHandler(this.btnHasText3_Click);
//
// btnGetField3
//
this.btnGetField3.Location = new System.Drawing.Point(226, 109);
this.btnGetField3.Name = "btnGetField3";
this.btnGetField3.Size = new System.Drawing.Size(60, 23);
this.btnGetField3.TabIndex = 25;
this.btnGetField3.Text = "GetField";
this.btnGetField3.UseVisualStyleBackColor = true;
this.btnGetField3.Click += new System.EventHandler(this.btnGetField3_Click);
//
// txtField3
//
this.txtField3.Location = new System.Drawing.Point(15, 111);
this.txtField3.Name = "txtField3";
this.txtField3.Size = new System.Drawing.Size(142, 20);
this.txtField3.TabIndex = 24;
//
// btnGetColumn3
//
this.btnGetColumn3.Location = new System.Drawing.Point(292, 109);
this.btnGetColumn3.Name = "btnGetColumn3";
this.btnGetColumn3.Size = new System.Drawing.Size(69, 23);
this.btnGetColumn3.TabIndex = 23;
this.btnGetColumn3.Text = "GetColumn";
this.btnGetColumn3.UseVisualStyleBackColor = true;
this.btnGetColumn3.Click += new System.EventHandler(this.btnGetColumn3_Click);
//
// txtPages
//
this.txtPages.Anchor = ((System.Windows.Forms.AnchorStyles)((System.Windows.Forms.AnchorStyles.Top | System.Windows.Forms.AnchorStyles.Right)));
this.txtPages.Location = new System.Drawing.Point(397, 82);
this.txtPages.Name = "txtPages";
this.txtPages.Size = new System.Drawing.Size(75, 20);
this.txtPages.TabIndex = 27;
//
// chkRender
//
this.chkRender.AutoSize = true;
this.chkRender.Location = new System.Drawing.Point(292, 138);
this.chkRender.Name = "chkRender";
this.chkRender.Size = new System.Drawing.Size(61, 17);
this.chkRender.TabIndex = 28;
this.chkRender.Text = "Render";
this.chkRender.UseVisualStyleBackColor = true;
// //
// FrmPdfInfo // FrmPdfInfo
// //
this.AutoScaleDimensions = new System.Drawing.SizeF(6F, 13F); this.AutoScaleDimensions = new System.Drawing.SizeF(6F, 13F);
this.AutoScaleMode = System.Windows.Forms.AutoScaleMode.Font; this.AutoScaleMode = System.Windows.Forms.AutoScaleMode.Font;
this.ClientSize = new System.Drawing.Size(484, 461); this.ClientSize = new System.Drawing.Size(484, 461);
this.Controls.Add(this.txtText); this.Controls.Add(this.chkRender);
this.Controls.Add(this.btnHasText); this.Controls.Add(this.txtPages);
this.Controls.Add(this.txtFieldName); this.Controls.Add(this.btnHasText3);
this.Controls.Add(this.btnGetField); this.Controls.Add(this.btnGetField3);
this.Controls.Add(this.txtColumnName); this.Controls.Add(this.txtField3);
this.Controls.Add(this.btnGetColumn); this.Controls.Add(this.btnGetColumn3);
this.Controls.Add(this.btnHasText2);
this.Controls.Add(this.btnGetField2);
this.Controls.Add(this.txtField2);
this.Controls.Add(this.btnGetColumn2);
this.Controls.Add(this.btnRender);
this.Controls.Add(this.btnHasText1);
this.Controls.Add(this.btnGetField1);
this.Controls.Add(this.txtField1);
this.Controls.Add(this.btnGetColumn1);
this.Controls.Add(this.lblOutputs); this.Controls.Add(this.lblOutputs);
this.Controls.Add(this.lblInputs); this.Controls.Add(this.lblInputs);
this.Controls.Add(this.btnBrowse); this.Controls.Add(this.btnBrowse);
@@ -193,11 +300,20 @@
private System.Windows.Forms.TextBox txtPdfPath; private System.Windows.Forms.TextBox txtPdfPath;
private System.Windows.Forms.TextBox txtOutput; private System.Windows.Forms.TextBox txtOutput;
private System.Windows.Forms.Button btnProcess; private System.Windows.Forms.Button btnProcess;
private System.Windows.Forms.Button btnGetColumn; private System.Windows.Forms.Button btnGetColumn1;
private System.Windows.Forms.TextBox txtColumnName; private System.Windows.Forms.TextBox txtField1;
private System.Windows.Forms.TextBox txtFieldName; private System.Windows.Forms.Button btnGetField1;
private System.Windows.Forms.Button btnGetField; private System.Windows.Forms.Button btnHasText1;
private System.Windows.Forms.TextBox txtText; private System.Windows.Forms.Button btnRender;
private System.Windows.Forms.Button btnHasText; private System.Windows.Forms.Button btnHasText2;
private System.Windows.Forms.Button btnGetField2;
private System.Windows.Forms.TextBox txtField2;
private System.Windows.Forms.Button btnGetColumn2;
private System.Windows.Forms.Button btnHasText3;
private System.Windows.Forms.Button btnGetField3;
private System.Windows.Forms.TextBox txtField3;
private System.Windows.Forms.Button btnGetColumn3;
private System.Windows.Forms.TextBox txtPages;
private System.Windows.Forms.CheckBox chkRender;
} }
} }

View File

@@ -1,7 +1,12 @@
using System; using System;
using System.Collections.Generic; using System.Collections.Generic;
using System.Drawing;
using System.Drawing.Imaging;
using System.IO;
using System.Linq; using System.Linq;
using System.Text;
using System.Windows.Forms; using System.Windows.Forms;
using VAR.PdfTools.PdfElements;
namespace VAR.PdfTools.Workbench namespace VAR.PdfTools.Workbench
{ {
@@ -14,19 +19,27 @@ namespace VAR.PdfTools.Workbench
private void FrmPdfInfo_Load(object sender, EventArgs e) private void FrmPdfInfo_Load(object sender, EventArgs e)
{ {
txtPdfPath.Text = Properties.Settings.Default.LastPdfPath; var configuration = new Configuration();
txtColumnName.Text = Properties.Settings.Default.LastColumnName; configuration.Load();
txtFieldName.Text = Properties.Settings.Default.LastFieldName; txtPdfPath.Text = configuration.Get("LastPdfPath", string.Empty);
txtText.Text = Properties.Settings.Default.LastText; txtField1.Text = configuration.Get("Field1", string.Empty);
txtField2.Text = configuration.Get("Field2", string.Empty);
txtField3.Text = configuration.Get("Field3", string.Empty);
txtPages.Text = configuration.Get("Pages", string.Empty);
chkRender.Checked = configuration.Get("Render", false);
} }
private void FrmPdfInfo_FormClosing(object sender, FormClosingEventArgs e) private void FrmPdfInfo_FormClosing(object sender, FormClosingEventArgs e)
{ {
Properties.Settings.Default.LastPdfPath = txtPdfPath.Text; var configuration = new Configuration();
Properties.Settings.Default.LastColumnName = txtColumnName.Text; var configItems = new Dictionary<string, string>();
Properties.Settings.Default.LastFieldName = txtFieldName.Text; configuration.Set("LastPdfPath", txtPdfPath.Text);
Properties.Settings.Default.LastText = txtText.Text; configuration.Set("Field1", txtField1.Text);
Properties.Settings.Default.Save(); configuration.Set("Field2", txtField2.Text);
configuration.Set("Field3", txtField3.Text);
configuration.Set("Pages", txtPages.Text);
configuration.Set("Render", chkRender.Checked);
configuration.Save();
} }
private void btnBrowse_Click(object sender, EventArgs e) private void btnBrowse_Click(object sender, EventArgs e)
@@ -87,9 +100,25 @@ namespace VAR.PdfTools.Workbench
PdfTextExtractor extractor = new PdfTextExtractor(page); PdfTextExtractor extractor = new PdfTextExtractor(page);
foreach (PdfTextElement textElement in extractor.Elements) foreach (PdfTextElement textElement in extractor.Elements)
{ {
string fontName = textElement.Font == null ? "#NULL#" : textElement.Font.Name;
if (fontName == "#NULL#" && textElement.Childs.Count > 0)
{
var fontNames = textElement.Childs.Select(c => c.Font == null ? "#NULL#" : c.Font.Name);
StringBuilder sbFontName = new StringBuilder();
foreach (string fontNameAux in fontNames)
{
if (sbFontName.Length > 0) { sbFontName.Append(";"); }
sbFontName.Append(fontNameAux);
}
fontName = sbFontName.ToString();
}
lines.Add(string.Format("Text({0}, {1})({2}, {3})[{4}]: \"{5}\"", lines.Add(string.Format("Text({0}, {1})({2}, {3})[{4}]: \"{5}\"",
textElement.Matrix.Matrix[0, 2], textElement.Matrix.Matrix[1, 2], textElement.VisibleWidth, textElement.VisibleHeight, Math.Round(textElement.Matrix.Matrix[0, 2], 2),
textElement.Font == null ? string.Empty : textElement.Font.Name, Math.Round(textElement.Matrix.Matrix[1, 2], 2),
Math.Round(textElement.VisibleWidth, 2),
Math.Round(textElement.VisibleHeight, 2),
fontName,
textElement.VisibleText)); textElement.VisibleText));
} }
} }
@@ -97,61 +126,256 @@ namespace VAR.PdfTools.Workbench
txtOutput.Lines = lines.ToArray(); txtOutput.Lines = lines.ToArray();
} }
private void btnGetColumn_Click(object sender, EventArgs e) private void btnHasText1_Click(object sender, EventArgs e)
{ {
if (System.IO.File.Exists(txtPdfPath.Text) == false) string pdfPath = txtPdfPath.Text;
string text = txtField1.Text;
Action_HasText(pdfPath, text);
}
private void btnGetField1_Click(object sender, EventArgs e)
{
string pdfPath = txtPdfPath.Text;
string field = txtField1.Text;
Action_GetField(pdfPath, field);
}
private void btnGetColumn1_Click(object sender, EventArgs e)
{
string pdfPath = txtPdfPath.Text;
string column = txtField1.Text;
Action_GetColumn(pdfPath, column);
}
private void btnHasText2_Click(object sender, EventArgs e)
{
string pdfPath = txtPdfPath.Text;
string text = txtField2.Text;
Action_HasText(pdfPath, text);
}
private void btnGetField2_Click(object sender, EventArgs e)
{
string pdfPath = txtPdfPath.Text;
string field = txtField2.Text;
Action_GetField(pdfPath, field);
}
private void btnGetColumn2_Click(object sender, EventArgs e)
{
string pdfPath = txtPdfPath.Text;
string column = txtField2.Text;
Action_GetColumn(pdfPath, column);
}
private void btnHasText3_Click(object sender, EventArgs e)
{
string pdfPath = txtPdfPath.Text;
string text = txtField3.Text;
Action_HasText(pdfPath, text);
}
private void btnGetField3_Click(object sender, EventArgs e)
{
string pdfPath = txtPdfPath.Text;
string field = txtField3.Text;
Action_GetField(pdfPath, field);
}
private void btnGetColumn3_Click(object sender, EventArgs e)
{
string pdfPath = txtPdfPath.Text;
string column = txtField3.Text;
Action_GetColumn(pdfPath, column);
}
private IEnumerable<int> GetSelectedPages(int maxPages)
{
string pages = txtPages.Text;
if (string.IsNullOrEmpty(pages))
{
return Enumerable.Range(1, maxPages);
}
string[] pagesParts;
if (pages.Contains(","))
{
pagesParts = pages.Split(',');
}
else
{
pagesParts = new string[] { pages };
}
List<int> listPages = new List<int>();
foreach (string part in pagesParts)
{
if (part.Contains("-"))
{
string[] range = part.Split('-');
if (range.Length == 2)
{
int pageStart;
int pageEnd;
if (int.TryParse(range[0], out pageStart) && int.TryParse(range[1], out pageEnd))
{
listPages.AddRange(Enumerable.Range(pageStart, (pageEnd - pageStart) + 1));
}
}
}
else
{
int pageNum;
if (int.TryParse(part, out pageNum))
{
listPages.Add(pageNum);
}
}
}
if (listPages.Count == 0)
{
listPages.AddRange(Enumerable.Range(1, maxPages));
}
return listPages;
}
private void Action_HasText(string pdfPath, string text)
{
if (System.IO.File.Exists(pdfPath) == false)
{ {
MessageBox.Show("File does not exist"); MessageBox.Show("File does not exist");
return; return;
} }
PdfDocument doc = PdfDocument.Load(txtPdfPath.Text); PdfDocument doc = PdfDocument.Load(pdfPath);
var columnData = new List<string>(); IEnumerable<int> selectedPages = GetSelectedPages(doc.Pages.Count);
List<string> lines = new List<string>();
int pageNum = 0;
foreach (PdfDocumentPage page in doc.Pages) foreach (PdfDocumentPage page in doc.Pages)
{ {
pageNum++;
if (selectedPages.Contains(pageNum) == false) { continue; }
PdfTextExtractor extractor = new PdfTextExtractor(page); PdfTextExtractor extractor = new PdfTextExtractor(page);
columnData.AddRange(extractor.GetColumn(txtColumnName.Text)); lines.Add(string.Format("Page({0}) : {1}", pageNum, Convert.ToString(extractor.HasText(text))));
} }
txtOutput.Lines = columnData.ToArray(); txtOutput.Lines = lines.ToArray();
} }
private void btnGetField_Click(object sender, EventArgs e) private void Action_GetField(string pdfPath, string field)
{ {
if (System.IO.File.Exists(txtPdfPath.Text) == false) if (System.IO.File.Exists(pdfPath) == false)
{ {
MessageBox.Show("File does not exist"); MessageBox.Show("File does not exist");
return; return;
} }
PdfDocument doc = PdfDocument.Load(txtPdfPath.Text); PdfDocument doc = PdfDocument.Load(pdfPath);
IEnumerable<int> selectedPages = GetSelectedPages(doc.Pages.Count);
var fieldData = new List<string>(); var fieldData = new List<string>();
int pageNum = 0;
foreach (PdfDocumentPage page in doc.Pages) foreach (PdfDocumentPage page in doc.Pages)
{ {
pageNum++;
if (selectedPages.Contains(pageNum) == false) { continue; }
PdfTextExtractor extractor = new PdfTextExtractor(page); PdfTextExtractor extractor = new PdfTextExtractor(page);
fieldData.Add(extractor.GetField(txtFieldName.Text)); fieldData.Add(extractor.GetFieldAsString(field));
} }
txtOutput.Lines = fieldData.ToArray(); txtOutput.Lines = fieldData.ToArray();
} }
private void btnHasText_Click(object sender, EventArgs e) private void Action_GetColumn(string pdfPath, string column)
{ {
if (System.IO.File.Exists(txtPdfPath.Text) == false) if (System.IO.File.Exists(pdfPath) == false)
{
MessageBox.Show("File does not exist");
return;
}
PdfDocument doc = PdfDocument.Load(pdfPath);
string baseDocumentPath = Path.GetDirectoryName(txtPdfPath.Text);
string baseDocumentFilename = Path.GetFileNameWithoutExtension(txtPdfPath.Text);
IEnumerable<int> selectedPages = GetSelectedPages(doc.Pages.Count);
var columns = new List<string>();
int pageNum = 0;
foreach (PdfDocumentPage page in doc.Pages)
{
pageNum++;
if (selectedPages.Contains(pageNum) == false) { continue; }
PdfTextExtractor extractor = new PdfTextExtractor(page);
PdfTextElementColumn columnData;
if (column.StartsWith("#"))
{
string[] columnParts = column.Substring(1).Split(';');
double y = Convert.ToDouble(columnParts[0]);
double x1 = Convert.ToDouble(columnParts[1]);
double x2 = Convert.ToDouble(columnParts[2]);
columnData = extractor.GetColumn(null, y, x1, x2, x1, x2);
}
else
{
columnData = extractor.GetColumn(column);
}
if (chkRender.Checked)
{
var pdfPageRenderer = new PdfPageRenderer(extractor);
Bitmap bmp = pdfPageRenderer.Render();
pdfPageRenderer.RenderColumn(columnData, bmp);
string fileName = Path.Combine(baseDocumentPath, string.Format("{0}_{1:0000}.png", baseDocumentFilename, pageNum));
bmp.Save(fileName, ImageFormat.Png);
bmp.Dispose();
GC.Collect();
}
columns.AddRange(columnData.Elements.Select(t => t.VisibleText));
}
txtOutput.Lines = columns.ToArray();
}
private void btnRender_Click(object sender, EventArgs e)
{
if (File.Exists(txtPdfPath.Text) == false)
{ {
MessageBox.Show("File does not exist"); MessageBox.Show("File does not exist");
return; return;
} }
PdfDocument doc = PdfDocument.Load(txtPdfPath.Text); PdfDocument doc = PdfDocument.Load(txtPdfPath.Text);
string baseDocumentPath = Path.GetDirectoryName(txtPdfPath.Text);
string baseDocumentFilename = Path.GetFileNameWithoutExtension(txtPdfPath.Text);
List<string> lines = new List<string>(); List<string> lines = new List<string>();
int pageNum = 1; lines.Add(string.Format("Filename : {0}", baseDocumentFilename));
lines.Add(string.Format("Number of Pages : {0}", doc.Pages.Count));
IEnumerable<int> selectedPages = GetSelectedPages(doc.Pages.Count);
int pageNum = 0;
foreach (PdfDocumentPage page in doc.Pages) foreach (PdfDocumentPage page in doc.Pages)
{ {
PdfTextExtractor extractor = new PdfTextExtractor(page); pageNum++;
lines.Add(string.Format("Page({0}) : {1}", pageNum, Convert.ToString(extractor.HasText(txtText.Text)))); if (selectedPages.Contains(pageNum) == false) { continue; }
PdfPageRenderer pdfPageRenderer = new PdfPageRenderer(page);
Bitmap bmp = pdfPageRenderer.Render();
lines.Add(string.Format("Page {0:0000} TextElements : {1}", pageNum, pdfPageRenderer.Extractor.Elements.Count));
// Save image to disk
string fileName = Path.Combine(baseDocumentPath, string.Format("{0}_{1:0000}.png", baseDocumentFilename, pageNum));
bmp.Save(fileName, ImageFormat.Png);
bmp.Dispose();
GC.Collect();
} }
txtOutput.Lines = lines.ToArray(); txtOutput.Lines = lines.ToArray();
} }
} }

View File

@@ -112,9 +112,9 @@
<value>2.0</value> <value>2.0</value>
</resheader> </resheader>
<resheader name="reader"> <resheader name="reader">
<value>System.Resources.ResXResourceReader, System.Windows.Forms, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089</value> <value>System.Resources.ResXResourceReader, System.Windows.Forms, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089</value>
</resheader> </resheader>
<resheader name="writer"> <resheader name="writer">
<value>System.Resources.ResXResourceWriter, System.Windows.Forms, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089</value> <value>System.Resources.ResXResourceWriter, System.Windows.Forms, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089</value>
</resheader> </resheader>
</root> </root>

View File

@@ -1,14 +0,0 @@
using System.Reflection;
using System.Runtime.InteropServices;
[assembly: AssemblyTitle("VAR.PdfTools.Workbench")]
[assembly: AssemblyDescription("PdfTools Workbench")]
[assembly: AssemblyConfiguration("")]
[assembly: AssemblyCompany("VAR")]
[assembly: AssemblyProduct("VAR.PdfTools.Workbench")]
[assembly: AssemblyCopyright("Copyright © VAR 2016")]
[assembly: AssemblyTrademark("")]
[assembly: AssemblyCulture("")]
[assembly: ComVisible(false)]
[assembly: Guid("a5825d8e-9f81-49e0-b610-8ae5e46d02ea")]
[assembly: AssemblyVersion("1.1.*")]

View File

@@ -0,0 +1,19 @@
<?xml version="1.0" encoding="utf-8"?>
<!--
https://go.microsoft.com/fwlink/?LinkID=208121.
-->
<Project ToolsVersion="4.0" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
<PropertyGroup>
<Configuration>Release</Configuration>
<Platform>Any CPU</Platform>
<PublishDir>bin\Release\net5.0-windows\publish\</PublishDir>
<PublishProtocol>FileSystem</PublishProtocol>
<TargetFramework>net5.0-windows</TargetFramework>
<RuntimeIdentifier>win-x64</RuntimeIdentifier>
<SelfContained>true</SelfContained>
<PublishSingleFile>True</PublishSingleFile>
<PublishReadyToRun>False</PublishReadyToRun>
<IncludeNativeLibrariesForSelfExtract>True</IncludeNativeLibrariesForSelfExtract>
<PublishTrimmed>True</PublishTrimmed>
</PropertyGroup>
</Project>

View File

@@ -1,74 +0,0 @@
//------------------------------------------------------------------------------
// <auto-generated>
// This code was generated by a tool.
// Runtime Version:4.0.30319.42000
//
// Changes to this file may cause incorrect behavior and will be lost if
// the code is regenerated.
// </auto-generated>
//------------------------------------------------------------------------------
namespace VAR.PdfTools.Workbench.Properties {
[global::System.Runtime.CompilerServices.CompilerGeneratedAttribute()]
[global::System.CodeDom.Compiler.GeneratedCodeAttribute("Microsoft.VisualStudio.Editors.SettingsDesigner.SettingsSingleFileGenerator", "10.0.0.0")]
internal sealed partial class Settings : global::System.Configuration.ApplicationSettingsBase {
private static Settings defaultInstance = ((Settings)(global::System.Configuration.ApplicationSettingsBase.Synchronized(new Settings())));
public static Settings Default {
get {
return defaultInstance;
}
}
[global::System.Configuration.UserScopedSettingAttribute()]
[global::System.Diagnostics.DebuggerNonUserCodeAttribute()]
[global::System.Configuration.DefaultSettingValueAttribute("")]
public string LastPdfPath {
get {
return ((string)(this["LastPdfPath"]));
}
set {
this["LastPdfPath"] = value;
}
}
[global::System.Configuration.UserScopedSettingAttribute()]
[global::System.Diagnostics.DebuggerNonUserCodeAttribute()]
[global::System.Configuration.DefaultSettingValueAttribute("")]
public string LastColumnName {
get {
return ((string)(this["LastColumnName"]));
}
set {
this["LastColumnName"] = value;
}
}
[global::System.Configuration.UserScopedSettingAttribute()]
[global::System.Diagnostics.DebuggerNonUserCodeAttribute()]
[global::System.Configuration.DefaultSettingValueAttribute("")]
public string LastFieldName {
get {
return ((string)(this["LastFieldName"]));
}
set {
this["LastFieldName"] = value;
}
}
[global::System.Configuration.UserScopedSettingAttribute()]
[global::System.Diagnostics.DebuggerNonUserCodeAttribute()]
[global::System.Configuration.DefaultSettingValueAttribute("")]
public string LastText {
get {
return ((string)(this["LastText"]));
}
set {
this["LastText"] = value;
}
}
}
}

View File

@@ -1,18 +0,0 @@
<?xml version='1.0' encoding='utf-8'?>
<SettingsFile xmlns="http://schemas.microsoft.com/VisualStudio/2004/01/settings" CurrentProfile="(Default)" GeneratedClassNamespace="VAR.PdfTools.Workbench.Properties" GeneratedClassName="Settings">
<Profiles />
<Settings>
<Setting Name="LastPdfPath" Type="System.String" Scope="User">
<Value Profile="(Default)" />
</Setting>
<Setting Name="LastColumnName" Type="System.String" Scope="User">
<Value Profile="(Default)" />
</Setting>
<Setting Name="LastFieldName" Type="System.String" Scope="User">
<Value Profile="(Default)" />
</Setting>
<Setting Name="LastText" Type="System.String" Scope="User">
<Value Profile="(Default)" />
</Setting>
</Settings>
</SettingsFile>

View File

@@ -1,5 +1,5 @@
<?xml version="1.0" encoding="utf-8"?> <?xml version="1.0" encoding="utf-8"?>
<Project ToolsVersion="14.0" DefaultTargets="Build" xmlns="http://schemas.microsoft.com/developer/msbuild/2003"> <Project ToolsVersion="4.0" DefaultTargets="Build" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
<Import Project="$(MSBuildExtensionsPath)\$(MSBuildToolsVersion)\Microsoft.Common.props" Condition="Exists('$(MSBuildExtensionsPath)\$(MSBuildToolsVersion)\Microsoft.Common.props')" /> <Import Project="$(MSBuildExtensionsPath)\$(MSBuildToolsVersion)\Microsoft.Common.props" Condition="Exists('$(MSBuildExtensionsPath)\$(MSBuildToolsVersion)\Microsoft.Common.props')" />
<PropertyGroup> <PropertyGroup>
<Configuration Condition=" '$(Configuration)' == '' ">Debug</Configuration> <Configuration Condition=" '$(Configuration)' == '' ">Debug</Configuration>
@@ -13,6 +13,8 @@
<FileAlignment>512</FileAlignment> <FileAlignment>512</FileAlignment>
<AutoGenerateBindingRedirects>true</AutoGenerateBindingRedirects> <AutoGenerateBindingRedirects>true</AutoGenerateBindingRedirects>
<TargetFrameworkProfile /> <TargetFrameworkProfile />
<ProductVersion>10.0.0</ProductVersion>
<SchemaVersion>2.0</SchemaVersion>
</PropertyGroup> </PropertyGroup>
<PropertyGroup Condition=" '$(Configuration)|$(Platform)' == 'Debug|AnyCPU' "> <PropertyGroup Condition=" '$(Configuration)|$(Platform)' == 'Debug|AnyCPU' ">
<PlatformTarget>AnyCPU</PlatformTarget> <PlatformTarget>AnyCPU</PlatformTarget>
@@ -33,16 +35,12 @@
<ErrorReport>prompt</ErrorReport> <ErrorReport>prompt</ErrorReport>
<WarningLevel>4</WarningLevel> <WarningLevel>4</WarningLevel>
</PropertyGroup> </PropertyGroup>
<PropertyGroup>
<StartupObject />
</PropertyGroup>
<ItemGroup> <ItemGroup>
<Reference Include="System" /> <Reference Include="System" />
<Reference Include="System.Core" /> <Reference Include="System.Core" />
<Reference Include="System.Xml.Linq" /> <Reference Include="System.Xml.Linq" />
<Reference Include="System.Data.DataSetExtensions" /> <Reference Include="System.Data.DataSetExtensions" />
<Reference Include="System.Data" /> <Reference Include="System.Data" />
<Reference Include="System.Deployment" />
<Reference Include="System.Drawing" /> <Reference Include="System.Drawing" />
<Reference Include="System.Windows.Forms" /> <Reference Include="System.Windows.Forms" />
<Reference Include="System.Xml" /> <Reference Include="System.Xml" />
@@ -56,9 +54,6 @@
</Compile> </Compile>
<Compile Include="Program.cs" /> <Compile Include="Program.cs" />
<Compile Include="Properties\AssemblyInfo.cs" /> <Compile Include="Properties\AssemblyInfo.cs" />
<EmbeddedResource Include="FrmPdfInfo.resx">
<DependentUpon>FrmPdfInfo.cs</DependentUpon>
</EmbeddedResource>
<None Include="Properties\Settings.settings"> <None Include="Properties\Settings.settings">
<Generator>SettingsSingleFileGenerator</Generator> <Generator>SettingsSingleFileGenerator</Generator>
<LastGenOutput>Settings.Designer.cs</LastGenOutput> <LastGenOutput>Settings.Designer.cs</LastGenOutput>
@@ -69,12 +64,6 @@
<DesignTimeSharedInput>True</DesignTimeSharedInput> <DesignTimeSharedInput>True</DesignTimeSharedInput>
</Compile> </Compile>
</ItemGroup> </ItemGroup>
<ItemGroup>
<ProjectReference Include="..\VAR.PdfTools\VAR.PdfTools.csproj">
<Project>{eb7e003a-6a95-4002-809f-926c7c8a11e9}</Project>
<Name>VAR.PdfTools</Name>
</ProjectReference>
</ItemGroup>
<Import Project="$(MSBuildToolsPath)\Microsoft.CSharp.targets" /> <Import Project="$(MSBuildToolsPath)\Microsoft.CSharp.targets" />
<!-- To modify your build process, add your task inside one of the targets below and uncomment it. <!-- To modify your build process, add your task inside one of the targets below and uncomment it.
Other similar extension points exist, see Microsoft.Common.targets. Other similar extension points exist, see Microsoft.Common.targets.
@@ -83,4 +72,10 @@
<Target Name="AfterBuild"> <Target Name="AfterBuild">
</Target> </Target>
--> -->
<ItemGroup>
<ProjectReference Include="..\VAR.PdfTools\VAR.PdfTools.Net35.csproj">
<Project>{EB7E003A-6A95-4002-809F-926C7C8A11E9}</Project>
<Name>VAR.PdfTools.Net35</Name>
</ProjectReference>
</ItemGroup>
</Project> </Project>

View File

@@ -1,86 +1,26 @@
<?xml version="1.0" encoding="utf-8"?> <Project Sdk="Microsoft.NET.Sdk">
<Project ToolsVersion="14.0" DefaultTargets="Build" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
<Import Project="$(MSBuildExtensionsPath)\$(MSBuildToolsVersion)\Microsoft.Common.props" Condition="Exists('$(MSBuildExtensionsPath)\$(MSBuildToolsVersion)\Microsoft.Common.props')" />
<PropertyGroup> <PropertyGroup>
<Configuration Condition=" '$(Configuration)' == '' ">Debug</Configuration> <TargetFramework>net5.0-windows</TargetFramework>
<Platform Condition=" '$(Platform)' == '' ">AnyCPU</Platform>
<ProjectGuid>{A5825D8E-9F81-49E0-B610-8AE5E46D02EA}</ProjectGuid>
<OutputType>WinExe</OutputType> <OutputType>WinExe</OutputType>
<AppDesignerFolder>Properties</AppDesignerFolder> <UseWindowsForms>true</UseWindowsForms>
<RootNamespace>VAR.PdfTools.Workbench</RootNamespace>
<AssemblyName>VAR.PdfTools.Workbench</AssemblyName>
<TargetFrameworkVersion>v4.6.1</TargetFrameworkVersion>
<FileAlignment>512</FileAlignment>
<AutoGenerateBindingRedirects>true</AutoGenerateBindingRedirects>
<TargetFrameworkProfile />
</PropertyGroup>
<PropertyGroup Condition=" '$(Configuration)|$(Platform)' == 'Debug|AnyCPU' ">
<PlatformTarget>AnyCPU</PlatformTarget>
<DebugSymbols>true</DebugSymbols>
<DebugType>full</DebugType>
<Optimize>false</Optimize>
<OutputPath>bin\Debug\</OutputPath>
<DefineConstants>DEBUG;TRACE</DefineConstants>
<ErrorReport>prompt</ErrorReport>
<WarningLevel>4</WarningLevel>
</PropertyGroup>
<PropertyGroup Condition=" '$(Configuration)|$(Platform)' == 'Release|AnyCPU' ">
<PlatformTarget>AnyCPU</PlatformTarget>
<DebugType>pdbonly</DebugType>
<Optimize>true</Optimize>
<OutputPath>bin\Release\</OutputPath>
<DefineConstants>TRACE</DefineConstants>
<ErrorReport>prompt</ErrorReport>
<WarningLevel>4</WarningLevel>
</PropertyGroup> </PropertyGroup>
<PropertyGroup> <PropertyGroup>
<StartupObject /> <PackageId>VAR.PdfTools.Workbench</PackageId>
<Title>VAR.PdfTools.Workbench</Title>
<Version>1.6.1</Version>
<Description>PdfTools Workbench</Description>
<Authors>VAR</Authors>
<Company>VAR</Company>
<Copyright>Copyright © VAR 2016-2019</Copyright>
<RequireLicenseAcceptance>false</RequireLicenseAcceptance>
<PackageLicenseFile>LICENSE.txt</PackageLicenseFile>
<PackageProjectUrl>https://github.com/Kableado/VAR.PdfTools</PackageProjectUrl>
<PackageTags>PDF;PDF Tool</PackageTags>
</PropertyGroup> </PropertyGroup>
<ItemGroup> <ItemGroup>
<Reference Include="System" /> <Content Include="..\LICENSE.txt" Link="LICENSE.txt" Pack="true" PackagePath="" />
<Reference Include="System.Core" />
<Reference Include="System.Xml.Linq" />
<Reference Include="System.Data.DataSetExtensions" />
<Reference Include="System.Data" />
<Reference Include="System.Deployment" />
<Reference Include="System.Drawing" />
<Reference Include="System.Windows.Forms" />
<Reference Include="System.Xml" />
</ItemGroup> </ItemGroup>
<ItemGroup> <ItemGroup>
<Compile Include="FrmPdfInfo.cs"> <ProjectReference Include="..\VAR.PdfTools\VAR.PdfTools.csproj" />
<SubType>Form</SubType>
</Compile>
<Compile Include="FrmPdfInfo.Designer.cs">
<DependentUpon>FrmPdfInfo.cs</DependentUpon>
</Compile>
<Compile Include="Program.cs" />
<Compile Include="Properties\AssemblyInfo.cs" />
<EmbeddedResource Include="FrmPdfInfo.resx">
<DependentUpon>FrmPdfInfo.cs</DependentUpon>
</EmbeddedResource>
<None Include="Properties\Settings.settings">
<Generator>SettingsSingleFileGenerator</Generator>
<LastGenOutput>Settings.Designer.cs</LastGenOutput>
</None>
<Compile Include="Properties\Settings.Designer.cs">
<AutoGen>True</AutoGen>
<DependentUpon>Settings.settings</DependentUpon>
<DesignTimeSharedInput>True</DesignTimeSharedInput>
</Compile>
</ItemGroup> </ItemGroup>
<ItemGroup>
<ProjectReference Include="..\VAR.PdfTools\VAR.PdfTools.csproj">
<Project>{eb7e003a-6a95-4002-809f-926c7c8a11e9}</Project>
<Name>VAR.PdfTools</Name>
</ProjectReference>
</ItemGroup>
<Import Project="$(MSBuildToolsPath)\Microsoft.CSharp.targets" />
<!-- To modify your build process, add your task inside one of the targets below and uncomment it.
Other similar extension points exist, see Microsoft.Common.targets.
<Target Name="BeforeBuild">
</Target>
<Target Name="AfterBuild">
</Target>
-->
</Project> </Project>

View File

@@ -1,12 +1,18 @@
Microsoft Visual Studio Solution File, Format Version 12.00 Microsoft Visual Studio Solution File, Format Version 12.00
# Visual Studio 14 # Visual Studio Version 16
VisualStudioVersion = 14.0.25123.0 VisualStudioVersion = 16.0.31402.337
MinimumVisualStudioVersion = 10.0.40219.1 MinimumVisualStudioVersion = 10.0.40219.1
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "VAR.PdfTools", "VAR.PdfTools\VAR.PdfTools.csproj", "{EB7E003A-6A95-4002-809F-926C7C8A11E9}" Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "VAR.PdfTools", "VAR.PdfTools\VAR.PdfTools.csproj", "{EB7E003A-6A95-4002-809F-926C7C8A11E9}"
EndProject EndProject
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "VAR.PdfTools.Workbench", "VAR.PdfTools.Workbench\VAR.PdfTools.Workbench.csproj", "{A5825D8E-9F81-49E0-B610-8AE5E46D02EA}" Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "VAR.PdfTools.Workbench", "VAR.PdfTools.Workbench\VAR.PdfTools.Workbench.csproj", "{A5825D8E-9F81-49E0-B610-8AE5E46D02EA}"
EndProject EndProject
Project("{2150E333-8FDC-42A3-9474-1A3956D46DE8}") = "Notes", "Notes", "{CE2D7584-5D82-401E-9A88-A9961CBB6959}"
ProjectSection(SolutionItems) = preProject
LICENSE.txt = LICENSE.txt
README.md = README.md
EndProjectSection
EndProject
Global Global
GlobalSection(SolutionConfigurationPlatforms) = preSolution GlobalSection(SolutionConfigurationPlatforms) = preSolution
Debug|Any CPU = Debug|Any CPU Debug|Any CPU = Debug|Any CPU
@@ -25,4 +31,7 @@ Global
GlobalSection(SolutionProperties) = preSolution GlobalSection(SolutionProperties) = preSolution
HideSolutionNode = FALSE HideSolutionNode = FALSE
EndGlobalSection EndGlobalSection
GlobalSection(ExtensibilityGlobals) = postSolution
SolutionGuid = {7E5F981A-8918-4C9E-AC9C-A798E2F3DA69}
EndGlobalSection
EndGlobal EndGlobal

View File

@@ -0,0 +1,121 @@
using System;
namespace VAR.PdfTools.Maths
{
public class Matrix3x3
{
#region Declarations
public double[,] _matrix = new double[3, 3];
#endregion
#region Properties
public double[,] Matrix { get { return _matrix; } }
#endregion
#region Creator
public Matrix3x3()
{
Idenity();
}
public Matrix3x3(double a, double b, double c, double d, double e, double f)
{
Set(a, b, c, d, e, f);
}
#endregion
#region Public methods
public void Idenity()
{
_matrix[0, 0] = 1.0;
_matrix[0, 1] = 0.0;
_matrix[0, 2] = 0.0;
_matrix[1, 0] = 0.0;
_matrix[1, 1] = 1.0;
_matrix[1, 2] = 0.0;
_matrix[2, 0] = 0.0;
_matrix[2, 1] = 0.0;
_matrix[2, 2] = 1.0;
}
public void Set(double a, double b, double c, double d, double e, double f)
{
_matrix[0, 0] = a;
_matrix[1, 0] = b;
_matrix[2, 0] = 0;
_matrix[0, 1] = c;
_matrix[1, 1] = d;
_matrix[2, 1] = 0;
_matrix[0, 2] = e;
_matrix[1, 2] = f;
_matrix[2, 2] = 1;
}
public Vector3D Multiply(Vector3D vect)
{
Vector3D vectResult = new Vector3D();
vectResult.Vector[0] = (vect.Vector[0] * _matrix[0, 0]) + (vect.Vector[1] * _matrix[0, 1]) + (vect.Vector[2] * _matrix[0, 2]);
vectResult.Vector[1] = (vect.Vector[0] * _matrix[1, 0]) + (vect.Vector[1] * _matrix[1, 1]) + (vect.Vector[2] * _matrix[1, 2]);
vectResult.Vector[2] = (vect.Vector[0] * _matrix[2, 0]) + (vect.Vector[1] * _matrix[2, 1]) + (vect.Vector[2] * _matrix[2, 2]);
return vectResult;
}
public Matrix3x3 Multiply(Matrix3x3 matrix)
{
Matrix3x3 newMatrix = new Matrix3x3();
newMatrix._matrix[0, 0] = (_matrix[0, 0] * matrix._matrix[0, 0]) + (_matrix[1, 0] * matrix._matrix[0, 1]) + (_matrix[2, 0] * matrix._matrix[0, 2]);
newMatrix._matrix[0, 1] = (_matrix[0, 1] * matrix._matrix[0, 0]) + (_matrix[1, 1] * matrix._matrix[0, 1]) + (_matrix[2, 1] * matrix._matrix[0, 2]);
newMatrix._matrix[0, 2] = (_matrix[0, 2] * matrix._matrix[0, 0]) + (_matrix[1, 2] * matrix._matrix[0, 1]) + (_matrix[2, 2] * matrix._matrix[0, 2]);
newMatrix._matrix[1, 0] = (_matrix[0, 0] * matrix._matrix[1, 0]) + (_matrix[1, 0] * matrix._matrix[1, 1]) + (_matrix[2, 0] * matrix._matrix[1, 2]);
newMatrix._matrix[1, 1] = (_matrix[0, 1] * matrix._matrix[1, 0]) + (_matrix[1, 1] * matrix._matrix[1, 1]) + (_matrix[2, 1] * matrix._matrix[1, 2]);
newMatrix._matrix[1, 2] = (_matrix[0, 2] * matrix._matrix[1, 0]) + (_matrix[1, 2] * matrix._matrix[1, 1]) + (_matrix[2, 2] * matrix._matrix[1, 2]);
newMatrix._matrix[2, 0] = (_matrix[0, 0] * matrix._matrix[2, 0]) + (_matrix[1, 0] * matrix._matrix[2, 1]) + (_matrix[2, 0] * matrix._matrix[2, 2]);
newMatrix._matrix[2, 1] = (_matrix[0, 1] * matrix._matrix[2, 0]) + (_matrix[1, 1] * matrix._matrix[2, 1]) + (_matrix[2, 1] * matrix._matrix[2, 2]);
newMatrix._matrix[2, 2] = (_matrix[0, 2] * matrix._matrix[2, 0]) + (_matrix[1, 2] * matrix._matrix[2, 1]) + (_matrix[2, 2] * matrix._matrix[2, 2]);
return newMatrix;
}
public Matrix3x3 Copy()
{
Matrix3x3 newMatrix = new Matrix3x3();
newMatrix._matrix[0, 0] = _matrix[0, 0];
newMatrix._matrix[0, 1] = _matrix[0, 1];
newMatrix._matrix[0, 2] = _matrix[0, 2];
newMatrix._matrix[1, 0] = _matrix[1, 0];
newMatrix._matrix[1, 1] = _matrix[1, 1];
newMatrix._matrix[1, 2] = _matrix[1, 2];
newMatrix._matrix[2, 0] = _matrix[2, 0];
newMatrix._matrix[2, 1] = _matrix[2, 1];
newMatrix._matrix[2, 2] = _matrix[2, 2];
return newMatrix;
}
public bool IsCollinear(Matrix3x3 otherMatrix, double horizontalDelta = 0.00001, double verticalDelta = 0.00001)
{
double epsilon = 0.00001;
return (
Math.Abs(_matrix[0, 0] - otherMatrix.Matrix[0, 0]) <= epsilon &&
Math.Abs(_matrix[1, 0] - otherMatrix.Matrix[1, 0]) <= epsilon &&
Math.Abs(_matrix[0, 1] - otherMatrix.Matrix[0, 1]) <= epsilon &&
Math.Abs(_matrix[1, 1] - otherMatrix.Matrix[1, 1]) <= epsilon &&
Math.Abs(_matrix[0, 2] - otherMatrix.Matrix[0, 2]) <= horizontalDelta &&
Math.Abs(_matrix[1, 2] - otherMatrix.Matrix[1, 2]) <= verticalDelta &&
true);
}
#endregion
}
}

View File

@@ -0,0 +1,19 @@
namespace VAR.PdfTools.Maths
{
public class Rect
{
public double XMin { get; set; }
public double XMax { get; set; }
public double YMin { get; set; }
public double YMax { get; set; }
public void Add(Rect rect)
{
if (rect.XMax > XMax) { XMax = rect.XMax; }
if (rect.YMax > YMax) { YMax = rect.YMax; }
if (rect.XMin < XMin) { XMin = rect.XMin; }
if (rect.YMin < YMin) { YMin = rect.YMin; }
}
}
}

View File

@@ -0,0 +1,33 @@
namespace VAR.PdfTools.Maths
{
public class Vector3D
{
#region Declarations
public double[] _vector = new double[3];
#endregion
#region Properties
public double[] Vector { get { return _vector; } }
#endregion
#region Creator
public Vector3D()
{
Init();
}
public void Init()
{
_vector[0] = 0.0;
_vector[1] = 0.0;
_vector[2] = 1.0;
}
#endregion
}
}

View File

View File

@@ -1,4 +1,5 @@
using System.Collections.Generic; using System.Collections.Generic;
using VAR.PdfTools.PdfElements;
namespace VAR.PdfTools namespace VAR.PdfTools
{ {

View File

@@ -2,6 +2,7 @@
using System.Collections.Generic; using System.Collections.Generic;
using System.IO; using System.IO;
using System.Linq; using System.Linq;
using VAR.PdfTools.PdfElements;
namespace VAR.PdfTools namespace VAR.PdfTools
{ {
@@ -37,26 +38,50 @@ namespace VAR.PdfTools
private static void ApplyFilterToStream(PdfStream stream, string filter) private static void ApplyFilterToStream(PdfStream stream, string filter)
{ {
if (filter == "FlateDecode") if(filter == "ASCIIHexDecode")
{
// TODO: Implement ASCIIHexDecode Filter
}
else if (filter == "ASCII85Decode" || filter == "A85")
{
// TODO: Implement ASCII85Decode Filter
}
else if (filter == "LZWDecode")
{
// TODO: Implement LZWDecode Filter
}
else if (filter == "FlateDecode")
{ {
byte[] decodedStreamData = PdfFilters.FlateDecode.Decode(stream.Data); byte[] decodedStreamData = PdfFilters.FlateDecode.Decode(stream.Data);
stream.Data = decodedStreamData; stream.Data = decodedStreamData;
} }
else if (filter == "ASCII85Decode" || filter == "A85") else if (filter == "RunLengthDecode")
{ {
// FIXME: Implement this filter // TODO: Implement RunLengthDecode Filter
} }
else if (filter == "CCITTFaxDecode") else if (filter == "CCITTFaxDecode")
{ {
// FIXME: Implement this filter // TODO: Implement CCITTFaxDecode Filter
}
else if (filter == "JBIG2Decode")
{
// TODO: Implement JBIG2Decode Filter
} }
else if (filter == "DCTDecode") else if (filter == "DCTDecode")
{ {
// FIXME: Implement this filter // TODO: Implement DCTDecode Filter
}
else if (filter == "JPXDecode")
{
// TODO: Implement JPXDecode Filter
}
else if (filter == "Crypt")
{
// TODO: Implement Crypt Filter
} }
else else
{ {
// FIXME: Implement the rest of filters // TODO: Handle unknown filters
} }
} }

View File

@@ -1,5 +1,6 @@
using System; using System;
using System.Collections.Generic; using System.Collections.Generic;
using VAR.PdfTools.PdfElements;
namespace VAR.PdfTools namespace VAR.PdfTools
{ {
@@ -68,7 +69,8 @@ namespace VAR.PdfTools
{ {
PdfParser parser = new PdfParser(_content); PdfParser parser = new PdfParser(_content);
_contentActions = parser.ParseContent(); _contentActions = parser.ParseContent();
}else }
else
{ {
_contentActions = new List<PdfContentAction>(); _contentActions = new List<PdfContentAction>();
} }

View File

@@ -1,202 +0,0 @@
using System.Collections.Generic;
using System.IO;
namespace VAR.PdfTools
{
public enum PdfElementTypes
{
Undefined,
Boolean,
Integer,
Real,
String,
Name,
Array,
Dictionary,
Null,
ObjectReference,
Object,
Stream,
};
public interface IPdfElement
{
PdfElementTypes Type { get; }
}
public class PdfBoolean : IPdfElement
{
public PdfElementTypes Type { get { return PdfElementTypes.Boolean; } }
public bool Value { get; set; }
}
public class PdfInteger : IPdfElement
{
public PdfElementTypes Type { get { return PdfElementTypes.Integer; } }
public long Value { get; set; }
}
public class PdfReal : IPdfElement
{
public PdfElementTypes Type { get { return PdfElementTypes.Real; } }
public double Value { get; set; }
}
public class PdfString : IPdfElement
{
public PdfElementTypes Type { get { return PdfElementTypes.String; } }
public string Value { get; set; }
}
public class PdfName : IPdfElement
{
public PdfElementTypes Type { get { return PdfElementTypes.Name; } }
public string Value { get; set; }
}
public class PdfArray : IPdfElement
{
public PdfElementTypes Type { get { return PdfElementTypes.Array; } }
private List<IPdfElement> _values = new List<IPdfElement>();
public List<IPdfElement> Values { get { return _values; } }
}
public class PdfDictionary : IPdfElement
{
public PdfElementTypes Type { get { return PdfElementTypes.Dictionary; } }
private Dictionary<string, IPdfElement> _values = new Dictionary<string, IPdfElement>();
public Dictionary<string, IPdfElement> Values { get { return _values; } }
public string GetParamAsString(string name)
{
if (Values.ContainsKey(name) == false) { return null; }
IPdfElement value = Values[name];
if (value is PdfArray)
{
value = ((PdfArray)value).Values[0];
}
if (value is PdfName)
{
return ((PdfName)value).Value;
}
if (value is PdfString)
{
return ((PdfString)value).Value;
}
return null;
}
public long? GetParamAsInt(string name)
{
if (Values.ContainsKey(name) == false) { return null; }
IPdfElement value = Values[name];
if (value is PdfArray)
{
value = ((PdfArray)value).Values[0];
}
if (value is PdfInteger)
{
return ((PdfInteger)value).Value;
}
return null;
}
public byte[] GetParamAsStream(string name)
{
if (Values.ContainsKey(name) == false) { return null; }
IPdfElement value = Values[name];
if (value is PdfArray)
{
PdfArray array = value as PdfArray;
MemoryStream memStream = new MemoryStream();
foreach (IPdfElement elem in array.Values)
{
PdfStream stream = elem as PdfStream;
if (stream == null) { continue; }
memStream.Write(stream.Data, 0, stream.Data.Length);
}
if (memStream.Length > 0)
{
return memStream.ToArray();
}
return null;
}
if (value is PdfStream)
{
return ((PdfStream)value).Data;
}
return null;
}
}
public class PdfNull : IPdfElement
{
public PdfElementTypes Type { get { return PdfElementTypes.Null; } }
}
public class PdfObjectReference : IPdfElement
{
public PdfElementTypes Type { get { return PdfElementTypes.ObjectReference; } }
public int ObjectID { get; set; }
public int ObjectGeneration { get; set; }
}
public class PdfStream : IPdfElement
{
public PdfElementTypes Type { get { return PdfElementTypes.Stream; } }
public PdfDictionary Dictionary { get; set; }
public byte[] Data { get; set; }
public byte[] OriginalData { get; set; }
public IPdfElement OriginalFilter { get; set; }
}
public class PdfObject : IPdfElement
{
public PdfElementTypes Type { get { return PdfElementTypes.Object; } }
public int ObjectID { get; set; }
public int ObjectGeneration { get; set; }
public IPdfElement Data { get; set; }
public int UsageCount { get; set; }
}
public static class PdfElementUtils
{
public static double GetReal(IPdfElement elem, double defaultValue)
{
if (elem == null)
{
return defaultValue;
}
if (elem is PdfInteger)
{
return ((PdfInteger)elem).Value;
}
if (elem is PdfReal)
{
return ((PdfReal)elem).Value;
}
return defaultValue;
}
public static long GetInt(IPdfElement elem, long defaultValue)
{
if (elem == null)
{
return defaultValue;
}
if (elem is PdfInteger)
{
return ((PdfInteger)elem).Value;
}
if (elem is PdfReal)
{
return (long)((PdfReal)elem).Value;
}
return defaultValue;
}
}
}

View File

@@ -0,0 +1,7 @@
namespace VAR.PdfTools.PdfElements
{
public interface IPdfElement
{
PdfElementTypes Type { get; }
}
}

View File

@@ -0,0 +1,11 @@
using System.Collections.Generic;
namespace VAR.PdfTools.PdfElements
{
public class PdfArray : IPdfElement
{
public PdfElementTypes Type { get { return PdfElementTypes.Array; } }
private List<IPdfElement> _values = new List<IPdfElement>();
public List<IPdfElement> Values { get { return _values; } }
}
}

View File

@@ -0,0 +1,8 @@
namespace VAR.PdfTools.PdfElements
{
public class PdfBoolean : IPdfElement
{
public PdfElementTypes Type { get { return PdfElementTypes.Boolean; } }
public bool Value { get; set; }
}
}

View File

@@ -0,0 +1,77 @@
using System.Collections.Generic;
using System.IO;
namespace VAR.PdfTools.PdfElements
{
public class PdfDictionary : IPdfElement
{
public PdfElementTypes Type { get { return PdfElementTypes.Dictionary; } }
private Dictionary<string, IPdfElement> _values = new Dictionary<string, IPdfElement>();
public Dictionary<string, IPdfElement> Values { get { return _values; } }
public string GetParamAsString(string name)
{
if (Values.ContainsKey(name) == false) { return null; }
IPdfElement value = Values[name];
if (value is PdfArray)
{
value = ((PdfArray)value).Values[0];
}
if (value is PdfName)
{
return ((PdfName)value).Value;
}
if (value is PdfString)
{
return ((PdfString)value).Value;
}
return null;
}
public long? GetParamAsInt(string name)
{
if (Values.ContainsKey(name) == false) { return null; }
IPdfElement value = Values[name];
if (value is PdfArray)
{
value = ((PdfArray)value).Values[0];
}
if (value is PdfInteger)
{
return ((PdfInteger)value).Value;
}
return null;
}
public byte[] GetParamAsStream(string name)
{
if (Values.ContainsKey(name) == false) { return null; }
IPdfElement value = Values[name];
if (value is PdfArray)
{
PdfArray array = value as PdfArray;
MemoryStream memStream = new MemoryStream();
foreach (IPdfElement elem in array.Values)
{
PdfStream stream = elem as PdfStream;
if (stream == null) { continue; }
memStream.Write(stream.Data, 0, stream.Data.Length);
}
if (memStream.Length > 0)
{
return memStream.ToArray();
}
return null;
}
if (value is PdfStream)
{
return ((PdfStream)value).Data;
}
return null;
}
}
}

View File

@@ -0,0 +1,18 @@
namespace VAR.PdfTools.PdfElements
{
public enum PdfElementTypes
{
Undefined,
Boolean,
Integer,
Real,
String,
Name,
Array,
Dictionary,
Null,
ObjectReference,
Object,
Stream,
};
}

View File

@@ -0,0 +1,56 @@
namespace VAR.PdfTools.PdfElements
{
public static class PdfElementUtils
{
public static double GetReal(IPdfElement elem, double defaultValue)
{
if (elem == null)
{
return defaultValue;
}
if (elem is PdfInteger)
{
return ((PdfInteger)elem).Value;
}
if (elem is PdfReal)
{
return ((PdfReal)elem).Value;
}
return defaultValue;
}
public static long GetInt(IPdfElement elem, long defaultValue)
{
if (elem == null)
{
return defaultValue;
}
if (elem is PdfInteger)
{
return ((PdfInteger)elem).Value;
}
if (elem is PdfReal)
{
return (long)((PdfReal)elem).Value;
}
return defaultValue;
}
public static string GetString(IPdfElement elem, string defaultValue)
{
if (elem == null)
{
return defaultValue;
}
if (elem is PdfString)
{
return ((PdfString)elem).Value;
}
if (elem is PdfName)
{
return ((PdfName)elem).Value;
}
return defaultValue;
}
}
}

View File

@@ -0,0 +1,8 @@
namespace VAR.PdfTools.PdfElements
{
public class PdfInteger : IPdfElement
{
public PdfElementTypes Type { get { return PdfElementTypes.Integer; } }
public long Value { get; set; }
}
}

View File

@@ -0,0 +1,8 @@
namespace VAR.PdfTools.PdfElements
{
public class PdfName : IPdfElement
{
public PdfElementTypes Type { get { return PdfElementTypes.Name; } }
public string Value { get; set; }
}
}

View File

@@ -0,0 +1,7 @@
namespace VAR.PdfTools.PdfElements
{
public class PdfNull : IPdfElement
{
public PdfElementTypes Type { get { return PdfElementTypes.Null; } }
}
}

View File

@@ -0,0 +1,11 @@
namespace VAR.PdfTools.PdfElements
{
public class PdfObject : IPdfElement
{
public PdfElementTypes Type { get { return PdfElementTypes.Object; } }
public int ObjectID { get; set; }
public int ObjectGeneration { get; set; }
public IPdfElement Data { get; set; }
public int UsageCount { get; set; }
}
}

View File

@@ -0,0 +1,9 @@
namespace VAR.PdfTools.PdfElements
{
public class PdfObjectReference : IPdfElement
{
public PdfElementTypes Type { get { return PdfElementTypes.ObjectReference; } }
public int ObjectID { get; set; }
public int ObjectGeneration { get; set; }
}
}

View File

@@ -0,0 +1,8 @@
namespace VAR.PdfTools.PdfElements
{
public class PdfReal : IPdfElement
{
public PdfElementTypes Type { get { return PdfElementTypes.Real; } }
public double Value { get; set; }
}
}

View File

@@ -0,0 +1,12 @@
namespace VAR.PdfTools.PdfElements
{
public class PdfStream : IPdfElement
{
public PdfElementTypes Type { get { return PdfElementTypes.Stream; } }
public PdfDictionary Dictionary { get; set; }
public byte[] Data { get; set; }
public byte[] OriginalData { get; set; }
public IPdfElement OriginalFilter { get; set; }
}
}

View File

@@ -0,0 +1,8 @@
namespace VAR.PdfTools.PdfElements
{
public class PdfString : IPdfElement
{
public PdfElementTypes Type { get { return PdfElementTypes.String; } }
public string Value { get; set; }
}
}

View File

@@ -1,5 +1,5 @@
using System; using System.Collections.Generic;
using System.Collections.Generic; using VAR.PdfTools.PdfElements;
namespace VAR.PdfTools namespace VAR.PdfTools
{ {
@@ -45,6 +45,19 @@ namespace VAR.PdfTools
_tainted = true; _tainted = true;
} }
PrepareSizes(baseData);
}
#endregion
#region Private methods
private void PrepareSizes(PdfDictionary baseData)
{
// Set "Times-Roman" as default basefont sizes
_widths = PdfStandar14FontMetrics.Times_Roman.Widths;
_height = PdfStandar14FontMetrics.Times_Roman.ApproxHeight;
if (baseData.Values.ContainsKey("ToUnicode")) if (baseData.Values.ContainsKey("ToUnicode"))
{ {
byte[] toUnicodeStream = ((PdfStream)baseData.Values["ToUnicode"]).Data; byte[] toUnicodeStream = ((PdfStream)baseData.Values["ToUnicode"]).Data;
@@ -52,9 +65,21 @@ namespace VAR.PdfTools
_toUnicode = parser.ParseToUnicode(); _toUnicode = parser.ParseToUnicode();
} }
string baseFont = _baseData.GetParamAsString("BaseFont");
if (string.IsNullOrEmpty(baseFont))
{
SetBaseFontSizes(baseFont);
}
if (_baseData.Values.ContainsKey("FirstChar") && _baseData.Values.ContainsKey("LastChar") && _baseData.Values.ContainsKey("Widths")) if (_baseData.Values.ContainsKey("FirstChar") && _baseData.Values.ContainsKey("LastChar") && _baseData.Values.ContainsKey("Widths"))
{ {
double glyphSpaceToTextSpace = 1000.0; // FIXME: SubType:Type3 Uses a FontMatrix that may not correspond to 1/1000th ParseSizes();
}
}
private void ParseSizes()
{
double glyphSpaceToTextSpace = 1000.0; // TODO: PdfFont.ParseSizes: SubType:Type3 Uses a FontMatrix that may not correspond to 1/1000th
_widths = new Dictionary<char, double>(); _widths = new Dictionary<char, double>();
char firstChar = (char)_baseData.GetParamAsInt("FirstChar"); char firstChar = (char)_baseData.GetParamAsInt("FirstChar");
char lastChar = (char)_baseData.GetParamAsInt("LastChar"); char lastChar = (char)_baseData.GetParamAsInt("LastChar");
@@ -62,26 +87,16 @@ namespace VAR.PdfTools
char actualChar = firstChar; char actualChar = firstChar;
foreach (IPdfElement elem in widths.Values) foreach (IPdfElement elem in widths.Values)
{ {
PdfReal widthReal = elem as PdfReal; double width = PdfElementUtils.GetReal(elem, 500);
if (widthReal != null) if (width < 0.0001f && width > -0.0001f) { width = 500; }
{ _widths.Add(actualChar, width / glyphSpaceToTextSpace);
_widths.Add(actualChar, widthReal.Value / glyphSpaceToTextSpace);
actualChar++; actualChar++;
continue;
}
PdfInteger widthInt = elem as PdfInteger;
if (widthInt != null)
{
_widths.Add(actualChar, widthInt.Value / glyphSpaceToTextSpace);
actualChar++;
continue;
}
} }
// FIMXE: Calculate real height // FIMXE: Calculate real height
} }
else
private void SetBaseFontSizes(string baseFont)
{ {
string baseFont = _baseData.GetParamAsString("BaseFont");
if (baseFont == "Times-Roman") if (baseFont == "Times-Roman")
{ {
_widths = PdfStandar14FontMetrics.Times_Roman.Widths; _widths = PdfStandar14FontMetrics.Times_Roman.Widths;
@@ -153,7 +168,6 @@ namespace VAR.PdfTools
_height = PdfStandar14FontMetrics.ZapfDingbats.ApproxHeight; _height = PdfStandar14FontMetrics.ZapfDingbats.ApproxHeight;
} }
} }
}
#endregion #endregion
@@ -163,7 +177,7 @@ namespace VAR.PdfTools
{ {
if (_toUnicode == null) if (_toUnicode == null)
{ {
// FIXME: use standar tables // TODO: PdfFont.ToUnicode: use standar tables
return new string(character, 1); return new string(character, 1);
} }
@@ -177,15 +191,23 @@ namespace VAR.PdfTools
public double GetCharWidth(char character) public double GetCharWidth(char character)
{ {
double charWidth = 0;
if (_widths == null) if (_widths == null)
{ {
return 0; return charWidth;
} }
if (_widths.ContainsKey(character)) if (_widths.ContainsKey(character))
{ {
return _widths[character]; charWidth = _widths[character];
} }
return 0;
// NOTE: Convert "Zero" to default width of 0.5
if (charWidth <= 0.0001)
{
charWidth = 0.5;
}
return charWidth;
} }
#endregion #endregion

View File

@@ -0,0 +1,210 @@
using System;
using System.Drawing;
using System.Drawing.Drawing2D;
using System.Drawing.Imaging;
using VAR.PdfTools.Maths;
namespace VAR.PdfTools
{
public class PdfPageRenderer
{
private PdfDocumentPage _page;
private PdfTextExtractor _pdfTextExtractor;
private Rect _pageRect;
private int _pageWidth;
private int _pageHeight;
private int _scale = 10;
private const int MaxSize = 10000;
public PdfTextExtractor Extractor { get { return _pdfTextExtractor; } }
public PdfPageRenderer(PdfDocumentPage page)
{
_page = page;
_pdfTextExtractor = new PdfTextExtractor(_page);
InitPage();
}
public PdfPageRenderer(PdfTextExtractor pdfTextExtractor)
{
_pdfTextExtractor = pdfTextExtractor;
_page = pdfTextExtractor.Page;
InitPage();
}
private void InitPage()
{
_pageRect = _pdfTextExtractor.GetRect();
_pageWidth = (int)Math.Ceiling(_pageRect.XMax - _pageRect.XMin);
_pageHeight = (int)Math.Ceiling(_pageRect.YMax - _pageRect.YMin);
while ((_pageWidth * _scale) > MaxSize) { _scale--; }
while ((_pageHeight * _scale) > MaxSize) { _scale--; }
if (_scale <= 0) { _scale = 1; }
}
public Bitmap Render()
{
if (_pdfTextExtractor.Elements.Count == 0)
{
// Nothing to render
Bitmap emptyBmp = new Bitmap(100, 200, PixelFormat.Format32bppArgb);
using (Graphics gcEmpty = Graphics.FromImage(emptyBmp))
gcEmpty.Clear(Color.White);
return emptyBmp;
}
// Prepare image
Bitmap bmp = new Bitmap(_pageWidth * _scale, _pageHeight * _scale, PixelFormat.Format32bppArgb);
Graphics gc = Graphics.FromImage(bmp);
gc.Clear(Color.White);
// Draw text elements of the page
using (Pen penTextElem = new Pen(Color.Blue))
using (Pen penCharElem = new Pen(Color.Navy))
{
foreach (PdfTextElement textElement in _pdfTextExtractor.Elements)
{
DrawTextElement(textElement, gc, penTextElem, penCharElem, _scale, _pageHeight, _pageRect.XMin, _pageRect.YMin, Brushes.Black);
}
}
gc.Dispose();
return bmp;
}
public Bitmap RenderColumn(PdfTextElementColumn columnData, Bitmap bmp = null)
{
Graphics gc;
if (bmp == null)
{
bmp = new Bitmap(_pageWidth * _scale, _pageHeight * _scale, PixelFormat.Format32bppArgb);
gc = Graphics.FromImage(bmp);
gc.Clear(Color.White);
}
else
{
gc = Graphics.FromImage(bmp);
}
// Draw text elements of the column header
using (Pen penTextElem = new Pen(Color.Green))
using (Pen penCharElem = new Pen(Color.DarkGreen))
{
DrawTextElement(columnData.HeadTextElement, gc, penTextElem, penCharElem, _scale, _pageHeight, _pageRect.XMin, _pageRect.YMin, Brushes.Olive);
}
// Draw text elements of the column
using (Pen penTextElem = new Pen(Color.Red))
using (Pen penCharElem = new Pen(Color.DarkRed))
{
foreach (PdfTextElement textElement in columnData.Elements)
{
DrawTextElement(textElement, gc, penTextElem, penCharElem, _scale, _pageHeight, _pageRect.XMin, _pageRect.YMin, Brushes.OrangeRed);
}
}
// Draw column extents
using (Pen penColumn = new Pen(Color.Red))
{
float y = (float)(_pageRect.YMax - columnData.Y);
float x1 = (float)(columnData.X1 - _pageRect.XMin);
float x2 = (float)(columnData.X2 - _pageRect.XMin);
gc.DrawLine(penColumn, x1 * _scale, y * _scale, x2 * _scale, y * _scale);
gc.DrawLine(penColumn, x1 * _scale, y * _scale, x1 * _scale, _pageHeight * _scale);
gc.DrawLine(penColumn, x2 * _scale, y * _scale, x2 * _scale, _pageHeight * _scale);
}
gc.Dispose();
return bmp;
}
private static void DrawTextElement(PdfTextElement textElement, Graphics gc, Pen penTextElem, Pen penCharElem, int scale, int pageHeight, double pageXMin, double pageYMin, Brush brushText)
{
if (textElement == null) { return; }
double textElementX = textElement.GetX() - pageXMin;
double textElementY = textElement.GetY() - pageYMin;
double textElementWidth = textElement.VisibleWidth;
double textElementHeight = textElement.VisibleHeight;
string textElementText = textElement.VisibleText;
string textElementFontName = (textElement.Font == null ? string.Empty : textElement.Font.Name);
if (textElementHeight < 0.0001) { return; }
double textElementPageX = textElementX;
double textElementPageY = pageHeight - textElementY;
if (penTextElem != null)
{
DrawRoundedRectangle(gc, penTextElem,
(int)(textElementPageX * scale),
(int)(textElementPageY * scale),
(int)(textElementWidth * scale),
(int)(textElementHeight * scale),
5);
}
using (Font font = new Font("Arial", (int)(textElementHeight * scale), GraphicsUnit.Pixel))
{
foreach (PdfCharElement c in textElement.Characters)
{
gc.DrawString(c.Char,
font,
brushText,
(int)((textElementPageX + c.Displacement) * scale),
(int)(textElementPageY * scale));
if (penCharElem != null)
{
DrawRoundedRectangle(gc, penCharElem,
(int)((textElementPageX + c.Displacement) * scale),
(int)(textElementPageY * scale),
(int)(c.Width * scale),
(int)(textElementHeight * scale),
5);
}
}
}
}
public static GraphicsPath RoundedRect(int x, int y, int width, int height, int radius)
{
int diameter = radius * 2;
Size size = new Size(diameter, diameter);
Rectangle arc = new Rectangle(x, y, diameter, diameter);
GraphicsPath path = new GraphicsPath();
// top left arc
path.AddArc(arc, 180, 90);
// top right arc
arc.X = (x + width) - diameter;
path.AddArc(arc, 270, 90);
// bottom right arc
arc.Y = (y + height) - diameter;
path.AddArc(arc, 0, 90);
// bottom left arc
arc.X = x;
path.AddArc(arc, 90, 90);
path.CloseFigure();
return path;
}
public static void DrawRoundedRectangle(Graphics graphics, Pen pen, int x, int y, int width, int height, int cornerRadius)
{
if (graphics == null)
throw new ArgumentNullException("graphics");
if (pen == null)
throw new ArgumentNullException("pen");
using (GraphicsPath path = RoundedRect(x, y, width, height, cornerRadius))
{
graphics.DrawPath(pen, path);
}
}
}
}

View File

@@ -4,6 +4,7 @@ using System.Globalization;
using System.IO; using System.IO;
using System.Linq; using System.Linq;
using System.Text; using System.Text;
using VAR.PdfTools.PdfElements;
namespace VAR.PdfTools namespace VAR.PdfTools
{ {
@@ -851,7 +852,7 @@ namespace VAR.PdfTools
string token = ParseToken(); string token = ParseToken();
if (token == "startxref") if (token == "startxref")
{ {
// FIXME: Ignoring startxref for now // TODO: PdfParser: Ignoring startxref for now
SkipEndOfLine(); SkipEndOfLine();
SkipToEndOfLine(); SkipToEndOfLine();
SkipEndOfLine(); SkipEndOfLine();
@@ -862,7 +863,7 @@ namespace VAR.PdfTools
} }
if (token == "xref") if (token == "xref")
{ {
// FIXME: Ignoring xref for now // TODO: PdfParser: Ignoring xref for now
SkipToEndOfLine(); SkipToEndOfLine();
SkipEndOfLine(); SkipEndOfLine();
do do
@@ -890,7 +891,7 @@ namespace VAR.PdfTools
} }
if (token == "trailer") if (token == "trailer")
{ {
// FIXME: Ignoring trailer for now // TODO: PdfParser: Ignoring trailer for now
SkipEndOfLine(); SkipEndOfLine();
ParseElement(); ParseElement();
SkipWhitespace(); SkipWhitespace();
@@ -1021,6 +1022,20 @@ namespace VAR.PdfTools
PdfContentAction action = new PdfContentAction(token, elems); PdfContentAction action = new PdfContentAction(token, elems);
elems = new List<IPdfElement>(); elems = new List<IPdfElement>();
actions.Add(action); actions.Add(action);
if (action.Token == "ID")
{
// Embbed inline image
byte lineFeed = 0x0A;
byte carriageReturn = 0x0D;
long distToObject = MeasureToMarkers(new char[][] {
new char[] {(char)lineFeed, 'E', 'I'},
new char[] {(char)carriageReturn, (char)lineFeed, 'E', 'I'},
});
byte[] imageBody = GetRawData(distToObject);
SkipEndOfLine();
string endToken = ParseToken();
action.Parameters.Add(new PdfStream { OriginalData = imageBody, });
}
} }
} while (IsEndOfStream() == false); } while (IsEndOfStream() == false);
return actions; return actions;

View File

@@ -0,0 +1,149 @@
using System.Collections.Generic;
using System.Linq;
using VAR.PdfTools.Maths;
namespace VAR.PdfTools
{
public struct PdfCharElement
{
public string Char;
public double Displacement;
public double Width;
}
public class PdfTextElement
{
#region Properties
public PdfFont Font { get; set; }
public double FontSize { get; set; }
public Matrix3x3 Matrix { get; set; }
public string RawText { get; set; }
public string VisibleText { get; set; }
public double VisibleWidth { get; set; }
public double VisibleHeight { get; set; }
public List<PdfCharElement> Characters { get; set; }
public List<PdfTextElement> Childs { get; set; }
#endregion
#region Public methods
public double GetX()
{
return Matrix.Matrix[0, 2];
}
public double GetY()
{
return Matrix.Matrix[1, 2];
}
public PdfTextElement SubPart(int startIndex, int endIndex)
{
PdfTextElement blockElem = new PdfTextElement
{
Font = null,
FontSize = FontSize,
Matrix = Matrix.Copy(),
RawText = RawText.Substring(startIndex, endIndex - startIndex),
VisibleText = VisibleText.Substring(startIndex, endIndex - startIndex),
VisibleWidth = 0,
VisibleHeight = VisibleHeight,
Characters = new List<PdfCharElement>(),
Childs = new List<PdfTextElement>(),
};
double displacement = Characters[startIndex].Displacement;
blockElem.Matrix.Matrix[0, 2] += displacement;
for (int j = startIndex; j < endIndex; j++)
{
blockElem.Characters.Add(new PdfCharElement
{
Char = Characters[j].Char,
Displacement = Characters[j].Displacement - displacement,
Width = Characters[j].Width,
});
}
PdfCharElement lastChar = blockElem.Characters[blockElem.Characters.Count - 1];
blockElem.VisibleWidth = lastChar.Displacement + lastChar.Width;
foreach (PdfTextElement elem in Childs)
{
blockElem.Childs.Add(elem);
}
return blockElem;
}
public double MaxWidth()
{
return Characters.Average(c => c.Width);
}
public Rect GetRect()
{
double x = GetX();
double y = GetY();
return new Rect
{
XMin = x,
YMax = y,
XMax = x + VisibleWidth,
YMin = y - VisibleHeight,
};
}
public double GetCharacterPreviousSpacing(int index)
{
if (index <= 0) { return 0; }
double previousEnd = Characters[index - 1].Displacement + Characters[index - 1].Width;
double spacing = Characters[index].Displacement - previousEnd;
return spacing;
}
public double GetCharacterPrecedingSpacing(int index)
{
if (index >= (Characters.Count - 1)) { return 0; }
double currentEnd = Characters[index].Displacement + Characters[index].Width;
double spacing = Characters[index + 1].Displacement - currentEnd;
return spacing;
}
#endregion
}
public class PdfTextElementColumn
{
public PdfTextElement HeadTextElement { get; private set; }
public IEnumerable<PdfTextElement> Elements { get; private set; }
public double Y { get; private set; }
public double X1 { get; private set; }
public double X2 { get; private set; }
public static PdfTextElementColumn Empty { get; } = new PdfTextElementColumn();
private PdfTextElementColumn()
{
Elements = new List<PdfTextElement>();
}
public PdfTextElementColumn(PdfTextElement head, IEnumerable<PdfTextElement> elements, double y, double x1, double x2)
{
HeadTextElement = head;
Elements = elements;
Y = y;
X1 = x1;
X2 = x2;
}
}
}

View File

@@ -1,164 +1,12 @@
using System.Collections.Generic; using System;
using System.Collections.Generic;
using System.Linq; using System.Linq;
using System.Text; using System.Text;
using VAR.PdfTools.Maths;
using VAR.PdfTools.PdfElements;
namespace VAR.PdfTools namespace VAR.PdfTools
{ {
public class Vector3D
{
#region Declarations
public double[] _vector = new double[3];
#endregion
#region Properties
public double[] Vector { get { return _vector; } }
#endregion
#region Creator
public Vector3D()
{
Init();
}
public void Init()
{
_vector[0] = 0.0;
_vector[1] = 0.0;
_vector[2] = 1.0;
}
#endregion
}
public class Matrix3x3
{
#region Declarations
public double[,] _matrix = new double[3, 3];
#endregion
#region Properties
public double[,] Matrix { get { return _matrix; } }
#endregion
#region Creator
public Matrix3x3()
{
Idenity();
}
#endregion
#region Public methods
public void Idenity()
{
_matrix[0, 0] = 1.0;
_matrix[0, 1] = 0.0;
_matrix[0, 2] = 0.0;
_matrix[1, 0] = 0.0;
_matrix[1, 1] = 1.0;
_matrix[1, 2] = 0.0;
_matrix[2, 0] = 0.0;
_matrix[2, 1] = 0.0;
_matrix[2, 2] = 1.0;
}
public Vector3D Multiply(Vector3D vect)
{
Vector3D vectResult = new Vector3D();
vectResult.Vector[0] = (vect.Vector[0] * _matrix[0, 0]) + (vect.Vector[1] * _matrix[0, 1]) + (vect.Vector[2] * _matrix[0, 2]);
vectResult.Vector[1] = (vect.Vector[0] * _matrix[1, 0]) + (vect.Vector[1] * _matrix[1, 1]) + (vect.Vector[2] * _matrix[1, 2]);
vectResult.Vector[2] = (vect.Vector[0] * _matrix[2, 0]) + (vect.Vector[1] * _matrix[2, 1]) + (vect.Vector[2] * _matrix[2, 2]);
return vectResult;
}
public Matrix3x3 Multiply(Matrix3x3 matrix)
{
Matrix3x3 newMatrix = new Matrix3x3();
newMatrix._matrix[0, 0] = (_matrix[0, 0] * matrix._matrix[0, 0]) + (_matrix[1, 0] * matrix._matrix[0, 1]) + (_matrix[2, 0] * matrix._matrix[0, 2]);
newMatrix._matrix[0, 1] = (_matrix[0, 1] * matrix._matrix[0, 0]) + (_matrix[1, 1] * matrix._matrix[0, 1]) + (_matrix[2, 1] * matrix._matrix[0, 2]);
newMatrix._matrix[0, 2] = (_matrix[0, 2] * matrix._matrix[0, 0]) + (_matrix[1, 2] * matrix._matrix[0, 1]) + (_matrix[2, 2] * matrix._matrix[0, 2]);
newMatrix._matrix[1, 0] = (_matrix[0, 0] * matrix._matrix[1, 0]) + (_matrix[1, 0] * matrix._matrix[1, 1]) + (_matrix[2, 0] * matrix._matrix[1, 2]);
newMatrix._matrix[1, 1] = (_matrix[0, 1] * matrix._matrix[1, 0]) + (_matrix[1, 1] * matrix._matrix[1, 1]) + (_matrix[2, 1] * matrix._matrix[1, 2]);
newMatrix._matrix[1, 2] = (_matrix[0, 2] * matrix._matrix[1, 0]) + (_matrix[1, 2] * matrix._matrix[1, 1]) + (_matrix[2, 2] * matrix._matrix[1, 2]);
newMatrix._matrix[2, 0] = (_matrix[0, 0] * matrix._matrix[2, 0]) + (_matrix[1, 0] * matrix._matrix[2, 1]) + (_matrix[2, 0] * matrix._matrix[2, 2]);
newMatrix._matrix[2, 1] = (_matrix[0, 1] * matrix._matrix[2, 0]) + (_matrix[1, 1] * matrix._matrix[2, 1]) + (_matrix[2, 1] * matrix._matrix[2, 2]);
newMatrix._matrix[2, 2] = (_matrix[0, 2] * matrix._matrix[2, 0]) + (_matrix[1, 2] * matrix._matrix[2, 1]) + (_matrix[2, 2] * matrix._matrix[2, 2]);
return newMatrix;
}
public Matrix3x3 Copy()
{
Matrix3x3 newMatrix = new Matrix3x3();
newMatrix._matrix[0, 0] = _matrix[0, 0];
newMatrix._matrix[0, 1] = _matrix[0, 1];
newMatrix._matrix[0, 2] = _matrix[0, 2];
newMatrix._matrix[1, 0] = _matrix[1, 0];
newMatrix._matrix[1, 1] = _matrix[1, 1];
newMatrix._matrix[1, 2] = _matrix[1, 2];
newMatrix._matrix[2, 0] = _matrix[2, 0];
newMatrix._matrix[2, 1] = _matrix[2, 1];
newMatrix._matrix[2, 2] = _matrix[2, 2];
return newMatrix;
}
#endregion
}
public class PdfTextElement
{
#region Properties
public PdfFont Font { get; set; }
public double FontSize { get; set; }
public Matrix3x3 Matrix { get; set; }
public string RawText { get; set; }
public string VisibleText { get; set; }
public double VisibleWidth { get; set; }
public double VisibleHeight { get; set; }
private List<PdfTextElement> _childs = new List<PdfTextElement>();
public List<PdfTextElement> Childs { get { return _childs; } }
#endregion
#region Public methods
public double GetX()
{
return Matrix.Matrix[0, 2];
}
public double GetY()
{
return Matrix.Matrix[1, 2];
}
#endregion
}
public class PdfTextExtractor public class PdfTextExtractor
{ {
#region Declarations #region Declarations
@@ -174,15 +22,17 @@ namespace VAR.PdfTools
// Text state // Text state
private PdfFont _font = null; private PdfFont _font = null;
private double _fontSize = 1; private double _fontSize = 1;
private double _charSpacing = 0;
private double _wordSpacing = 0;
private double _textLeading = 0; private double _textLeading = 0;
// Text object state // Text object state
private bool inText = false; private bool inText = false;
private Matrix3x3 _textMatrix = new Matrix3x3(); private Matrix3x3 _textMatrix = new Matrix3x3();
private Matrix3x3 _textMatrixCurrent = new Matrix3x3();
private StringBuilder _sbText = new StringBuilder(); private StringBuilder _sbText = new StringBuilder();
private double _textWidth = 0; private double _textWidth = 0;
private List<PdfCharElement> _listCharacters = new List<PdfCharElement>();
PdfTextElement _currentTextElement = null;
#endregion #endregion
@@ -199,7 +49,9 @@ namespace VAR.PdfTools
public PdfTextExtractor(PdfDocumentPage page) public PdfTextExtractor(PdfDocumentPage page)
{ {
_page = page; _page = page;
ProcessPage(); ProcessPageContent();
JoinTextElements();
SplitTextElements();
} }
#endregion #endregion
@@ -226,42 +78,26 @@ namespace VAR.PdfTools
PdfTextElement textElem = new PdfTextElement(); PdfTextElement textElem = new PdfTextElement();
textElem.Font = _font; textElem.Font = _font;
textElem.FontSize = _fontSize; textElem.FontSize = _fontSize;
textElem.Matrix = _textMatrix.Multiply(_graphicsMatrix); textElem.Matrix = _textMatrixCurrent.Multiply(_graphicsMatrix);
textElem.RawText = _sbText.ToString(); textElem.RawText = _sbText.ToString();
textElem.VisibleText = PdfString_ToUnicode(textElem.RawText, _font); textElem.VisibleText = PdfString_ToUnicode(textElem.RawText, _font);
textElem.VisibleWidth = _textWidth * textElem.Matrix.Matrix[0, 0]; PdfCharElement lastChar = _listCharacters[_listCharacters.Count - 1];
textElem.VisibleWidth = (lastChar.Displacement + lastChar.Width) * textElem.Matrix.Matrix[0, 0];
textElem.VisibleHeight = (_font.Height * _fontSize) * textElem.Matrix.Matrix[1, 1]; textElem.VisibleHeight = (_font.Height * _fontSize) * textElem.Matrix.Matrix[1, 1];
textElem.Characters = new List<PdfCharElement>();
foreach (PdfCharElement c in _listCharacters)
{
textElem.Characters.Add(new PdfCharElement
{
Char = c.Char,
Displacement = (c.Displacement * textElem.Matrix.Matrix[0, 0]),
Width = (c.Width * textElem.Matrix.Matrix[0, 0]),
});
}
textElem.Childs = new List<PdfTextElement>();
return textElem; return textElem;
} }
private void FlushTextElementSoft()
{
if (_sbText.Length == 0)
{
return;
}
PdfTextElement textElem = BuildTextElement();
if (_currentTextElement == null)
{
_currentTextElement = new PdfTextElement();
_currentTextElement.Font = null;
_currentTextElement.FontSize = -1;
_currentTextElement.Matrix = textElem.Matrix.Copy();
_currentTextElement.RawText = string.Empty;
_currentTextElement.VisibleText = string.Empty;
_currentTextElement.VisibleWidth = 0;
_currentTextElement.VisibleHeight = 0;
}
_currentTextElement.VisibleText += textElem.VisibleText;
_currentTextElement.VisibleWidth += textElem.VisibleWidth;
_currentTextElement.VisibleHeight = System.Math.Max(_currentTextElement.VisibleHeight, textElem.VisibleHeight);
_currentTextElement.Childs.Add(textElem);
_sbText = new StringBuilder();
_textWidth = 0;
}
private void AddTextElement(PdfTextElement textElement) private void AddTextElement(PdfTextElement textElement)
{ {
if (string.IsNullOrEmpty(textElement.VisibleText.Trim())) if (string.IsNullOrEmpty(textElement.VisibleText.Trim()))
@@ -275,27 +111,16 @@ namespace VAR.PdfTools
{ {
if (_sbText.Length == 0) if (_sbText.Length == 0)
{ {
if (_currentTextElement != null) _textWidth = 0;
{
AddTextElement(_currentTextElement);
_currentTextElement = null;
}
return; return;
} }
if (_currentTextElement != null)
{
FlushTextElementSoft();
AddTextElement(_currentTextElement);
_currentTextElement = null;
}
else
{
PdfTextElement textElem = BuildTextElement(); PdfTextElement textElem = BuildTextElement();
AddTextElement(textElem); AddTextElement(textElem);
}
_textMatrixCurrent.Matrix[0, 2] += _textWidth;
_sbText = new StringBuilder(); _sbText = new StringBuilder();
_listCharacters.Clear();
_textWidth = 0; _textWidth = 0;
} }
@@ -330,6 +155,29 @@ namespace VAR.PdfTools
return null; return null;
} }
private List<PdfTextElement> FindElementsContainingText(string text, bool fuzzy)
{
List<PdfTextElement> list = new List<PdfTextElement>();
string matchingText = fuzzy ? SimplifyText(text) : text;
foreach (PdfTextElement elem in _textElements)
{
string elemText = fuzzy ? SimplifyText(elem.VisibleText) : elem.VisibleText;
if (elemText.Contains(matchingText))
{
list.Add(elem);
}
}
return list;
}
private bool TextElementVerticalIntersection(PdfTextElement elem1, double elem2X1, double elem2X2)
{
double elem1X1 = elem1.GetX();
double elem1X2 = elem1.GetX() + elem1.VisibleWidth;
return elem1X2 >= elem2X1 && elem2X2 >= elem1X1;
}
private bool TextElementVerticalIntersection(PdfTextElement elem1, PdfTextElement elem2) private bool TextElementVerticalIntersection(PdfTextElement elem1, PdfTextElement elem2)
{ {
double elem1X1 = elem1.GetX(); double elem1X1 = elem1.GetX();
@@ -359,44 +207,47 @@ namespace VAR.PdfTools
_graphicsMatrixStack.Add(_graphicsMatrix.Copy()); _graphicsMatrixStack.Add(_graphicsMatrix.Copy());
} }
private void OpSetGraphMatrix(double a, double b, double c, double d, double e, double f)
{
_graphicsMatrix.Set(a, b, c, d, e, f);
}
private void OpPopGraphState() private void OpPopGraphState()
{ {
_graphicsMatrix = _graphicsMatrixStack[_graphicsMatrixStack.Count - 1]; _graphicsMatrix = _graphicsMatrixStack[_graphicsMatrixStack.Count - 1];
_graphicsMatrixStack.RemoveAt(_graphicsMatrixStack.Count - 1); _graphicsMatrixStack.RemoveAt(_graphicsMatrixStack.Count - 1);
} }
private void OpSetGraphMatrix(double a, double b, double c, double d, double e, double f)
{
_graphicsMatrix.Matrix[0, 0] = a;
_graphicsMatrix.Matrix[1, 0] = b;
_graphicsMatrix.Matrix[2, 0] = 0;
_graphicsMatrix.Matrix[0, 1] = c;
_graphicsMatrix.Matrix[1, 1] = d;
_graphicsMatrix.Matrix[2, 1] = 0;
_graphicsMatrix.Matrix[0, 2] = e;
_graphicsMatrix.Matrix[1, 2] = f;
_graphicsMatrix.Matrix[2, 2] = 1;
}
private void OpBeginText() private void OpBeginText()
{ {
_textMatrix.Idenity(); _textMatrix.Idenity();
_textMatrixCurrent.Idenity();
inText = true; inText = true;
} }
private void OpEndText() private void OpEndText()
{ {
FlushTextElementSoft(); FlushTextElement();
inText = false; inText = false;
} }
private void OpTextFont(string fontName, double size) private void OpTextFont(string fontName, double size)
{ {
FlushTextElementSoft(); FlushTextElement();
_font = _page.Fonts[fontName]; _font = _page.Fonts[fontName];
_fontSize = size; _fontSize = size;
} }
private void OpTextCharSpacing(double charSpacing)
{
_charSpacing = charSpacing;
}
private void OpTextWordSpacing(double wordSpacing)
{
_wordSpacing = wordSpacing;
}
private void OpTextLeading(double textLeading) private void OpTextLeading(double textLeading)
{ {
_textLeading = textLeading; _textLeading = textLeading;
@@ -409,6 +260,7 @@ namespace VAR.PdfTools
newMatrix.Matrix[0, 2] = x; newMatrix.Matrix[0, 2] = x;
newMatrix.Matrix[1, 2] = y; newMatrix.Matrix[1, 2] = y;
_textMatrix = newMatrix.Multiply(_textMatrix); _textMatrix = newMatrix.Multiply(_textMatrix);
_textMatrixCurrent = _textMatrix.Copy();
} }
private void OpTextLineFeed() private void OpTextLineFeed()
@@ -418,16 +270,10 @@ namespace VAR.PdfTools
private void OpSetTextMatrix(double a, double b, double c, double d, double e, double f) private void OpSetTextMatrix(double a, double b, double c, double d, double e, double f)
{ {
Matrix3x3 newMatrix = new Matrix3x3(a, b, c, d, e, f);
FlushTextElement(); FlushTextElement();
_textMatrix.Matrix[0, 0] = a; _textMatrix = newMatrix;
_textMatrix.Matrix[1, 0] = b; _textMatrixCurrent = _textMatrix.Copy();
_textMatrix.Matrix[2, 0] = 0;
_textMatrix.Matrix[0, 1] = c;
_textMatrix.Matrix[1, 1] = d;
_textMatrix.Matrix[2, 1] = 0;
_textMatrix.Matrix[0, 2] = e;
_textMatrix.Matrix[1, 2] = f;
_textMatrix.Matrix[2, 2] = 1;
} }
private void OpTextPut(string text) private void OpTextPut(string text)
@@ -438,7 +284,12 @@ namespace VAR.PdfTools
{ {
foreach (char c in text) foreach (char c in text)
{ {
_textWidth += _font.GetCharWidth(c) * _fontSize; string realChar = _font.ToUnicode(c);
if (realChar == "\0") { continue; }
double charWidth = _font.GetCharWidth(c) * _fontSize;
_listCharacters.Add(new PdfCharElement { Char = _font.ToUnicode(c), Displacement = _textWidth, Width = charWidth });
_textWidth += charWidth;
_textWidth += ((c == 0x20) ? _wordSpacing : _charSpacing);
} }
} }
} }
@@ -455,7 +306,7 @@ namespace VAR.PdfTools
else if (elem is PdfInteger || elem is PdfReal) else if (elem is PdfInteger || elem is PdfReal)
{ {
double spacing = PdfElementUtils.GetReal(elem, 0); double spacing = PdfElementUtils.GetReal(elem, 0);
_textWidth += spacing; _textWidth -= (spacing / 1000) * _fontSize;
} }
else if (elem is PdfArray) else if (elem is PdfArray)
{ {
@@ -468,11 +319,17 @@ namespace VAR.PdfTools
#region Private methods #region Private methods
private void ProcessPage() private void ProcessPageContent()
{ {
foreach (PdfContentAction action in _page.ContentActions) int unknowCount = 0;
int lineCount = 0;
int strokeCount = 0;
int pathCount = 0;
for (int i = 0; i < _page.ContentActions.Count; i++)
{ {
// Graphics Operations PdfContentAction action = _page.ContentActions[i];
// Special graphics state
if (action.Token == "q") if (action.Token == "q")
{ {
OpPushGraphState(); OpPushGraphState();
@@ -503,19 +360,21 @@ namespace VAR.PdfTools
} }
else if (action.Token == "Tc") else if (action.Token == "Tc")
{ {
// FIXME: Char spacing double charSpacing = PdfElementUtils.GetReal(action.Parameters[0], 0);
OpTextCharSpacing(charSpacing);
} }
else if (action.Token == "Tw") else if (action.Token == "Tw")
{ {
// FIXME: Word spacing double wordSpacing = PdfElementUtils.GetReal(action.Parameters[0], 0);
OpTextWordSpacing(wordSpacing);
} }
else if (action.Token == "Tz") else if (action.Token == "Tz")
{ {
// FIXME: Horizontal Scale // TODO: PdfTextExtractor: Horizontal Scale
} }
else if (action.Token == "Tf") else if (action.Token == "Tf")
{ {
string fontName = ((PdfName)action.Parameters[0]).Value; string fontName = PdfElementUtils.GetString(action.Parameters[0], string.Empty);
double fontSize = PdfElementUtils.GetReal(action.Parameters[1], 0); double fontSize = PdfElementUtils.GetReal(action.Parameters[1], 0);
OpTextFont(fontName, fontSize); OpTextFont(fontName, fontSize);
} }
@@ -526,11 +385,11 @@ namespace VAR.PdfTools
} }
else if (action.Token == "Tr") else if (action.Token == "Tr")
{ {
// FIXME: Rendering mode // TODO: PdfTextExtractor: Rendering mode
} }
else if (action.Token == "Ts") else if (action.Token == "Ts")
{ {
// FIXME: Text rise // TODO: PdfTextExtractor: Text rise
} }
else if (action.Token == "Td") else if (action.Token == "Td")
{ {
@@ -561,44 +420,266 @@ namespace VAR.PdfTools
} }
else if (action.Token == "Tj") else if (action.Token == "Tj")
{ {
OpTextPut(((PdfString)action.Parameters[0]).Value); string text = PdfElementUtils.GetString(action.Parameters[0], string.Empty);
OpTextPut(text);
} }
else if (action.Token == "'") else if (action.Token == "'")
{ {
string text = PdfElementUtils.GetString(action.Parameters[0], string.Empty);
OpTextLineFeed(); OpTextLineFeed();
OpTextPut(((PdfString)action.Parameters[0]).Value); OpTextPut(text);
} }
else if (action.Token == "\"") else if (action.Token == "\"")
{ {
double wordSpacing = PdfElementUtils.GetReal(action.Parameters[0], 0); double wordSpacing = PdfElementUtils.GetReal(action.Parameters[0], 0);
double charSpacing = PdfElementUtils.GetReal(action.Parameters[1], 0); double charSpacing = PdfElementUtils.GetReal(action.Parameters[1], 0);
OpTextPut(((PdfString)action.Parameters[2]).Value); string text = PdfElementUtils.GetString(action.Parameters[0], string.Empty);
OpTextCharSpacing(charSpacing);
OpTextWordSpacing(wordSpacing);
OpTextPut(text);
} }
else if (action.Token == "TJ") else if (action.Token == "TJ")
{ {
OpTextPutMultiple(((PdfArray)action.Parameters[0])); OpTextPutMultiple(((PdfArray)action.Parameters[0]));
} }
else if (action.Token == "re")
{
// TODO: PdfTextExtractor: Interpret this
}
else if (action.Token == "f")
{
// TODO: PdfTextExtractor: Interpret this
}
else if (action.Token == "g")
{
// TODO: PdfTextExtractor: Interpret this
}
else if (action.Token == "rg")
{
// TODO: PdfTextExtractor: Interpret this
}
else if (action.Token == "BI")
{
// TODO: PdfTextExtractor: Interpret this
}
else if (action.Token == "ID")
{
// TODO: PdfTextExtractor: Interpret this
}
else if (action.Token == "EI")
{
// TODO: PdfTextExtractor: Interpret this
}
else if (action.Token == "W")
{
// TODO: PdfTextExtractor: Interpret this
}
else if (action.Token == "n")
{
// TODO: PdfTextExtractor: Interpret this
}
else if (action.Token == "Do")
{
// TODO: PdfTextExtractor: Interpret this
}
else if (action.Token == "m")
{
// TODO: PdfTextExtractor: Interpret this "moveto: Begin new subpath"
}
else if (action.Token == "l")
{
// TODO: PdfTextExtractor: Interpret this "lineto: Append straight line segment to path"
lineCount++;
}
else if (action.Token == "h")
{
// TODO: PdfTextExtractor: Interpret this "closepath: Close subpath"
pathCount++;
}
else if (action.Token == "W")
{
// TODO: PdfTextExtractor: Interpret this "clip: Set clipping path using nonzero winding number rule"
}
else if (action.Token == "W*")
{
// TODO: PdfTextExtractor: Interpret this "eoclip: Set clipping path using even-odd rule"
}
else if (action.Token == "w")
{
// TODO: PdfTextExtractor: Interpret this "setlinewidth: Set line width"
}
else if (action.Token == "G")
{
// TODO: PdfTextExtractor: Interpret this "setgray: Set gray level for stroking operations"
}
else if (action.Token == "S")
{
// TODO: PdfTextExtractor: Interpret this "stroke: Stroke path"
strokeCount++;
}
else if (action.Token == "M")
{
// TODO: PdfTextExtractor: Interpret this "setmiterlimit: Set miter limit"
}
else
{
unknowCount++;
}
} }
FlushTextElement(); FlushTextElement();
} }
private void JoinTextElements()
{
var textElementsCondensed = new List<PdfTextElement>();
while (_textElements.Count > 0)
{
PdfTextElement elem = _textElements[0];
_textElements.Remove(elem);
double blockY = elem.GetY();
double blockXMin = elem.GetX();
double blockXMax = blockXMin + elem.VisibleWidth;
// Prepare first neighbour
var textElementNeighbours = new List<PdfTextElement>();
textElementNeighbours.Add(elem);
// Search Neighbours
int i = 0;
while (i < _textElements.Count)
{
PdfTextElement neighbour = _textElements[i];
if (neighbour.Font != elem.Font || neighbour.FontSize != elem.FontSize)
{
i++;
continue;
}
double neighbourY = neighbour.GetY();
if (Math.Abs(neighbourY - blockY) > 0.001) { i++; continue; }
double maxWidth = neighbour.MaxWidth();
double neighbourXMin = neighbour.GetX();
double neighbourXMax = neighbourXMin + neighbour.VisibleWidth;
double auxBlockXMin = blockXMin - maxWidth;
double auxBlockXMax = blockXMax + maxWidth;
if (auxBlockXMax >= neighbourXMin && neighbourXMax >= auxBlockXMin)
{
_textElements.Remove(neighbour);
textElementNeighbours.Add(neighbour);
if (blockXMax < neighbourXMax) { blockXMax = neighbourXMax; }
if (blockXMin > neighbourXMin) { blockXMin = neighbourXMin; }
i = 0;
continue;
}
i++;
}
if (textElementNeighbours.Count == 1)
{
textElementsCondensed.Add(elem);
continue;
}
// Join neighbours
var chars = new List<PdfCharElement>();
foreach (PdfTextElement neighbour in textElementNeighbours)
{
double neighbourXMin = neighbour.GetX();
foreach (PdfCharElement c in neighbour.Characters)
{
chars.Add(new PdfCharElement
{
Char = c.Char,
Displacement = (c.Displacement + neighbourXMin) - blockXMin,
Width = c.Width,
});
}
}
chars = chars.OrderBy(c => c.Displacement).ToList();
var sbText = new StringBuilder();
foreach (PdfCharElement c in chars)
{
sbText.Append(c.Char);
}
PdfTextElement blockElem = new PdfTextElement
{
Font = null,
FontSize = elem.FontSize,
Matrix = elem.Matrix.Copy(),
RawText = sbText.ToString(),
VisibleText = sbText.ToString(),
VisibleWidth = blockXMax - blockXMin,
VisibleHeight = elem.VisibleHeight,
Characters = chars,
Childs = textElementNeighbours,
};
blockElem.Matrix.Matrix[0, 2] = blockXMin;
textElementsCondensed.Add(blockElem);
}
_textElements = textElementsCondensed;
}
private void SplitTextElements()
{
var textElementsSplitted = new List<PdfTextElement>();
while (_textElements.Count > 0)
{
PdfTextElement elem = _textElements[0];
_textElements.Remove(elem);
double maxWidth = elem.MaxWidth();
int prevBreak = 0;
for (int i = 1; i < elem.Characters.Count; i++)
{
double prevCharEnd = elem.Characters[i - 1].Displacement + elem.Characters[i - 1].Width;
double charSeparation = elem.Characters[i].Displacement - prevCharEnd;
if (charSeparation > maxWidth)
{
PdfTextElement partElem = elem.SubPart(prevBreak, i);
textElementsSplitted.Add(partElem);
prevBreak = i;
}
}
if (prevBreak == 0)
{
textElementsSplitted.Add(elem);
continue;
}
PdfTextElement lastElem = elem.SubPart(prevBreak, elem.Characters.Count);
textElementsSplitted.Add(lastElem);
}
_textElements = textElementsSplitted;
}
#endregion #endregion
#region Public methods #region Public methods
public List<string> GetColumn(string column) public Rect GetRect()
{ {
return GetColumn(column, true); Rect rect = null;
foreach (PdfTextElement textElement in _textElements)
{
Rect elementRect = textElement.GetRect();
if (rect == null) { rect = elementRect; }
rect.Add(elementRect);
}
return rect;
} }
public List<string> GetColumn(string column, bool fuzzy) public PdfTextElementColumn GetColumn(string column, bool fuzzy = true)
{ {
PdfTextElement columnHead = FindElementByText(column, fuzzy); PdfTextElement columnHead = FindElementByText(column, fuzzy);
if (columnHead == null) if (columnHead == null)
{ {
return new List<string>(); return PdfTextElementColumn.Empty;
} }
double headY = columnHead.GetY(); double headY = columnHead.GetY() - columnHead.VisibleHeight;
double headX1 = columnHead.GetX(); double headX1 = columnHead.GetX();
double headX2 = headX1 + columnHead.VisibleWidth; double headX2 = headX1 + columnHead.VisibleWidth;
@@ -626,14 +707,20 @@ namespace VAR.PdfTools
extentX2 = elemX1; extentX2 = elemX1;
} }
} }
} }
PdfTextElementColumn columnData = GetColumn(columnHead, headY, headX1, headX2, extentX1, extentX2);
return columnData;
}
public PdfTextElementColumn GetColumn(PdfTextElement columnHead, double headY, double headX1, double headX2, double extentX1, double extentX2)
{
// Get all the elements that intersects vertically, are down and sort results // Get all the elements that intersects vertically, are down and sort results
var columnDataRaw = new List<PdfTextElement>(); var columnDataRaw = new List<PdfTextElement>();
foreach (PdfTextElement elem in _textElements) foreach (PdfTextElement elem in _textElements)
{ {
if (TextElementVerticalIntersection(columnHead, elem) == false) { continue; } if (TextElementVerticalIntersection(elem, headX1, headX2) == false) { continue; }
// Only intems down the column // Only intems down the column
double elemY = elem.GetY(); double elemY = elem.GetY();
@@ -643,32 +730,94 @@ namespace VAR.PdfTools
} }
columnDataRaw = columnDataRaw.OrderByDescending(elem => elem.GetY()).ToList(); columnDataRaw = columnDataRaw.OrderByDescending(elem => elem.GetY()).ToList();
// Only items completelly inside extents, amd break on the first element outside // Only items completelly inside extents, try spliting big elements and break on big elements that can't be splitted
var columnData = new List<PdfTextElement>(); var columnElements = new List<PdfTextElement>();
foreach (PdfTextElement elem in columnDataRaw) foreach (PdfTextElement elem in columnDataRaw)
{ {
double elemX1 = elem.GetX(); double elemX1 = elem.GetX();
double elemX2 = elemX1 + elem.VisibleWidth; double elemX2 = elemX1 + elem.VisibleWidth;
if (elemX1 < extentX1 || elemX2 > extentX2) { break; }
columnData.Add(elem); // Add elements completely inside
if (elemX1 > extentX1 && elemX2 < extentX2)
{
columnElements.Add(elem);
continue;
} }
// Try to split elements intersecting extents of the column
double maxSpacing = elem.Characters.Average(c => c.Width) / 10;
int indexStart = 0;
int indexEnd = elem.Characters.Count - 1;
bool indexStartValid = true;
bool indexEndValid = true;
if (elemX1 < extentX1)
{
// Search best start
int index = 0;
double characterPosition = elemX1 + elem.Characters[index].Displacement;
while (characterPosition < extentX1 && index < (elem.Characters.Count - 1))
{
index++;
characterPosition = elemX1 + elem.Characters[index].Displacement;
}
double spacing = elem.GetCharacterPreviousSpacing(index);
while (spacing < maxSpacing && index < (elem.Characters.Count - 1))
{
index++;
spacing = elem.GetCharacterPreviousSpacing(index);
}
if (spacing < maxSpacing) { indexStartValid = false; }
indexStart = index;
}
if (elemX2 > extentX2)
{
// Search best end
int index = elem.Characters.Count - 1;
double characterPosition = elemX1 + elem.Characters[index].Displacement + elem.Characters[index].Width;
while (characterPosition > extentX2 && index > 0)
{
index--;
characterPosition = elemX1 + elem.Characters[index].Displacement + elem.Characters[index].Width;
}
double spacing = elem.GetCharacterPrecedingSpacing(index);
while (spacing < maxSpacing && index > 0)
{
index--;
spacing = elem.GetCharacterPrecedingSpacing(index);
}
if (spacing < maxSpacing) { indexEndValid = false; }
indexEnd = index;
}
// Break when there is no good split, spaning all extent
if (indexStartValid == false && indexEndValid == false) { break; }
// Continue when only one of the sides is invalid. (outside elements intersecting extents of the column)
if (indexStartValid == false || indexEndValid == false) { continue; }
// Add splitted element
columnElements.Add(elem.SubPart(indexStart, indexEnd + 1));
}
var columnData = new PdfTextElementColumn(columnHead, columnElements, headY, extentX1, extentX2);
return columnData;
}
public List<string> GetColumnAsStrings(string column, bool fuzzy = true)
{
PdfTextElementColumn columnData = GetColumn(column, fuzzy);
// Emit result // Emit result
var result = new List<string>(); var result = new List<string>();
foreach (PdfTextElement elem in columnData) foreach (PdfTextElement elem in columnData.Elements)
{ {
result.Add(elem.VisibleText); result.Add(elem.VisibleText);
} }
return result; return result;
} }
public string GetField(string field) public string GetFieldAsString(string field, bool fuzzy = true)
{
return GetField(field, true);
}
public string GetField(string field, bool fuzzy)
{ {
PdfTextElement fieldTitle = FindElementByText(field, fuzzy); PdfTextElement fieldTitle = FindElementByText(field, fuzzy);
if (fieldTitle == null) if (fieldTitle == null)
@@ -696,19 +845,10 @@ namespace VAR.PdfTools
return fieldData.OrderBy(elem => elem.GetX()).FirstOrDefault().VisibleText; return fieldData.OrderBy(elem => elem.GetX()).FirstOrDefault().VisibleText;
} }
public bool HasText(string text) public bool HasText(string text, bool fuzzy = true)
{ {
return HasText(text, true); List<PdfTextElement> list = FindElementsContainingText(text, fuzzy);
} return (list.Count > 0);
public bool HasText(string text, bool fuzzy)
{
PdfTextElement fieldTitle = FindElementByText(text, fuzzy);
if (fieldTitle == null)
{
return false;
}
return true;
} }
#endregion #endregion

View File

@@ -1,14 +0,0 @@
using System.Reflection;
using System.Runtime.InteropServices;
[assembly: AssemblyTitle("VAR.PdfTools")]
[assembly: AssemblyDescription("PdfTools Library")]
[assembly: AssemblyConfiguration("")]
[assembly: AssemblyCompany("VAR")]
[assembly: AssemblyProduct("VAR.PdfTools")]
[assembly: AssemblyCopyright("Copyright © VAR 2016")]
[assembly: AssemblyTrademark("")]
[assembly: AssemblyCulture("")]
[assembly: ComVisible(false)]
[assembly: Guid("eb7e003a-6a95-4002-809f-926c7c8a11e9")]
[assembly: AssemblyVersion("1.1.*")]

View File

@@ -1,61 +0,0 @@
<?xml version="1.0" encoding="utf-8"?>
<Project ToolsVersion="14.0" DefaultTargets="Build" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
<Import Project="$(MSBuildExtensionsPath)\$(MSBuildToolsVersion)\Microsoft.Common.props" Condition="Exists('$(MSBuildExtensionsPath)\$(MSBuildToolsVersion)\Microsoft.Common.props')" />
<PropertyGroup>
<Configuration Condition=" '$(Configuration)' == '' ">Debug</Configuration>
<Platform Condition=" '$(Platform)' == '' ">AnyCPU</Platform>
<ProjectGuid>{EB7E003A-6A95-4002-809F-926C7C8A11E9}</ProjectGuid>
<OutputType>Library</OutputType>
<AppDesignerFolder>Properties</AppDesignerFolder>
<RootNamespace>VAR.PdfTools</RootNamespace>
<AssemblyName>VAR.PdfTools</AssemblyName>
<TargetFrameworkVersion>v3.5</TargetFrameworkVersion>
<FileAlignment>512</FileAlignment>
<TargetFrameworkProfile />
</PropertyGroup>
<PropertyGroup Condition=" '$(Configuration)|$(Platform)' == 'Debug|AnyCPU' ">
<DebugSymbols>true</DebugSymbols>
<DebugType>full</DebugType>
<Optimize>false</Optimize>
<OutputPath>bin\Debug\</OutputPath>
<DefineConstants>DEBUG;TRACE</DefineConstants>
<ErrorReport>prompt</ErrorReport>
<WarningLevel>4</WarningLevel>
</PropertyGroup>
<PropertyGroup Condition=" '$(Configuration)|$(Platform)' == 'Release|AnyCPU' ">
<DebugType>pdbonly</DebugType>
<Optimize>true</Optimize>
<OutputPath>bin\Release\</OutputPath>
<DefineConstants>TRACE</DefineConstants>
<ErrorReport>prompt</ErrorReport>
<WarningLevel>4</WarningLevel>
</PropertyGroup>
<ItemGroup>
<Reference Include="System" />
<Reference Include="System.Core" />
<Reference Include="System.Xml.Linq" />
<Reference Include="System.Data.DataSetExtensions" />
<Reference Include="System.Data" />
<Reference Include="System.Xml" />
</ItemGroup>
<ItemGroup>
<Compile Include="PdfContentAction.cs" />
<Compile Include="PdfDocument.cs" />
<Compile Include="PdfDocumentPage.cs" />
<Compile Include="PdfElements.cs" />
<Compile Include="PdfFilters.cs" />
<Compile Include="PdfFont.cs" />
<Compile Include="PdfParser.cs" />
<Compile Include="PdfStandar14FontMetrics.cs" />
<Compile Include="PdfTextExtractor.cs" />
<Compile Include="Properties\AssemblyInfo.cs" />
</ItemGroup>
<Import Project="$(MSBuildToolsPath)\Microsoft.CSharp.targets" />
<!-- To modify your build process, add your task inside one of the targets below and uncomment it.
Other similar extension points exist, see Microsoft.Common.targets.
<Target Name="BeforeBuild">
</Target>
<Target Name="AfterBuild">
</Target>
-->
</Project>

View File

@@ -1,61 +1,30 @@
<?xml version="1.0" encoding="utf-8"?> <Project Sdk="Microsoft.NET.Sdk">
<Project ToolsVersion="14.0" DefaultTargets="Build" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
<Import Project="$(MSBuildExtensionsPath)\$(MSBuildToolsVersion)\Microsoft.Common.props" Condition="Exists('$(MSBuildExtensionsPath)\$(MSBuildToolsVersion)\Microsoft.Common.props')" />
<PropertyGroup> <PropertyGroup>
<Configuration Condition=" '$(Configuration)' == '' ">Debug</Configuration> <TargetFramework>netstandard2.0</TargetFramework>
<Platform Condition=" '$(Platform)' == '' ">AnyCPU</Platform>
<ProjectGuid>{EB7E003A-6A95-4002-809F-926C7C8A11E9}</ProjectGuid>
<OutputType>Library</OutputType> <OutputType>Library</OutputType>
<AppDesignerFolder>Properties</AppDesignerFolder> <IsPackable>true</IsPackable>
<RootNamespace>VAR.PdfTools</RootNamespace> <GeneratePackageOnBuild>true</GeneratePackageOnBuild>
<AssemblyName>VAR.PdfTools</AssemblyName>
<TargetFrameworkVersion>v4.6.1</TargetFrameworkVersion>
<FileAlignment>512</FileAlignment>
<TargetFrameworkProfile />
</PropertyGroup> </PropertyGroup>
<PropertyGroup Condition=" '$(Configuration)|$(Platform)' == 'Debug|AnyCPU' "> <PropertyGroup>
<DebugSymbols>true</DebugSymbols> <PackageId>VAR.PdfTools</PackageId>
<DebugType>full</DebugType> <Title>VAR.PdfTools</Title>
<Optimize>false</Optimize> <Version>1.6.1</Version>
<OutputPath>bin\Debug\</OutputPath> <Description>PdfTools Library</Description>
<DefineConstants>DEBUG;TRACE</DefineConstants> <Authors>VAR</Authors>
<ErrorReport>prompt</ErrorReport> <Company>VAR</Company>
<WarningLevel>4</WarningLevel> <Copyright>Copyright © VAR 2016-2019</Copyright>
</PropertyGroup> <RequireLicenseAcceptance>false</RequireLicenseAcceptance>
<PropertyGroup Condition=" '$(Configuration)|$(Platform)' == 'Release|AnyCPU' "> <PackageLicenseFile>LICENSE.txt</PackageLicenseFile>
<DebugType>pdbonly</DebugType> <PackageProjectUrl>https://github.com/Kableado/VAR.PdfTools</PackageProjectUrl>
<Optimize>true</Optimize> <PackageTags>PDF;PDF Library</PackageTags>
<OutputPath>bin\Release\</OutputPath>
<DefineConstants>TRACE</DefineConstants>
<ErrorReport>prompt</ErrorReport>
<WarningLevel>4</WarningLevel>
</PropertyGroup> </PropertyGroup>
<ItemGroup> <ItemGroup>
<Reference Include="System" /> <Content Include="..\LICENSE.txt" Link="LICENSE.txt" Pack="true" PackagePath="" />
<Reference Include="System.Core" />
<Reference Include="System.Xml.Linq" />
<Reference Include="System.Data.DataSetExtensions" />
<Reference Include="System.Data" />
<Reference Include="System.Xml" />
</ItemGroup> </ItemGroup>
<ItemGroup> <ItemGroup>
<Compile Include="PdfContentAction.cs" /> <PackageReference Include="System.Drawing.Common" Version="5.0.2" />
<Compile Include="PdfDocument.cs" />
<Compile Include="PdfDocumentPage.cs" />
<Compile Include="PdfElements.cs" />
<Compile Include="PdfFilters.cs" />
<Compile Include="PdfFont.cs" />
<Compile Include="PdfParser.cs" />
<Compile Include="PdfStandar14FontMetrics.cs" />
<Compile Include="PdfTextExtractor.cs" />
<Compile Include="Properties\AssemblyInfo.cs" />
</ItemGroup> </ItemGroup>
<Import Project="$(MSBuildToolsPath)\Microsoft.CSharp.targets" /> <Target Name="CopyPackage" AfterTargets="Pack">
<!-- To modify your build process, add your task inside one of the targets below and uncomment it. <Copy SourceFiles="$(OutputPath)..\$(PackageId).$(PackageVersion).nupkg" DestinationFolder="Nuget\" />
Other similar extension points exist, see Microsoft.Common.targets.
<Target Name="BeforeBuild">
</Target> </Target>
<Target Name="AfterBuild">
</Target>
-->
</Project> </Project>