We're in the last days of 2019, and this will be my last blog post this year. What better way to end a good year than with the release of the new PowerShell module. If the title of today's blog post isn't giving it up yet, I wanted to share a PowerShell module called PSWritePDF that can help you create and modify (split/merge) PDF documents. It joins my other PowerShell modules to create different types of documents such as PSWriteWord, PSWriteExcel, PSWriteHTML. I know that PSWriteExcel is relatively basic, but both PSWriteHTML and PSWriteWord deliver robust build capabilities.
PSWritePDF is by no means a finished product. Like with most of my modules, I build some concept that matches view on how I would like it to look, and in the next months, I will probably update its functionality to match my expectations. But just because it isn't finished doesn't mean it's not functional. PSWritePDF is based on NET iText 7 library, and the licensing of it is strictly related to requirements of that library – and that means it's licensed under AGPL. I would be more than happy to make PowerShell part MIT, but I am no licensing expert, and therefore, for now (or forever, it will stay licensed the same way iText 7 is licensed). Since PSWritePDF is based on iText 7 it should be possible with some work to get all that functionality into PowerShell. That means that this module has excellent possibilities when it comes to potential use cases.
For now, I've divided the module functionality into two categories:
Like with all my PowerShell modules, PSWritePDF is published to PowerShellGallery. That means all that you have to do to start working with my module is to install it.
Install-Module PSWritePDF -Force
And if the time comes that i will update it, all you have to do is run:
Update-Module PSWritePDF
The module should work on PowerShell 5.1, PowerShell 6, and PowerShell 7 and work on Windows/Linux and macOS. However, I noticed some issues on PowerShell Core for some PDF files, but it seems to be related to iText 7 or my implementation of it. Not sure what the problem is, but iText 7 running on PowerShell 5.1 seems a bit more stable.
Since PSWritePDF, like most of my modules, are under development most of the time, all sources are published on GitHub. If you want to contribute to this project or want to take a peek at sources, you can do so on GitHub. Please keep in mind that the PowerShellGallery version is optimized and better for production use. If you see any issues, bugs, or features that are missing, please make sure to submit them on GitHub.
After installing PSWritePDF to merge two or more PDF files is as easy as using one command Merge-PDF with two parameters.
$FilePath1 = "$PSScriptRoot\Input\OutputDocument0.pdf" $FilePath2 = "$PSScriptRoot\Input\OutputDocument1.pdf" $OutputFile = "$PSScriptRoot\Output\OutputDocument.pdf" # Shouldn't exist / will be overwritten Merge-PDF -InputFile $FilePath1, $FilePath2 -OutputFile $OutputFile
That's it.
Now that you know how to merge PDF files, it's time to learn how to split them. Right now, I've only implemented split by pages. It means given a file, it will split it into X number of files, where X is a number of pages in PDF.
Split-PDF -FilePath "$PSScriptRoot\SampleToSplit.pdf" -OutputFolder "$PSScriptRoot\Output"
That's it.
Another standalone function allows you to extract text from PDF. Of course, the text has to be computer generated. Sadly, it doesn't do any OCR.
# Get all pages text Convert-PDFToText -FilePath "$PSScriptRoot\Example04.pdf" # Get page 1 text only Convert-PDFToText -FilePath "$PSScriptRoot\Example04.pdf" -Page 1
By using command above, you can extract text from one or more pages.
Creating new PDF files takes a similar approach to what I have built for PSWriteHTML or Documentimo (which will be migrated back to PSWriteWord at some point). It uses DSL (Domain-Specific Language) to help build your document in an easy-to-use way. I've created few basic functions, but surely in future I will try to add more and more of those to make sure it's possible to create feature-rich PDF files.
New-PDF { New-PDFText -Text 'Hello ', 'World' -Font HELVETICA, TIMES_ITALIC -FontColor GRAY, BLUE -FontBold $true, $false, $true New-PDFText -Text 'Testing adding text. ', 'Keep in mind that this works like array.' -Font HELVETICA -FontColor RED New-PDFText -Text 'This text is going by defaults.', ' This will continue...', ' and we can continue working like that.' New-PDFList -Indent 3 { New-PDFListItem -Text 'Test' New-PDFListItem -Text '2nd' } New-PDFText -Text 'Hello ', 'World' -Font HELVETICA, TIMES_ITALIC -FontColor GRAY, BLUE -FontBold $true, $false, $true New-PDFText -Text 'Testing adding text. ', 'Keep in mind that this works like array.' -Font HELVETICA -FontColor RED New-PDFText -Text 'This text is going by defaults.', ' This will continue...', ' and we can continue working like that.' New-PDFList -Indent 3 { New-PDFListItem -Text 'Test' New-PDFListItem -Text '2nd' } } -FilePath "$PSScriptRoot\Example01_Simple.pdf" -Show
What we did above is we created a PDF document, added few texts to it using New-PDFText functions and created a list with 2 bullet points. What's important here is iText 7 brings some constant values for colors, fonts, and other types of styling. Most likely it's possible to expand beyond what is built-in using a different approach, but I didn't have time to play around those options. This means it's very basic in what it can do.
New-PDF -MarginTop 100 { New-PDFPage -PageSize A5 { New-PDFText -Text 'Hello ', 'World' -Font HELVETICA, TIMES_ITALIC -FontColor GRAY, BLUE -FontBold $true, $false, $true New-PDFText -Text 'Testing adding text. ', 'Keep in mind that this works like array.' -Font HELVETICA -FontColor RED New-PDFText -Text 'This text is going by defaults.', ' This will continue...', ' and we can continue working like that.' New-PDFList -Indent 3 { New-PDFListItem -Text 'Test' New-PDFListItem -Text '2nd' } } New-PDFPage -PageSize A4 -Rotate { New-PDFText -Text 'Hello 1', 'World' -Font HELVETICA, TIMES_ITALIC -FontColor GRAY, BLUE -FontBold $true, $false, $true New-PDFText -Text 'Testing adding text. ', 'Keep in mind that this works like array.' -Font HELVETICA -FontColor RED New-PDFText -Text 'This text is going by defaults.', ' This will continue...', ' and we can continue working like that.' New-PDFList -Indent 3 { New-PDFListItem -Text 'Test' New-PDFListItem -Text '2nd' } } } -FilePath "$PSScriptRoot\Example01_WithSections.pdf" -Show
As you can see above, the output from the code gave us two pages with different page sizes and rotations. It's important to understand that while the name of a function is New-PDFPage, it's not exactly a page. It's more of an area or a section. If you had enough text on the first „page”, it would span across multiple pages. New-PDFPage would create a new area starting from another page. Maybe it should be called New-PDFArea but seemed less intuitive. There's also a New-PDFOptions function that allows you to define margins for the whole document, but it isn't necessary. Both New-PDF and New-PDFPage have their margin parameters, making it a bit more direct approach where the margins get applied. As we have seen above, when we used margins for New-PDF, it applied to all pages. However, it's possible to apply margins using New-PDFPage, which can have different margins per each „page”. If you want to control margins for all pages, using them on New-PDF is the best choice.
New-PDF -MarginLeft 120 -MarginRight 20 -MarginTop 20 -MarginBottom 20 -PageSize B4 -Rotate { New-PDFText -Text 'Test ', 'Me', 'Oooh' -FontColor BLUE, YELLOW, RED New-PDFList { New-PDFListItem -Text 'Test' New-PDFListItem -Text '2nd' } } -FilePath "$PSScriptRoot\Example01_MoreOptions.pdf" -Show
Below is another example that shows using Margins on different levels and how they apply.
New-PDF -MarginTop 200 { New-PDFPage -PageSize A5 { New-PDFText -Text 'Hello ', 'World' -Font HELVETICA, TIMES_ITALIC -FontColor GRAY, BLUE -FontBold $true, $false, $true New-PDFText -Text 'Testing adding text. ', 'Keep in mind that this works like array.' -Font HELVETICA -FontColor RED New-PDFText -Text 'This text is going by defaults.', ' This will continue...', ' and we can continue working like that.' New-PDFList -Indent 3 { New-PDFListItem -Text 'Test' New-PDFListItem -Text '2nd' } } New-PDFPage -PageSize A4 -Rotate -MarginLeft 10 -MarginTop 50 { New-PDFText -Text 'Hello 1', 'World' -Font HELVETICA, TIMES_ITALIC -FontColor GRAY, BLUE -FontBold $true, $false, $true New-PDFText -Text 'Testing adding text. ', 'Keep in mind that this works like array.' -Font HELVETICA -FontColor RED New-PDFText -Text 'This text is going by defaults.', ' This will continue...', ' and we can continue working like that.' New-PDFList -Indent 3 { New-PDFListItem -Text 'Test' New-PDFListItem -Text '2nd' } } } -FilePath "$PSScriptRoot\Example01_WithSectionsMargins.pdf" -Show $Document = Get-PDF -FilePath "$PSScriptRoot\Example01_WithSections.pdf" $Details = Get-PDFDetails -Document $Document $Details | Format-List $Details.Pages | Format-Table Close-PDF -Document $Document
You can also notice that I've used additional code below to read the PDF I've just created and read the details of that PDF file.
Here's how the output of Get-PDFDetails look like
$Document = Get-PDF -FilePath "$PSScriptRoot\Example01_WithSections.pdf" $Details = Get-PDFDetails -Document $Document $Details | Format-List $Details.Pages | Format-Table Close-PDF -Document $Document
Notice how there are additional details for pages. You probably also noticed that margins show a bit different story. This is a known issue as I am not sure how to get margins for each page separatly. Hopefully, sooner or later, I'll figure it out, and this gets updated.
While there may not be a lot of functionality yet, I've also added the ability to add tables to PDF. It's as simple as New-PDFTable -DataTable $YourData.
$DataTable1 = @( [PSCustomObject] @{ Test = 'Name'; Test2 = 'Name2'; Test3 = 'Name3' } [PSCustomObject] @{ Test = 'Name'; Test2 = 'Name2'; Test3 = 'Name3' } [PSCustomObject] @{ Test = 'Name'; Test2 = 'Name2'; Test3 = 'Name3' } [PSCustomObject] @{ Test = 'Name'; Test2 = 'Name2'; Test3 = 'Name3' } ) $DataTable2 = @( [ordered] @{ Test = 'Name'; Test2 = 'Name2'; Test3 = 'Name3' } [ordered] @{ Test = 'Name'; Test2 = 'Name2'; Test3 = 'Name3' } [ordered] @{ Test = 'Name'; Test2 = 'Name2'; Test3 = 'Name3' } [ordered] @{ Test = 'Name'; Test2 = 'Name2'; Test3 = 'Name3' } ) New-PDF { New-PDFText -Text 'Hello ', 'World' -Font HELVETICA, TIMES_ITALIC -FontColor GRAY, BLUE -FontBold $true, $false, $true New-PDFText -Text 'Testing adding text. ', 'Keep in mind that this works like array.' -Font HELVETICA -FontColor RED New-PDFText -Text 'This text is going by defaults.', ' This will continue...', ' and we can continue working like that.' New-PDFText -Text 'This table is representation of ', 'PSCustomObject', ' or other', ', ', 'standard types' -FontColor BLACK, RED, BLACK, BLACK, RED -FontBold $false, $true, $false, $false, $true New-PDFTable -DataTable $DataTable1 New-PDFText -Text 'This shows how to create a list' -FontColor MAGENTA New-PDFList -Indent 3 { New-PDFListItem -Text 'Test' New-PDFListItem -Text '2nd' } New-PDFText -Text 'This table is representation of ', 'Hashtable/OrderedDictionary' -FontColor BLACK, BLUE New-PDFTable -DataTable $DataTable2 } -FilePath "$PSScriptRoot\Example06.pdf" -Show
As you can see above I've manually built the data in $DataTable1 and $DataTable2 variables but it should work with just any other data. Again, this is limited in functionality and it's just showing possible options. This will need further enhancement.