c# - Remove Javascript from PDF using iTextSharp -


this seems should quick do, in practice there seems problem. have bunch of pdf forms include form fields , embedded javascript. remove javascript code safely, leave pdf form fields intact.

so far i've been able find lots of solutions, solutions have either eliminated both javascript , form fields, or left both intact.

here's solution a; copies both form fields , javascript:

var pdfreader = new pdfreader(infilename); using (memorystream memorystream = new memorystream()) {     pdfcopyfields copy = new pdfcopyfields(memorystream);     copy.adddocument(pdfreader);     copy.close();     file.writeallbytes(rawfilename, memorystream.toarray()); } 

alternately, have solution b, strips out both form fields , javascript:

document document = new document(); using (memorystream memorystream = new memorystream()) {     pdfwriter writer = pdfwriter.getinstance(document, memorystream);     document.open();     document.adddoclistener(writer);     (int p = 1; p <= pdfreader.numberofpages; p++) {         document.setpagesize(pdfreader.getpagesize(p));         document.newpage();         pdfcontentbyte cb = writer.directcontent;         pdfimportedpage pageimport = writer.getimportedpage(pdfreader, p);         int rot = pdfreader.getpagerotation(p);         if (rot == 90 || rot == 270) {             cb.addtemplate(pageimport, 0, -1.0f, 1.0f, 0, 0, pdfreader.getpagesizewithrotation(p).height);         } else {             cb.addtemplate(pageimport, 1.0f, 0, 0, 1.0f, 0, 0);         }     }     document.close();     file.writeallbytes(rawfile, memorystream.toarray()); } 

does know how modify either solution or b eliminate javascript leave form fields in place?

edit: solution code here!

using (memorystream memorystream = new memorystream()) {     pdfstamper stamper = new pdfstamper(pdfreader, memorystream);     (int = 0; <= pdfreader.xrefsize; i++) {         object o = pdfreader.getpdfobject(i);         pdfdictionary pd = o pdfdictionary;         if (pd != null) {             pd.remove(pdfname.aa);             pd.remove(pdfname.js);             pd.remove(pdfname.javascript);         }     }     stamper.close();     pdfreader.close();     file.writeallbytes(rawfile, memorystream.toarray()); } 

to manipulate single pdf should use class pdfstamper , manipulate contents, in case iterating on existing form fields , removing javascript entries.

the itextsharp sample addjavascripttoform.cs corresponding addjavascripttoform.java chapter 13 of itext in action — 2nd edition shows how javascript actions added fields, central code being:

pdfstamper stamper = new pdfstamper(reader, ms);  acrofields form = stamper.acrofields; acrofields.item fd = form.getfielditem("married");  pdfdictionary dictyes = (pdfdictionary) pdfreader.getpdfobject(fd.getwidgetref(0)); pdfdictionary yesaction = ...; dictyes.put(pdfname.aa, yesaction); 

thus, remove such javascript form field actions have iterate on pdf form fields , remove /aa values in associated dictionaries:

dictxxx.remove(pdfname.aa); 

edit: (provided ted spence) here final code removes javascript while leaving form fields intact:

using (memorystream memorystream = new memorystream()) {     pdfstamper stamper = new pdfstamper(pdfreader, memorystream);     (int = 0; <= pdfreader.xrefsize; i++)     {         pdfdictionary pd = pdfreader.getpdfobject(i) pdfdictionary;         if (pd != null)         {             pd.remove(pdfname.aa); // removes automatic execution objects             pd.remove(pdfname.js); // removes javascript objects             pd.remove(pdfname.javascript); // removes other javascript objects         }     }     stamper.close();     pdfreader.close();     file.writeallbytes(rawfile, memorystream.toarray()); } 

edit: (by mkl) solution above overachieving because touches each , every indirect dictionary object. on other hand ignores inline dictionaries (i haven't checked spec, though; maybe /aa, /js, , /javascript entries appear in dictionaries have indirect objects, or @ least de-referenced code).

if fulfilling task job, try , access objects possibly carrying javascript more specifically.

the advantage of overachieving procedure might be, though, pdf objects inspected not specified carrying javascript in later pdf versions.


Comments