Nordlund, Eric
2014-07-31 05:20:46 UTC
Hello docbook-apps.
I have a large set of projects that I am looking to scrub for unused graphics and XML files prior to sending off to localization.
Some of my colleagues have created some very basic bash and batch scripts to scan through the folders and find files that arent referenced in any of the source files so we can delete them, but I worry that these scripts dont catch everything (unused XML files in the base directory that reference images will bless this images) and we could still have extraneous files left over or accidentally delete important ones unknowingly.
Each project has a book.xml file that is the gold master for the outputs. If the book.xml file or any of its includes doesnt reference a file in the project, its safe to delete. I was hoping that I could use xmllint to tell me which files are loaded when I try to validate the book.xml, but I havent found the magic formula yet.
Ive tried the following command to reference all of the loaded files during a pass, but it doesnt seem to list the image files referenced, which is mostly the point of this exercise, and I get a lot of noise from the module files for the DTD on every include.
$ xmllint --load-trace book.xml --xinclude --noout &> test1
Has anyone had a similar problem to solve? Am I going about this the right way?
Thanks, and Im open to any suggestion. If bash and xmllint dont work here, I am partial to Python as an alternative. Just saying.
Eric Nordlund
Senior Technical Writer
Amazon Web Services
Ph: 206-266-8048 | ***@amazon.com<applewebdata://542D1E87-0A8D-4B5A-A2DC-DE8204C46879/***@amazon.com>
[Description: Description: New Picture]
I have a large set of projects that I am looking to scrub for unused graphics and XML files prior to sending off to localization.
Some of my colleagues have created some very basic bash and batch scripts to scan through the folders and find files that arent referenced in any of the source files so we can delete them, but I worry that these scripts dont catch everything (unused XML files in the base directory that reference images will bless this images) and we could still have extraneous files left over or accidentally delete important ones unknowingly.
Each project has a book.xml file that is the gold master for the outputs. If the book.xml file or any of its includes doesnt reference a file in the project, its safe to delete. I was hoping that I could use xmllint to tell me which files are loaded when I try to validate the book.xml, but I havent found the magic formula yet.
Ive tried the following command to reference all of the loaded files during a pass, but it doesnt seem to list the image files referenced, which is mostly the point of this exercise, and I get a lot of noise from the module files for the DTD on every include.
$ xmllint --load-trace book.xml --xinclude --noout &> test1
Has anyone had a similar problem to solve? Am I going about this the right way?
Thanks, and Im open to any suggestion. If bash and xmllint dont work here, I am partial to Python as an alternative. Just saying.
Eric Nordlund
Senior Technical Writer
Amazon Web Services
Ph: 206-266-8048 | ***@amazon.com<applewebdata://542D1E87-0A8D-4B5A-A2DC-DE8204C46879/***@amazon.com>
[Description: Description: New Picture]