Page 1 of 1

Size of Document

PostPosted: Wed Oct 20, 2010 3:55 pm
by jnkarthik
Hello All,
I have a requirement.
We receive a two files.
First file is a binary file which contains lots of customer policies(actually pdf files) which we need to load in to our ECM tool using a ibm utility.
Second file is a index file which tells the index fields in the ECM table and we need to identify the size of the pdf from the first file.

i.e., For example. First file contains 10 Policy documents in binary format in a variable length record.
Second file will contain the list of metadata items for those 10 documents. I need to identify the each document size from the first file and place it in the first file.

I don't know whether this is the right place to raise this question but if some one can guide me how to identify the size of the documents fromt the first file.
If you require more details, please let me know on the same.

Re: Size of Document

PostPosted: Wed Oct 20, 2010 11:41 pm
by NicC
No idea as this is probably a proprietary thing. Suggest you find the documnetation for whatever creates the files and read that to find out the file formats and what is contained in those formats - maybe length is not there in which case you are up a creek with no paddle.

Re: Size of Document

PostPosted: Thu Oct 21, 2010 3:18 am
by dick scherrer
Hello and welcome to the forum,

Are these files received on the mainframe?

Are they used on the mainframe?

If the size is known/found, how will it be used?

Re: Size of Document

PostPosted: Thu Oct 21, 2010 3:21 pm
by jnkarthik
Hello,

Thanks for the response. Yes these files are received on mainframe and they are loaded in to ECM using Mainframe.
The challenge is identifying a pdf start and end from the first file. Just going through google and from my colleagues I found that PDF layouts are having shared resources that means it will have a common fonts, header and footer for the complete document. In that case if i try to split using bytes through my ibm utility I think it will be a corrupted file. I have decided to get back to IBM on this query (We are using IBM ECM tool and IBM utility to load the big binary data in to the tool) and we use HP Dialogue tool for creating pdf files.

Hope the above gives some idea, If any one can suggest some solution I will be very grateful. If I get some anything from IBM, I will update the same over here.

I doubted someone will respond to the queries but it is good to see two replies. :-)

Re: Size of Document

PostPosted: Thu Oct 21, 2010 11:53 pm
by dick scherrer
Hello,

Yes, if the file you receive contains multiple, embedded "things" and a simple split is used, the result will probably not be what you want (i.e. corrupt).

If all of the activity is within the IBM set of tools, what is the additional requirement?

Re: Size of Document

PostPosted: Fri Oct 22, 2010 4:20 pm
by jnkarthik
Thanks for the response.
Now what I have requested HP dialogue guys to generate a PDS which contain the pdf documents in individual members within pds so that each document will have the proper resources within it.
Then,
I will have to identify the size of each of them
and merge them in to ps using IEBCOPY and load it through the IBM utility in to our ECM tool. That is the current plan.

Here the challenge will be identifying the size of each member and then embeding the size in hexadecimal in to my index file (hope i can do through DFSORT but need to dig details on this area).

If someone as any suggestion or better idea then please feel free to share with me.

Thanks a lot again.

Re: Size of Document

PostPosted: Fri Oct 22, 2010 7:19 pm
by Bill Dennis
My experience is with IBM Content Manager OnDemand. It allowed the INDEX metadata to include an OFFSET and LENGTH of each document within the PS File and it would find and display the document. Isn't the document size known when the metadata file is being created? The control record should be inserted into the metadata by the HP dialogue guys.

Re: Size of Document

PostPosted: Sat Oct 23, 2010 1:56 am
by dick scherrer
Hello,

If each pdf is completely contained in a different member, why is the length needed "externally"?

Also, if all of the members in a pds are to be ftp'ed to a target, all of the members can be transmitted in one GET/PUT.

Re: Size of Document

PostPosted: Tue Oct 26, 2010 7:39 pm
by jnkarthik
Hello Bill,
Thanks for the update, We are using Content Manager for DB2 but you are right We are trying to generate the document size using HP Dialogue guys but it is not working for us when they are generating all documents in a single PDF file so now we are trying using PDS option i.e., each document as a member and atleast file size will be accurate now. Tomorrow I might have update on the progress.

Hello d.Sch,
Thanks for the response.
As I mentioned earlier, IBM Utility require a single PS file to load in to ECM tool and it also requires the actual bytes of each pdf file.
Hence, I need to get the bytes occupied for each member in the PDS.
Then convert pds to ps (Here when I am using IEBPTPCHit is reformating the file to 81 recordlength where as i require the output in 12880 bytes. I tried using IEBGENER/IEBCOPY which is also not allowing me to use DCB, I am working what are all the other option available to convert PDS to PS or PDSE to PS)
And then load the file using IBM Utility.

Is that explains you d.Sch. Sorry If I have confused you a lot.

Re: Size of Document

PostPosted: Tue Oct 26, 2010 11:11 pm
by dick scherrer
Hello,

Is that explains you d.Sch. Sorry If I have confused you a lot.
Not to worry - sometimes it takes me a while, but i'll usually get there eventually :)

You might consider having each pdf sent as a separate file. . .