ecm-guide.com

 
document management software,solution,system

January 27, 2008

Capturing Content

Filed under: Enterprise Content Management — Admin @ 10:58 am

Capturing Content Under Enterprise Content Management

Enterprise content consists of:

  • Unstructured electronic content such as word-processed documents, spreadsheets, emails, instant messages and such,
  • Rich media content such as audio, video and photographs,
  • Content captured through structured and unstructured electronic forms, and
  • Content in paper-based documents and forms.

All these forms of content contribute to the knowledgebase of the enterprise and could also constitute evidence in litigation. Hence it is important to bring all the content under the control of a unified content management system. The process starts with capturing the content that come in all the above formats and sending them to a repository that is accessible from anywhere in the enterprise (or even by external entities like suppliers and customers).

Electronic documents can go directly into the enterprise content repositories once they are categorized appropriately.

Capturing Paper Documents

Moving paper around an enterprise that has geographically widespread operations is not a practical proposition. Time and costs involved, risk of loss during transit, possibility of damage, and security issues make such movement unsuitable.

Instead, paper documents are scanned into digital images, and processed further using technologies such as Optical Character Recognition into machine-readable text documents. Reliability is important for OCR, as there is the possibility that similar characters might be converted wrongly, leading to unintelligible content.

This conversion of paper into electronic documents can take place centrally, or in a distributed fashion at the locations where the documents originate.

One major source of paper documents is the mailroom. Enterprise mailrooms can use advanced equipment that automates the process of extracting documents from envelopes, scanning them and converting scanned images into machine-readable text. With minimal operator intervention, the mail documents can be sorted and categorized for storage in the content repository, making them accessible from anywhere in the world.

Categorization and Indexing

The vast collection of documents in the enterprise repository would be useless unless each document is categorized in a meaningful way and the document collection is indexed.

Indexing attaches metadata to each document and makes it findable whenever it is relevant to a query. Users would then be able to search the enterprise content to extract the information they want.

Enterprise Content Management systems can store all enterprise data in a separate repository of its own, such as a data warehouse, or use XML-based tools to make data in disparate repositories (of different applications) look like it comes from a single source.

Either way, authorized persons would now be able to access relevant enterprise content from their workstations or mobile devices from wherever they happen to be located.

It is important to provide such access and improve findability of the content to enable informed decision making across the enterprise as a whole.

In an Enterprise Content Management environment, capturing content is not a simple task. The capture functionality must deal with issues like huge volumes, varying document formats, numerous originating points, different languages and difficult to scan/read documents.

The strategies and tools selected for content capture must tackle these problems, and must be tailored to the specific needs and environments of the organizations.