Content ETL

Industrializing your transfers and your transformations

"Content ETL is to documents what standard ETL is to structured data!!"
Nicolas Maquaire, EntropySoft CEO

 


Content ETL

Traditionally, ETL software does Extraction, Transform and Load between structured data repositories such as databases.

EntropySoft’s Content ETL has a leading position on the market as the first scalable and fully-flexible ETL product for unstructured data.

  Content ETL

Content ETL handles all the content transfers that are required in today’s complex information systems.

The main use cases for EntropySoft’s Content ETL product are:

- on-demand content transfers
- intra-enterprise content transfers on a very regular basis
- cloud repository integration and synchronization

With Content ETL you can:
- simplify content transfers between previously non-compatible content repositories
- graphically design and plan document transfers between repositories by building "content bridges"
- map metadata, security and properties in any repository
- delegate transfer management to end-users

Content ETL is using EntropySoft’s exclusive portfolio of read/write connectors and Content Federation Server to interact with more than 30 different content repositories. As the connectors enable most of the features of popular content-centric applications, Content ETL can rely on a vast set of features to manage any kind of transfer and securely transfer complex objects from one content repository to another.

Content ETL consists of two clients of EntropySoft’s Content Federation Server: Content ETL Studio and Content ETL Web.

Content ETL Studio is used to design the content process transfers or "bridges" between repositories.  Content ETL Web is a simple user interface that allows anybody to browse easily in all repositories and  also move documents effortlessly between repositories.

Content ETL can build "content bridges" between incompatible content repositories, so as to move content from one repository to another on a permanent basis. The bridges are a series of logical steps that manage the transfer of a complex information "parcel". The bridges can be tested and visually debugged, steps can be added so as to ensure seamless content transfers. The bridges are designed through a user-friendly graphic interface and stored in the content hub. They can be easily duplicated , elaborated upon and published to enable new transfers. Once the bridges are ready, they can be published to Content Federation Server.

Source and target applications can be of very different types: document-centric applications, messaging systems, records management platforms, collaborative frameworks, CSV files, XML files… Sources and targets can either be on-premise, in the cloud, or both.

Content ETL takes the documents from the source application, puts them through a number of basic operations, before finally injecting the result in the target application. Each of the basic operations (or “stages”) can either be drag-and-dropped from a palette or specifically developed. Stages can include time stamping, pdf rendering, data de-duplication, categorization etc…
Each of the stages can be monitored and graphically debugged. Visual debugging enables you to place a break point on a particular stage and visually check data before and after processing. Detailed exception reports are available.

Content ETL has full capabilities for monitoring and auditing transfers including JMX monitoring. This live monitoring allows you to adapt your processes and maximise performances. Processes can be multi-threaded for better performance as well.


Content ETL keeps track of all processed documents and supplies audit trails. Thanks to this unique feature, it is now possible to trace the whole cross-applications transfer history of a document. By federating the transfer logs over time, the whole life-cycle of a document can be traced throughout its life. This full-traceability is a key feature for information management as well as compliance,

Content ETL’s benefits for your company are:
- easier content transfers between repositories
- better information life-cycle control
- stronger compliance with rules and regulations

EntropySoft brings Content Management Intelligence to its customers.

 

 

Content ETL for Records Management

Every day, the number of content silos used in a company increases. Each department wants to have its own wikis or collaboration application, each new vertical solution comes with its own content repository (BPM, CRM system etc...). Mergers, acquisitions, the introduction of new applications or the use of cloud repositories all contribute to the multiplication of content repositories. Products and applications come in all shapes and sizes and cannot work with each others because of technological differences.

The scattering of documents and information in various non-compatible content silos is increasing content fragmentation in companies.

At the same time, Records Management (RM) rules and regulation are frequently changing and becoming more complex.

The following best practices for better RM are increasing compliance:

- all information in the RM platform must be the same as in the original repository.

- the lifecycle of a document must be fully traceable.

- using one single technology to connect all applications to the RM platform.

- being able to add specific RM operations such as timestamping.

Content ETL automatically implements all the above-mentioned for more than 30 applications.

- Full mapping of metadata / permissions / aspects etc… of source and target repositories is provided.

- Content ETL keeps track of all processed documents and supplies audit trails.

Thanks to this unique feature, it is now possible to trace the whole cross-applications transfer history of a document. By federating the transfer logs over time, the whole life-cycle of a document can be traced throughout its life.

- Content ETL can connect 30 applications to the RM platform, and all connectors are built with a single API which ensures content homogeneity.

Content ETL for Cloud integration

The development of Software as a Service (SaaS) means that more and more companies work with document repositories that are no longer managed “in-house”. Using external document repositories is the fastest way to deliver document-based solutions without the pain of deploying and maintaining internally complex architectures. Additionally, a cloud strategy is very common when mergers and acquisitions happen: organizations with different IT architectures need to put their resources in common as fast as possible.

External document repositories increase content fragmentation and the need to exchange securely and easily documents over the firewall. The use of cloud repositories must be initiated in conjunction with a strategy to have easy transfers between cloud and corporate repositories.

EntropySoft's Standalone Content ETL is specifically designed to be an ideal product for such transfers.

In all companies, most common practice is sending an e-mail with an attachment when sharing information inside or outside of the company. It is common to first download a document from a content-centric application and mail it. The receiving party uploads and saves in its own repository of choice.
This common practice highlights the dangers of not managing company-wide content transfers.   Taking the document out of its initial repository comes with major risks regarding permissions, lifecycle management and mail server overload. A protected document extracted from its initial repository becomes a non-protected document. The complete history of the document is left behind or lost. Knowledge Management becomes impossible.

Content ETL is the best product in its category to manage cross-repository document lifecycle. It can manage real time, on-demand or on-schedule transfers between more than 30 applications, without losing permissions and history.

Applications can be fundamentally different, especially with regards to permissions, metadata models or users and groups directories. Content ETL is deploying “bridges” between repositories. The role of the bridges is first to map the different models (permissions, metadata, users and groups…) and then manage the actual transfer.

Thanks to EntropySoft’s state-of-the-art connectors the bridges can be deployed between more than 30 enterprise content repositories including Microsoft SharePoint, Documentum, Google DOCS, IBM FileNet, Opentext Livelink, Alfresco and many other.

If requirements include data cleansing, de-duplication or rendering, Content ETL includes a large selection of pre-programmed actions that can easily be added in the content bridge. Adding the required features is very easy, since it is done by dragging and dropping a new stage in the graphical representation of the bridge.

Standalone Content ETL also has the ability to dynamically track changes in a repository and deliver a list of added, updated and deleted documents. Changes to content, permissions and metadata are taken into account on an immediate basis. This is a vital feature for all companies who want to have an accurate view of their content and take real-time actions such as immediate Records Management to minimize critical data loss risk.

Standalone Content ETL comes with two scheduling options for transferring documents: pre-programmed transfers on a regular basis or real-time immediate transfers.

Standalone Content ETL is an all-in-one version of Content ETL. It combines an embedded Content Federation server with content ETL so as to simplify product download and deployment.

Customers deploying Standalone Content ETL can, at any time, upgrade their deployment by adding a separate EntropySoft Content Federation Server. The full architecture gives customers a third scheduling option: the ability to manage on-demand transfers with Content ETL web.

By using a pre-defined content transfer bridge, SaaS companies can now offer their customers a downloadable package that can be easily deployed, so as to seamlessly integrate any corporate content repository with their online service, while preserving permissions and delivering full traceability.


Standalone Content ETL’s immediate benefits are easier content transfers between repositories, better information life-cycle control and stronger compliance with rules and regulations.


EntropySoft brings Content Management Intelligence to its customers.

Benefits of Content ETL

Easy transfers between incompatible repositories

Cloud repository integration and on-demand transfers

Full information transfer tracking and increased security

Better compliance and corporate governance

 

 

Available documents

Available connectors

Alfresco
DS Enovia MatrixOne
EMC Documentum CenterStage
EMC Documentum Cont. Serv.
EMC Documentum eRoom
File Systems NTLM
FTP
HP Trim Context (TowerSoft)
IBM DB2 Content Manager OD
IBM Content Manager
IBM FileNET Doc. Services
IBM FileNET Image Services
IBM FileNET ISRA
IBM FileNET P8
IBM Lotus Notes
IBM Lotus Quickplace
IBM Lotus QuickR
IBM WebSphere Portal PDM
Interwoven TeamSite
Interwoven WorkSite NT
I.R.I.S. Archea
JCR
Microsoft Exchange CDO
Microsoft Exchange WebStorage
Microsoft Sharepoint 2003
Microsoft Sharepoint 2007
Microsoft Sharepoint 2010
Open Text eDocs (Humm. DM)
Open Text LiveLink
Open Text Vignette
Oracle Stellent Universal CM
WebDAV server
Xerox Docushare

Updated: January 2010