Metadata To File
Use the Metadata to File process node to extract metadata from incoming files and jobs and store that information in a separate file. You can output to a variety of metadata file formats to fit your own specific needs.
For each Metadata to File node you add to a workflow, you specify the following:
- The type of file in which to store that information.
- The format of the new metadata file’s extension.
- Whether to output the original file.
The newly-created metadata file is often used for:
- Record keeping .
- Import into other systems for search purposes.
- Routing to other nodes in a Dispatcher workflow.
Using the Metadata to File Node
To open the Metadata to File properties window, add a Metadata to File process node to a workflow and double-click on it. The following illustration shows the Metadata to File properties window.
Note: The illustration may not display all possible metadata.
General Settings
-
Enabled - To enable this node in the current workflow, check the box at this field. By default, this box will be checked. If you uncheck this box, the workflow ignores the node and documents pass through as if the node was not present, and the node will display with a red X in the workflow. Note that a disabled node does not check for logic or error conditions.
-
Node Name - This editable field contains a default node name that appears in the workflow under the node icon. Use this field to specify a meaningful name for the node that indicates its use in the workflow.
-
Description - Enter an optional description for this node. A description can help you remember the purpose of the node in the workflow or distinguish nodes from each other.
-
Help - To access Dispatcher Online Help, click this button.
-
Save - To preserve your node definition and exit the window, click this button.
-
Cancel - To exit the window without saving any changes, click this button.
Output Settings
This area contains the following options:
-
Output the original file - To output the original file along with the newly created metadata file, check the box. To output only the metadata, uncheck this box. By default, this box will be checked.
-
Metadata file format - Choose the format of the file you want to store the metadata. You have the following options:
- XML (Extensible Markup Language)
- INI (Initialization)
- CSV (Comma-Separated Values)
- JSON (JavaScript Object Notation)
- Custom - For more information, see the Creating Custom Output Files section below.
- Custom (Batch) - This option allows for the creation of a single CSV file that contains all metadata from all processed documents in the batch.
-
Metadata file extension - Select the extension for the newly created metadata file. Choices are:
-
Output file extension plus metadata file extension - This option includes the output file’s extension along with the metadata file’s extension. For example, if this option is enabled and you chose to create an INI file, if the node processes a file named “123.TIFF”, the newly created metadata file would be “123.TIFF.INI”.
-
Metadata file extension only - This option includes the metadata file’s file extension only. For example, if this option is enabled and you chose to create an INI file, if the node processes a file named “123.TIFF”, the newly created metadata file would be “123.INI”.
Note: The newly created metadata file uses the same file name as the output file.
-
Select Metadata to Extract
This area lists all of the metadata types available within the workflow. For example, if you have added an Advanced OCR node to the workflow with zones defined, a check box for Advanced OCR will appear. Check the box next to one or more metadata sets you want to extract to a separate file.
Options include:
- Advanced OCR
- Annotate
- Collect from Dispatcher Phoenix - (While this node does not generate any of its own metadata, the metadata will be available because of this node.)
- Dropbox In
- External Form
- Forms
- Internal Form
- Job History
- Job Notes
- Microsoft Exchange In
- ODBC
- Upland InterFAX In
Note: For more information about how to use metadata in other nodes, see Metadata Browsing.
Extracting Metadata Generated in Dispatcher Phoenix
The Collect from Dispatcher Phoenix node can import metadata from Dispatcher Phoenix workflows into Dispatcher Dispatcher Stratus . However, note the following:
-
At this time, only file-level and page-level metadata is passed between Dispatcher Stratus and Dispatcher Phoenix.
-
When sharing metadata between Dispatcher Phoenix and Dispatcher Stratus , metadata keys generated by one system will not be available for selection in the other system’s Metadata to File node. Instead select “Custom” at the Metadata File Format field, then create a custom Lua script to export the metadata.
-
Metadata keys generated in Dispatcher Phoenix and collected by Dispatcher Stratus must be manually edited in Dispatcher Stratus for use in workflows. Conversely, metadata keys flowing from Dispatcher Stratus do not need to be modified in Dispatcher Phoenix.
Structure of XML File
XML files consist of one root element named <file> with the following required attributes:
-
name - the name of the file that this XML document is associated with.
-
size - the number of bytes of the ‘name’ file.
-
mtime - the number of seconds from the UNIX epoch that the ‘name’ file was modified.
The <file> element may contain one or more <meta> elements. No other XML element may exist as a direct child of <file>.
The <meta> element has two required attributes:
-
group - A short but friendly descriptor used to separate where the variables came from.
-
name - The variable name, either defined by the system or by the Index Form designer.
The <meta> element may have one of two possible child elements:
-
document - Contains the variable’s value.
-
page - This element contains the variable’s value as it relates to specific pages.
Structure of an INI File
The format of an INI file follows this structure:
[file]
name=sample-file.pdf
size=205491
ctime=0
mtime=1622651543
[group]
metadata-variable-name1=metadata-value1
metadata-variable-name2=metadata-value2
Note: ctime
refers to the time the node took to create the file. mtime
refers to the time the node took to modify the file.
Structure of a CSV File
The format of an CSV file follows this structure:
file,name,sample-file.pdf
file,size,205491
file,ctime,0
file,mtime,1622651543
group,metadata-variable-name1,"metadata-value1"
group,metadata-variable-name2,"metadata-value2"
Structure of JSON File
The format of JSON files follow this structure:
{name:’’, size:0, ctime:0, mtime:0, meta: [
{group:’’, name:’’, values: {
Document = {doc: %VALUE%}
Per Page = {%PAGE%: %VALUE%}
}
]
}
For example:
{
"name" : "pdf_form_maker1_new.pdf",
"size" : 58750,
"ctime" : 0,
"mtime" : 1427808953,
"meta" : [
{
"group" : "pdf",
"name" : "editable",
"values" : {
"doc" : "true"
}
}
]
}
Creating Custom Output Files
Dispatcher Stratus also gives you the option to create your own file format in which to store metadata. This is useful if you need to control the output of the file to fit your system, for example when the other standard formats such as XML will not work with the content management system or other application that you are working with.
To create a customized file format, choose the Custom option from the Metadata file format drop-down list. Once selected, the Select metatdata to extract box containing the various metadata types disappears and is replaced by the Create script to extract metadata box. This text box is pre-populated with syntax highlighting for Lua. You can use this very simple example to get started. See the following illustration:
Export Function
The Lua script must implement a function called “export” that accepts the following two arguments:
-
file (representing the basic file information and accessor for file metadata)
-
file.name
-
file.ext
-
file.fullname
-
file.size
-
file.mtime
-
file.ctime
-
file.{metadata group name} (i.e., pdf, bates, ocr, etc.)
-
file.pdf.author
-
file.pdf[‘author’]
-
file.bates.Stamp1
-
file.ocr[‘zone.MyZone’][1] = value of zone.MyZone on 1st page
-
file.ocr[‘zone.MyZone’][2] = value of zone.MyZone on 2nd page
Note: If the metadata contains periods, it must be wrapped in brackets and quotes, like [“zone.MyZone”].
-
jid (representing user information)
-
jid.user
-
export all {user:***} as variables
-
Examples: jid.user.name, jid.user.domain, jid.user.email (if defined)
-
jid.fs
-
export all {fs:***} as variables
-
Examples: jid.fs.DesktopDirectory, jid.fs.CommonApplicationData
The “Export” function should return an instance of ‘File.new()’ or ‘nil’. To create your output file, use the following syntax:
local out = File.new(“my filename here”)
OR
local out = File.new(file.fullname)
File Object Methods
Here are the file object methods that you can use:
-
write(“some data”)
-
writeln(“some data that has a newline appended to it”)
-
ext(“change file extension”)
-
ext(“txt”)
-
ext(“xml”)
-
-
eol(“change newline character”)
-
eol(“\n”)
-
eol(“\r\n”)
-
There is also a global function called ‘Print’ that can be used to log messages.
Example
In a workflow that extracts data from invoices, the following XML file is created:
However, the format and content of this XML will not work with a customer’s existing system. In this case, the following custom file was created:
The following custom script was used to create this custom file: