Salesforce
Connect Salesforce to your preprocessing pipeline, and use the Unstructured Ingest CLI or the Unstructured Ingest Python library to batch process all your documents and store structured outputs locally on your filesystem.
The requirements are as follows.
-
Your Salesforce username. To get this username, do the following:
- Log in to your Salesforce account.
- In the top navigation bar, click the Quick Settings (gear) icon, and then click Open Advanced Setup.
- In the Home tab, under Administration, expand Users, and then click Users.
- Note the User Name value (not the Name value) for your user.
-
The names of the Salesforce categories (objects) that you want to access, specified as a comma-separated list. Available categories include
Account
,Campaign
,Case
,EmailMessage
, andLead
. -
A Salesforce connected app in your Salesforce account.
If you do not already have a Salesforce connected app, to create one, start by creating or getting the certificate (
.crt
) and private key (.pem
) files that you will associate with the connected app.If you do not have the certificate and private key files, you can use the
openssl
utility on your local machine to create your own private key and self-signed certificate, if your organization allows, by running the following commands, one command at a time:Of course, you can change these preceding example filenames as needed. Be sure to store these generated files in a secure location.
To create a Salesforce connected app, do the following:
-
Log in to your Salesforce account.
-
In the top navigation bar, click the Quick Settings (gear) icon, and then click Open Advanced Setup.
-
In the Home tab, under Platform Tools, expand Apps, and then click App Manager.
-
Click New Connected App.
-
With Create a Connected App selected, click Continue.
-
At a minimum, fill in the following, and then click Save:
-
Connected App Name
-
API Name (can be the same as Connected App Name, but do not use spaces or punctuation)
-
Contact Email
-
Under API (Enable OAuth Settings), check Enable OAuth Settings.
-
For Callback URL, entering
https://localhost
is okay if you won’t be using this connected app for other special authentication scenarios. -
Check Use digital signatures, click Choose File, and browse to and select your certificate (
.crt
) file. -
For Selected OAuth Scopes, move the following entries from the Available OAuth Scopes list to the Selected OAuth Scopes list:
- Manage user data via APIs (api)
- Perform requests on your behalf at any time (refresh_token, offline_access)
-
Uncheck Require Proof Key for Code Exchange (PKCE) Extension for Supported Authorization Flows.
-
Leave Require Secret for Web Server Flow checked.
-
Leave Require Secret for Refresh Token Flow checked.
-
Check Enable Authorization Code and Credentials Flow.
-
-
On the connected app’s details page, click Manage, click Edit Policies, set the following under OAuth Policies, and then click Save:
- Set Permitted Users to All users may self-authorize.
- Set IP Relaxation to Relax IP restrictions.
- Set Refresh Token Policy to Refresh token is valid until revoked.
-
-
The OAuth consumer key (client ID) for the Salesforce connected app.
To get the Salesforce connected app’s consumer key, do the following:
- Log in to your Salesforce account.
- In the top navigation bar, click the Quick Settings (gear) icon, and then click Open Advanced Setup.
- In the Home tab, under Platform Tools, expand Apps, and then click App Manager.
- In the list of apps, click the arrow next to the target connected app, and click View.
- Click Manage Consumer Details.
- Complete the on-screen security verification.
- Note the Consumer Key value.
-
You must use your Salesforce account to do a one-time approval of the Salesforce connected app by using its consumer key and callback URL. To do this, while you are logged in to your Salesforce account, browse to the following URL, replacing
<client-id>
with the consumer key value. This URL assumes that the callback URL ishttps://localhost
: -
To ensure maximum compatibility across Unstructured service offerings, you should give the contents of the private key (
.pem
) file to Unstructured as a string that contains the contents of the file (not the private key file itself).
To print this string suitable for copying, you can run one of the following commands from your Terminal or Command Prompt. In this command, replace<path-to-private-key-file>
with the path to the private key file.-
For macOS or Linux:
-
For Windows:
-
The Salesforce connector dependencies:
You might also need to install additional dependencies, depending on your needs. Learn more.
The following environment variables:
SALESFORCE_USERNAME
- The Salesforce username that has access to the required Salesforce categories, represented by--username
(CLI) orusername
(Python).SALESFORCE_CONSUMER_KEY
- The consumer key (client ID) for the Salesforce connected app, represented by--consumer-key
(CLI) orconsumer_key
(Python).SALESFORCE_PRIVATE_KEY
- The contents of the private key (PEM) associated with the consumer key for the Salesforce connected app, represented by--private-key
(CLI) orprivate_key
(Python), orSALESFORCE_PRIVATE_KEY_PATH
- The local path to the (PEM) associated with the consumer key for the Salesforce connected app, represented by--private-key-path
(CLI) orprivate_key_path
(Python).
Now call the Unstructured Ingest CLI or the Unstructured Ingest Python library. The destination connector can be any of the ones supported. This example uses the local destination connector.
This example sends data to Unstructured API services for processing by default. To process data locally instead, see the instructions at the end of this page.
For the Unstructured Ingest CLI and the Unstructured Ingest Python library, you can use the --partition-by-api
option (CLI) or partition_by_api
(Python) parameter to specify where files are processed:
-
To do local file processing, omit
--partition-by-api
(CLI) orpartition_by_api
(Python), or explicitly specifypartition_by_api=False
(Python).Local file processing does not use an Unstructured API key or API URL, so you can also omit the following, if they appear:
--api-key $UNSTRUCTURED_API_KEY
(CLI) orapi_key=os.getenv("UNSTRUCTURED_API_KEY")
(Python)--partition-endpoint $UNSTRUCTURED_API_URL
(CLI) orpartition_endpoint=os.getenv("UNSTRUCTURED_API_URL")
(Python)- The environment variables
UNSTRUCTURED_API_KEY
andUNSTRUCTURED_API_URL
-
To send files to Unstructured API services for processing, specify
--partition-by-api
(CLI) orpartition_by_api=True
(Python).Unstructured API services also requires an Unstructured API key and API URL, by adding the following:
--api-key $UNSTRUCTURED_API_KEY
(CLI) orapi_key=os.getenv("UNSTRUCTURED_API_KEY")
(Python)--partition-endpoint $UNSTRUCTURED_API_URL
(CLI) orpartition_endpoint=os.getenv("UNSTRUCTURED_API_URL")
(Python)- The environment variables
UNSTRUCTURED_API_KEY
andUNSTRUCTURED_API_URL
, representing your API key and API URL, respectively.