Docker Installation
The instructions below guide you on how to use the unstructured library inside a Docker container.
Prerequisites
If you haven’t installed Docker on your machine, you can find the installation guide here.
We build multi-platform images to support both x86_64 and Apple silicon hardware. Using docker pull should download the appropriate image for your architecture. However, if needed, you can specify the platform with the –platform flag, e.g., –platform linux/amd64.
We do not support GPU usage with the Unstructured library inside a Docker container.
Pulling the Docker Image
We create Docker images for every push to the main branch. These images are tagged with the respective short commit hash (like fbc7a69) and the application version (e.g., 0.5.5-dev1). The most recent image also receives the latest tag. To use these images, pull them from our repository:
Using the Docker Image
After pulling the image, you can create and start a container from it:
Building Your Own Docker Image
You can also build your own Docker image. If you only plan to parse a single type of data, you can accelerate the build process by excluding certain packages or requirements needed for other data types. Refer to the Dockerfile to determine which lines are necessary for your requirements.
Interacting with Python Inside the Container
Once inside the running Docker container, you can directly test the library using Python’s interactive mode: