Transform a JSON file into a different schema
Task
You want to convert a JSON file that Unstructured produces into a separate JSON file that uses a different JSON schema than the one that Unstructured uses.
Approach
Use a Python package such as json-converter in your Python code project to transform your source JSON file into a target JSON file that conforms to your own schema.
json-converter
package is not owned or supported by Unstructured. For questions and
requests, see the Issues tab of the
json-converter
repository in GitHub.Code
Install dependencies
In your local Python code project, install the json-converter package.
Identify the JSON file to transform
-
Find the local source JSON file that you want to transform.
-
Note the JSON field names and structures that you want to transform. For example, the JSON file might look like the following (the ellipses indicate content omitted for brevity):
Create the JSON field mappings file
-
Decide what you want the JSON schema in the transformed file to look like. For example, the transformed JSON file might look like the following (the ellipses indicate content omitted for brevity):
-
Create the JSON field mappings file, for example:
This file declares the following mappings:
- The
type
field is renamed tocontent_type
. - The
element_id
field is renamed tocontent_id
. - The
text
field is renamed tocontent
. - The
page_number
field nested insidemetadata
is renamed topage
and is nested insidecontent_properties
. - All of the other fields (
filetype
,languages
, andfilename
) are dropped.
For more information about the format of this JSON field mappings file, see the Project Description in the
json-converter
page on PyPI or the README in thejson-converter
repository in GitHub. - The
Add and run the transform code
-
Set the following local environment variables:
- Set
LOCAL_FILE_INPUT_PATH
to the local path to the source JSON file. - Set
LOCAL_FILE_OUTPUT_PATH
to the local path to the target JSON file. - Set
LOCAL_FIELD_MAPPINGS_PATH
to the local path to the JSON field mappings file.
- Set
-
Add the following Python code file to your project:
-
Run the Python code file.
-
Check the path specified by
LOCAL_FILE_OUTPUT_PATH
for the transformed JSON file.
Troubleshooting
Error when trying to import Mapping from collections
Issue: When you run your Python code file, the following error message appears: “ImportError: cannot import name ‘Mapping’ from ‘collections’”.
Cause: When you use the json-converter
package with newer versions of Python such as 3.11 and later,
Python tries to use an outdated import in this json-converter
package.
Solution: Update the json-converter
package’s source code to use a different import, as follows:
-
In your Python project, find the
json-converter
package’s source location, by running thepip show
command:Note the path in the Location field.
-
Use your code editor to the open the path to the
json-converter
package’s source code. -
In the source code, open the file named
json_mapper.py
. -
Change the following line of code…
…to the following line of code, by adding
.abc
: -
Save this source code file.
-
Run your Python code file again.