# DocuPipe Documentation ## Guides - [Getting Started with DocuPipe](https://docupipe.readme.io/docs/getting-started.md): This page will help you get started with Docupanda. You'll be up and running in a jiffy! ## API Reference - [Health Check Post](https://docupipe.readme.io/reference/health_check_post.md): Health check endpoint to confirm the service is operational. - [Health Check](https://docupipe.readme.io/reference/health_check.md): Health check endpoint to confirm the service is operational. - [Root](https://docupipe.readme.io/reference/root-1.md) - [Get Account Information](https://docupipe.readme.io/reference/get_account-1.md): Get information about your account, including the plan name, number of remaining credits, the number of overage credits, and the details of the upcoming invoice. - [Delete Multiple Analyses](https://docupipe.readme.io/reference/delete_analyses-1.md): Delete multiple analyses at once by providing a list of analysis IDs. - [Retrieve Analysis](https://docupipe.readme.io/reference/get_analysis-1.md): Retrieve an analysis object by providing the analysis ID, which has the questions and answers. - [List Analyses](https://docupipe.readme.io/reference/list_analyses-1.md): List all analyses that have been performed. - [Analyze Data](https://docupipe.readme.io/reference/post_analyze_data-1.md): Analyze multiple documents all at once, either by passing a list of Document IDs or by passing a dataset name. If both are pased, we will use the intersection of the two. Analysis works by passing a list of questions in natural language. If a schemaId is passed, the AI will first use the standardizations of those documents under the provided schema to narrow down which documents are relevant. Only then, it will analyze the documents and provide answers to the questions, along with confidence scores and citations. If a schemaId is not passed, the AI will manually examine all documents, with a limit of 50. When schemaId is passed, the AI can optionally also perform database queries for statistical calculations and answer the questions based on those results. - [Analyze Document](https://docupipe.readme.io/reference/post_analyze_document-1.md): Analyze a single document by passing a `documentId` and a list of questions in natural language. If the `pages` parameter is provided, the AI will only analyze the specified pages. Poll for the results using the `GET /job/{jobId}` endpoint with the returned jobId. - [Delete a Class](https://docupipe.readme.io/reference/delete_class-1.md): Delete a class from the taxonomy of classes by providing the class ID. - [List Classes](https://docupipe.readme.io/reference/list_classes-1.md): List all classes that have been defined in the taxonomy. - [Add a Class](https://docupipe.readme.io/reference/post_add_class-1.md): Add a new class to the taxonomy of classes, including name and description. - [Classify Documents](https://docupipe.readme.io/reference/post_classify_batch-1.md): Classify a batch of one or multiple documents all at once by passing a list of document IDs, and an optional list of class IDs to use for classification. If no class IDs are provided, all classes will be used. To use the `unknown` class, either pass its classId ('unknown') or set the includeUnknown flag as True. - [Upload and Standardize Multiple](https://docupipe.readme.io/reference/upload-and-standardize-multiple.md): Upload multiple documents to DocuPipe and standardize them, then retrieve the results - [Workflow: Upload and Standardize](https://docupipe.readme.io/reference/upload-and-standardize-using-workflow.md): Upload and standardize a document in a single POST request using workflows. - [Upload Multiple](https://docupipe.readme.io/reference/upload-multiple.md): Upload multiple documents to DocuPipe and retrieve the results - [Upload](https://docupipe.readme.io/reference/upload.md): Upload a document to DocuPipe and retrieve the parsed results - [Workflow: Upload, Classify and Standardize](https://docupipe.readme.io/reference/workflow-upload-classify-and-standardize.md): Upload and classify a document, and then standardize for certain classes, all in a single POST request using workflows. - [Delete a Document](https://docupipe.readme.io/reference/delete_document-1.md): Delete a document that has been previously submitted to DocuPipe for processing. - [Delete Multiple Documents](https://docupipe.readme.io/reference/delete_documents-1.md): Delete multiple documents that have been previously submitted to DocuPipe for processing. You can provide a list of document IDs to delete multiple documents at once. - [Download OCR URL](https://docupipe.readme.io/reference/download_ocr_url-1.md): Generate an OCR PDF by providing a document ID. This will add the OCR text into a layer on top of the PDF, allowing you to search the PDF by the OCR text. Returns a presigned URL to download the OCR PDF. This URL is valid for a limited time (e.g., 1 hour) and allows secure access to the OCR PDF stored on DocuPipe. Note that this endpoint only works for documents with PDF file types. - [Download Original URL](https://docupipe.readme.io/reference/download_original_url-1.md): Generate and retrieve a presigned URL for accessing the original file of a document by its ID. The URL is valid for a limited time (e.g., 1 hour) and allows secure access to the document stored on DocuPipe. - [Retrieve Detailed Processing Result](https://docupipe.readme.io/reference/get_document_detailed.md): Access the fine grained document parsing result. This representation includes individual word locations on the page - [Get Document Count](https://docupipe.readme.io/reference/get_document_summary-1.md): Get a count of your documents, including the total number of documents, as well as the list of unique datasets - [Retrieve a Processed Document](https://docupipe.readme.io/reference/get_document-1.md): Access the analysis results of your submitted document using this endpoint. The `status` field indicates the document's current processing stage, and the `result` field provides the extracted plain text for AI comprehension, as well as more granular structured information such as bounding boxes for detected tables and text blocks. - [Get Schema Proposals](https://docupipe.readme.io/reference/get_proposed_schemas.md): Get schema proposals for a document by providing the document's ID. The schema proposals are generated by the AI based on the document's content. - [Retrieve Split Job](https://docupipe.readme.io/reference/get_split_job-1.md): Retrieve the status of a document splitting job. - [List Dataset Names](https://docupipe.readme.io/reference/list_datasets-1.md): List all dataset names for the documents you have submitted so far - [List Documents](https://docupipe.readme.io/reference/list_documents-1.md): List all documents that have been submitted to DocuPipe for processing. You can filter the results by providing a dataset name. - [Submit a Document for Processing](https://docupipe.readme.io/reference/post_document-1.md): Use this endpoint to submit a document to DocuPipe for processing. You can upload a local file or provide a URL to a remote file. Upon submission, receive a unique `documentId` which you may use to retrieve the document's results, or apply subsequent workflows on it. Max document size is 1000 pages and 2000 MB. *Advanced*: you may also provide a `workflowId` to apply a pre-defined workflow to the document. - [Split a Document](https://docupipe.readme.io/reference/post_split-1.md): Split a document into multiple documents intelligently using AI. If no splitting is needed, no new documents will be created. Otherwise, the new sub-documents will be automatically generated. - [Update Dataset](https://docupipe.readme.io/reference/update_documents_dataset.md): Update the dataset of a list of documents by providing a list of document IDs and the new dataset name. This operation will update the dataset of all documents and standardizations associated with those documents. - [DocuPipe Rate Limits](https://docupipe.readme.io/reference/docupipe-rate-limits.md) - [Getting Started With DocuPipe](https://docupipe.readme.io/reference/getting-started-with-docupipe.md): Take your first steps by uploading a document and getting its parsed text and tables. - [Using LLMs With These Docs](https://docupipe.readme.io/reference/using-llms-with-these-docs.md): Learn how to copy and paste into chatGPT like a boss - [Webhooks](https://docupipe.readme.io/reference/webhooks.md): Architect event-driven workflows - [Delete Jobs](https://docupipe.readme.io/reference/delete_jobs-1.md): Delete multiple jobs that have been submitted to DocuPipe for processing. You can provide a list of job IDs to delete multiple jobs at once. Since jobs are just a record of events, deleting them will just hide them from you - the actual records will still be stored in the database. For specific jobs that contain actual data, such as Analyze-Document, the data will be deleted. - [Get Job Count](https://docupipe.readme.io/reference/get_job_summary-1.md): Get a count of your jobs broken down by Job Type. For each job type, you will see the number of jobs and number of credits consumed. The output includes 3 versions of the summary: 1. All time summary with deleted jobs 2. All time summary excluding deleted jobs 2. Count since start_date (defaults to previous billing date, includes deleted jobs) You may pass start_date as an optional query parameter in ISO format (yyyy-mm-dd). - [Retrieve a Job](https://docupipe.readme.io/reference/get_job-1.md): Retrieve the details of a specific job by providing the job's ID. This will include the job's status, timestamp, and any other relevant information. - [List Jobs](https://docupipe.readme.io/reference/list_jobs-1.md): List all jobs that have been submitted to DocuPipe for processing. Every document upload, standardization, or credit-consuming event results in a job. This lets you audit your credit consumption. You can optionally filter the results by providing a date range in the format yyyy-mm-dd. - [Delete Reviews](https://docupipe.readme.io/reference/delete_reviews-1.md): This endpoint is used to delete multiple review object. You can pass a length 1 list of review IDs to delete a single review. - [Generate a Presigned URL for a Review](https://docupipe.readme.io/reference/get_presigned_url.md): This endpoint generates a presigned URL containing a signature and expiration time for accessing or acting on a review. - [Retrieve a Review by ID](https://docupipe.readme.io/reference/get_review_by_id.md): This endpoint is used to retrieve a review object its unique ID - [Retrieve review by standardization ID](https://docupipe.readme.io/reference/get_standardization_review-1.md): This endpoint is used to retrieve a review object by its associated standardization ID. - [List Reviews](https://docupipe.readme.io/reference/list_reviews-1.md): This endpoint is used to list all review objects. - [Generate a Visual Review](https://docupipe.readme.io/reference/post_review_batch-1.md): This endpoint is used to generate a visual review of the standardization results. For every value in the standardization payload, we generate a confidence score and a a list of locations, where a location is page number and x1,y1,x2,y2 bounding box coordinate on that page, designating the top left and lower right corner of the bounding box. This indicates where in the doucment the value was found. - [Update a Review](https://docupipe.readme.io/reference/update_review.md): This endpoint is used to update a review object with new data or status. - [Delete a Schema](https://docupipe.readme.io/reference/delete_schema-1.md): Delete a schema by its id. - [Retrieve AutoGenerate Schema Job](https://docupipe.readme.io/reference/get_schema_autogenerate_job-1.md): Retrieve the status of an autogenerate schema job. - [Retrieve a Schema](https://docupipe.readme.io/reference/get_schema-1.md): Retrieve an existing schema by providing the schema's ID. - [List Schemas](https://docupipe.readme.io/reference/list_schemas-1.md): List all of your schemas. The output here includes the jsonSchema data as well. - [Edit a Schema](https://docupipe.readme.io/reference/post_edit_schema.md): Edit a schema by providing a schema ID and the parameters you want to edit. This does not create a new schema, but rather updates the existing schema. Changing the schema name is purely cosmetic, but changing the description and guidelines will affect the behavior of the schema for future standardizations. The things you can edit are: 1. `schemaName` - The name of the schema 2. `description` - The description of the schema 3. `guidelines` - The guidelines for the schema - [Refine a Schema](https://docupipe.readme.io/reference/post_refine_schema-1.md): Refine a schema by providing feedback in free text to edit the structured output. - [AutoGenerate a Schema](https://docupipe.readme.io/reference/post_schema_autogenerate-1.md): Generate a schema based on a list of documents. Leave the instructions empty if you want the AI to use its best judgment, or provide instructions to indicate your preference to how the schema should be generated. Best results are achieved when you provide a varied list of documents that represent the full range of type of documents you expect to process, and when you provide clear instructions to what you expect the schema to capture and how you want it to be structured. - [Add a Schema](https://docupipe.readme.io/reference/post_schema.md): Create a new schema by posting a valid JSON schema. The schema should be a valid JSON schema that represents the structured output you want to extract from documents. - [Update a Schema](https://docupipe.readme.io/reference/post_update_schema-1.md): Update an existing schema by posting a valid JSON schema. This does not overwrite, but creates a new schema. - [Bulk Download Standardization XMLs](https://docupipe.readme.io/reference/bulk_xml_download.md): Download multiple standardization results as XML files in a single zip archive. Provide a list of standardization IDs and receive a presigned URL to download a zip file containing all the XML files. Maximum 100 standardizations per request. The download URL expires after 24 hours. - [Delete a Standardization](https://docupipe.readme.io/reference/delete_standardization-1.md): Delete a standardization by providing the standardization ID. - [Delete Multiple Standardizations](https://docupipe.readme.io/reference/delete_standardizations-1.md): Delete multiple standardizations at once by providing a list of standardization IDs. - [Download Excel URL](https://docupipe.readme.io/reference/download_excel_url.md): Generate an Excel file from the standardization JSON by providing a standardization ID. All arrays in the JSON will be put in a separate sheet, and all the non-array fields will be put in the main sheet. Returns a presigned URL to download the Excel file. This URL is valid for a limited time (e.g., 1 hour) and allows secure access to the Excel file stored on DocuPipe. - [Get Standardization Count](https://docupipe.readme.io/reference/get_standardization_summary-1.md): Get a count of your standardizations, including the total number as well as the list of unique schema names - [Retrieve a Standardization XML](https://docupipe.readme.io/reference/get_standardization_xml.md): Retrieve the standardization results of a document as an XML object. - [Retrieve a Standardization JSON](https://docupipe.readme.io/reference/get_standardization-1.md): Retrieve the standardization results of a document as a JSON object. - [List Standardizations](https://docupipe.readme.io/reference/list_standardizations-1.md): Retrieve all standardizations of documents that have been processed using a specific schema. - [Match a standardization to a list of candidates](https://docupipe.readme.io/reference/match_standardization.md): Use this endpoint to match a standardization to a list of candidates. You can provide a standardization id and a list of candidates. A candidate must have and id and a record which details all its properties. You can optionally provide instructions to clarify the task rules - [Standardize Documents](https://docupipe.readme.io/reference/post_standardize_batch_v2.md): Standardize a batch documents, either by passing a list of Document IDs or by passing a dataset name. Pass a schemaId to standardize the documents using a specific structure, or leave it empty to create an ad-hoc structure as the AI sees fit. Standardization handles lists (arrays) by splitting documents into smaller sub-documents behind the scenes - the AI will do its best to decide how and when it is appropriate to split. *Advanced*: You can specify certain parameters, by default they are left to `auto` which lets the AI decide. 1. `displayMode` - Controls how the AI sees the document. The options are: - `auto` - Automatically determine the best mode based on the document content. - `spatial` - Represent text in the document according to its spatial layout. - `sections` - Represent the document as a list of sections (paragraphs, tables, images, etc.) as seen in the web UX. 2. `splitMode` - Controls how the AI splits the document into sub-documents. The options are: - `auto` - Automatically determine the best mode based on the document content. - `all` - Split the document into single-page sub-documents, so each page is handled separately. - `never` - Do not split the document at all, so the entire document is handled as a single unit. This can lead to poor performance for long documents, or documents with lots of dense data that needs to be extracted. 3. `effortLevel` - Controls how much effort the AI puts into the standardization. The options are: - `standard` - Use the standard effort level. - `high` - Use the high effort level, which takes longer but can produce better results. Costs +2 credits per page. - [Deregister an Endpoint](https://docupipe.readme.io/reference/delete_endpoint-1.md): Deregister a webhook endpoint for your application, it will stop receiving all events. You can also manage this in our dashboard portal under account/settings in docupanda website. - [Register an Endpoint](https://docupipe.readme.io/reference/generate_endpoint-1.md): Generate a webhook endpoint for your application. The specified url will receive ALL events. If you want to define a more granular specification, use our dashboard portal under account/settings in docupanda website. - [Get Webhook Portal URL](https://docupipe.readme.io/reference/get_portal_link-1.md): Generates a magic link for you to log on to URL to the app portal. From the portal you can configure webhook subscriptions in a user-friendly interface. - [Delete a Workflow](https://docupipe.readme.io/reference/delete_workflow-1.md): Delete a workflow by providing the workflow ID. - [List your Workflows](https://docupipe.readme.io/reference/list_workflows-1.md): This endpoint returns a list of all your workflows. - [Create a Workflow](https://docupipe.readme.io/reference/post_workflow_on_submit_document-1.md): Use this endpoint to create a workflow that triggers when a document is submitted. The workflow can be configured to either: 1. Always run the specified schema(s) on the document, set via the `standardizeStep` input. 2. Conditionally run one or more schemas based on the document's `classId`, set via the `classifyStandardizeStep` input. Note: You must provide one of these inputs, but not both. To run the workflow, use the `POST /document` endpoint with the `workflowId` that gets returned from this endpoint. - [Update a Workflow](https://docupipe.readme.io/reference/update_workflow.md): Update an existing workflow by posting the same parameters as `POST /workflow/on-submit-document`. The workflow will retain its original `workflowId`. ## Recipes - [Get Magic](https://docupipe.readme.io/recipes/get-magic.md)