EN ET

Document Importer

Document Importer (Document Importer) provides API endpoints for adding, deleting and replacing JSON documents inside Elasticsearch indices.

Importing a document

Adding documents through the API is the easiest way to integrate existing datasets and systems with TEXTA Toolkit. However, for security reasons the users are only allowed to insert documents into indices which are already put inside their Project. API users should also be keenly aware that such indices would also need to be set up with a the proper schema to work with tools like Tagger and Tagger Groups, please refer to the Index API documentation.

Parameters

documents:

Collection of raw Elasticsearch documents. Each element in the list has following fields:

  1. “_id”: Under which id should Elasticsearch insert the document, without this Elasticsearch will generate one itself.

  2. “_index”: Under which already existing index should Elasticsearch insert the document. When the index field is missing, all the documents

will be sent to an index with the name format of: “texta-{DEPLOY_KEY}-import-project-{project_id}” where DEPLOY_KEY by default is “1”.

  1. “_type”: Specifies the doc_type for Elasticsearch documents, should be manually set to “_doc”, defaults to “_doc”.

  2. “_source”: Actual JSON content of the document. All documents should follow the same schema as conflicts will cause errors.

split_text_in_fields:

Specifies which text fields should be split into smaller pieces, defaults to a field with the name “text” if none is given. By default the texts are split at a 3000 character limit! Users who do not want to have their documents split should set this field to an empty list.

Note

If the documents do not have “_index” and “_type” field, the name of the index is generated automatically.

Examples

Endpoint: /projects/{project_pk}/elastic/documents/

Example with index name added automatically and no splitting:

curl -X POST "http://localhost:8000/api/v2/projects/1/elastic/documents/" \
-H "accept: application/json" \
-H "Content-Type: application/json" \
-H "Authorization: Token 8229898dccf960714a9fa22662b214005aa2b049" \
-d '{
      "documents": [{"_id": "3", "_source": {"hello": "general kenobi!"}}],
      "split_text_in_fields": []
    }'

Example with given index name and splitting:

curl -X POST "http://localhost:8000/api/v2/projects/1/elastic/documents/" \
-H "accept: application/json" \
-H "Content-Type: application/json" \
-H "Authorization: Token 8229898dccf960714a9fa22662b214005aa2b049" \
-d '{
      "documents": [{"_id": "30", "uuid": "aa15-ghh4-41af-af51", "_index": "texta_test_index", "_type": "texta_test_index", "_source": {"hello": "general kenobi! Here is a very long text that should be splitted", "date": "2015-01-01T12:10:30Z"}}],
      "split_text_in_fields": ["hello"]
    }'

Viewing a document

Endpoint: projects/{project_pk}/elastic/documents/{index_name}/{document_id}/

curl -X GET "http://localhost:8000/api/v2/projects/1/elastic/documents/texta_text_index/30/"

Deleting a document

Endpoint: projects/{project_pk}/elastic/documents/{index_name}/{document_id}/

curl -X DELETE "http://localhost:8000/api/v2/projects/1/elastic/documents/texta_text_index/30"

Updating split document

Parameters

id_field:

Which field to use as the ID marker to categorize split documents into a single entity.

id_value:

Value of the ID field by which you categorize split documents into a single entity.

text_field:

Specifies the name of the text field you wish to update.

content:

New content that the old one will be updated with.

Example

Endpoint: projects/{project_pk}/elastic/documents/{index_name}/update_split

Note

Lack of trailing “/” is important for this endpoint!

curl -X POST "http://localhost:8000/api/v2/projects/1/elastic/documents/texta_test_index/update_split" \
-H "accept: application/json" \
-H "Content-Type: application/json" \
-H "Authorization: Token 8229898dccf960714a9fa22662b214005aa2b049" \
-d '{
      "content": "general kenobi! Here is a very long text that should be splitted and now there is more text I forgot to add before and am replacing now",
      "text_field": "hello",
      "id_value": "uuid",
      "id_field": "aa15-ghh4-41af-af51"
    }'