EN ET

Document Importer

Document Importer (Document Importer) provides API endpoints for adding, deleting and replacing JSON documents inside Elasticsearch indices.

Dokumendi importimine

Adding documents through the API is the easiest way to integrate existing datasets and systems with TEXTA Toolkit. However, for security reasons the users are only allowed to insert documents into indices which are already put inside their Project. API users should also be keenly aware that such indices would also need to be set up with a the proper schema to work with tools like Tagger and Tagger Groups, please refer to the Index API documentation.

Parameetrid

documents:

Elasticsearchi dokumentide kogum. Listi igal elemendil on järgnevad väljad:

  1. „_id“: millise id alla Elasticsearch dokumendi paneb. Kui see on puudu, genereerib Elasticsearch id ise.

  2. „_index“: Under which already existing index should Elasticsearch insert the document. When the index field is missing, all the documents

will be sent to an index with the name format of: „texta-{DEPLOY_KEY}-import-project-{project_id}“ where DEPLOY_KEY by default is „1“.

  1. „_type“: Specifies the doc_type for Elasticsearch documents, should be manually set to „_doc“, defaults to „_doc“.

  2. „_source“: Actual JSON content of the document. All documents should follow the same schema as conflicts will cause errors.

split_text_in_fields:

Specifies which text fields should be split into smaller pieces, defaults to a field with the name „text“ if none is given. By default the texts are split at a 3000 character limit! Users who do not want to have their documents split should set this field to an empty list.

Märkus

Kui dokumentidel ei ole „_index“ ja „_type“ välju, genereeritakse indeksi nimi automaatselt.

Näited

Endpoint: /projects/{project_pk}/elastic/documents/

Näide, kus indeksi nimi lisatakse automaatselt ning tekste ei jaotata mitmeks:

curl -X POST "http://localhost:8000/api/v2/projects/1/elastic/documents/" \
-H "accept: application/json" \
-H "Content-Type: application/json" \
-H "Authorization: Token 8229898dccf960714a9fa22662b214005aa2b049" \
-d '{
      "documents": [{"_id": "3", "_source": {"hello": "general kenobi!"}}],
      "split_text_in_fields": []
    }'

Näide, kus indeksi nimi on kaasa antud ning teksti jaotatakse mitmeks:

curl -X POST "http://localhost:8000/api/v2/projects/1/elastic/documents/" \
-H "accept: application/json" \
-H "Content-Type: application/json" \
-H "Authorization: Token 8229898dccf960714a9fa22662b214005aa2b049" \
-d '{
      "documents": [{"_id": "30", "uuid": "aa15-ghh4-41af-af51", "_index": "texta_test_index", "_type": "texta_test_index", "_source": {"hello": "general kenobi! Here is a very long text that should be splitted", "date": "2015-01-01T12:10:30Z"}}],
      "split_text_in_fields": ["hello"]
    }'

Dokumendi vaatamine

Endpoint: projects/{project_pk}/elastic/documents/{index_name}/{document_id}/

curl -X GET "http://localhost:8000/api/v2/projects/1/elastic/documents/texta_text_index/30/"

Dokumendi kustutamine

Endpoint: projects/{project_pk}/elastic/documents/{index_name}/{document_id}/

curl -X DELETE "http://localhost:8000/api/v2/projects/1/elastic/documents/texta_text_index/30"

Mitmeks jagatud dokumendi uuendamine

Parameetrid

id_field:

Millist välja kasutada ID markerina, et kategoriseerida mitmeks jagatud dokumente tagasi üheks üksuseks.

id_value:

asendatava dokumendi ID välja väärtus.

text_field:

Täpsustab tekstivälja, mida soovid uuendada

content:

Sisu, millega valitud tekstiväli asendatakse

Näide

Endpoint: projects/{project_pk}/elastic/documents/{index_name}/update_split

Märkus

Lack of trailing „/“ is important for this endpoint!

curl -X POST "http://localhost:8000/api/v2/projects/1/elastic/documents/texta_test_index/update_split" \
-H "accept: application/json" \
-H "Content-Type: application/json" \
-H "Authorization: Token 8229898dccf960714a9fa22662b214005aa2b049" \
-d '{
      "content": "general kenobi! Here is a very long text that should be splitted and now there is more text I forgot to add before and am replacing now",
      "text_field": "hello",
      "id_value": "uuid",
      "id_field": "aa15-ghh4-41af-af51"
    }'