Tutorial: API¶

The purpose of this tutorial is to get you started with using Toolkit API. The tutorial gives you an overview of the most fundamental API operations together with illustrating examples. For more detailed documentations please see API reference.

Health of Toolkit¶

For checking the health of a running Toolkit instance, one can access the /health endpoint for operating statistics. The endpoint responds with information abouth the availability of services (e.g. Elasticsearch and TEXTA MLP) and system resources (e.g. disk, memory, GPU usage, etc.).

{
    "elastic": {
        "url": "http://elasticsearch:9200",
        "alive": true,
        "status": {
            "name": "elastic-dev",
            "cluster_name": "TEXTA-elastic",
            "cluster_uuid": "hyho3XLIS4qj9HijALguUA",
            "version": {
                "number": "6.8.6",
                "build_flavor": "default",
                "build_type": "deb",
                "build_hash": "3d9f765",
                "build_date": "2019-12-13T17:11:52.013738Z",
                "build_snapshot": false,
                "lucene_version": "7.7.2",
                "minimum_wire_compatibility_version": "5.6.0",
                "minimum_index_compatibility_version": "5.0.0"
            },
            "tagline": "You Know, for Search"
        }
    },
    "mlp": {
        "url": "http://texta-mlp:5000",
        "alive": true,
        "status": {
            "loaded entities": [
                "/opt/texta-mlp/entity_mapper/data/addresses.json",
                "/opt/texta-mlp/entity_mapper/data/atc.json",
                "/opt/texta-mlp/entity_mapper/data/companies.json",
                "/opt/texta-mlp/entity_mapper/data/substances.json",
                "/opt/texta-mlp/entity_mapper/data/drugs.json"
            ],
            "service": "TEXTA MLP",
            "version": "2.2.1"
        }
    },
    "version": "2.3.10",
    "disk": {
        "free": 316.9075050354004,
        "total": 1599.4990272521973,
        "used": 1282.5915222167969,
        "unit": "GB"
    },
    "memory": {
        "free": 22.222003936767578,
        "total": 31.4151611328125,
        "used": 8.80206298828125,
        "unit": "GB"
    },
    "cpu": {
        "percent": 0.8
    },
    "gpu": {
        "count": 0,
        "devices": []
    },
    "active_tasks": 0
}

Registration¶

Endpoint: /rest-auth/registration/

Example:

curl -X POST "http://localhost:8000/api/v1/rest-auth/registration/" \
-H  "accept: application/json" \
-H  "Content-Type: application/json" \
-d '{
        "username": "myname",
        "email": "myname@example.com",
        "password1": "a123s456789",
        "password2": "a123s456789"
    }'

Response:

{"key":"7cd98b388e85b82bd084c80418d56a185b3a35ba"}

Response is the Token key that you will later need to authenticate requests.

Logging in¶

Endpoint: /rest-auth/login/

Example:

curl -X POST "http://localhost:8000/api/v1/rest-auth/login/" \
-H "Content-Type: application/json" \
-d '{
        "username": "admin",
        "password": "1234"
    }'

Response:

{"key":"8229898dccf960714a9fa22662b214005aa2b049"}

Response is the Token key that you will later need to authenticate requests.

Projects¶

Create a new project¶

Endpoint: /projects/

Example:

curl -X POST "http://localhost:8000/api/v1/projects/" \
-H "accept: application/json" \
-H "Content-Type: application/json" \
-H "Authorization: Token 8229898dccf960714a9fa22662b214005aa2b049" \
-d '{
        "title": "My project",
        "users": ["http://localhost:8000/api/v1/users/1/"],
        "indices": ["texta_test_index"]
    }'

Anonymizers¶

Create a new anonymizer¶

Endpoint: /projects/{project_pk}/anonymizers/

Example:

curl -X POST "http://localhost:8000/api/v1/projects/11/anonymizers/" \
-H "accept: application/json" \
-H "Content-Type: application/json" \
-H "Authorization: Token 8229898dccf960714a9fa22662b214005aa2b049" \
-d '{
        "description": "My anonymizer",
        "replace_misspelled_names": true,
        "replace_single_last_names": true,
        "replace_single_first_names": true,
        "misspelling_threshold": 0.9,
        "mimic_casing": true,
        "auto_adjust_threshold": true
    }'

Anonymize text¶

Endpoint /projects/{project_pk}/anonymizers/{id}/anonymize_text/

Example:

curl -X POST "http://localhost:8000/api/v1/projects/11/anonymizers/1/anonymize_text/" \
-H "accept: application/json" \
-H "Content-Type: application/json" \
-H "Authorization: Token 8229898dccf960714a9fa22662b214005aa2b049" \
-d '{
        "text": "Bonnie Parker and Clyde Barrow are believed to have murdered at least nine police officers.",
        "names": ["Parker, Bonnie Elizabeth", "Chestnut Barrow, Clyde"]
    }'

Response:

"N.Q and X.R are believed to have murdered at least nine police officers."

Anonymize texts¶

Endpoint /projects/{project_pk}/anonymizers/{id}/anonymize_texts/

Example:

curl -X POST "http://localhost:8000/api/v1/projects/11/anonymizers/1/anonymize_texts/" \
-H "accept: application/json" \
-H "Content-Type: application/json" \
-H "Authorization: Token 8229898dccf960714a9fa22662b214005aa2b049" \
-d '{
       "texts": [
            "Bonnie Parker and Clyde Barrow are believed to have murdered at least nine police officers.",
            "Bonnie and Clyde were killed in May 1934."
        ],
       "names": ["Parker, Bonnie Elizabeth", "Chestnut Barrow, Clyde"],
       "consistent_replacement": true
    }'

Response:

[
    "F.Q and T.T are believed to have murdered at least nine police officers.",
    "F.Q and T.T were killed in May 1934."
]

MLP¶

Apply MLP to texts¶

Endpoint: /mlp/texts/

Example:

curl -X POST "http://localhost:8000/api/v1/projects/11/embeddings/" \
-H "accept: application/json" \
-H "Content-Type: application/json" \
-H "Authorization: Token 8229898dccf960714a9fa22662b214005aa2b049" \
-d '{
        "texts": ["Mis su nimi on?", "Ettepanek minna üle neljapäevasele töönädalale lükati tagasi."]
    }'

Response:

[
    {
        "text": {"text":"Mis su nimi on ?","lang":"et","lemmas":"mis sina nimi olema ?","pos_tags":"P P S V Z"},
        "texta_facts":[]
    },
    {
        "text": {"text":"Ettepanek minna üle neljapäevasele töönädalale lükati tagasi .","lang":"et","lemmas":"ettepanek minema üle neljapäevane töönädal lükkama tagasi .","pos_tags":"S V K A S V D Z"},
        "texta_facts":[]
    }
]

Embeddings¶

Train a new embedding¶

Endpoint: /projects/{project_pk}/embeddings/

Example:

curl -X POST "http://localhost:8000/api/v1/projects/11/embeddings/" \
-H "accept: application/json" \
-H "Content-Type: application/json" \
-H "Authorization: Token 8229898dccf960714a9fa22662b214005aa2b049" \
-d '{
        "description": "My embedding",
        "indices": [{"name": "texta_test_index"}],
        "fields": ["comment_content_lemmas"],
        "num_dimensions": 100,
        "max_documents": 10000,
        "min_freq": 5
    }'

Taggers¶

Train a new tagger¶

Endpoint: /projects/{project_pk}/taggers/

Example:

curl -X POST "http://localhost:8000/api/v1/projects/11/taggers/" \
-H "accept: application/json" \
-H "Content-Type: application/json" \
-H "Authorization: Token 8229898dccf960714a9fa22662b214005aa2b049" \
-d '{
        "description": "My tagger",
        "fields": ["comment_content_lemmas"],
        "vectorizer": "Hashing Vectorizer",
        "classifier": "Logistic Regression",
        "indices": [{"name": "texta_test_index"}]
    }'

Tag text¶

Endpoint /projects/{project_pk}/taggers/{id}/tag_text/

Example:

curl -X POST "http://localhost:8000/api/v1/projects/11/taggers/2/tag_text/" \
-H "accept: application/json" \
-H "Content-Type: application/json" \
-H "Authorization: Token 8229898dccf960714a9fa22662b214005aa2b049" \
-d '{
        "text": "mis su nimi on?",
        "lemmatize": true
    }'

Response:

{
    "tag":"My tagger",
    "probability":0.9898217973842874,
    "tagger_id":2,
    "result":true
}

Regex Taggers¶

Create a new Regex Tagger¶

Endpoint: /projects/{project_pk}/regex_taggers/

Example:

curl -X POST "http://localhost:8000/api/v1/projects/11/regex_taggers/" \
-H "accept: application/json" \
-H "Content-Type: application/json" \
-H "Authorization: Token 8229898dccf960714a9fa22662b214005aa2b049" \
-d '{
        "description": "monsters",
        "lexicon": ["poltergeist", "vampire", "werewolf", "beast", "zombie", "ghost"],
        "counter_lexicon": ["no", "not", "neither", "nor"],
        "operator": "or",
        "match_type": "prefix",
        "required_words": 1.0,
        "phrase_slop": 0,
        "counter_slop": 2,
        "n_allowed_edits": 0,
        "return_fuzzy_match": true,
        "ignore_case": true,
        "ignore_punctuation": false

    }'

Tag doc¶

Endpoint /projects/{project_pk}/regex_taggers/{id}/tag_doc/

Example:

curl -X POST "http://localhost:8000/api/v1/projects/11/regex_taggers/1/tag_doc/" \
-H "accept: application/json" \
-H "Content-Type: application/json" \
-H "Authorization: Token 8229898dccf960714a9fa22662b214005aa2b049" \
-d '{
        "doc": {
            "title": "Horrendous werewolf attack in Yorkshire",
            "body": "An American tourist was attacked by a werewolf. The beast escaped.",
            "id": 12
        },
        "fields": ["title", "body"]
    }'

Response:

{
    "tagger_id": 1,
    "tag": "monsters",
    "result": true,
    "matches": [
        {
            "str_val": "werewolf",
            "span": [
                11,
                19
            ],
            "field": "title"
        },
        {
            "str_val": "werewolf",
            "span": [
                38,
                46
            ],
            "field": "body"
        },
        {
            "str_val": "beast",
            "span": [
                52,
                57
            ],
            "field": "body"
        }
    ]
}

Tag random doc¶

Endpoint /projects/{project_pk}/regex_taggers/{id}/tag_random_doc/

Example:

curl -X POST "http://localhost:8000/api/v1/projects/11/regex_taggers/1/tag_random_doc/" \
-H "accept: application/json" \
-H "Content-Type: application/json" \
-H "Authorization: Token 8229898dccf960714a9fa22662b214005aa2b049" \
-d '{
        "indices": [{"name": "news_articles"}],
        "fields": ["title", "body"]
    }'

Response:

{
    "tagger_id": 1,
    "tag": "monsters",
    "result": true,
    "matches": [
        {
            "str_val": "zombie",
            "span": [
                25,
                30
            ],
            "field": "title"
        },
        {
            "str_val": "zombie",
            "span": [
                46,
                51
            ],
            "field": "body"
        }
    ],
    "document": {
        "title": "Local boy infected by a zombie virus",
        "body": "John Smith, 13, claims to have symptoms of a zombie virus.",
        "id": 16
    }
}

Tag text¶

Endpoint /projects/{project_pk}/regex_taggers/{id}/tag_text/

Example:

curl -X POST "http://localhost:8000/api/v1/projects/11/regex_taggers/1/tag_text/" \
-H "accept: application/json" \
-H "Content-Type: application/json" \
-H "Authorization: Token 8229898dccf960714a9fa22662b214005aa2b049" \
-d '{
        "text": "The old mansion was now a home for 217 ghosts and 10 vampires.",
    }'

Response:

{
  "tagger_id": 1,
  "tag": "monsters",
  "result": true,
  "matches": [
      {
          "str_val": "ghosts",
          "span": [
              39,
              45
          ]
      },
      {
          "str_val": "vampires",
          "span": [
              53,
              61
          ]
      }
  ]
}

Tag texts¶

Endpoint /projects/{project_pk}/regex_taggers/{id}/tag_texts/

Example:

curl -X POST "http://localhost:8000/api/v1/projects/11/regex_taggers/1/tag_texts/" \
-H "accept: application/json" \
-H "Content-Type: application/json" \
-H "Authorization: Token 8229898dccf960714a9fa22662b214005aa2b049" \
-d '{
        "texts": [
          "Two poltergeist were seen at the end of the hallway.",
          "It was neither a ghost nor a human.",
          "A vampire was out for a walk."
        ]
    }'

Response:

[
    {
        "tagger_id": 1,
        "tag": "monsters",
        "result": true,
        "matches": [
            {
                "str_val": "poltergeist",
                "span": [
                    4,
                    15
                ]
            }
        ]
    },
    {
        "tagger_id": 1,
        "tag": "monsters",
        "result": false,
        "matches": []
    },
    {
        "tagger_id": 1,
        "tag": "monsters",
        "result": true,
        "matches": [
            {
                "str_val": "vampire",
                "span": [
                    2,
                    9
                ]
            }
        ]
    }
]

Multitag text¶

Endpoint /projects/{project_pk}/regex_taggers/multitag_text/

Example:

curl -X POST "http://localhost:8000/api/v1/projects/11/regex_taggers/multitag_text/" \
-H "accept: application/json" \
-H "Content-Type: application/json" \
-H "Authorization: Token 8229898dccf960714a9fa22662b214005aa2b049" \
-d '{
        "text": "Michael Myers had an unfortunate encounter with a pack of werewolfs.",
        "taggers": [1, 2]
    }'

Response:

[
    {
        "tagger_id": 1,
        "tag": "monsters",
        "matches": [
            {
                "str_val": "werewolfs",
                "span": [
                    58,
                    67
                ]
            }
        ]
    },
    {
        "tagger_id": 2,
        "tag": "serial killers",
        "matches": [
            {
                "str_val": "michael myers",
                "span": [
                    0,
                    13
                ]
            }
        ]
    }
]

Tagger Groups¶

Train a new tagger group¶

Endpoint: /projects/{project_pk}/tagger_groups/

Example:

curl -X POST "http://localhost:8000/api/v1/projects/11/tagger_groups/" \
-H "accept: application/json" \
-H "Content-Type: application/json" \
-H "Authorization: Token 8229898dccf960714a9fa22662b214005aa2b049" \
-d '{
        "description": "My tagger group",
        "fact_name": "TEEMA",
        "tagger":
                {
                    "fields": ["comment_content_lemmas"],
                    "vectorizer": "TfIdf Vectorizer",
                    "classifier": "Logistic Regression",
                    "indices": [{"name": "texta_test_index"}]
                }
    }'

Tag text¶

Endpoint: /projects/{project_pk}/tagger_groups/{id}/tag_text/

Example:

curl -X POST "http://localhost:8000/api/v1/projects/11/tagger_groups/1/tag_text/" \
-H "accept: application/json" \
-H "Content-Type: application/json" \
-H "Authorization: Token 8229898dccf960714a9fa22662b214005aa2b049" \
-d '{
        "text": "AINUS ettepanek - alla põhihariduse isikutele sõidulubasid mitte anda - sai kriitika osaliseks.",
        "lemmatize": true,
        "n_similar_docs": 10,
        "n_candidate_tags": 10
    }'

Response:

[
    {
        "tag": "foo",
        "probability": 0.6659222999240199,
        "tagger_id": 4,
        "result": true
    },
    {
        "tag": "bar",
        "probability": 0.5107991699285356,
        "tagger_id": 3,
        "result": true
    }
]

Regex Tagger Groups¶

Create a new regex tagger group¶

Endpoint: /projects/{project_pk}/regex_tagger_groups/

Example:

curl -X POST "http://localhost:8000/api/v1/projects/11/regex_tagger_groups/" \
-H "accept: application/json" \
-H "Content-Type: application/json" \
-H "Authorization: Token 8229898dccf960714a9fa22662b214005aa2b049" \
-d '{
      "regex_taggers": [1, 2],
      "description": "horror"
    }'

Response:

{
    "id": 1,
    "url": "http://localhost:8000/api/v1/projects/11/regex_tagger_groups/1/",
    "regex_taggers": [
        1,
        2
    ],
    "author_username": "admin",
    "task": null,
    "description": "horror",
    "tagger_info": [
        {
            "tagger_id": 1,
            "description": "monsters"
        },
        {
            "tagger_id": 2,
            "description": "serial killers"
        }
    ]
}

Apply regex tagger group¶

Endpoint: /projects/{project_pk}/regex_tagger_groups/{id}/apply_tagger_group/

Example:

curl -X POST "http://localhost:8000/api/v1/projects/11/regex_tagger_groups/1/apply_tagger_group/" \
-H "accept: application/json" \
-H "Content-Type: application/json" \
-H "Authorization: Token 8229898dccf960714a9fa22662b214005aa2b049" \
-d '{
      "description": "Apply my regex tagger group",
      "indices": [{"name": "news_articles"}],
      "fields": ["title", "body"],
      "query": {}
    }'

Response:

{
    "message": "Started process of applying RegexTaggerGroup with id: 1"
}

Tag doc¶

Endpoint: /projects/{project_pk}/regex_tagger_groups/{id}/tag_doc/

Example:

curl -X POST "http://localhost:8000/api/v1/projects/11/regex_tagger_groups/1/tag_doc/" \
-H "accept: application/json" \
-H "Content-Type: application/json" \
-H "Authorization: Token 8229898dccf960714a9fa22662b214005aa2b049" \
-d '{
       "doc": {
          "title": "Michael Myers returns to Haddonfield",
          "body": "Michael Myers returns to Haddonfield to take back his family home from a couple of belligerent poltergeists.",
          "id": 3
        },
       "fields": ["title", "body"]
    }'

Response:

{
    "tagger_group_id": 1,
    "tagger_group_tag": "horror",
    "result": true,
    "tags": [
        {
            "tagger_id": 2,
            "tag": "serial killers",
            "matches": [
                {
                    "str_val": "michael myers",
                    "span": [
                        0,
                        13
                    ],
                    "field": "title"
                },
                {
                    "str_val": "michael myers",
                    "span": [
                        0,
                        13
                    ],
                    "field": "body"
                }
            ]
        },
        {
            "tagger_id": 1,
            "tag": "monsters",
            "matches": [
                {
                    "str_val": "poltergeists",
                    "span": [
                        95,
                        107
                    ],
                    "field": "body"
                }
            ]
        }
    ]
}

Tag random doc¶

Endpoint: /projects/{project_pk}/regex_tagger_groups/{id}/tag_random_doc/

Example:

curl -X POST "http://localhost:8000/api/v1/projects/11/regex_tagger_groups/1/tag_random_doc/" \
-H "accept: application/json" \
-H "Content-Type: application/json" \
-H "Authorization: Token 8229898dccf960714a9fa22662b214005aa2b049" \
-d '{
        "indices": [{"name": "news_articles"}],
        "fields": ["title", "body"]
    }'

Response:

{
    "tagger_group_id": 1,
    "tagger_group_tag": "horror",
    "result": true,
    "tags": [
        {
            "tagger_id": 2,
            "tag": "serial killers",
            "matches": [
                {
                    "str_val": "michael myers",
                    "span": [
                        0,
                        13
                    ],
                    "field": "title"
                },
                {
                    "str_val": "michael myers",
                    "span": [
                        0,
                        13
                    ],
                    "field": "body"
                }
            ]
        },
        {
            "tagger_id": 1,
            "tag": "monsters",
            "matches": [
                {
                    "str_val": "poltergeists",
                    "span": [
                        95,
                        107
                    ],
                    "field": "body"
                }
            ]
        }
    ],
    "document": {
       "title": "Michael Myers returns to Haddonfield",
       "body": "Michael Myers returns to Haddonfield to take back his family home from a couple of belligerent poltergeists.",
       "id": 3
     },
}

Tag text¶

Endpoint: /projects/{project_pk}/regex_tagger_groups/{id}/tag_text/

Example:

curl -X POST "http://localhost:8000/api/v1/projects/11/regex_tagger_groups/1/tag_text/" \
-H "accept: application/json" \
-H "Content-Type: application/json" \
-H "Authorization: Token 8229898dccf960714a9fa22662b214005aa2b049" \
-d '{
      "text": "Jason Voorhees was having a peaceful camping trip until his roads crossed with two rowdy vampires accompanied by Freddy Krueger."
    }'

Response:

{
    "tagger_group_id": 4,
    "tagger_group_tag": "horror",
    "result": true,
    "tags": [
        {
            "tag": "monsters",
            "tagger_id": 7,
            "matches": [
                {
                    "str_val": "vampires",
                    "span": [
                        89,
                        97
                    ]
                }
            ]
        },
        {
            "tag": "serial killers",
            "tagger_id": 8,
            "matches": [
                {
                    "str_val": "jason voorhees",
                    "span": [
                        0,
                        14
                    ]
                },
                {
                    "str_val": "freddy krueger",
                    "span": [
                        113,
                        127
                    ]
                }
            ]
        }
    ]
}

Tag texts¶

Endpoint: /projects/{project_pk}/regex_tagger_groups/{id}/tag_texts/

Example:

curl -X POST "http://localhost:8000/api/v1/projects/11/regex_tagger_groups/1/tag_texts/" \
-H "accept: application/json" \
-H "Content-Type: application/json" \
-H "Authorization: Token 8229898dccf960714a9fa22662b214005aa2b049" \
-d '{
        "texts": [
            "Norman Bates went knife-shopping with Michael Myers.",
            "The weather was nice.",
            "Jason Voorhees was scared of the ghost living in his cupboard."
        ]
    }'

Response:

{
    "tagger_group_id": 4,
    "tagger_group_tag": "horror",
    "tags": [
        [
            {
                "tag": "serial killers",
                "tagger_id": 8,
                "matches": [
                    {
                        "str_val": "norman bates",
                        "span": [
                            0,
                            12
                        ]
                    },
                    {
                        "str_val": "michael myers",
                        "span": [
                            38,
                            51
                        ]
                    }
                ]
            }
        ],
        [],
        [
            {
                "tag": "monsters",
                "tagger_id": 7,
                "matches": [
                    {
                        "str_val": "ghost",
                        "span": [
                            33,
                            38
                        ]
                    }
                ]
            },
            {
                "tag": "serial killers",
                "tagger_id": 8,
                "matches": [
                    {
                        "str_val": "jason voorhees",
                        "span": [
                            0,
                            14
                        ]
                    }
                ]
            }
        ]
    ]
}

Torch Tagger¶

Train a new Torch Tagger¶

Endpoint: /projects/{project_pk}/torchtaggers/

Example:

curl -X POST "http://localhost:8000/api/v1/projects/11/torchtaggers/" \
-H "accept: application/json" \
-H "Content-Type: application/json" \
-H "Authorization: Token 8229898dccf960714a9fa22662b214005aa2b049" \
-d '{
        "description": "My Torch Tagger",
        "fields": ["comment_content_clean.text"],
        "model_architecture": "TextRNN",
        "embedding": 2,
        "num_epochs": 5,
        "fact_name": "TEEMA",
        "indices": [{"name": "texta_test_index"}]
    }'

Tag text¶

Endpoint: /projects/{project_pk}/torchtaggers/{id}/tag_text/

Example:

curl -X POST "http://localhost:8000/api/v1/projects/11/torchtaggers/1/tag_text/" \
-H "accept: application/json" \
-H "Content-Type: application/json" \
-H "Authorization: Token 8229898dccf960714a9fa22662b214005aa2b049" \
-d '{
        "text": "AINUS ettepanek - alla põhihariduse isikutele sõidulubasid mitte anda - sai kriitika osaliseks.",
        "lemmatize": false
    }'

Response:

{
    "result": "foo",
    "probability": 0.36259710788726807
}

Dataset Importer¶

This module allows the user to insert jsonlines, csv and excel files into Elasticsearch to make them accessible by the Toolkit. Please note that this process reads the whole file into the memory and can thus create memory issues when trying to process bigger files, it is advisable to split such files up into smaller chunks and process each one separately.

This whole process is asynchronous so the response to the call will be instantaneous and it since it takes a bit time to load the file into the memory, the first 10 seconds might not display any signs of the progress changing.

Parameters:¶

description - Normal description to separate any given task from the other ones.
index - Name of the newly created index, please note that Elasticsearch index naming restrictions apply.
separator - Only needed for .csv files, defaults to a coma (,). Allows to configure the separator for csv files.

Example:

@ is special syntax for reading the binary of the given file name.

curl -H "Authorization: Token 8229898dccf960714a9fa22662b214005aa2b049" \
-F "description=Articles" \
-F "index=en_articles" \
-F "file=@FILE_NAME.csv" \
http://localhost:8000/api/v1/projects/11/dataset_imports/

Browsable API¶

http://localhost:8000/api/v1/

TODO

API Reference¶

Reference for Toolkit API is available when running the Toolkit: