Tutorial: API¶
Selle juhendi eesmärk on õpetada, kuidas teha algust TEXTA Toolkiti API-ga. Juhend annab ülevaate kõige põhilistemast API operatsioonidest koos illustreerivate näidetega. Detailsemat dokumentatsiooni näeb API viitest.
Toolkiti seisund¶
Töötava Toolkiti seisundit saab uurida statistika opereerimise /health otspunktis (endpoint). Sealt saab kätte teenuste (e.g. Elasticsearch and TEXTA MLP) ja süsteemi ressursside (nt diski, mälu, GPU kasutus jne) kättesaadavuse info.
{
"elastic": {
"url": "http://elasticsearch:9200",
"alive": true,
"status": {
"name": "elastic-dev",
"cluster_name": "TEXTA-elastic",
"cluster_uuid": "hyho3XLIS4qj9HijALguUA",
"version": {
"number": "6.8.6",
"build_flavor": "default",
"build_type": "deb",
"build_hash": "3d9f765",
"build_date": "2019-12-13T17:11:52.013738Z",
"build_snapshot": false,
"lucene_version": "7.7.2",
"minimum_wire_compatibility_version": "5.6.0",
"minimum_index_compatibility_version": "5.0.0"
},
"tagline": "You Know, for Search"
}
},
"mlp": {
"url": "http://texta-mlp:5000",
"alive": true,
"status": {
"loaded entities": [
"/opt/texta-mlp/entity_mapper/data/addresses.json",
"/opt/texta-mlp/entity_mapper/data/atc.json",
"/opt/texta-mlp/entity_mapper/data/companies.json",
"/opt/texta-mlp/entity_mapper/data/substances.json",
"/opt/texta-mlp/entity_mapper/data/drugs.json"
],
"service": "TEXTA MLP",
"version": "2.2.1"
}
},
"version": "2.3.10",
"disk": {
"free": 316.9075050354004,
"total": 1599.4990272521973,
"used": 1282.5915222167969,
"unit": "GB"
},
"memory": {
"free": 22.222003936767578,
"total": 31.4151611328125,
"used": 8.80206298828125,
"unit": "GB"
},
"cpu": {
"percent": 0.8
},
"gpu": {
"count": 0,
"devices": []
},
"active_tasks": 0
}
Registreerimine¶
Otspunkt (endpoint): /rest-auth/registration/
Näide:
curl -X POST "http://localhost:8000/api/v1/rest-auth/registration/" \
-H "accept: application/json" \
-H "Content-Type: application/json" \
-d '{
"username": "myname",
"email": "myname@example.com",
"password1": "a123s456789",
"password2": "a123s456789"
}'
Vastus (response):
{"key":"7cd98b388e85b82bd084c80418d56a185b3a35ba"}
Vastus (response) on see Tokeni võti (Token key), mida hiljem tuleb kasutada päringute autentiseerimiseks.
Sisse logimine¶
Otspunkt: /rest-auth/login/
Näide:
curl -X POST "http://localhost:8000/api/v1/rest-auth/login/" \
-H "Content-Type: application/json" \
-d '{
"username": "admin",
"password": "1234"
}'
Vastus (response):
{"key":"8229898dccf960714a9fa22662b214005aa2b049"}
Vastus (response) on see Tokeni võti (Token key), mida hiljem tuleb kasutada päringute autentiseerimiseks.
Projektid¶
Loo uus projekt¶
Otspunkt: /projects/
Näide:
curl -X POST "http://localhost:8000/api/v1/projects/" \
-H "accept: application/json" \
-H "Content-Type: application/json" \
-H "Authorization: Token 8229898dccf960714a9fa22662b214005aa2b049" \
-d '{
"title": "My project",
"users": ["http://localhost:8000/api/v1/users/1/"],
"indices": ["texta_test_index"]
}'
Anonüümijad¶
Loo uus anonüümija¶
Otspunkt: /projects/{project_pk}/anonymizers/
Näide:
curl -X POST "http://localhost:8000/api/v1/projects/11/anonymizers/" \
-H "accept: application/json" \
-H "Content-Type: application/json" \
-H "Authorization: Token 8229898dccf960714a9fa22662b214005aa2b049" \
-d '{
"description": "My anonymizer",
"replace_misspelled_names": true,
"replace_single_last_names": true,
"replace_single_first_names": true,
"misspelling_threshold": 0.9,
"mimic_casing": true,
"auto_adjust_threshold": true
}'
Anonüümi tekst¶
Otspunkt: /projects/{project_pk}/anonymizers/{id}/anonymize_text/
Näide:
curl -X POST "http://localhost:8000/api/v1/projects/11/anonymizers/1/anonymize_text/" \
-H "accept: application/json" \
-H "Content-Type: application/json" \
-H "Authorization: Token 8229898dccf960714a9fa22662b214005aa2b049" \
-d '{
"text": "Bonnie Parker and Clyde Barrow are believed to have murdered at least nine police officers.",
"names": ["Parker, Bonnie Elizabeth", "Chestnut Barrow, Clyde"]
}'
Vastus (response):
"N.Q and X.R are believed to have murdered at least nine police officers."
Anonüümi tekstid¶
Otspunkt: /projects/{project_pk}/anonymizers/{id}/anonymize_texts/
Näide:
curl -X POST "http://localhost:8000/api/v1/projects/11/anonymizers/1/anonymize_texts/" \
-H "accept: application/json" \
-H "Content-Type: application/json" \
-H "Authorization: Token 8229898dccf960714a9fa22662b214005aa2b049" \
-d '{
"texts": [
"Bonnie Parker and Clyde Barrow are believed to have murdered at least nine police officers.",
"Bonnie and Clyde were killed in May 1934."
],
"names": ["Parker, Bonnie Elizabeth", "Chestnut Barrow, Clyde"],
"consistent_replacement": true
}'
Vastus (response):
[
"F.Q and T.T are believed to have murdered at least nine police officers.",
"F.Q and T.T were killed in May 1934."
]
MLP¶
MLP rakendamine tekstidel¶
Otspunkt: /mlp/texts/
Näide:
curl -X POST "http://localhost:8000/api/v1/projects/11/embeddings/" \
-H "accept: application/json" \
-H "Content-Type: application/json" \
-H "Authorization: Token 8229898dccf960714a9fa22662b214005aa2b049" \
-d '{
"texts": ["Mis su nimi on?", "Ettepanek minna üle neljapäevasele töönädalale lükati tagasi."]
}'
Vastus (response):
[
{
"text": {"text":"Mis su nimi on ?","lang":"et","lemmas":"mis sina nimi olema ?","pos_tags":"P P S V Z"},
"texta_facts":[]
},
{
"text": {"text":"Ettepanek minna üle neljapäevasele töönädalale lükati tagasi .","lang":"et","lemmas":"ettepanek minema üle neljapäevane töönädal lükkama tagasi .","pos_tags":"S V K A S V D Z"},
"texta_facts":[]
}
]
Sõnavektorid¶
Treeni uus sõnavektorite mudel¶
Otspunkt: /projects/{project_pk}/embeddings/
Näide:
curl -X POST "http://localhost:8000/api/v1/projects/11/embeddings/" \
-H "accept: application/json" \
-H "Content-Type: application/json" \
-H "Authorization: Token 8229898dccf960714a9fa22662b214005aa2b049" \
-d '{
"description": "My embedding",
"indices": [{"name": "texta_test_index"}],
"fields": ["comment_content_lemmas"],
"num_dimensions": 100,
"max_documents": 10000,
"min_freq": 5
}'
Märgendajad (Taggers)¶
Treeni uus Tagger¶
Otspunkt: /projects/{project_pk}/taggers/
Näide:
curl -X POST "http://localhost:8000/api/v1/projects/11/taggers/" \
-H "accept: application/json" \
-H "Content-Type: application/json" \
-H "Authorization: Token 8229898dccf960714a9fa22662b214005aa2b049" \
-d '{
"description": "My tagger",
"fields": ["comment_content_lemmas"],
"vectorizer": "Hashing Vectorizer",
"classifier": "Logistic Regression",
"indices": [{"name": "texta_test_index"}]
}'
Märgenda tekst¶
Otspunkt: /projects/{project_pk}/taggers/{id}/tag_text/
Näide:
curl -X POST "http://localhost:8000/api/v1/projects/11/taggers/2/tag_text/" \
-H "accept: application/json" \
-H "Content-Type: application/json" \
-H "Authorization: Token 8229898dccf960714a9fa22662b214005aa2b049" \
-d '{
"text": "mis su nimi on?",
"lemmatize": true
}'
Vastus (response):
{
"tag":"My tagger",
"probability":0.9898217973842874,
"tagger_id":2,
"result":true
}
Mustripõhised märgendajad (Regex Taggers)¶
Loo uus mustripõhine märgendaja¶
Otspunkt: /projects/{project_pk}/regex_taggers/
Näide:
curl -X POST "http://localhost:8000/api/v1/projects/11/regex_taggers/" \
-H "accept: application/json" \
-H "Content-Type: application/json" \
-H "Authorization: Token 8229898dccf960714a9fa22662b214005aa2b049" \
-d '{
"description": "monsters",
"lexicon": ["poltergeist", "vampire", "werewolf", "beast", "zombie", "ghost"],
"counter_lexicon": ["no", "not", "neither", "nor"],
"operator": "or",
"match_type": "prefix",
"required_words": 1.0,
"phrase_slop": 0,
"counter_slop": 2,
"n_allowed_edits": 0,
"return_fuzzy_match": true,
"ignore_case": true,
"ignore_punctuation": false
}'
Märgenda dokument¶
Otspunkt: /projects/{project_pk}/regex_taggers/{id}/tag_doc/
Näide:
curl -X POST "http://localhost:8000/api/v1/projects/11/regex_taggers/1/tag_doc/" \
-H "accept: application/json" \
-H "Content-Type: application/json" \
-H "Authorization: Token 8229898dccf960714a9fa22662b214005aa2b049" \
-d '{
"doc": {
"title": "Horrendous werewolf attack in Yorkshire",
"body": "An American tourist was attacked by a werewolf. The beast escaped.",
"id": 12
},
"fields": ["title", "body"]
}'
Vastus (response):
{
"tagger_id": 1,
"tag": "monsters",
"result": true,
"matches": [
{
"str_val": "werewolf",
"span": [
11,
19
],
"field": "title"
},
{
"str_val": "werewolf",
"span": [
38,
46
],
"field": "body"
},
{
"str_val": "beast",
"span": [
52,
57
],
"field": "body"
}
]
}
Märgenda juhuslik dokument¶
Otspunkt: /projects/{project_pk}/regex_taggers/{id}/tag_random_doc/
Näide:
curl -X POST "http://localhost:8000/api/v1/projects/11/regex_taggers/1/tag_random_doc/" \
-H "accept: application/json" \
-H "Content-Type: application/json" \
-H "Authorization: Token 8229898dccf960714a9fa22662b214005aa2b049" \
-d '{
"indices": [{"name": "news_articles"}],
"fields": ["title", "body"]
}'
Vastus (response):
{
"tagger_id": 1,
"tag": "monsters",
"result": true,
"matches": [
{
"str_val": "zombie",
"span": [
25,
30
],
"field": "title"
},
{
"str_val": "zombie",
"span": [
46,
51
],
"field": "body"
}
],
"document": {
"title": "Local boy infected by a zombie virus",
"body": "John Smith, 13, claims to have symptoms of a zombie virus.",
"id": 16
}
}
Märgenda tekst¶
Otspunkt: /projects/{project_pk}/regex_taggers/{id}/tag_text/
Näide:
curl -X POST "http://localhost:8000/api/v1/projects/11/regex_taggers/1/tag_text/" \
-H "accept: application/json" \
-H "Content-Type: application/json" \
-H "Authorization: Token 8229898dccf960714a9fa22662b214005aa2b049" \
-d '{
"text": "The old mansion was now a home for 217 ghosts and 10 vampires.",
}'
Vastus (response):
{
"tagger_id": 1,
"tag": "monsters",
"result": true,
"matches": [
{
"str_val": "ghosts",
"span": [
39,
45
]
},
{
"str_val": "vampires",
"span": [
53,
61
]
}
]
}
Märgenda tekstid¶
Otspunkt: /projects/{project_pk}/regex_taggers/{id}/tag_texts/
Näide:
curl -X POST "http://localhost:8000/api/v1/projects/11/regex_taggers/1/tag_texts/" \
-H "accept: application/json" \
-H "Content-Type: application/json" \
-H "Authorization: Token 8229898dccf960714a9fa22662b214005aa2b049" \
-d '{
"texts": [
"Two poltergeist were seen at the end of the hallway.",
"It was neither a ghost nor a human.",
"A vampire was out for a walk."
]
}'
Vastus (response):
[
{
"tagger_id": 1,
"tag": "monsters",
"result": true,
"matches": [
{
"str_val": "poltergeist",
"span": [
4,
15
]
}
]
},
{
"tagger_id": 1,
"tag": "monsters",
"result": false,
"matches": []
},
{
"tagger_id": 1,
"tag": "monsters",
"result": true,
"matches": [
{
"str_val": "vampire",
"span": [
2,
9
]
}
]
}
]
Multitag text¶
Endpoint /projects/{project_pk}/regex_taggers/multitag_text/
Näide:
curl -X POST "http://localhost:8000/api/v1/projects/11/regex_taggers/multitag_text/" \
-H "accept: application/json" \
-H "Content-Type: application/json" \
-H "Authorization: Token 8229898dccf960714a9fa22662b214005aa2b049" \
-d '{
"text": "Michael Myers had an unfortunate encounter with a pack of werewolfs.",
"taggers": [1, 2]
}'
Vastus (response):
[
{
"tagger_id": 1,
"tag": "monsters",
"matches": [
{
"str_val": "werewolfs",
"span": [
58,
67
]
}
]
},
{
"tagger_id": 2,
"tag": "serial killers",
"matches": [
{
"str_val": "michael myers",
"span": [
0,
13
]
}
]
}
]
Tagger Groups¶
Treeni uus Tagger Group¶
Otspunkt: /projects/{project_pk}/tagger_groups/
Näide:
curl -X POST "http://localhost:8000/api/v1/projects/11/tagger_groups/" \
-H "accept: application/json" \
-H "Content-Type: application/json" \
-H "Authorization: Token 8229898dccf960714a9fa22662b214005aa2b049" \
-d '{
"description": "My tagger group",
"fact_name": "TEEMA",
"tagger":
{
"fields": ["comment_content_lemmas"],
"vectorizer": "TfIdf Vectorizer",
"classifier": "Logistic Regression",
"indices": [{"name": "texta_test_index"}]
}
}'
Märgenda tekst¶
Otspunkt: /projects/{project_pk}/tagger_groups/{id}/tag_text/
Näide:
curl -X POST "http://localhost:8000/api/v1/projects/11/tagger_groups/1/tag_text/" \
-H "accept: application/json" \
-H "Content-Type: application/json" \
-H "Authorization: Token 8229898dccf960714a9fa22662b214005aa2b049" \
-d '{
"text": "AINUS ettepanek - alla põhihariduse isikutele sõidulubasid mitte anda - sai kriitika osaliseks.",
"lemmatize": true,
"n_similar_docs": 10,
"n_candidate_tags": 10
}'
Vastus (response):
[
{
"tag": "foo",
"probability": 0.6659222999240199,
"tagger_id": 4,
"result": true
},
{
"tag": "bar",
"probability": 0.5107991699285356,
"tagger_id": 3,
"result": true
}
]
Mustripõhiste märgendajate grupid¶
Loo uus mustripõhiste märgendajajate grupp¶
Otspunkt: /projects/{project_pk}/regex_tagger_groups/
Näide:
curl -X POST "http://localhost:8000/api/v1/projects/11/regex_tagger_groups/" \
-H "accept: application/json" \
-H "Content-Type: application/json" \
-H "Authorization: Token 8229898dccf960714a9fa22662b214005aa2b049" \
-d '{
"regex_taggers": [1, 2],
"description": "horror"
}'
Vastus (response):
{
"id": 1,
"url": "http://localhost:8000/api/v1/projects/11/regex_tagger_groups/1/",
"regex_taggers": [
1,
2
],
"author_username": "admin",
"task": null,
"description": "horror",
"tagger_info": [
{
"tagger_id": 1,
"description": "monsters"
},
{
"tagger_id": 2,
"description": "serial killers"
}
]
}
Rakenda mustripõhiste märgendajate gruppi¶
Otspunkt: /projects/{project_pk}/regex_tagger_groups/{id}/apply_tagger_group/
Näide:
curl -X POST "http://localhost:8000/api/v1/projects/11/regex_tagger_groups/1/apply_tagger_group/" \
-H "accept: application/json" \
-H "Content-Type: application/json" \
-H "Authorization: Token 8229898dccf960714a9fa22662b214005aa2b049" \
-d '{
"description": "Apply my regex tagger group",
"indices": [{"name": "news_articles"}],
"fields": ["title", "body"],
"query": {}
}'
Vastus (response):
{
"message": "Started process of applying RegexTaggerGroup with id: 1"
}
Märgenda dokument¶
Otspunkt: /projects/{project_pk}/regex_tagger_groups/{id}/tag_doc/
Näide:
curl -X POST "http://localhost:8000/api/v1/projects/11/regex_tagger_groups/1/tag_doc/" \
-H "accept: application/json" \
-H "Content-Type: application/json" \
-H "Authorization: Token 8229898dccf960714a9fa22662b214005aa2b049" \
-d '{
"doc": {
"title": "Michael Myers returns to Haddonfield",
"body": "Michael Myers returns to Haddonfield to take back his family home from a couple of belligerent poltergeists.",
"id": 3
},
"fields": ["title", "body"]
}'
Vastus (response):
{
"tagger_group_id": 1,
"tagger_group_tag": "horror",
"result": true,
"tags": [
{
"tagger_id": 2,
"tag": "serial killers",
"matches": [
{
"str_val": "michael myers",
"span": [
0,
13
],
"field": "title"
},
{
"str_val": "michael myers",
"span": [
0,
13
],
"field": "body"
}
]
},
{
"tagger_id": 1,
"tag": "monsters",
"matches": [
{
"str_val": "poltergeists",
"span": [
95,
107
],
"field": "body"
}
]
}
]
}
Märgenda juhuslik dokument¶
Otspunkt: /projects/{project_pk}/regex_tagger_groups/{id}/tag_random_doc/
Näide:
curl -X POST "http://localhost:8000/api/v1/projects/11/regex_tagger_groups/1/tag_random_doc/" \
-H "accept: application/json" \
-H "Content-Type: application/json" \
-H "Authorization: Token 8229898dccf960714a9fa22662b214005aa2b049" \
-d '{
"indices": [{"name": "news_articles"}],
"fields": ["title", "body"]
}'
Vastus (response):
{
"tagger_group_id": 1,
"tagger_group_tag": "horror",
"result": true,
"tags": [
{
"tagger_id": 2,
"tag": "serial killers",
"matches": [
{
"str_val": "michael myers",
"span": [
0,
13
],
"field": "title"
},
{
"str_val": "michael myers",
"span": [
0,
13
],
"field": "body"
}
]
},
{
"tagger_id": 1,
"tag": "monsters",
"matches": [
{
"str_val": "poltergeists",
"span": [
95,
107
],
"field": "body"
}
]
}
],
"document": {
"title": "Michael Myers returns to Haddonfield",
"body": "Michael Myers returns to Haddonfield to take back his family home from a couple of belligerent poltergeists.",
"id": 3
},
}
Märgenda tekst¶
Otspunkt: /projects/{project_pk}/regex_tagger_groups/{id}/tag_text/
Näide:
curl -X POST "http://localhost:8000/api/v1/projects/11/regex_tagger_groups/1/tag_text/" \
-H "accept: application/json" \
-H "Content-Type: application/json" \
-H "Authorization: Token 8229898dccf960714a9fa22662b214005aa2b049" \
-d '{
"text": "Jason Voorhees was having a peaceful camping trip until his roads crossed with two rowdy vampires accompanied by Freddy Krueger."
}'
Vastus (response):
{
"tagger_group_id": 4,
"tagger_group_tag": "horror",
"result": true,
"tags": [
{
"tag": "monsters",
"tagger_id": 7,
"matches": [
{
"str_val": "vampires",
"span": [
89,
97
]
}
]
},
{
"tag": "serial killers",
"tagger_id": 8,
"matches": [
{
"str_val": "jason voorhees",
"span": [
0,
14
]
},
{
"str_val": "freddy krueger",
"span": [
113,
127
]
}
]
}
]
}
Märgenda tekstid¶
Otspunkt: /projects/{project_pk}/regex_tagger_groups/{id}/tag_texts/
Näide:
curl -X POST "http://localhost:8000/api/v1/projects/11/regex_tagger_groups/1/tag_texts/" \
-H "accept: application/json" \
-H "Content-Type: application/json" \
-H "Authorization: Token 8229898dccf960714a9fa22662b214005aa2b049" \
-d '{
"texts": [
"Norman Bates went knife-shopping with Michael Myers.",
"The weather was nice.",
"Jason Voorhees was scared of the ghost living in his cupboard."
]
}'
Vastus (response):
{
"tagger_group_id": 4,
"tagger_group_tag": "horror",
"tags": [
[
{
"tag": "serial killers",
"tagger_id": 8,
"matches": [
{
"str_val": "norman bates",
"span": [
0,
12
]
},
{
"str_val": "michael myers",
"span": [
38,
51
]
}
]
}
],
[],
[
{
"tag": "monsters",
"tagger_id": 7,
"matches": [
{
"str_val": "ghost",
"span": [
33,
38
]
}
]
},
{
"tag": "serial killers",
"tagger_id": 8,
"matches": [
{
"str_val": "jason voorhees",
"span": [
0,
14
]
}
]
}
]
]
}
Torch Tagger¶
Treeni uus Torch Tagger¶
Otspunkt: /projects/{project_pk}/torchtaggers/
Näide:
curl -X POST "http://localhost:8000/api/v1/projects/11/torchtaggers/" \
-H "accept: application/json" \
-H "Content-Type: application/json" \
-H "Authorization: Token 8229898dccf960714a9fa22662b214005aa2b049" \
-d '{
"description": "My Torch Tagger",
"fields": ["comment_content_clean.text"],
"model_architecture": "TextRNN",
"embedding": 2,
"num_epochs": 5,
"fact_name": "TEEMA",
"indices": [{"name": "texta_test_index"}]
}'
Märgenda tekst¶
Otspunkt: /projects/{project_pk}/torchtaggers/{id}/tag_text/
Näide:
curl -X POST "http://localhost:8000/api/v1/projects/11/torchtaggers/1/tag_text/" \
-H "accept: application/json" \
-H "Content-Type: application/json" \
-H "Authorization: Token 8229898dccf960714a9fa22662b214005aa2b049" \
-d '{
"text": "AINUS ettepanek - alla põhihariduse isikutele sõidulubasid mitte anda - sai kriitika osaliseks.",
"lemmatize": false
}'
Vastus (response):
{
"result": "foo",
"probability": 0.36259710788726807
}
Andmete Importija (Dataset Importer)¶
Selle mooduli abil saab kasutaja sisestada Elasticsearchi jsonlines, csv ja exceli formaadis faile, muutes need niimoodi Toolkitile ligipääsetavaks. Palun pane tähele, et see loeb kogu faili mällu, mis võib põhjustada mäluprobleeme, kui üritatakse üles laadida suuremaid faile. Soovitame taolised suured failid jagada väiksemateks tükkideks ja neid siis eraldi üles laadida.
Kogu protsess on asünkroonne. St, päringu vastus võib tulla koheselt, kuid faili mällu laadimine võib võtta natukene rohkem aega. Esimese 10 sekundi jooksul ei pruugi olla näha mingeid muutuseid progressis.
Parameetrid:¶
description - Tavaline kirjeldus, mis aitab eristada antud ülesannet teistest.
index - Vastloodud indeksi kirjeldus. Palun pane tähele, et sellel kehtivad Elasticsearch-i indeksi nimetamise piirangud.
separator - Vajalik vaid .csv failide puhul, vaikimisi on koma (,). Laseb seadistada .csv failide eraldajat.
Näide:
@ on eriline süntaksimärk, millega saab lugeda antud failinime binaari.
curl -H "Authorization: Token 8229898dccf960714a9fa22662b214005aa2b049" \
-F "description=Articles" \
-F "index=en_articles" \
-F "file=@FILE_NAME.csv" \
http://localhost:8000/api/v1/projects/11/dataset_imports/
API viide (reference)¶
Viide Toolkiti API-le on kättesaadav, kui Toolkit töötab: