`EN `_ `ET `_ .. _lang_det: ###################### Language Detector ###################### :ref:`Language Detector ` is a tool for detecting languages of the chosen :ref:`indices ` documents. It uses `langdetect python module `_. This is useful for getting a quick overview of the languages presented in your dataset and parse out documents in a certain language for future work. Creation ****************** .. _lang_det_creation_parameters: Parameters ============ .. _param_description: **description**: Name of the Language Detector application task. This is necessary only for differentiating between all the Language Detector tasks in the project. .. _param_indices: **indices**: List of Elasticsearch :ref:`indices ` containing the documents to analyze. NB! Indices should be formatted as list of dicts, where key = "name" and value = , e.g: .. code-block:: json [{"name": "my_dataset"}] .. _param_fields: **fields**: List of field names (as strings) that are containing the content to analyze. .. _param_query: **query**: The :ref:`query ` restricting the set of documents to analyze. In the API, the query should be formatted as a JSON string. In the GUI :ref:`saved searches ` can be used. By default empty and all the documents in the chosen indices are then analyzed. GUI ==================== For creating a new Language Detector task, navigate to **"Tools"** -> **"Language Detector"** and click on the button **"CREATE"** in the upper left corner of the page. A new window with the title "Apply Language Detector task" opens as a result. Fill all the required fields and then click on the button "Create" in the bottom right corner of the window (:numref:`lang_det_create`). The new Language Detector task should now appear as a new row in the list of Language Detector tasks on the same page (if not, try refreshing the page). .. _lang_det_create: .. figure:: images/lang_det/lang_det_GUI.png *Language Detector creation window* After the task has finished (status is "completed"), you can view the results in Search. The output of language analysis is stored in the `field ` ``_mlp.language_detected``. API =================== Endpoint for /api/v1/ : **/projects/{project_pk}/lang_index/** Endpoint for /api/v2/ : **/projects/{project_pk}/lang_index/** Example: .. code-block:: bash curl -X POST "http://localhost:8000/api/v2/projects/1/lang_index/" \ -H "accept: application/json" \ -H "Content-Type: application/json" \ -H "Authorization: Token 8229898dccf960714a9fa22662b214005aa2b049" \ -d '{ "indices": [{"name": "index_name"}], "description": "job_description", "field": "field_name_to_detect_on" }' Response: .. code-block:: json { "id": 6, "url": "http://localhost:8000/api/v2/projects/1/lang_index/6/", "author_username": "test_user", "indices": [ { "id": 3949, "is_open": true, "url": "http://localhost:8000/api/v2/elastic/index/3949/", "name": "index_name", "description": "", "added_by": "test_user", "test": true, "source": "", "client": "", "domain": "", "created_at": "2021-07-27T13:56:46.118000+03:00" } ], "description": "job_description", "task": { "id": 163542, "status": "completed", "progress": 100.0, "step": "", "errors": "[]", "time_started": "2021-07-27T16:58:46.886043+03:00", "last_update": null, "time_completed": "2021-07-27T16:59:09.632845+03:00", "total": 0, "num_processed": 0 }, "query": "{\"query\": {\"match_all\": {}}}", "field": "field_name_to_detect_on" }