========= Searching ========= ------------- Simple Search ------------- 1. First go into the **Search** view and select a project and index to view. Look at the **Searcher** window on the left. The default option and simplest of the search options is to do a **Simple search**. This search will look through all of the documents in all fields for a match. .. image:: /_static/simple-search.png :scale: 80% 2. To do the simple search, type in the search field (with the text **Type here to search**) and press **Search** for results. For more advanced searches, either switch over to :ref:`Advanced Search` or click to open :ref:`Searcher Options`. The search text will be highlighted in a random color and the document count on the right will show many document matches were found (more information about query matches in :ref:`Change documents per page` and :ref:`Navigating through documents`) .. figure:: /_static/simple-search-result2.png :scale: 60% *An example of simple search results with the word "kool". The results are highlighted in pink in this example, but could be in another color for another search. The simple search also looks for matches at the start of a word.* |br| ------ Choosing Search Field ===================== While using simple search, if you have similar data in several fields, both fields will be shown in the results. .. figure:: /_static/simple-search-result2.png :scale: 60% *An example of finding similar results from both lemmatized text and regular text.* To get results from a single field, you could use :ref:`Advanced search` and specify the exact search field. Or you could use **Toggle columns** to only view the fields you are interested in (see more about this in :ref:`Search view options`). |br| ------ --------------- Advanced Search --------------- Advanced search allows you to search for specific fields and make more complex searches. 1. Pick **Advanced Search** in the Searcher. 2. Select a field to search from. .. image:: /_static/advanced-search1.png :scale: 80% *The drop-down menu selection for choosing the search field.* .. image:: /_static/search-fact1.png :scale: 80% 3. Choosing the **texta_facts** fact name field. .. image:: /_static/search-fact2.png :scale: 80% 4. After clicking away, a search box for the field appears. .. image:: /_static/search-fact3.png :scale: 80% It’s possible to specify advanced search using :ref:`Operators`, choosing the :ref:`match type ` and :ref:`slop ` value. You can specify the search using :ref:`Operators`: * and * or * not You can also specify the search using the :ref:`Match Types`: * phrase prefix * best fields * phrase * regex * exact term * fuzzy term * fuzzy phrase You can also add more fields or more of the same field to narrow the search even more. If you want to get rid of any selected fields or fact values, press the **X** button next to the field name. .. image:: /_static/advanced-search2.png *An example of options that pop up when selecting a text field.* For text values, you can also enter several search words or phrases by adding each item on a new line. .. image:: /_static/advanced-search-and.png :scale: 60% *An example of searching for two words. Each word is on a new line.* Additionally, you can use the :ref:`Slop` value, for example to find phrases that could have one or more words between them. .. note:: .. image:: /_static/search-constraints-field.png You can make the **Text to search** box larger or smaller by clicking and dragging it from the lower right corner of the search field box (with the icon of diagonal stripes). .. image:: /_static/short-search.png :scale: 80% Expanding the search field. .. image:: /_static/long-search.png :scale: 80% |br| ------ Operators ========= The operators **and**, **or** and **not** can be used to specify a search. Click on the **Operator** field to open a drop-down menu. Here you can choose whether the search match needs to contain: * all the words in the search (**and**) * at least one of the words in the search (**or**) * none of the words in the search (**not**) .. image:: /_static/advanced-search-op.png :scale: 80% *Specifying operator type for an advanced search.* .. image:: /_static/advanced-search-and.png :scale: 40% *Finding documents (sentences) that contain the words kool and klass. Both words must be present.* .. image:: /_static/advanced-search-or.png :scale: 40% *Finding documents (sentences) that contain the words kool or klass. One of the words must be present.* .. image:: /_static/advanced-search-not.png :scale: 40% *Finding documents (sentences) that don’t contain the words kool or klass. None of the words can occur in the document.* .. image:: /_static/advanced-search-not-and.png :scale: 80% To conduct a search where one word must be present and another may not be present: * Select the field and word that must be present with **and** / **or** operator. * Next, select the same field and specify the new search constraint with the operator *not* (the one that should not be in the text). For example: the query for finding documents (sentences) that contain the word klass, but not kool. One of the words must be present, the other one cannot be present in the same document. .. image:: /_static/advanced-search-not-and-res.png :scale: 40% *The results for the query: documents that contain klass, but not kool.* |br| ------ Match Types =========== You can change the match type by clicking on the **Match** field. The different types of matches are described below. Phrase prefix ------------- **Phrase prefix** searches for any words that begin with the search term. For example, if we search for **klass**, we will find words like *klass, klassika, klassikaaslane, klassifikaator* etc. It can be useful to find words that are conjugated or have declension. So for the Estonian example **klass**, we will also find *klassis, klassi, klassile, klassiski* etc. For words that have stem changes, it is also possible to search by entering the word stem and using the or operator, for example searching for different forms of **minema** (et *to go*) using forms like *läheb, läks, mine*. .. image:: /_static/prefix-search.png :scale: 40% *Searching for different forms of a verb using phrase prefix match.* .. note:: Searching with phrase prefix match can find a lot of irrelevant documents. For example, the phrase prefix search shown above could also find very different words like *mineraalne*, *minestus*, *minevik* etc. It is possible to make the search more accurate by providing longer word forms like *minem* etc. However, for more exact searches using regex match is recommended. |br| Best fields ------------ **Best fields** tries to find the one document where all the search words or search phrases are present and more closely together, but also finds partial matches based on individual words. .. image:: /_static/advanced-search-best-fields.png :scale: 40% |br| Phrase ------ **Phrase** is used to search for an exact phrase or exact word. For example, if we want to find the exact word **klass**. .. image:: /_static/advanced-search-phrase.png :scale: 60% *Searching for an exact phrase.* .. image:: /_static/advanced-search-phrase.png :scale: 60% *Searching for an exact word with phrase match type.* |br| Regex ----- **Regex** match type can be used to search with regular expressions. Read more about Elasticsearch regular expressions `here `_. .. image:: /_static/advanced-search-regex-match.png :scale: 80% * in *lähe(b|ks)*, *()* indicates a group where *|* separates two options, *läheb* or *läheks* * in *minem[a-z]** searches for *minem* and then any lowercase letter *[a-z]* for any total length * to find forms like *minemine*, *minemisest* etc * *mine|#* finds the string *mine* with no other options using *#* * *läks* finds the string *läks* with anything after it .. image:: /_static/advanced-search-regex-match2.png :scale: 40% |br| Exact term ---------- **Exact term** can be used to search the entire content of one field. For example, one entire document or sentence, with punctuation. There is an option to ignore case. It is only very useful for finding things from keyword fields, in most cases it is better to use :ref:`Phrase` match type. |br| Fuzzy term ---------- **Fuzzy term** can be used to find a misspelled word or similar words. You can control the results by using the fuzziness number (number of substitutions). Prefix length can also be adjusted (how many changes in the word are allowed to be missing letters on the end of the search item). .. image:: /_static/advanced-search-fuzzy-term.png :scale: 40% *Searching for words that are one letter off from the search term “kuod”.* |br| Fuzzy phrase ------------ **Fuzzy phrase** is similar to a fuzzy term, but also allows you to find phrases. You can control the results by using the fuzziness number (number of substitutions). Prefix length can also be adjusted (how many changes in the word are allowed to be missing letters on the end of the search item). .. image:: /_static/advanced-search-fuzzy-phrase.png :scale: 40% *Searching for phrases that are one letter off from the search term “kuod seo”.* |br| Slop ==== Perhaps you only know the start and end of a phrase, but can’t find it using those two separately. Or maybe there is an interesting phrase that you would like to find all variations of. You can use the slop function for this. Slop specifies the potential amount of words between any of the search words. .. image:: /_static/advanced-search-slop.png :scale: 40% *A phrase search for "kool elupiirkond", with a slop of 3. This means up to three words can be between the two parts of the phrase.* |br| ------ ---------------- Searcher Options ---------------- Searcher options change the display of search results. **The different options are:** * changing the highlights on facts * changing highlights on searcher matches * changing highlights on hyperlinks * only highlighting matching facts * viewing the short version of documents (up to 15 words) .. figure:: /_static/searcher-options.png 1. Click on the **Searcher options** in Searcher. 2. Change any of the options and press **Search** in the Searcher to update the search results display. Another way to customize your display is to use the collapse arrows on the three Searcher components Search constraints, Saved queries, Aggregations. .. image:: /_static/searcher-search-constraints.png *Click on the arrow pointing up (˄) to collapse the Search Constraints.* .. image:: /_static/searcher-collapsed.png *A collapsed Searcher, which can be opened up again by pressing on the arrows pointing down (˅).* More on the searcher options below: |br| Highlighting options ==================== **Highlight facts** highlights any facts in displayed columns, regardless whether they match a certain search query or not. .. image:: /_static/highlight-facts.png :scale: 40% *Highlight facts while searching.* .. image:: /_static/highlight-facts2.png :scale: 40% *Not highlighting facts during a search.* **Highlight matching facts** allows you to only highlight any facts that match the current query. For this, you need to also search for fact names or values. .. image:: /_static/highlight-match.png :scale: 40% *Highlighting matching facts.* **Highlight searcher matches** can be used to turn the highlighting on the search matches on or off. .. image:: /_static/no-searcher-matches.png :scale: 40% *Not highlighting searcher matches.* The **Highlight hyperlinks option** is used to display hyperlinks as regular text, if you have a dataset full of hyperlinks. |br| Short version ============= Viewing the short version of documents is a good way to reduce loading times for long documents, even without using any other search features. You can choose how many words before the cutoff will be shown, up to 15. Without any other search functions, it will show the start of the document. However, if used while searching for anything else, it will center on the found match(es). .. image:: /_static/short-version.png :scale: 40% *An example of Show short version option with an active search query.* |br| ------ -------------- Search Queries -------------- Search queries are a useful way to keep search queries you might need to access regularly. They are also very useful for separating out a subset of interesting documents without creating a new index. You can later use them for processing, reindexing, adding/editing/deleting facts and more. View and Save Queries ===================== .. image:: /_static/save-view-queries.png Below the search constraints are options to view and save search queries. You can view the search query in its raw form by clicking on the eye icon (|grey-eye|) shown in the screenshot above. This can be useful for debugging queries, but is only recommended for advanced users. .. |grey-eye| image:: /_static/grey-eye.png 1. You can save the search query by clicking on the save icon (|grey-save|) shown in the screenshot above. .. |grey-save| image:: /_static/grey-save.png .. image:: /_static/save-query1.png :scale: 80% 2. You can either save a new query or overwrite an existing query. .. image:: /_static/save-query2.png :scale: 80% 3. The description you enter for a new query will be shown in the saved queries below. The description of a new query should be relatively short, informative and unique. 4. After writing the description, press **Save**. .. image:: /_static/save-query3.png :scale: 80% *The saved query description.* |br| Access a Saved Query ==================== To access a saved query, press on the desired query description in **Saved Queries**. This will show you the saved query’s selected fields, operators and match types in the **Search Constraints** box, which you can then use to activate the query by pressing **Search**. .. image:: /_static/saved-queries.png :scale: 80% 1. Example of saved queries. We select the first one with the description "töö koht". .. image:: /_static/saved-queries2.png :scale: 80% 2. After clicking, we can see the search constraints. Press Search to view the documents matching this query. |br| Edit a Search Query =================== Editing Search Query Content ---------------------------- 1. For editing the content of a search query, click on the query. 2. Make any changes to the search constraints you wish to make, then press the save button. .. image:: /_static/overwrite-query.png :scale: 80% 3. While saving, pick the **Overwrite existing query** option. .. image:: /_static/overwrite-query2.png :scale: 80% 4. Pick the query you wish to change, then press **Save**. |br| Editing Search Query Description -------------------------------- .. |blue-pencil| image:: /_static/blue-pencil.png .. image:: /_static/edit-delete-query1.png :scale: 80% 1. For editing the description of a search query, click on the pencil button (|blue-pencil|) in the Edit column for the query you wish to edit. .. image:: /_static/edit-query.png :scale: 80% 2. Type in a new description for your query and press **Edit**. .. image:: /_static/edit-query2.png :scale: 80% *The result of the description edit.* |br| Delete a Search Query ===================== There are small boxes in front of the saved search query descriptions. 1. Clicking on the first box will select all queries, or you can select specific queries by clicking on the boxes in front of the queries you wish to delete. .. image:: /_static/saved-queries4.png :scale: 80% *The list of saved queries.* .. image:: /_static/select-all-queries.png :scale: 80% *Selecting all queries to delete.* .. image:: /_static/select-queries.png :scale: 80% *Selecting one query to delete.* 2. Press the Delete button (|grey-bin|) to delete any selected queries. .. |grey-bin| image:: /_static/grey-bin.png 3. A window will appear for you to confirm that you want to delete this number of selected search queries. Press **Delete** to confirm the deletion. |br| ------ ----------------------- Searching with Metadata ----------------------- To use metadata while searching, use advanced search and for the search field select texta_facts[**fact_name**] or texta_facts[**fact_value**]. The **fact_name** option enables searching for any documents that have, have one of selected values or don’t have selected fact values. It can be useful for finding missing items while adding metadata. In the Bwrite corpus, **fact_name** can also be used to select either theses or journal articles by searching for metadata that is only associated with those genres. For example, **University** to access all theses; or **Publication** to access all Journal Articles (and Yearbooks, if they are defined in the metadata). .. image:: /_static/advanced-search-fact-value.png :scale: 80% In **fact_name** search you can add fact names to the search by using a drop-down menu and choosing the :ref:`Operator ` for the search. Filtering by using **fact_value** has even more options, returning only those documents that have the matching file name as well as a matching fact value. .. image:: /_static/advanced-search-fact-value2.png :scale: 80% For example by selecting the genre and pressing on the plus sign, we can find journal articles from one publication. In the screenshot above we are looking for journal articles from the publication *Ariadne Lõng*. This type of search can also be used for different genres and metadata, for example finding all the MA theses at one University that were published during 2010-2019. For **fact values**, you can also use the :ref:`Operators` **and**, **or** and **not** to pick and combine different searches. .. note:: All combined searches have a default value of **and** when combining different fields. .. image:: /_static/advanced-search-fact-value3.png :scale: 80% For example, the screenshot above is a query for documents in the *Journal articles* or *BA thesis* genres that have the publication year *2011*. If you want to get rid of any selected fields or fact values, press the **X** button. |br| ------ ------------ Aggregations ------------ Search queries can be used to aggregate metadata. .. note:: The aggregations can be off for very large datasets due to Elasticsearch limitations, but it can still give you a rough overview of the data in those cases. In this example of aggregations, a query will be used for finding all MA and BA theses in a smaller Estonian subcorpus. Our research questions are to find out the most popular **disciplines** and **publication years**, as well as the **proportion of MA and BA** documents. 1. Make a search query. This is the example query: .. image:: /_static/aggregations1.png :scale: 80% 2. Next, go to **Aggregations**, select the field **texta_facts** and aggregate by frequent items with a minimum number of 1 document and aggregation size of 1. .. image:: /_static/aggregation1.png :scale: 80% 3. See that the option to apply the current search to aggregations is active, then press **Aggregate**. .. image:: /_static/aggregations3.png :scale: 80% In aggregations you can access metadata about the documents matching the query by clicking on any of the arrows pointing right (**>**). We are most interested in Discipline, Genre and Publication year, so we can safely ignore the University metadata. .. image:: /_static/aggregations2.png :scale: 70% In this example, we can see that there are many more BA thesis documents than MA thesis documents. An overwhelming amount of corpus documents are in the social sciences discipline and the publication year 2011 is the most popular for this dataset. |br| Aggregation options =================== You can view the aggregation in raw form using the eye icon (|grey-eye|; debugging for advanced users) and add more fields to aggregate using the plus button. Different types of fields have different aggregation options: * **Texta_facts** only have the option of frequent items. * Any fields containing **text** have options for frequent items, significant items, significant words and string stats. * Any **numerical fields** have options for Extended stats and percentiles. * There are also aggregation options for date fields (but no date fields occur in this corpus at this point). .. image:: /_static/agg-options.png :scale: 80% |br| Aggregation overview -------------------- * **frequent items** - displays items that are the most frequent based on statistical frequency .. image:: /_static/agg-freq-items.png :scale: 40% *An example of frequent items in the lemmas field.* * **significant items** - finds words that are different for the search query than the rest of the corpus for keyword fields. .. image:: /_static/agg-sig-items.png :scale: 40% *An example of significant items in the lemmas field.* * **significant words** - finds words that are different for the search query than the rest of the corpus for longer text fields. .. image:: /_static/agg-sig-words.png :scale: 40% *An example of significant words in the lemmas field.* * **string stats** - this feature doesn’t work in this version of Elasticsearch * **extended stats** - statistics about numerical fields (for example, sum, average, minimum and maximum of selected field) * **percentiles** - shows percentiles for numerical fields. For **facts** you can also view the results of an aggregation in table form or visualized as a plot by pressing the different icons next to the text **Aggregation results**. .. image:: /_static/aggregations4.png :scale: 40% You can also combine the results of different fields by pressing on the plus (**+**) button. .. note:: For a very large amount of longer documents, the aggregation may not work or load very slowly. It is suggested to try out potential aggregations using smaller subsets or use shorter documents. .. image:: /_static/agg-combo.png :scale: 40% *An example of combining lemmas/frequent items with facts/frequent items. The results show most frequent lemmas and fact values associated with those documents/lemmas.* Aggregation parameters ----------------------- * **Aggregation size** controls how many different terms or words are in the output (10-500). * **Apply current search to aggregations** allows you to create an aggregation with or without the search constraints. * **Only show saved search aggregations** allows you to aggregate together two different saved queries while ignoring the current search. .. image:: /_static/agg-saved-queries.png :scale: 40% *An example of two saved queries used to find significant lemmas.* .. |br| raw:: html