Searching



Searcher Options

Searcher options change the display of search results.

The different options are:

  • changing the highlights on facts

  • changing highlights on searcher matches

  • changing highlights on hyperlinks

  • only highlighting matching facts

  • viewing the short version of documents (up to 15 words)

_images/searcher-options.png
  1. Click on the Searcher options in Searcher.

  2. Change any of the options and press Search in the Searcher to update the search results display.

Another way to customize your display is to use the collapse arrows on the three Searcher components Search constraints, Saved queries, Aggregations.

_images/searcher-search-constraints.png

Click on the arrow pointing up (˄) to collapse the Search Constraints.

_images/searcher-collapsed.png

A collapsed Searcher, which can be opened up again by pressing on the arrows pointing down (˅).

More on the searcher options below:


Highlighting options

Highlight facts highlights any facts in displayed columns, regardless whether they match a certain search query or not.

_images/highlight-facts.png

Highlight facts while searching.

_images/highlight-facts2.png

Not highlighting facts during a search.

Highlight matching facts allows you to only highlight any facts that match the current query. For this, you need to also search for fact names or values.

_images/highlight-match.png

Highlighting matching facts.

Highlight searcher matches can be used to turn the highlighting on the search matches on or off.

_images/no-searcher-matches.png

Not highlighting searcher matches.

The Highlight hyperlinks option is used to display hyperlinks as regular text, if you have a dataset full of hyperlinks.


Short version

Viewing the short version of documents is a good way to reduce loading times for long documents, even without using any other search features.

You can choose how many words before the cutoff will be shown, up to 15.

Without any other search functions, it will show the start of the document. However, if used while searching for anything else, it will center on the found match(es).

_images/short-version.png

An example of Show short version option with an active search query.



Search Queries

Search queries are a useful way to keep search queries you might need to access regularly.

They are also very useful for separating out a subset of interesting documents without creating a new index. You can later use them for processing, reindexing, adding/editing/deleting facts and more.

View and Save Queries

_images/save-view-queries.png

Below the search constraints are options to view and save search queries.

You can view the search query in its raw form by clicking on the eye icon (grey-eye) shown in the screenshot above. This can be useful for debugging queries, but is only recommended for advanced users.

  1. You can save the search query by clicking on the save icon (grey-save) shown in the screenshot above.

_images/save-query1.png
  1. You can either save a new query or overwrite an existing query.

_images/save-query2.png
  1. The description you enter for a new query will be shown in the saved queries below. The description of a new query should be relatively short, informative and unique.

  2. After writing the description, press Save.

_images/save-query3.png

The saved query description.


Access a Saved Query

To access a saved query, press on the desired query description in Saved Queries.

This will show you the saved query’s selected fields, operators and match types in the Search Constraints box, which you can then use to activate the query by pressing Search.

_images/saved-queries.png
  1. Example of saved queries. We select the first one with the description “töö koht”.

_images/saved-queries2.png
  1. After clicking, we can see the search constraints. Press Search to view the documents matching this query.


Edit a Search Query

Editing Search Query Content

  1. For editing the content of a search query, click on the query.

  2. Make any changes to the search constraints you wish to make, then press the save button.

_images/overwrite-query.png
  1. While saving, pick the Overwrite existing query option.

_images/overwrite-query2.png
  1. Pick the query you wish to change, then press Save.


Editing Search Query Description

_images/edit-delete-query1.png
  1. For editing the description of a search query, click on the pencil button (blue-pencil) in the Edit column for the query you wish to edit.

_images/edit-query.png
  1. Type in a new description for your query and press Edit.

_images/edit-query2.png

The result of the description edit.


Delete a Search Query

There are small boxes in front of the saved search query descriptions.

  1. Clicking on the first box will select all queries, or you can select specific queries by clicking on the boxes in front of the queries you wish to delete.

_images/saved-queries4.png

The list of saved queries.

_images/select-all-queries.png

Selecting all queries to delete.

_images/select-queries.png

Selecting one query to delete.

  1. Press the Delete button (grey-bin) to delete any selected queries.

  1. A window will appear for you to confirm that you want to delete this number of selected search queries. Press Delete to confirm the deletion.



Searching with Metadata

To use metadata while searching, use advanced search and for the search field select texta_facts[fact_name] or texta_facts[fact_value].

The fact_name option enables searching for any documents that have, have one of selected values or don’t have selected fact values. It can be useful for finding missing items while adding metadata.

In the Bwrite corpus, fact_name can also be used to select either theses or journal articles by searching for metadata that is only associated with those genres. For example, University to access all theses; or Publication to access all Journal Articles (and Yearbooks, if they are defined in the metadata).

_images/advanced-search-fact-value.png

In fact_name search you can add fact names to the search by using a drop-down menu and choosing the Operator for the search.

Filtering by using fact_value has even more options, returning only those documents that have the matching file name as well as a matching fact value.

_images/advanced-search-fact-value2.png

For example by selecting the genre and pressing on the plus sign, we can find journal articles from one publication. In the screenshot above we are looking for journal articles from the publication Ariadne Lõng.

This type of search can also be used for different genres and metadata, for example finding all the MA theses at one University that were published during 2010-2019.

For fact values, you can also use the Operators and, or and not to pick and combine different searches.

Note

All combined searches have a default value of and when combining different fields.

_images/advanced-search-fact-value3.png

For example, the screenshot above is a query for documents in the Journal articles or BA thesis genres that have the publication year 2011.

If you want to get rid of any selected fields or fact values, press the X button.



Aggregations

Search queries can be used to aggregate metadata.

Note

The aggregations can be off for very large datasets due to Elasticsearch limitations, but it can still give you a rough overview of the data in those cases.

In this example of aggregations, a query will be used for finding all MA and BA theses in a smaller Estonian subcorpus.

Our research questions are to find out the most popular disciplines and publication years, as well as the proportion of MA and BA documents.

  1. Make a search query. This is the example query:

_images/aggregations1.png
  1. Next, go to Aggregations, select the field texta_facts and aggregate by frequent items with a minimum number of 1 document and aggregation size of 1.

_images/aggregation1.png
  1. See that the option to apply the current search to aggregations is active, then press Aggregate.

_images/aggregations3.png

In aggregations you can access metadata about the documents matching the query by clicking on any of the arrows pointing right (>).

We are most interested in Discipline, Genre and Publication year, so we can safely ignore the University metadata.

_images/aggregations2.png

In this example, we can see that there are many more BA thesis documents than MA thesis documents.

An overwhelming amount of corpus documents are in the social sciences discipline and the publication year 2011 is the most popular for this dataset.


Aggregation options

You can view the aggregation in raw form using the eye icon (grey-eye; debugging for advanced users) and add more fields to aggregate using the plus button.

Different types of fields have different aggregation options:

  • Texta_facts only have the option of frequent items.

  • Any fields containing text have options for frequent items, significant items, significant words and string stats.

  • Any numerical fields have options for Extended stats and percentiles.

  • There are also aggregation options for date fields (but no date fields occur in this corpus at this point).

_images/agg-options.png


Aggregation overview

  • frequent items - displays items that are the most frequent based on statistical frequency

_images/agg-freq-items.png

An example of frequent items in the lemmas field.

  • significant items - finds words that are different for the search query than the rest of the corpus for keyword fields.

_images/agg-sig-items.png

An example of significant items in the lemmas field.

  • significant words - finds words that are different for the search query than the rest of the corpus for longer text fields.

_images/agg-sig-words.png

An example of significant words in the lemmas field.

  • string stats - this feature doesn’t work in this version of Elasticsearch

  • extended stats - statistics about numerical fields (for example, sum, average, minimum and maximum of selected field)

  • percentiles - shows percentiles for numerical fields.

For facts you can also view the results of an aggregation in table form or visualized as a plot by pressing the different icons next to the text Aggregation results.

_images/aggregations4.png

You can also combine the results of different fields by pressing on the plus (+) button.

Note

For a very large amount of longer documents, the aggregation may not work or load very slowly. It is suggested to try out potential aggregations using smaller subsets or use shorter documents.

_images/agg-combo.png

An example of combining lemmas/frequent items with facts/frequent items. The results show most frequent lemmas and fact values associated with those documents/lemmas.

Aggregation parameters

  • Aggregation size controls how many different terms or words are in the output (10-500).

  • Apply current search to aggregations allows you to create an aggregation with or without the search constraints.

  • Only show saved search aggregations allows you to aggregate together two different saved queries while ignoring the current search.

_images/agg-saved-queries.png

An example of two saved queries used to find significant lemmas.