corpustools - Managing, Querying and Analyzing Tokenized Text
Provides text analysis in R, focusing on the use of a tokenized text format. In this format, the positions of tokens are maintained, and each token can be annotated (e.g., part-of-speech tags, dependency relations). Prominent features include advanced Lucene-like querying for specific tokens or contexts (e.g., documents, sentences), similarity statistics for words and documents, exporting to DTM for compatibility with many text analysis packages, and the possibility to reconstruct original text from tokens to facilitate interpretation.
Last updated
cpp
7.25 score 32 stars 1 dependents 184 scripts 535 downloadsRNewsflow - Tools for Comparing Text Messages Across Time and Media
A collection of tools for measuring the similarity of text messages and tracing the flow of messages over time and across media.
Last updated
cpp
6.93 score 38 stars 2 dependents 37 scripts 425 downloadstokenbrowser - Create Full Text Browsers from Annotated Token Lists
Create browsers for reading full texts from a token list format. Information obtained from text analyses (e.g., topic modeling, word scaling) can be used to annotate the texts.
Last updated
5.10 score 7 stars 4 dependents 15 scripts 450 downloadsrsyntax - Extract Semantic Relations from Text by Querying and Reshaping Syntax
Various functions for querying and reshaping dependency trees, as for instance created with the 'spacyr' or 'udpipe' packages. This enables the automatic extraction of useful semantic relations from texts, such as quotes (who said what) and clauses (who did what). Method proposed in Van Atteveldt et al. (2017) <doi:10.1017/pan.2016.12>.
Last updated
3.05 score 3 dependents 25 scripts 464 downloads