corpustools - Managing, Querying and Analyzing Tokenized Text
Provides text analysis in R, focusing on the use of a tokenized text format. In this format, the positions of tokens are maintained, and each token can be annotated (e.g., part-of-speech tags, dependency relations). Prominent features include advanced Lucene-like querying for specific tokens or contexts (e.g., documents, sentences), similarity statistics for words and documents, exporting to DTM for compatibility with many text analysis packages, and the possibility to reconstruct original text from tokens to facilitate interpretation.
Last updated 5 months ago
cpp
7.45 score 30 stars 1 dependents 174 scripts 1.8k downloadsRNewsflow - Tools for Comparing Text Messages Across Time and Media
A collection of tools for measuring the similarity of text messages and tracing the flow of messages over time and across media.
Last updated 11 months ago
cpp
7.02 score 38 stars 2 dependents 31 scripts 1.5k downloadstokenbrowser - Create Full Text Browsers from Annotated Token Lists
Create browsers for reading full texts from a token list format. Information obtained from text analyses (e.g., topic modeling, word scaling) can be used to annotate the texts.
Last updated 4 years ago
cpp
5.39 score 7 stars 5 dependents 13 scripts 1.8k downloadsrsyntax - Extract Semantic Relations from Text by Querying and Reshaping Syntax
Various functions for querying and reshaping dependency trees, as for instance created with the 'spacyr' or 'udpipe' packages. This enables the automatic extraction of useful semantic relations from texts, such as quotes (who said what) and clauses (who did what). Method proposed in Van Atteveldt et al. (2017) <doi:10.1017/pan.2016.12>.
Last updated 3 years ago
3.11 score 4 dependents 18 scripts 1.2k downloads