Package: corpustools Version: 0.5.2 Date: 2025-07-07 Title: Managing, Querying and Analyzing Tokenized Text Description: Provides text analysis in R, focusing on the use of a tokenized text format. In this format, the positions of tokens are maintained, and each token can be annotated (e.g., part-of-speech tags, dependency relations). Prominent features include advanced Lucene-like querying for specific tokens or contexts (e.g., documents, sentences), similarity statistics for words and documents, exporting to DTM for compatibility with many text analysis packages, and the possibility to reconstruct original text from tokens to facilitate interpretation. Authors@R: c(person(given = "Kasper", family = "Welbers", role = c("aut", "cre"), email = "kasperwelbers@gmail.com"), person(given = "Wouter", family = "van Atteveldt", role = "aut")) Maintainer: Kasper Welbers Depends: R (>= 3.5.0) Imports: methods, wordcloud (>= 2.5), stringi, Rcpp (>= 0.12.12), R6, udpipe (>= 0.8.3), digest, data.table (>= 1.10.4), quanteda (>= 1.5.1), igraph, tokenbrowser (>= 0.1.5), RNewsflow (>= 1.2.1), Matrix (>= 1.2), parallel, pbapply (>= 1.4), rsyntax (>= 0.1.1) Suggests: testthat, tm (>= 0.6), topicmodels, knitr, rmarkdown LinkingTo: Rcpp, RcppProgress LazyData: true Encoding: UTF-8 License: GPL-3 URL: https://github.com/kasperwelbers/corpustools RoxygenNote: 7.3.2 VignetteBuilder: knitr Config/pak/sysreqs: libglpk-dev libicu-dev libpng-dev libxml2-dev Repository: https://kasperwelbers.r-universe.dev Date/Publication: 2025-07-10 09:09:04 UTC RemoteUrl: https://github.com/kasperwelbers/corpustools RemoteRef: HEAD RemoteSha: cf98223c175e39b65e15a50e65675f8407ffc452 NeedsCompilation: yes Packaged: 2026-06-09 05:58:53 UTC; root Author: Kasper Welbers [aut, cre], Wouter van Atteveldt [aut]