Choose and add multitoken strings based on multitoken categories | add_multitoken_label |
Helper function for aggregate_rsyntax | agg_label |
Aggregate the tokens data | agg_tcorpus |
Aggregate rsyntax annotations | aggregate_rsyntax |
Force an object to be a tCorpus class | as.tcorpus |
Force an object to be a tCorpus class | as.tcorpus.default |
Force an object to be a tCorpus class | as.tcorpus.tCorpus |
Extract the backbone of a network. | backbone_filter |
View hits in a browser | browse_hits |
Create and view a full text browser | browse_texts |
Vectorized computation of chi^2 statistic for a 2x2 crosstab containing the values [a, b] [c, d] | calc_chi2 |
Compare tCorpus vocabulary to that of another (reference) tCorpus | compare_corpus |
Calculate the similarity of documents | compare_documents |
Compare vocabulary of a subset of a tCorpus to the rest of the tCorpus | compare_subset |
coreNLP example sentences | corenlp_tokens |
Count results of search hits, or of a given feature in tokens | count_tcorpus |
Create a tCorpus | create_tcorpus create_tcorpus.character create_tcorpus.corpus create_tcorpus.data.frame create_tcorpus.factor |
Support function for subset method | docfreq_filter |
Compare two document term matrices | dtm_compare |
Plot a word cloud from a dtm | dtm_wordcloud |
Create an ego network | ego_semnet |
Export span annotations | export_span_annotations |
Get common nearby features given a query or query hits | feature_associations |
Feature statistics | feature_stats |
Fold rsyntax annotations | fold_rsyntax |
Support function for subset method | freq_filter |
Create a document term matrix. | get_dfm get_dtm |
Compute global feature positions | get_global_i |
Get keyword-in-context (KWIC) strings | get_kwic |
Get a character vector of stopwords | get_stopwords |
Laplace (i.e. add constant) smoothing | laplace |
Convert a quanteda dictionary to a long data.table format | melt_quanteda_dict |
Merge tCorpus objects | merge_tcorpora |
Visualize a semnet network | plot_semnet |
Plot a wordcloud with words ordered and coloured according to a dimension (x) | plot_words |
S3 plot for contextHits class | plot.contextHits |
visualize feature associations | plot.featureAssociations |
S3 plot for featureHits class | plot.featureHits |
visualize vocabularyComparison | plot.vocabularyComparison |
Preprocess tokens in a character vector | preprocess_tokens |
S3 print for contextHits class | print.contextHits |
S3 print for featureHits class | print.featureHits |
S3 print for tCorpus class | print.tCorpus |
Refresh a tCorpus object using the current version of corpustools | refresh_tcorpus |
Check if package with given version exists | require_package |
Search for documents or sentences using Boolean queries | search_contexts |
Dictionary lookup | search_dictionary |
Find tokens using a Lucene-like search query | search_features |
Create a semantic network based on the co-occurence of tokens in documents | semnet |
Create a semantic network based on the co-occurence of tokens in token windows | semnet_window |
Set some default network attributes for pretty plotting | set_network_attributes |
Simple Good Turing smoothing | sgt |
Show the names of udpipe models | show_udpipe_models |
State of the Union addresses | sotu_texts |
Basic stopword lists | stopwords_list |
Subset tCorpus token data using a query | subset_query |
S3 subset for tCorpus class | subset.tCorpus |
S3 summary for contextHits class | summary.contextHits |
S3 summary for featureHits class | summary.featureHits |
Summary of a tCorpus object | summary.tCorpus |
Visualize a dependency tree | tc_plot_tree |
A tCorpus with a small sample of sotu paragraphs parsed with udpipe | tc_sotu_udpipe |
tCorpus: a corpus class for tokenized texts | tCorpus tcorpus |
Corpus comparison | tCorpus_compare |
Creating a tCorpus | tCorpus_create |
Methods and functions for viewing, modifying and subsetting tCorpus data | tCorpus_data |
Document similarity | tCorpus_docsim |
Preprocessing, subsetting and analyzing features | tCorpus_features |
Modify tCorpus by reference | tCorpus_modify_by_reference |
Use Boolean queries to analyze the tCorpus | tCorpus_querying |
Feature co-occurrence based semantic network analysis | tCorpus_semnet |
Topic modeling | tCorpus_topmod |
Annotate tokens based on rsyntax queries | annotate_rsyntax tCorpus$annotate_rsyntax |
Dictionary lookup | code_dictionary tCorpus$code_dictionary |
Code features in a tCorpus based on a search string | code_features tCorpus$code_features |
Get a context vector | context tCorpus$context |
Deduplicate documents | deduplicate tCorpus$deduplicate |
Delete column from the data and meta data | delete_columns delete_meta_columns tCorpus$delete_columns tCorpus$delete_meta_columns |
Cast the "feats" column in UDpipe tokens to columns | feats_to_columms tCorpus$feats_to_columns |
Filter features | feature_subset tCorpus$feature_subset |
Fold rsyntax annotations | tCorpus$fold_rsyntax |
Access the data from a tCorpus | get get_meta tCorpus$get tCorpus$get_meta |
Estimate a LDA topic model | lda_fit tCorpus$lda_fit |
Merge the token and meta data.tables of a tCorpus with another data.frame | merge merge_meta tCorpus$merge |
Preprocess feature | preprocess tCorpus$preprocess |
Replace tokens with dictionary match | replace_dictionary tCorpus$replace_dictionary |
Recode features in a tCorpus based on a search string | search_recode tCorpus$search_recode |
Modify the token and meta data.tables of a tCorpus | set set_meta tCorpus$set tCorpus$set_meta |
Change levels of factor columns | set_levels set_meta_levels tCorpus$set_levels tCorpus$set_meta_levels |
Change column names of data and meta data | set_meta_name set_name tCorpus$set_meta_name tCorpus$set_name |
Subset a tCorpus | subset subset_meta tCorpus$subset tCorpus$subset_meta |
Subset tCorpus token data using a query | tCorpus$subset_query |
Add columns indicating who did what | tCorpus$udpipe_clauses udpipe_clauses |
Add columns indicating who said what | tCorpus$udpipe_quotes udpipe_quotes |
Create a tcorpus based on tokens (i.e. preprocessed texts) | tokens_to_tcorpus |
Gives the window in which a term occured in a matrix. | tokenWindowOccurence |
Show top features | top_features |
Apply rsyntax transformations | transform_rsyntax |
Get a list of tqueries for extracting who did what | udpipe_clause_tqueries |
Get a list of tqueries for extracting quotes | udpipe_quote_tqueries |
Simplify tokenIndex created with the udpipe parser | udpipe_simplify |
Get a list of tqueries for finding candidates for span quotes. | udpipe_spanquote_tqueries |
Create a tCorpus using udpipe | udpipe_tcorpus udpipe_tcorpus.character udpipe_tcorpus.corpus udpipe_tcorpus.data.frame udpipe_tcorpus.factor |
Reconstruct original texts | untokenize |