Title: | Create Full Text Browsers from Annotated Token Lists |
---|---|
Description: | Create browsers for reading full texts from a token list format. Information obtained from text analyses (e.g., topic modeling, word scaling) can be used to annotate the texts. |
Authors: | Kasper Welbers and Wouter van Atteveldt |
Maintainer: | Kasper Welbers <[email protected]> |
License: | GPL-3 |
Version: | 0.1.5 |
Built: | 2024-11-04 03:48:23 UTC |
Source: | https://github.com/kasperwelbers/tokenbrowser |
Wrap values in an HTML tag
add_tag( x, tag, attr_str = NULL, ignore_na = F, span_adjacent = F, doc_id = NULL )
add_tag( x, tag, attr_str = NULL, ignore_na = F, span_adjacent = F, doc_id = NULL )
x |
a vector of values to be wrapped in a tag |
tag |
A character vector of length 1, specifying the html tag (e.g., "div", "h1", "span") |
attr_str |
A character string of the same length as x (or of length 1). |
ignore_na |
If TRUE, do not add tag if value is NA |
span_adjacent |
If TRUE, include adjacent tokens with identical attr_str within the same tag |
doc_id |
If span_adjacent is TRUE, The document ids are required to ensure that tags do not span from one document to another. |
a character vector
x = c("Obama","Bush") add_tag(x, 'span') ## add attributes with the tag_attr function add_tag(x, 'span', tag_attr(class = "president")) ## add style attributes with the attr_style function within tag_attr add_tag(x, 'span', tag_attr(class = "president", style = attr_style(`background-color` = 'rgba(255, 255, 0, 1)')))
x = c("Obama","Bush") add_tag(x, 'span') ## add attributes with the tag_attr function add_tag(x, 'span', tag_attr(class = "president")) ## add style attributes with the attr_style function within tag_attr add_tag(x, 'span', tag_attr(class = "president", style = attr_style(`background-color` = 'rgba(255, 255, 0, 1)')))
Designed to be used together with the tag_attr function.
attr_style(...)
attr_style(...)
... |
named arguments are used as settings in the html style attribute, with the name being the name of the setting (e.g., background-color). All arguments must be vectors of the same length. NA values can be used to ignore a setting, and if all settings are NA then NA is returned (instead of an empty string for style settings). |
a character vector with the content of the html style attribute
tag_attr(class = c('x','y'), style = attr_style(`background-color` = 'rgba(255, 255, 0, 1)'))
tag_attr(class = c('x','y'), style = attr_style(`background-color` = 'rgba(255, 255, 0, 1)'))
Convert tokens into full texts in an HTML file with category highlighting
categorical_browser( tokens, category, alpha = 0.3, labels = NULL, meta = NULL, colors = NULL, doc_col = "doc_id", token_col = "token", filename = NULL, unfold = NULL, span_adjacent = T, ... )
categorical_browser( tokens, category, alpha = 0.3, labels = NULL, meta = NULL, colors = NULL, doc_col = "doc_id", token_col = "token", filename = NULL, unfold = NULL, span_adjacent = T, ... )
tokens |
A data.frame with a column for document ids (doc_col) and a column for tokens (token_col) |
category |
Either a numeric vector with values representing categories, or a factor vector, in which case the values are used as labels. If a numeric vector is used, the labels can also be specified in the labels argument |
alpha |
Optionally, the alpha (transparency) can be specified, with 0 being fully transparent and 1 being fully colored. This can be a vector to specify a different alpha for each value. |
labels |
A character vector giving names to the unique category values. If category is a factor vector, the factor levels are used. |
meta |
A data.frame with a column for document_ids (doc_col). All other columns are added to the browser as document meta. |
colors |
A character vector with color names for unique values of the category argument. Has to be the same length as unique(na.omit(category)) |
doc_col |
The name of the document id column |
token_col |
The name of the token column |
filename |
Name of the output file. Default is temp file |
unfold |
Either a character vector or a named list of vectors of the same length as tokens. If given, all tokens with a tag can be clicked on to unfold the given text. If a list of vectors is given, the values of the columns are concatenated with the column name. E.g. list(doc_id = 1, sentence = 1) will be [doc_id = 1, sentence = 2]. |
span_adjacent |
If TRUE, include adjacent tokens with identical attributes within the same tag |
... |
Additional formatting arguments passed to create_browser() |
The name of the file where the browser is saved. Can be opened conveniently from within R using browseUrl()
## as an example, use simple grep to code tokens code = rep(NA, nrow(sotu_data$tokens)) code[grep('war', sotu_data$tokens$token)] = 'War' code[grep('mother|father|child', sotu_data$tokens$token)] = 'Family' code = as.factor(code) url = categorical_browser(sotu_data$tokens, category=code, meta=sotu_data$meta) view_browser(url) ## view browser in the Viewer if (interactive()) { browseURL(url) ## view in default webbrowser }
## as an example, use simple grep to code tokens code = rep(NA, nrow(sotu_data$tokens)) code[grep('war', sotu_data$tokens$token)] = 'War' code[grep('mother|father|child', sotu_data$tokens$token)] = 'Family' code = as.factor(code) url = categorical_browser(sotu_data$tokens, category=code, meta=sotu_data$meta) view_browser(url) ## view browser in the Viewer if (interactive()) { browseURL(url) ## view in default webbrowser }
This is a convenience wrapper for tag_tokens() that can be used if tokens need to be colored per category
category_highlight_tokens( tokens, category, labels = NULL, alpha = 0.4, class = NULL, colors = NULL, unfold = NULL, span_adjacent = F, doc_id = NULL )
category_highlight_tokens( tokens, category, labels = NULL, alpha = 0.4, class = NULL, colors = NULL, unfold = NULL, span_adjacent = F, doc_id = NULL )
tokens |
A character vector of tokens |
category |
Either a factor, or a numeric vector with values representing category indices. If a numeric vector is used, labels must also be given |
labels |
A character vector with labels for the categories |
alpha |
Optionally, the alpha (transparency) can be specified, with 0 being fully transparent and 1 being fully colored. This can be a vector to specify a different alpha for each value. |
class |
Optionally, a character vector of the class to add to the span tags. If NA no class is added |
colors |
A character vector with color names for unique values of the value argument. Has to be the same length as unique(na.omit(category)) |
unfold |
Either a character vector or a named list of vectors of the same length as tokens. If given, all tokens with a tag can be clicked on to unfold the given text. If a list of vectors is given,
the values of the columns are concatenated with the column name. E.g. list(doc_id = 1, sentence = 1) will be [doc_id = 1, sentence = 2].
This only works if the tagged tokens are used in the html browser created with the |
span_adjacent |
If TRUE, include adjacent tokens with identical attributes within the same tag |
doc_id |
If span_adjacent is TRUE, The document ids are required to ensure that tags do not span from one document to another. |
a character vector of color-tagged tokens
tokens = c('token_1','token_2','token_3','token_4') category = c('a','a',NA,'b') category_highlight_tokens(tokens, category)
tokens = c('token_1','token_2','token_3','token_4') category = c('a','a',NA,'b') category_highlight_tokens(tokens, category)
This is a convenience wrapper for tag_tokens() that can be used if tokens only need to be colored.
colorscale_tokens( tokens, value, alpha = 0.4, class = NULL, col_range = c("red", "blue"), unfold = NULL, span_adjacent = F, doc_id = NULL )
colorscale_tokens( tokens, value, alpha = 0.4, class = NULL, col_range = c("red", "blue"), unfold = NULL, span_adjacent = F, doc_id = NULL )
tokens |
A character vector of tokens |
value |
A numeric vector with values between -1 and 1. Determines the color mixture of the scale colors specified in col_range |
alpha |
Optionally, the alpha (transparency) can be specified, with 0 being fully transparent and 1 being fully colored. This can be a vector to specify a different alpha for each value. |
class |
Optionally, a character vector of the class to add to the span tags. If NA no class is added |
col_range |
The colors used in the scale ramp. |
unfold |
Either a character vector or a named list of vectors of the same length as tokens. If given, all tokens with a tag can be clicked on to unfold the given text. If a list of vectors is given,
the values of the columns are concatenated with the column name. E.g. list(doc_id = 1, sentence = 1) will be [doc_id = 1, sentence = 2].
This only works if the tagged tokens are used in the html browser created with the |
span_adjacent |
If TRUE, include adjacent tokens with identical attributes within the same tag |
doc_id |
If span_adjacent is TRUE, The document ids are required to ensure that tags do not span from one document to another. |
a character vector of color-tagged tokens
colorscale_tokens(c('token_1','token_2','token_3'), value = c(-1,0,1))
colorscale_tokens(c('token_1','token_2','token_3'), value = c(-1,0,1))
Convert tokens into full texts in an HTML file with color ramp highlighting
colorscaled_browser( tokens, value, alpha = 0.4, meta = NULL, col_range = c("red", "blue"), doc_col = "doc_id", token_col = "token", doc_nav = NULL, token_nav = NULL, filename = NULL, unfold = NULL, span_adjacent = T, ... )
colorscaled_browser( tokens, value, alpha = 0.4, meta = NULL, col_range = c("red", "blue"), doc_col = "doc_id", token_col = "token", doc_nav = NULL, token_nav = NULL, filename = NULL, unfold = NULL, span_adjacent = T, ... )
tokens |
A data.frame with a column for document ids (doc_col) and a column for tokens (token_col) |
value |
A numeric vector with values between -1 and 1. Determines the color mixture of the scale colors specified in col_range |
alpha |
Optionally, the alpha (transparency) can be specified, with 0 being fully transparent and 1 being fully colored. This can be a vector to specify a different alpha for each value. |
meta |
A data.frame with a column for document_ids (doc_col). All other columns are added to the browser as document meta |
col_range |
The color used to highlight |
doc_col |
The name of the document id column |
token_col |
The name of the token column |
doc_nav |
The name of a column in meta, used to set a navigation tag |
token_nav |
Alternative to doc_nav, a column in the tokens, used to set a navigation tag |
filename |
Name of the output file. Default is temp file |
unfold |
Either a character vector or a named list of vectors of the same length as tokens. If given, all tokens with a tag can be clicked on to unfold the given text. If a list of vectors is given, the values of the columns are concatenated with the column name. E.g. list(doc_id = 1, sentence = 1) will be [doc_id = 1, sentence = 2]. |
span_adjacent |
If TRUE, include adjacent tokens with identical attributes within the same tag |
... |
Additional formatting arguments passed to create_browser() |
The name of the file where the browser is saved. Can be opened conveniently from within R using browseUrl()
## as an example, scale word colors based on number of characters scale = nchar(as.character(sotu_data$tokens$token)) scale[scale>6] = scale[scale>6] +20 scale = rescale_var(sqrt(scale), -1, 1) scale[abs(scale) < 0.5] = NA url = colorscaled_browser(sotu_data$tokens, value = scale, meta=sotu_data$meta) view_browser(url) ## view browser in the Viewer if (interactive()) { browseURL(url) ## view in default webbrowser }
## as an example, scale word colors based on number of characters scale = nchar(as.character(sotu_data$tokens$token)) scale[scale>6] = scale[scale>6] +20 scale = rescale_var(sqrt(scale), -1, 1) scale[abs(scale) < 0.5] = NA url = colorscaled_browser(sotu_data$tokens, value = scale, meta=sotu_data$meta) view_browser(url) ## view browser in the Viewer if (interactive()) { browseURL(url) ## view in default webbrowser }
Convert tokens into full texts in an HTML file
create_browser( tokens, meta = NULL, doc_col = "doc_id", token_col = "token", space_col = NULL, doc_nav = NULL, token_nav = NULL, filename = NULL, css_str = NULL, header = "", subheader = "", n = TRUE, navfilter = TRUE, top_nav = NULL, thres_nav = 1, colors = NULL, style_col1 = "#7D1935", style_col2 = "#F5F3EE", drop_missing_meta = FALSE )
create_browser( tokens, meta = NULL, doc_col = "doc_id", token_col = "token", space_col = NULL, doc_nav = NULL, token_nav = NULL, filename = NULL, css_str = NULL, header = "", subheader = "", n = TRUE, navfilter = TRUE, top_nav = NULL, thres_nav = 1, colors = NULL, style_col1 = "#7D1935", style_col2 = "#F5F3EE", drop_missing_meta = FALSE )
tokens |
A data.frame with a column for document ids (doc_col) and a column for tokens (token_col) |
meta |
A data.frame with a column for document_ids (doc_col). All other columns are added to the browser as document meta |
doc_col |
The name of the document id column |
token_col |
The name of the token column |
space_col |
Optionally, a column with space indications (" ", "\n", etc.) per token (which is how some NLP parsers indicate spaces) |
doc_nav |
The name of a column (factor or character) in meta, used to create a navigation bar for selecting document groups. |
token_nav |
Alternative to doc_nav, a column in the tokens. Navigation filters will then be used to select documents in which the value occurs at least once. |
filename |
Name of the output file. Default is temp file |
css_str |
A character string, to be directly added to the css style header |
header |
Optionally, specify the header |
subheader |
Optionally, specify a subheader |
n |
If TRUE, report N in header |
navfilter |
If TRUE (default) enable filtering with nav(igation) bar. |
top_nav |
A number. If token_nav is used, navigation filters will only apply to the top x values with highest token occurence in a document |
thres_nav |
Like top_nav, but specifying a threshold for the minimum number of tokens. |
colors |
Optionally, a vector with color names for the navigation bar. Length has to be identical to unique non-NA items in the navigation. |
style_col1 |
Color of the browser header |
style_col2 |
Color of the browser background |
drop_missing_meta |
if TRUE, omit missing meta rows instead of printing empty value |
The name of the file where the browser is saved. Can be opened conveniently from within R using browseUrl()
url = create_browser(sotu_data$tokens, sotu_data$meta, token_col = 'token', header = 'Speeches') view_browser(url) ## view browser in the Viewer if (interactive()) { browseURL(url) ## view in default webbrowser }
url = create_browser(sotu_data$tokens, sotu_data$meta, token_col = 'token', header = 'Speeches') view_browser(url) ## view browser in the Viewer if (interactive()) { browseURL(url) ## view in default webbrowser }
Each row of the data.frame is transformed into a html table with two columns: name and value. The columnnames of meta are used as names.
create_meta_tables(meta, ignore_col = NULL, drop_missing = FALSE)
create_meta_tables(meta, ignore_col = NULL, drop_missing = FALSE)
meta |
a data.frame where each row represents the meta data for a document |
ignore_col |
optionally, a character vector with names of metadata columns to ignore |
drop_missing |
if TRUE, omit missing meta rows instead of printing empty value |
a character vector where each value contains a string for an html table.
tabs = create_meta_tables(sotu_data$meta) tabs[1]
tabs = create_meta_tables(sotu_data$meta) tabs[1]
Designed to be used together with the attr_style function. The return value can directly be used to set the color in an html tag attribute (e.g., color, background-color)
highlight_col(value, col = "yellow")
highlight_col(value, col = "yellow")
value |
Either a logical vector or a numeric vector with values between 0 and 1. If a logical vector is used, then tokens with TRUE will be highlighted (with the color specified in pos_col). If a numeric vector is used, the value determines the alpha (transparency), with 0 being fully transparent and 1 being fully colored. |
col |
The color used to highlight |
The string used to specify a color in an html tag attribute
highlight_col(c(NA, 0, 0.1,0.5, 1)) ## used in combination with attr_style() attr_style(color = highlight_col(c(NA, 0, 0.1,0.5, 1))) ## note that for background-color you need inversed quotes to deal ## with the hyphen in an argument name attr_style(`background-color` = highlight_col(c(NA, 0, 0.1,0.5, 1))) tag_attr(class = c(1, 2), style = attr_style(`background-color` = highlight_col(c(FALSE,TRUE))))
highlight_col(c(NA, 0, 0.1,0.5, 1)) ## used in combination with attr_style() attr_style(color = highlight_col(c(NA, 0, 0.1,0.5, 1))) ## note that for background-color you need inversed quotes to deal ## with the hyphen in an argument name attr_style(`background-color` = highlight_col(c(NA, 0, 0.1,0.5, 1))) tag_attr(class = c(1, 2), style = attr_style(`background-color` = highlight_col(c(FALSE,TRUE))))
This is a convenience wrapper for tag_tokens() that can be used if tokens only need to be colored.
highlight_tokens( tokens, value, class = NULL, col = "yellow", unfold = NULL, span_adjacent = F, doc_id = NULL )
highlight_tokens( tokens, value, class = NULL, col = "yellow", unfold = NULL, span_adjacent = F, doc_id = NULL )
tokens |
A character vector of tokens |
value |
Either a logical vector or a numeric vector with values between 0 and 1. If a logical vector is used, then tokens with TRUE will be highlighted (with the color specified in pos_col). If a numeric vector is used, the value determines the alpha (transparency), with 0 being fully transparent and 1 being fully colored. |
class |
Optionally, a character vector of the class to add to the span tags. If NA no class is added |
col |
The color used to highlight |
unfold |
Either a character vector or a named list of vectors of the same length as tokens. If given, all tokens with a tag can be clicked on to unfold the given text. If a list of vectors is given,
the values of the columns are concatenated with the column name. E.g. list(doc_id = 1, sentence = 1) will be [doc_id = 1, sentence = 2].
This only works if the tagged tokens are used in the html browser created with the |
span_adjacent |
If TRUE, include adjacent tokens with identical attributes within the same tag |
doc_id |
If span_adjacent is TRUE, The document ids are required to ensure that tags do not span from one document to another. |
a character vector of color-tagged tokens
highlight_tokens(c('token_1','token_2','token_3'), value = c(FALSE,FALSE,TRUE)) highlight_tokens(c('token_1','token_2','token_3'), value = c(0,0.3,0.6))
highlight_tokens(c('token_1','token_2','token_3'), value = c(FALSE,FALSE,TRUE)) highlight_tokens(c('token_1','token_2','token_3'), value = c(0,0.3,0.6))
Convert tokens into full texts in an HTML file with highlighted tokens
highlighted_browser( tokens, value, meta = NULL, col = "yellow", doc_col = "doc_id", token_col = "token", doc_nav = NULL, token_nav = NULL, filename = NULL, unfold = NULL, span_adjacent = T, ... )
highlighted_browser( tokens, value, meta = NULL, col = "yellow", doc_col = "doc_id", token_col = "token", doc_nav = NULL, token_nav = NULL, filename = NULL, unfold = NULL, span_adjacent = T, ... )
tokens |
A data.frame with a column for document ids (doc_col) and a column for tokens (token_col) |
value |
Either a logical vector or a numeric vector with values between 0 and 1. If a logical vector is used, then tokens with TRUE will be highlighted (with the color specified in pos_col). If a numeric vector is used, the value determines the alpha (transparency), with 0 being fully transparent and 1 being fully colored. |
meta |
A data.frame with a column for document_ids (doc_col). All other columns are added to the browser as document meta |
col |
The color used to highlight |
doc_col |
The name of the document id column |
token_col |
The name of the token column |
doc_nav |
The name of a column in meta, used to set a navigation tag |
token_nav |
Alternative to doc_nav, a column in the tokens, used to set a navigation tag |
filename |
Name of the output file. Default is temp file |
unfold |
Either a character vector or a named list of vectors of the same length as tokens. If given, all tokens with a tag can be clicked on to unfold the given text. If a list of vectors is given, the values of the columns are concatenated with the column name. E.g. list(doc_id = 1, sentence = 1) will be [doc_id = 1, sentence = 2]. |
span_adjacent |
If TRUE, include adjacent tokens with identical attributes within the same tag |
... |
Additional formatting arguments passed to create_browser() |
The name of the file where the browser is saved. Can be opened conveniently from within R using browseUrl()
## as an example, highlight words based on word length highlight = nchar(as.character(sotu_data$tokens$token)) highlight = highlight / max(highlight) highlight[highlight < 0.3] = NA url = highlighted_browser(sotu_data$tokens, value = highlight, sotu_data$meta) view_browser(url) ## view browser in the Viewer if (interactive()) { browseURL(url) ## view in default webbrowser }
## as an example, highlight words based on word length highlight = nchar(as.character(sotu_data$tokens$token)) highlight = highlight / max(highlight) highlight[highlight < 0.3] = NA url = highlighted_browser(sotu_data$tokens, value = highlight, sotu_data$meta) view_browser(url) ## view browser in the Viewer if (interactive()) { browseURL(url) ## view in default webbrowser }
create the html template
html_template(template, css_str = NULL, col1 = "#7D1935", col2 = "#F5F3EE")
html_template(template, css_str = NULL, col1 = "#7D1935", col2 = "#F5F3EE")
template |
The name of the template to be used |
css_str |
A character string, to be directly added to the css style header |
col1 |
The first style color (top bar color) |
col2 |
The second style color (background color) |
A list with the html header and footer
Rescale a numeric variable
rescale_var(x, new_min = 0, new_max = 1, x_min = min(x), x_max = max(x))
rescale_var(x, new_min = 0, new_max = 1, x_min = min(x), x_max = max(x))
x |
a numeric vector |
new_min |
The minimum value of the output |
new_max |
The maximum value of the output |
x_min |
The lowest possible value in x. By default this is the actual lowest value in x. |
x_max |
The highest possible value in x. By default this is the actual highest value in x. |
a numeric vector
rescale_var(1:10) rescale_var(1:10, new_min = -1, new_max = 1)
rescale_var(1:10) rescale_var(1:10, new_min = -1, new_max = 1)
Wrap html body in the template and save
save_html(data, template, filename = NULL)
save_html(data, template, filename = NULL)
data |
The html body data |
template |
The html header/footer template |
filename |
The name of the file to save the html. Default is a temp file |
The (local) url to the html file
Designed to be used together with the attr_style function. The return value can directly be used to set the color in an html tag attribute (e.g., color, background-color)
scale_col(value, alpha = 1, col_range = c("red", "blue"))
scale_col(value, alpha = 1, col_range = c("red", "blue"))
value |
A numeric vector with values between -1 and 1. Determines the color mixture of the scale colors specified in col_range |
alpha |
Optionally, the alpha (transparency) can be specified, with 0 being fully transparent and 1 being fully colored. This can be a vector to specify a different alpha for each value. |
col_range |
The colors used in the scale. |
The string used to specify a color in a html tag attribute
scale_col(c(NA, -1, 0, 0.5, 1)) ## used in combination with attr_style() attr_style(color = scale_col(c(NA, -1, 0, 0.5, 1))) ## note that for background-color you need inversed ## quotes to deal with the hyphen in an argument name attr_style(`background-color` = scale_col(c(NA, -1, 0, 0.5, 1))) tag_attr(class = c(1, 2), style = attr_style(`background-color` = scale_col(c(-1,1))))
scale_col(c(NA, -1, 0, 0.5, 1)) ## used in combination with attr_style() attr_style(color = scale_col(c(NA, -1, 0, 0.5, 1))) ## note that for background-color you need inversed ## quotes to deal with the hyphen in an argument name attr_style(`background-color` = scale_col(c(NA, -1, 0, 0.5, 1))) tag_attr(class = c(1, 2), style = attr_style(`background-color` = scale_col(c(-1,1))))
Transpose a color into the string format used in html attributes
set_col(col, alpha = 1)
set_col(col, alpha = 1)
col |
The name of the color |
alpha |
Optionally, the alpha (transparency), with 0 being fully transparent and 1 being fully colorized. |
The string used to specify a color in an html tag attribute
set_col('red') set_col('red', alpha=0.5)
set_col('red') set_col('red', alpha=0.5)
Tokens from Bush and Obamas State of the Union addresses
data(sotu_data)
data(sotu_data)
sotu_data: A data.frame with tokens and a data.frame with meta data
Word assignments, docXtopic matrix and topicXword matrix of an LDA model of the SOTU data
data(sotu_lda)
data(sotu_lda)
sotu_lda: Word assignments is a data.frame with document, lemma and topic columns. topic_word_mat and doc_topic_mat are matrices
create attribute string for html tags
tag_attr(...)
tag_attr(...)
... |
named arguments are used as attributes, with the name being the name of the attribute (e.g., class, style). All argument must be vectors of the same length, or lenght 1 (used as a constant). NA values can be used to skip an attribute. If all attributes are NA, an NA is returned |
a character vector with attribute strings. Designed to be usable as the attr_str in add_tag(). If ... is empty, NA is returned
add_tag('TEXT', 'span') add_tag('TEXT', 'span', tag_attr(class='CLASS'))
add_tag('TEXT', 'span') add_tag('TEXT', 'span', tag_attr(class='CLASS'))
This is the main function for adding colors, onclick effects, etc. to tokens, for which <span> tags are used. The named arguments are used to set the attributes.
tag_tokens( tokens, tag = "span", span_adjacent = F, doc_id = NULL, unfold = NULL, ... )
tag_tokens( tokens, tag = "span", span_adjacent = F, doc_id = NULL, unfold = NULL, ... )
tokens |
a vector of tokens. |
tag |
The name of the tag to be used |
span_adjacent |
If TRUE, include adjacent tokens with identical attributes within the same tag |
doc_id |
If span_adjacent is TRUE, The document ids are required to ensure that tags do not span from one document to another. |
unfold |
Either a character vector or a named list of vectors of the same length as tokens. If given, all tokens with a tag can be clicked on to unfold the given text. If a list of vectors is given,
the values of the columns are concatenated with the column name. E.g. list(doc_id = 1, sentence = 1) will be [doc_id = 1, sentence = 2].
This only works if the tagged tokens are used in the html browser created with the |
... |
named arguments are used as attributes in the span tag for each token, with the name being the name of the attribute (e.g., class, . Each argument must be a vector of the same length as the number of tokens. NA values can be used to ignore attribute for a token, and if a token has NA for each attribute, it is not given a span tag. |
If a token does not have any attributes, the <span> tag is not added.
Note that the attr_style() function can be used to conveniently set the style attribute. Also, the set_col(), highlight_col() and scale_col() functions can be used to set the color of style attributes. See the example for illustration.
a character vector of tagged tokens
tag_tokens(tokens = c('token_1','token_2', 'token_3'), class = c(1,1,2), style = attr_style(color = set_col('red'), `background-color` = highlight_col(c(FALSE,FALSE,TRUE)))) ## tokens without attributes are not given a span tag tag_tokens(tokens = c('token_1','token_2', 'token_3'), class = c(1,NA,NA), style = attr_style(color = highlight_col(c(TRUE,TRUE,FALSE)))) ## span_adjacent can be used to put tokens with identical tags within one tag ## but then a doc_id has to be given as well tag_tokens(tokens = c('token_1','token_2', 'token_3'), class = c(1,1,NA), span_adjacent=TRUE, doc_id = c(1,1,1))
tag_tokens(tokens = c('token_1','token_2', 'token_3'), class = c(1,1,2), style = attr_style(color = set_col('red'), `background-color` = highlight_col(c(FALSE,FALSE,TRUE)))) ## tokens without attributes are not given a span tag tag_tokens(tokens = c('token_1','token_2', 'token_3'), class = c(1,NA,NA), style = attr_style(color = highlight_col(c(TRUE,TRUE,FALSE)))) ## span_adjacent can be used to put tokens with identical tags within one tag ## but then a doc_id has to be given as well tag_tokens(tokens = c('token_1','token_2', 'token_3'), class = c(1,1,NA), span_adjacent=TRUE, doc_id = c(1,1,1))
View a browser (HTML) in the R viewer
view_browser(url)
view_browser(url)
url |
An URL, created with *_browser |
url = create_browser(sotu_data$tokens, sotu_data$meta, token_col = 'token', header = 'Speeches') ## the url view_browser(url) ## view browser in the Viewer
url = create_browser(sotu_data$tokens, sotu_data$meta, token_col = 'token', header = 'Speeches') ## the url view_browser(url) ## view browser in the Viewer
Pastes the tokens into articles, and returns an <article> html element.
wrap_documents( tokens, meta, doc_col = "doc_id", token_col = "token", space_col = NULL, nav = doc_col, token_nav = NULL, top_nav = NULL, thres_nav = NULL, drop_missing_meta = FALSE )
wrap_documents( tokens, meta, doc_col = "doc_id", token_col = "token", space_col = NULL, nav = doc_col, token_nav = NULL, top_nav = NULL, thres_nav = NULL, drop_missing_meta = FALSE )
tokens |
A data.frame with a column for document ids (doc_col) and a column for tokens (token_col) |
meta |
A data.frame with a column for document_ids (doc_col). All other columns are added to the browser as document meta |
doc_col |
The name of the document id column |
token_col |
The name of the token column |
space_col |
Optionally, a column with space indications (e.g., newline) per token (which is how some NLP parsers indicate spaces) |
nav |
The column in meta used for nav. Defaults to 'doc_id' |
token_nav |
Alternative to nav (which uses meta), a column in tokens used for navigation |
top_nav |
If token_nav is used, navigation filters will only apply to the top x values with highest token occurence in a document |
thres_nav |
Like top_nav, but specifying a threshold for the minimum number of tokens. |
drop_missing_meta |
if TRUE, omit missing meta rows instead of printing empty value |
A named vector, with document ids as names and the document html strings as values
docs = wrap_documents(sotu_data$tokens, sotu_data$meta) head(names(docs)) docs[[1]]
docs = wrap_documents(sotu_data$tokens, sotu_data$meta) head(names(docs)) docs[[1]]