common
¶
Module Contents¶
-
common.
prepare_nodes
(uast:bblfsh.Node)¶
-
common.
order_nodes
(uast, excluded_internal_roles)¶ Select nodes with (tokens or specific node types) and with correct pos information -> order by start_position.offset
-
common.
transform_content
(content:str, uast:bblfsh.Node, filler, excluded_internal_roles)¶ Visualize code without nodes with token and positions and fill theirs positions with filler.
Parameters: - content – content.
- uast – UAST of content.
- filler – string that is used to fill the nodes.
- excluded_internal_roles – internal types that require special handling.
Returns: updated content.
-
common.
_token_to_seq
(token, to_check:Iterable[str])¶
-
common.
token_to_seq
(token, to_check:Iterable[str])¶
-
common.
split_whitespaces_reserved
(text, reserved_tokens:Iterable[str])¶ Split text into whitespaces(including newlines/etc) and reserved keywords/operators.
Parameters: - text – text with whitespaces and reserved keywords.
- reserved_tokens – list of reserved keywords and operators.
Returns: list of operators and whitespaces.
-
common.
find_common_ancestor
(node1, node2)¶
-
common.
split_whitespaces_reserved_to_nodes
(start, start_line, start_col, end, common_anc, content, reserved_tokens:Iterable[str])¶
-
common.
extract_nodes
(content, uast, reserved_tokens:Iterable[str], excluded_internal_roles:Iterable[str])¶ Extract list of Nodes ordered by position. :param content: content or text of source code. :param uast: UAST extracted from source code. :param reserved_tokens: list of reserved words ordered by length. :param excluded_internal_roles: list of exceptional internal types - special handling for them. :return: list of nodes.
-
common.
collect_unique_features
(contents, uasts, reserved_tokens:Iterable[str], excluded_internal_roles:Iterable[str], filenames:Iterable[str], ignore_errors:bool=False)¶
-
common.
extract_features
(filenames:Iterable[str], contents:List[str], uasts:List[bblfsh.Node], reserved_tokens:Iterable[str], excluded_internal_roles:Iterable[str], seq_len:int=5, depth:int=5, unique_features:Iterable[str]=None, use_features_after:bool=True, use_parents:bool=True, ignore_errors:bool=False, use_siblings:bool=False)¶ Extract features: * before label * after label if use_features_after * information about parents if use_parents and extract label + metadata (filename, min & max position of features in code and position of label).
Parameters: - filenames – list of filenames.
- contents – list of contents of files.
- uasts – list of extracted UASTs.
- reserved_tokens – list of reserved tokens.
- excluded_internal_roles – list of exceptional internal types - special handling for them.
- seq_len – sequence length for features (before and after).
- depth – how many parents to use.
- unique_features – list of unique features. If None it will be collected from data.
- use_features_after – if context after label should be used.
- use_parents – if context about parent nodes should be used.
- ignore_errors – if ignore_errors than files with problems will be skipped.
- use_siblings – if context about siblings nodes should be used.
Returns: list of features, list of labels, list of metadata.