common

Module Contents

common.prepare_nodes(uast:bblfsh.Node)
common.order_nodes(uast, excluded_internal_roles)

Select nodes with (tokens or specific node types) and with correct pos information -> order by start_position.offset

common.transform_content(content:str, uast:bblfsh.Node, filler, excluded_internal_roles)

Visualize code without nodes with token and positions and fill theirs positions with filler.

Parameters:
  • content – content.
  • uast – UAST of content.
  • filler – string that is used to fill the nodes.
  • excluded_internal_roles – internal types that require special handling.
Returns:

updated content.

common._token_to_seq(token, to_check:Iterable[str])
common.token_to_seq(token, to_check:Iterable[str])
common.split_whitespaces_reserved(text, reserved_tokens:Iterable[str])

Split text into whitespaces(including newlines/etc) and reserved keywords/operators.

Parameters:
  • text – text with whitespaces and reserved keywords.
  • reserved_tokens – list of reserved keywords and operators.
Returns:

list of operators and whitespaces.

common.find_common_ancestor(node1, node2)
common.split_whitespaces_reserved_to_nodes(start, start_line, start_col, end, common_anc, content, reserved_tokens:Iterable[str])
common.extract_nodes(content, uast, reserved_tokens:Iterable[str], excluded_internal_roles:Iterable[str])

Extract list of Nodes ordered by position. :param content: content or text of source code. :param uast: UAST extracted from source code. :param reserved_tokens: list of reserved words ordered by length. :param excluded_internal_roles: list of exceptional internal types - special handling for them. :return: list of nodes.

common.collect_unique_features(contents, uasts, reserved_tokens:Iterable[str], excluded_internal_roles:Iterable[str], filenames:Iterable[str], ignore_errors:bool=False)
common.extract_features(filenames:Iterable[str], contents:List[str], uasts:List[bblfsh.Node], reserved_tokens:Iterable[str], excluded_internal_roles:Iterable[str], seq_len:int=5, depth:int=5, unique_features:Iterable[str]=None, use_features_after:bool=True, use_parents:bool=True, ignore_errors:bool=False, use_siblings:bool=False)

Extract features: * before label * after label if use_features_after * information about parents if use_parents and extract label + metadata (filename, min & max position of features in code and position of label).

Parameters:
  • filenames – list of filenames.
  • contents – list of contents of files.
  • uasts – list of extracted UASTs.
  • reserved_tokens – list of reserved tokens.
  • excluded_internal_roles – list of exceptional internal types - special handling for them.
  • seq_len – sequence length for features (before and after).
  • depth – how many parents to use.
  • unique_features – list of unique features. If None it will be collected from data.
  • use_features_after – if context after label should be used.
  • use_parents – if context about parent nodes should be used.
  • ignore_errors – if ignore_errors than files with problems will be skipped.
  • use_siblings – if context about siblings nodes should be used.
Returns:

list of features, list of labels, list of metadata.