:mod:`common` ============= .. py:module:: common Module Contents --------------- .. function:: prepare_nodes(uast:bblfsh.Node) .. function:: order_nodes(uast, excluded_internal_roles) Select nodes with (tokens or specific node types) and with correct pos information -> order by `start_position.offset` .. function:: transform_content(content:str, uast:bblfsh.Node, filler, excluded_internal_roles) Visualize code without nodes with token and positions and fill theirs positions with filler. :param content: content. :param uast: UAST of content. :param filler: string that is used to fill the nodes. :param excluded_internal_roles: internal types that require special handling. :return: updated content. .. function:: _token_to_seq(token, to_check:Iterable[str]) .. function:: token_to_seq(token, to_check:Iterable[str]) .. function:: split_whitespaces_reserved(text, reserved_tokens:Iterable[str]) Split text into whitespaces(including newlines/etc) and reserved keywords/operators. :param text: text with whitespaces and reserved keywords. :param reserved_tokens: list of reserved keywords and operators. :return: list of operators and whitespaces. .. function:: find_common_ancestor(node1, node2) .. function:: split_whitespaces_reserved_to_nodes(start, start_line, start_col, end, common_anc, content, reserved_tokens:Iterable[str]) .. function:: extract_nodes(content, uast, reserved_tokens:Iterable[str], excluded_internal_roles:Iterable[str]) Extract list of Nodes ordered by position. :param content: content or text of source code. :param uast: UAST extracted from source code. :param reserved_tokens: list of reserved words ordered by length. :param excluded_internal_roles: list of exceptional internal types - special handling for them. :return: list of nodes. .. function:: collect_unique_features(contents, uasts, reserved_tokens:Iterable[str], excluded_internal_roles:Iterable[str], filenames:Iterable[str], ignore_errors:bool=False) .. function:: extract_features(filenames:Iterable[str], contents:List[str], uasts:List[bblfsh.Node], reserved_tokens:Iterable[str], excluded_internal_roles:Iterable[str], seq_len:int=5, depth:int=5, unique_features:Iterable[str]=None, use_features_after:bool=True, use_parents:bool=True, ignore_errors:bool=False, use_siblings:bool=False) Extract features: * before label * after label if `use_features_after` * information about parents if `use_parents` and extract label + metadata (filename, min & max position of features in code and position of label). :param filenames: list of filenames. :param contents: list of contents of files. :param uasts: list of extracted UASTs. :param reserved_tokens: list of reserved tokens. :param excluded_internal_roles: list of exceptional internal types - special handling for them. :param seq_len: sequence length for features (before and after). :param depth: how many parents to use. :param unique_features: list of unique features. If None it will be collected from data. :param use_features_after: if context after label should be used. :param use_parents: if context about parent nodes should be used. :param ignore_errors: if ignore_errors than files with problems will be skipped. :param use_siblings: if context about siblings nodes should be used. :return: list of features, list of labels, list of metadata.