DataConnector

class DataConnector()

Base class for data connectors that can be extended to generate documents and passages from a custom data source.

generate_documents

def generate_documents() -> Iterator[Tuple[str, Dict]]

Generate document text and metadata from a data source.

Returns:

  • documents Iterator[Tuple[str, Dict]] - Generate a tuple of string text and metadata dictionary for each document.

generate_passages

def generate_passages(documents: List[Document],
                      chunk_size: int = 1024) -> Iterator[Tuple[str, Dict]]

Generate passage text and metadata from a list of documents.

Arguments:

  • documents List[Document] - List of documents to generate passages from.
  • chunk_size int, optional - Chunk size for splitting passages. Defaults to 1024.

Returns:

  • passages Iterator[Tuple[str, Dict]] - Generate a tuple of string text and metadata dictionary for each passage.