Aller au contenu principal

Class: SentenceSplitter

SentenceSplitter is our default text splitter that supports splitting into sentences, paragraphs, or fixed length chunks with overlap.

One of the advantages of SentenceSplitter is that even in the fixed length chunks it will try to keep sentences together.

Constructors

constructor

new SentenceSplitter(options?)

Parameters

NameType
options?Object
options.chunkOverlap?number
options.chunkSize?number
options.chunkingTokenizerFn?(text: string) => null | RegExpMatchArray
options.paragraphSeparator?string
options.splitLongSentences?boolean
options.tokenizer?any
options.tokenizerDecoder?any

Defined in

packages/core/src/TextSplitter.ts:67

Properties

chunkOverlap

Private chunkOverlap: number

Defined in

packages/core/src/TextSplitter.ts:60


chunkSize

Private chunkSize: number

Defined in

packages/core/src/TextSplitter.ts:59


chunkingTokenizerFn

Private chunkingTokenizerFn: (text: string) => null | RegExpMatchArray

Type declaration

▸ (text): null | RegExpMatchArray

Parameters
NameType
textstring
Returns

null | RegExpMatchArray

Defined in

packages/core/src/TextSplitter.ts:64


paragraphSeparator

Private paragraphSeparator: string

Defined in

packages/core/src/TextSplitter.ts:63


splitLongSentences

Private splitLongSentences: boolean

Defined in

packages/core/src/TextSplitter.ts:65


tokenizer

Private tokenizer: any

Defined in

packages/core/src/TextSplitter.ts:61


tokenizerDecoder

Private tokenizerDecoder: any

Defined in

packages/core/src/TextSplitter.ts:62

Methods

combineTextSplits

combineTextSplits(newSentenceSplits, effectiveChunkSize): TextSplit[]

Parameters

NameType
newSentenceSplitsSplitRep[]
effectiveChunkSizenumber

Returns

TextSplit[]

Defined in

packages/core/src/TextSplitter.ts:205


getEffectiveChunkSize

Private getEffectiveChunkSize(extraInfoStr?): number

Parameters

NameType
extraInfoStr?string

Returns

number

Defined in

packages/core/src/TextSplitter.ts:104


getParagraphSplits

getParagraphSplits(text, effectiveChunkSize?): string[]

Parameters

NameType
textstring
effectiveChunkSize?number

Returns

string[]

Defined in

packages/core/src/TextSplitter.ts:121


getSentenceSplits

getSentenceSplits(text, effectiveChunkSize?): string[]

Parameters

NameType
textstring
effectiveChunkSize?number

Returns

string[]

Defined in

packages/core/src/TextSplitter.ts:147


processSentenceSplits

Private processSentenceSplits(sentenceSplits, effectiveChunkSize): SplitRep[]

Splits sentences into chunks if necessary.

This isn't great behavior because it can split down the middle of a word or in non-English split down the middle of a Unicode codepoint so the splitting is turned off by default. If you need it, please set the splitLongSentences option to true.

Parameters

NameType
sentenceSplitsstring[]
effectiveChunkSizenumber

Returns

SplitRep[]

Defined in

packages/core/src/TextSplitter.ts:176


splitText

splitText(text, extraInfoStr?): string[]

Parameters

NameType
textstring
extraInfoStr?string

Returns

string[]

Defined in

packages/core/src/TextSplitter.ts:297


splitTextWithOverlaps

splitTextWithOverlaps(text, extraInfoStr?): TextSplit[]

Parameters

NameType
textstring
extraInfoStr?string

Returns

TextSplit[]

Defined in

packages/core/src/TextSplitter.ts:269