Chapter 10: Text Feature Engineering

When Long-Context Encoders Earn Their Keep intermediate

Use full-document encoding only when the label depends on interactions between distant parts of the document; otherwise chunking is usually the better default.

Use full-document encoding only when the label depends on interactions between distant parts of the document; otherwise chunking is usually the better default.

Register to Read

Sign up for a free account to access all 112 primer topics.

Create Free Account

Already have an account? Sign in

References

FinMTEB: Finance Massive Text Embedding Benchmark
Yixuan Tang, Yi Yang (2025)
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova (2019) — Association for Computational Linguistics
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Victor Sanh, Lysandre Debut, Julien Chaumond, Thomas Wolf (2020) — arXiv:1910.01108 [cs]
Natural Language Processing for Asset Managers: Turning Text into Alpha
Guido Baltussen, Gijsbert de Lange, Ashraf Mansur, Olivera Rakic, Machiel Westerdijk (2025) — The Journal of Portfolio Management
Attention Is All You Need
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin (2017) — arXiv:1706.03762 [cs]