Chapter 10: Text Feature Engineering
When Long-Context Encoders Earn Their Keep intermediate
Use full-document encoding only when the label depends on interactions between distant parts of the document; otherwise chunking is usually the better default.
Use full-document encoding only when the label depends on interactions between distant parts of the document; otherwise chunking is usually the better default.
Register to Read
Sign up for a free account to access all 112 primer topics.
Create Free AccountAlready have an account? Sign in
References
FinMTEB: Finance Massive Text Embedding Benchmark
Yixuan Tang, Yi Yang
(2025)
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova
(2019)
— Association for Computational Linguistics
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Victor Sanh, Lysandre Debut, Julien Chaumond, Thomas Wolf
(2020)
— arXiv:1910.01108 [cs]
Natural Language Processing for Asset Managers: Turning Text into Alpha
Guido Baltussen, Gijsbert de Lange, Ashraf Mansur, Olivera Rakic, Machiel Westerdijk
(2025)
— The Journal of Portfolio Management
Attention Is All You Need
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin
(2017)
— arXiv:1706.03762 [cs]