What language model to train if you have one million GPU hours?

Teven Le Scao , Thomas Wang , Daniel Hesslow , Lucile Saulnier
arXiv preprint arXiv:2210.15424

11
2022
Holographic Global Convolutional Networks for Long-Range Prediction Tasks in Malware Detection

Mohammad Mahmudul Alam , Edward Raff , Stella R Biderman , Tim Oates
International Conference on Artificial Intelligence and Statistics 4042 -4050

1
2024
The Pile: An 800GB Dataset of Diverse Text for Language Modeling

Travis Hoppe , Jason Phang , Horace He , Stella Biderman
arXiv: Computation and Language

108
2020
Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets

Julia Kreutzer , Isaac Caswell , Lisa Wang , Ahsan Wahab
arXiv: Computation and Language

66
2021
Gpt-neox-20b: An open-source autoregressive language model

Sid Black , Stella Biderman , Eric Hallahan , Quentin Anthony
arXiv preprint arXiv:2204.06745

73
2022
Vqgan-clip: Open domain image generation and editing with natural language guidance

Katherine Crowson , Stella Biderman , Daniel Kornis , Dashiell Stander
Smpte Journal 88 -105

68
2022
Gpt-neo: Large scale autoregressive language modeling with mesh-tensorflow

Sid Black , Leo Gao , Phil Wang , Connor Leahy
If you use this software, please cite it using these metadata 58

252
2021
Bigbio: a framework for data-centric biomedical natural language processing

Jason Fries , Leon Weber , Natasha Seelam , Gabriel Altay
Advances in Neural Information Processing Systems 35 25792 -25806

5
2022
Data governance in the age of large-scale data-driven language technology

Yacine Jernite , Huu Nguyen , Stella Biderman , Anna Rogers
Smpte Journal 2206 -2222

13
2022
EleutherAI: Going Beyond" Open Science" to" Science in the Open"

Jason Phang , Herbie Bradley , Leo Gao , Louis Castricato
arXiv preprint arXiv:2210.06413

2022
Documenting geographically and contextually diverse data sources: The bigscience catalogue of language data and resources

Angelina McMillan-Major , Zaid Alyafeai , Stella Biderman , Kimbo Chen
arXiv preprint arXiv:2201.10066

4
2022
Datasheet for the pile

Stella Biderman , Kieran Bicheno , Leo Gao
arXiv preprint arXiv:2201.07311

6
2022
Cut the CARP: Fishing for zero-shot story evaluation

Shahbuland Matiana , JR Smith , Ryan Teehan , Louis Castricato
arXiv preprint arXiv:2110.03111

4
2021
OpenFold: Retraining AlphaFold2 yields new insights into its learning mechanisms and capacity for generalization

Gustaf Ahdritz , Nazim Bouatta , Christina Floristean , Sachin Kadyan
bioRxiv 2022 -11

11
2022
Fooling moss detection with pretrained language models

Stella Biderman , Edward Raff
Smpte Journal 2933 -2943

1
2022
Llemma: An open language model for mathematics

Zhangir Azerbayev , Hailey Schoelkopf , Keiran Paster , Marco Dos Santos
arXiv preprint arXiv:2310.10631

87
2023
Crosslingual Generalization through Multitask Finetuning

Niklas Muennighoff , Thomas Wang , Lintang Sutawika , Adam Roberts
arXiv preprint arXiv:2211.01786

409
2022
You reap what you sow: On the challenges of bias evaluation under multilingual settings

Zeerak Talat , Aurélie Névéol , Stella Biderman , Miruna Clinciu
Proceedings of BigScience Episode# 5--Workshop on Challenges & Perspectives in Creating Large Language Models 26 -41

73
2022
Bloom: A 176b-parameter open-access multilingual language model

Teven Le Scao , Angela Fan , Christopher Akiki , Ellie Pavlick

1,242
2023
BLOOM+ 1: Adding Language Support to BLOOM for Zero-Shot Prompting

Zheng-Xin Yong , Hailey Schoelkopf , Niklas Muennighoff , Alham Fikri Aji
arXiv preprint arXiv:2212.09535

36
2022