Abstract: Modern language models (LMs) increasingly require two critical resources: computational resources and data resources. Data selection techniques can effectively reduce the amount of training ...
Abstract: In many real-world applications, sorting is a crucial data structure. Sorting algorithms are methods for rearranging a collection of unsorted items into a desired format or order. A lot of ...
This project investigates token quality from a noisy-label perspective and propose a generic token cleaning pipeline for SFT tasks. Our method filters out uninformative tokens while preserving those ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results