MENU
  • Research Projects
  • Members
  • Publications
  • Access
  • 日本語
Katsurai Laboratory / Doshisha University
  • Research Projects
  • Members
  • Publications
  • Access
  • 日本語
Katsurai Laboratory / Doshisha University
  • Research Projects
  • Members
  • Publications
  • Access
  • 日本語

Numeric Information Extraction from Dataset Descriptions (2025)

2025 12/16
  1. Home
  2. Research Projects (en)
  3. digital library
  4. Numeric Information Extraction from Dataset Descriptions (2025)

NIED: A Corpus for Numeric Information Extraction from Dataset Descriptions

Abstract: Although a large number of machine learning datasets have been proposed, their descriptions are often insufficiently structured, making it difficult to search datasets using quantitative criteria. To address this issue, we construct NIED, an annotated corpus consisting of 3,926 dataset descriptions collected from academic papers and data repositories, designed for the task of extracting numerical information from dataset descriptions. In addition, we propose a two-stage labeling scheme that distinguishes numerical entities from contextual non-numerical information.

Authors: Moriyuki Kamoto, Akihiro Tamura, Marie Katsurai

Publication venue: JCDL 2025

Corpus

  • GitHub

Reference

Moriyuki Kamoto, Akihiro Tamura, and Marie Katsurai, “NIED: A Corpus for Numeric Information Extraction from Dataset Descriptions,” JCDL 2025, to appear.

digital library