It is often stated that 80% of time in an analysis project is devoted to data cleaning. This is certainly a challenge for libraries as the data comes from so many different sources. It is the getting, cleaning, and transforming phase, which feeds into the visualization and modeling phase; yet, we often gloss over this part of our work. This presentation will break down the challenges of collecting and creating collections datasets and merging them together into interactive visualizations. This included gathering and cleaning data, fuzzy merging messy text strings, reshaping data from wide to long format, and making decisions on handling duplicate and missing values. The wrangling of data together into an interactive visualization with data filters adds immense value by enlarging the context of decision-making.
This presentation will discuss case studies demonstrating ways that data expertise has elevated our work with collections and in the creation and dissemination of scholarship. Discussing the challenges of data wrangling will make assessment feasible for librarians wanting to review their collections and projects. It will also serve as another call to data providers to provide clean, standardized, and interoperable data.