Lab 3 - Data Cleaning and Validating

Created Monday 17 November 2014

At the end of this lab we should be able to clean the data ( Validity, Accuracy, Completeness, Consistency and Uniformity) and should be able to grain the ability to assess the best method/tool to clean the data.

  • Clear formatting for further processing
  • Importing, Paste, Paste Special
  • Changing rows into columns (Paste Special - Transpose)
  • Remove white space or new lines inside cells
  • Find missing data or empty cells
  • Deal with the Typos (filters, Data Validations)
  • Column formatting - Decimals, Date, Currency, % etc
  • Conditional format the columns

Sample Exercise Questions

  • Clean the school marks data for analysis
  • Clean the BBMP data for analysis

Example Data Sets for Lab


Libre Office Calc

  • Libre Office is available for Linux, Win and Mac
  • Calc is the Spreadsheet applications

Data Explorer

Other tools

  • Open Refine - is a powerful tool for working with messy data: cleaning it; transforming it from one format into another; extending it with web services; and linking it to databases like Freebase.