Lab 3 - Data Cleaning and Validating
Created Monday 17 November 2014 - Lab Home
At the end of this lab we should be able to clean the data ( Validity, Accuracy, Completeness, Consistency and Uniformity) and should be able to grain the ability to assess the best method/tool to clean the data.
- Clear formatting for further processing
- Importing, Paste, Paste Special
- Changing rows into columns (Paste Special - Transpose)
- Remove white space or new lines inside cells
- Find missing data or empty cells
- Deal with the Typos (filters, Data Validations)
- Column formatting - Decimals, Date, Currency, % etc
- Conditional format the columns
Sample Exercise Questions
- Clean the school marks data for analysis
- Clean the BBMP data for analysis
Example Data Sets for Lab
- BBMP Schools info
- BBMP Park Info
- Assets and Liabilities of BBMP Councillors
- DICE School Data
Tools
Libre Office Calc
- Libre Office is available for Linux, Win and Mac
- Calc is the Spreadsheet applications
Data Explorer
- Data Explorer is an online tool
Other tools
- Open Refine - is a powerful tool for working with messy data: cleaning it; transforming it from one format into another; extending it with web services; and linking it to databases like Freebase.