Like it or not, one of the biggest problems with data is that humans tend to refer to the same item/ place in different ways. Humans can interpret minor differences. So, if I say Washington or Washington DC you will associate them together. However, a computer needs them to be (EXACTLY) the same.
You will get data that needs cleaning because you are typically asked questions about issues that are not strictly controlled in the IT system.
Cleaning data in Excel is a separate course and can involve almost anything. We have just highlighted some useful clean up methods to help but if your issue is more about getting the data in the correct format, look at our Data Clean up course.
Before we show you this though. One rule for cleaning up the data.
ALWAYS WORK ON THE COPY OF THE DATA, NEVER THE ORIGINAL- YOU NEVER KNOW WHEN YOU NEED TO GO BACK TO THE ORIGINAL
You also need to decide what tools to use depending on how often you will need to create this report. If it is a once off exercise, then certain tools will be useful to quickly do it. But if you are going to get the same dirty data every month and redo all the steps then it might be worthwhile spending a bit more time and creating a template to do the cleanup for you.
In the next lessons we will cover quicker ways to:
- Change all the Y’s and N’s to Yes and No and addressing similar issues
- Create a template to automatically clean errors that you know will appear every time you download the data