The structure of data
Data is everywhere and it can be stored in lots of ways. Two general categories of data are:
- Structured data: Organized in a certain format, such as rows and columns.
- Unstructured data: Not organized in any easy-to-identify way.
For example, when we rate our favorite restaurant online, we’re creating structured data. But when we use Google Earth to check out a satellite image of a restaurant location, we’re using unstructured data.
Here’s a refresher on the characteristics of structured and unstructured data:
Structured Data
Structured data is organized in a specific way, which makes it easier to store and find the information needed for businesses. When you export structured data, the format remains intact.
Unstructured Data
Unstructured data doesn’t have a clear organization. There is much more unstructured data than structured data in the world. Examples of unstructured data include videos, audios, text files, social media content, images, presentations, PDFs, survey responses, and websites.
The Fairness Issue
Because unstructured data lacks organization, it becomes challenging to search, manage, and analyze. However, recent advancements in artificial intelligence and machine learning are improving this situation. The new challenge for data scientists is to ensure that these tools are fair and unbiased. If not, some elements of the data will be given more importance or representation than others. An unfair dataset leads to skewed outcomes, low accuracy levels, and unreliable analysis, which we should avoid.