Logo

Data Science Consultancy

Main navigation

  • Home
  • Blog
  • Contact
  • RSS

Getting started

Publish date: 2020-06-22
Last updated: 2020-08-25
Tags:
  • general
image from Getting started

Table of contents

  1. Introduction
  2. Data sources
  3. Use cases
  4. Conclusion

I’ll be using this blog to share some general ideas, how-tos and projects. To stay tuned, feel free to subscribe to the RSS feed.

My first blog post will be on ideation of a data science project.

Introduction

Let’s assume you are a business owner. You have heard good things about utilizing data for automation and decision making, but you’re not sure how this could apply to your business. What you need are project ideas. Ideally, you’ll assemble a group of people representing business, IT/ data engineering and data science and set up a workshop to brainstorm.

  • business: identifies business problems and defines metrics for success
  • data scientists: propose solutions
  • IT/ data engineer: makes sure the solutions are feasible in production

To give you a leg up before you actually get to the workshop, check out this excellent checklist and then let’s review some types of data that many businesses have at their disposal [1] and some generic use cases.

Data sources

When you want to start a data driven project, it makes sense to first look at what data you have. Probably you have accumulated several legacy systems over the years - e.g. at some point you started using Customer Relationship Software. The design decisions taken back then still are relevant. They determine what customer data you gather and what structure that data has. The same goes for the other legacy systems - they were built for a specific business purpose, not for data management. This makes sense and it’s fine, just something to keep in mind going forward.

What a simple data model looks like. License: Wuser6 / CC BY-SA Source

In the following, you can find an overview of data sources and for every source a more detailed explanation.

Examples are fabricated and for illustrative purposes only.

Master data is also known as fact data, or reference data. To get a bit abstract here, every sales transaction is just a contract between two legal entities, having a specific subject matter. It is common to store the abstract sales transaction (see Transactional data) separately from the detailed description of the legal entities involved (your customer and you) and the subject matter (product). So common, in fact, that it has it’s own name: master data.

  • Product
namedescriptioncategoryprice
Phone64GB 3GB RAMConsumer Electronics563.99€
  • Customer
namepostal codeemail
Allison Hill01352al.hill@gmail.com
  • Employee
namejob titlejob description
Ashley James3D ModelerMultimedia Artists and Animators
  • Accounting
    E.g. taxes

For machine data there are two major groups: logs and measurements. Logs are the most reliable data you can find, and I think they’re extremely under-represented in popular data-driven news. Sensor measurements are not really machine data, but recorded by a machine. Measurement technology makes use of physical properties and can be anything from a photo tube (converts incoming photons to electric current) to a piezoelectric sensor (crystal that converts e.g. pressure change to electric current).

  • Logs

    • Database
      E.g. timestamp, query, execution time

    • Website
      E.g. timestamp, query, status, size, referrer

    • Production
      E.g. timestamp, part identifier, production line stage, quality control passed

  • Measurements

    • Sensors, e.g. light
      E.g. timestamp, value, unit, sensor ID

  • Sales
    E.g. customer identifier, product identifier, fulfillment status

  • Email
    E.g. timestamp, from, to, subject, X-Folder, Message-Text

  • Calls
    E.g. timestamp, number from, number to, duration

  • Social media
    E.g. timestamp, network, text

  • Tracking inventory
    E.g. name, description, category, price

  • Fulfilling orders

  • Supporting customers
    E.g. service tickets

  • Domain wiki
    E.g. title, text, category

  • Strategy
    E.g. verifiable forecast, associated management actions

  • Constraints
    E.g. require email address to contain “@”, password to contain special characters, sensor measurement to be in sensor value range

  • Quality
    E.g. value: e.g. 1-good, 2-mediocre, 3-bad, description, category

E.g. active, creation time, last update time

Use cases

Recommendation

Recommendation can: increase revenue by suggesting products to customers they wouldn’t have thought to look for. Or, decrease loss of revenue due to customers who are looking for something you actually offer but can’t find it. Or even be a search engine that increases employee productivity by recommending relevant company wiki articles.

Recommendation is more than an opportunity for a business to increase sales - it can be a real convenience for customers (or employees), even if it’s as simple as “you’re buying a screw, maybe you need a wall plug”.

The key is other peoples’ transactions, be it customer purchases or employee’s access log to wiki articles. You’re leveraging the effort people put in to e.g. finding new products and can then suggest those products to other customers by mining patterns in your sales.

Classification

In the business of reselling, it’s crucial to gather enough structured data on the items being sold, so that buyers can find what they’re looking for. This can mean manually entering extremely detailed data, a pain point for sellers. Enter image classification: It can help suggest e.g. a car’s make and can automatically pull all available associated data, e.g. engine capacity. The seller only has to review this data instead of entering everything by hand. [1]

Similarly, you might want to utilize communication data (email, social media post) to derive metrics you care about - e.g. percentage of negative interactions, recurring topics, etc.

Forecasting

Forecasting is supposed to improve decision-taking. Maybe you want to manage your supply chain to reduce storage cost - it makes sense to forecast shop sales. The same goes for energy producers forecasting energy consumption to better match production at any given moment. One more example: mining your past project data to get a better estimate of resources to allocate to future projects can be a good idea.

Regression

An example of regression is price optimization. Based on input data you want to predict a number - e.g. house price based on location, number of rooms and area. Or you want to increase your airplane ticket sales based on demand.

Conclusion

Improving business process - either solving pain points or making use of a growth possibility - is a constant challenge and defining projects achieving such improvements is hard. One resource at your disposal is the data available from business processes. I hope this short overview of different types of such data, including examples of how they can be leveraged, will prove helpful.


© 2020 Datalign UG · Except where noted, content is licensed CC-BY · Privacy · Legal