Data Engineer: Getting to the heart of the data
Sometimes referred to as the "Big Data engineer", the data engineer is the first actor in the data processing process. His or her work comes upstream of that of the data scientist, directly after the technical infrastructure has been implemented by the architects and administrators.
As an engineer, he designs and builds, but in a very specific field: that of data. Thus, his daily work will consist of connecting to several data sources, crossing data, performing data cleansing operations, filters, managing the storage of data in different databases, managing various kinds of data formats, and potentially producing cross-referenced reports of these data. Its missions
- Develop and implement processes for data collection, organization, storage and modeling.
- Ensure access to and quality of various sources and data.
- Put into production the predictive models created by the data scientists
Its key competencies
- Master database management languages and tools such as Java, Python, R, SQL, NoSQL, Hadoop
- Understand and know how to apply data modeling techniques
- Have strong computer development skills and know how to code cleanly
An inspiring data engineer
A good example of a data engineer is Jeffrey Dean, nicknamed Jeff Dean. He worked on MapReduce, a system to process high volume data, much used by data engineers. He also worked on other systems known to the data community like Bigtable, an important service of Google Cloud, or Tensorflow, essential today as a prediction tool.