Digital - Content - DataLake Engineer Digital - Content - DataLake Engineer …

in Paris, Ile-de-France, France
Permanent, Full time
Be the first to apply
in Paris, Ile-de-France, France
Permanent, Full time
Be the first to apply
Digital - Content - DataLake Engineer
Role/Department Description:
Content Engineering department is responsible for all data collection within FactSet as well as making this available to our clients and other departments. The department is looking into fostering cloud infrastructure and technologies to revamp the processes and leverage the Datalake concept as a unique storage for all acquired/extracted documents to source all collection systems whatever their core content. The team aims at providing APIs and services to feed the datalake, to browse its contents in a structured way but also to cross check and data mine any valuable information that would be tagged already of interest for another content. This will ultimately allow for better material and servicing for cognitive computing, data analysis and data collection processes to filter, select and extract other metadata.

  • Join the team implementing the data platform architecture providing access to large datasets, data ingestion pipelines and data infrastructure
  • Explore and evaluate new data technologies to build a scalable, cloud oriented data platform
  • Make large and/or complex data more accessible, understandable and usable by implementing advanced APIs for storage and querying.
  • Create unified enterprise data models for analytics, mining and reporting
  • Interface with various teams to support their needs (cognitive computing, content, other engineering teams)
  • Collaborate with engineering, cloud infrastructure and security teams to understand the requirements and develop highly scalable system
  • As part of Agile development team contribute to architecture, tools and development process improvements with projection in other teams for the duration of projects

Required Skills:
  • Bachelor or Master Degree in Computer Science, Math, or Engineering
  • 1 to 3+ years of working experience in software development
  • 1 to 3+ years of relevant experience in Data Science or Machine Learning
  • Strong experience and proficiency with Python, Pandas, Numpy and AWS APIs
  • Organized, self-directed, and resourceful with the ability to appropriately prioritize work in a fast-paced environment
  • Able to work in a team of data scientists as well as in projection in other teams for the duration of the projects

Desirable Requirements:
If possible, but not essential, some experience in one or more of those areas:
  • Experience with AWS environment
  • Experience with modern data platforms such as Spark or other map/reduce big data systems and services
  • Experience with a variety of data stores (nosql, graphdbs)