Job boardArrow
Software engineer, data acquisition (paris/london)

Software engineer, data acquisition (paris/london)

CDI
Python
Senior
Remote France
International

Description

About Mistral

  • At Mistral AI, we are a tight-knit, nimble team dedicated to bringing our cutting-edge AI technology to the world
  • Our mission is to make AI ubiquitous and open
  • We are creative, low-ego, team-spirited, and have been passionate about AI for years
  • We hire people that foster in competitive environments, because they find them more fun to work in
  • We hire passionate women and men from all over the world
  • Our teams are distributed between France, UK and USA


Role Summary

  • We are seeking a skilled and motivated Web Crawling and Data Indexing Engineer to join our dynamic engineering team
  • The ideal candidate will have a strong background in web scraping, data extraction and indexing, with a focus on leveraging advanced tools and technologies to gather and process large-scale data from various web sources
  • The role is based in Paris or London


Key Responsibilities

  • Develop and maintain web crawlers using Python libraries such as Beautiful Soup to extract data from target websites
  • Utilize headless browsing techniques, such as Chrome DevTools, to automate and optimize data collection processes
  • Collaborate with cross-functional teams to identify, scrape, and integrate data from APIs to support business objectives
  • Create and implement efficient parsing patterns using regular expressions, XPaths, and CSS selectors to ensure accurate data extraction
  • Design and manage distributed job queues using technologies such as Redis, Kubernetes, and Postgres to handle large-scale data processing tasks
  • Develop strategies to monitor and ensure data quality, accuracy, and integrity throughout the crawling and indexing process
  • Continuously improve and optimize existing web crawling infrastructure to maximize efficiency and adapt to new challenges


Qualifications & Profile

  • Bachelor’s or master’s degree in computer science, information systems, or information technology
  • Strong understanding of web technologies, data structures, and algorithms
  • They should have knowledge of database management systems and data warehousing
  • Programming Languages: Proficiency in programming languages such as Python, Java, or C++ is essential
  • Masterings of Web Technologies: Understanding of HTML, CSS, and JavaScript is crucial to navigate and scrape data from websites
  • Knowledge of HTTP and HTTPS protocols
  • A good understanding of data structures (like queues, stacks, and hash maps) and algorithms is necessary
  • Knowledge of databases (SQL or NoSQL) is important to store and manage the crawled data
  • Understanding distributed systems and technologies like Hadoop or Spark Experience using web Scraping Libraries and Frameworks like Scrapy, BeautifulSoup, Selenium, or MechanicalSoup
  • Understanding how search engines work and how to optimize web crawling
  • Experience in Machine Learning to improve the efficiency and accuracy of web crawling
  • Familiar with tools such as Pandas, NumPy, and Matplotlib to analyze and visualize data


Benefits

  • Daily lunch vouchers
  • Contribution to a Gympass subscription
  • Monthly contribution to a mobility pass
  • Full health insurance for you and your family
  • Generous parental leave policy

Entreprise

Mistral ai

Localisation

Paris, France

Publié le

11/11/2024

Taille

11-50

Postuler
Arrow

toutes les offres de

Mistral ai

Mistral ai
ne propose aucune autre offre