Data Acquisition Engineer

Job description

The Role

We’re looking for a Data Acquisition Engineer to join OpenCorporates, the world’s largest open database of company info, and one of the most exciting and important data companies in the world. We’re expanding fast, and seeking someone who lives and breathes cloud, who is passionate about open source, and who can cope with near-real-time data pipelines without breaking a sweat.

OpenCorporates is revolutionising access to company data, and with it genuinely changing the world for the better. Our data is more trusted, more transparent, fresher and depended upon by everyone from investigative journalists and anti-corruption investigators to law enforcement, major banks and fintech unicorns. We are an innovative, fast growing, scale-up with aggressive goals and a public benefit mission at its core.

Main Responsibilities:

  • Managing and supporting our data pipeline: You will work with other members of the team to specify, document and manage our data feeds ensuring OpenCorporates data is the highest quality. You will solve day-to-day problems with our existing data feeds.

  • Analyse data sources. You'll know how to make sense of complex data sources and can apply this to our data sets such as company registers, understanding their issues and data.

  • Help us improve the way we work. You will be a key player in our goal to have a ruthless focus on efficiency and productivity improvements in order to maintain our competitive advantage. Working with the team to structure, systemise, automate where possible.

  • Develop our products. You will work alongside the Commercial and Technical teams to implement compelling and innovative products by understanding the use cases, the data and desired client experience, and to develop our data pipeline.

  • Write bots to source publicly available data (scraping websites, consuming data published via APIs or CSV, or extracting data from PDFs) in order to create new data feeds, and also help solve problems with our existing feeds

  • Overall you will manage our BAU data pipeline (our systems that fetch and ingest incoming data into OpenCorporates) and ensure the smooth running of our data operations.

  • You will monitor the data environments (e.g. through our dashboards and logs) to identify and escalate issues

  • You'll be diagnosing and resolving issues that occur, working as needed with other team members in our Data and Technical teams

  • You will work with colleagues to ensure that clients and users receive timely answers to questions about our data.


Relevant Technical Skills

You will either have a number of years of experience with data in data analyst or engineering roles. Above all we are looking for smart people who we think will fit in well. During the interview process you should be able to demonstrate that:

  • You know data. Our product is data – arranging it, linking it, making it accessible – so you should enjoy dealing with it, including handling complex concepts, managing big datasets and knowing how to keep it high quality.

  • You understand process. The OpenCorporates data pipeline has a variety of data flows and processes which need to work together seamlessly. You must understand and be able to deal with the challenges this raises.

  • You have a keen eye for detail. Accuracy, attention to detail and ability to spot trends is key to keeping data quality levels high.

  • You know how to work in a team. The problems we deal with require a lot of collaboration and communication.

Desirable skills / experience:

  • Python and/or Ruby

  • Query and understand structured data: SQL (SQLite/MySQL or similar), JSON, XML

  • An understanding of / experience of working with ETL processes and data pipelines (you will not actually be building them) and experience with data testing/quality assurance processes, scripting & tools

  • Root cause analysis & data remediation experience

  • Excellent verbal and written communication skills

  • Bot writing experience a plus (Web Scraping)

  • Additional experience Web Scraping, Nokogiri, TinyProxy, ftp/sftp etc.

Our Values

Our values outline the shared principles that define the OpenCorporates culture and team environment. The company values underpin everything we do, whether that's through who we chose to work with (and who we chose not to) to day-to-day decision making, teamwork, supporting our clients and evaluating individual and company performance, the core values are the lens we look through in everything we do.  All our employees are driven by our values and use them as a compass to guide their work and collaboration with colleagues and clients.

Be Bold & Beat The Odds

  • Our work is hard - and matters. We will succeed by being more ambitious, more imaginative and more daring than our competitors

We Put Users First 

  • Success will only come if we focus obsessively on the success of our users in everything we do

Learn & Adapt

  • There is no straight line to success. We will excel by taking a scientific approach to all our work

One Team

  • We win together. We fail together. And diversity – of backgrounds, of views, of personalities – is a critical asset