About the job Data Engineer
RESPONSIBILITY
* Design, develop and deploy data tables, views and marts in data warehouses, operational data store, data lake and data virtualization.
* Perform data extraction, cleaning, transformation, and flow. Web scraping may be also a part of the work scope in data extraction.
* Design, build, launch and maintain efficient and reliable large-scale batch and real-time data pipelines with data processing frameworks.
* Integrate and collate data silos in a manner which is both scalable and compliant.
* Collaborate with Project Manager, Data Architect, Business Analysts, Frontend Developers, Designers and Data Analyst to build scalable data-driven products.
* Be responsible for developing backend APls & working on databases to support the applications.
* Work in an Agile Environment that practices Continuous Integration and
Delivery.
* Work closely with fellow developers through pair programming and code review process.
* Proficient in general data cleaning and transformation (e.g. SQL, pandas, R, etc) to ensure data accuracy and consistency.
* Proficient in building ETL pipeline (eg. SQL Server Integration Services (SSIS), AWS Database Migration Services (DMS), Python, AWS Lambda,
ECS Container task, Eventbridge, AWS Glue, Spring).
* Proficient in database design and various databases (e.g. SQL, PostgreSQL, AWS S3, Athena, mongodb, postgres/gis, mysql, sqlite, voltdb, cassandra, etc).
* Experience in cloud technologies such as GPC, GCC (i.e. AWS, Azure,
Google Cloud).
* Experience and passion for data engineering in a big data environment using
Cloud platforms such as GPC, GCC (i.e. AWS, Azure, Google Cloud).
* Experience with building production-grade data pipelines, ETL/ELT data integration.
* Knowledge about system design, data structure and algorithms.
* Familiar with data modelling, data access, and data storage infrastructure like Data Mart, Data Lake, Data Virtualisation and Data Warehouse for efficient storage and retrieval.
* Familiar with rest api and web requests/protocols in general.
* Familiar with big data frameworks and tools (eg. Hadoop, Spark, Katka,RabbitMQ).
* Familiar with W3C Document Object Model and customized web scraping (e.g. BeautifulSoup, CasperJs, PhantomJS, Selenium, Nodejs, etc).
* Familiar with data governance policies, access control and security best practices.
* Comfortable in at least one scripting language (eg. SQL, Python).
* Comfortable in both windows and linux development environments.
* Interest in being the bridge between engineering and analytics.