Senior Big Data Developer, Tampa, FL 33637

Job title: Senior Lead BigData Developer
Work Location: 7701 E Telecom pkwy, Temple Terrace – 33637 , FL , USA.
Duration: 6 Months
Minimum years of experience: 5 + years
JOB TITLE: -
We are looking for senior lead developers who are excited about building distributed data pipelines. We want you to help us shape our internal and external brand-new data warehouses, leveraging the latest advances of big data processing: a combination of Kafka Streams, Hadoop, Spark, Hive, Oozie and traditional RDBMS.
5+ years total experience in development on Bigdata, Hive & Hadoop, Spark, Scala, Python/Pyspark, AWS, and other cloud related technologies.
JOB DUTIES: -
Be a part of our Big Data team and of an overall Data Organization spanning multiple offices.
Participate in architecture discussions and bring your experience in scalable data pipelines (batch and streaming) using Kafka/Spark/PySpark/Oozie and/or other Big Data tools.
Take ownership of design and implementation of scalable and fault tolerant projects.
Maintain and incrementally improve existing solutions.
Get to build brand new pipelines with the technology stack including Spark, Spark Structured Streaming, Kafka, Hadoop, MySql, Python, Java/Scala
MUST HAVE SKILLSET:-
Independent/lead developer who can work with minimal supervision.
Solid understanding of distributed system fundamentals.
Solid understanding of hadoop security and familiar with kerberos/keytabs etc and hands on experience with working with Spark/Hive/Oozie/Kafka etc on a kerberized cluster.
Experience in developing, troubleshooting, diagnosing, and performance tuning of distributed batch & real-time data pipelines using Spark/PySpark at scale.
Develop scalable and reliable data solutions to move data across systems from multiple sources in real time (Nifi, Kafka) as well as batch modes (Sqoop)
Demonstrated professional experience working with various components of Big Data ecosystem: Spark/Spark Streaming, Hive, Kafka/KSQL, Hadoop (or similar NoSQL ecosystem) and orchestrate these pipelines using oozie, et. al, in a production system.
Construct data staging layers and fast real-time systems to feed BI applications and machine learning algorithms.
Strong software engineering skills with Python or Scala/Java.
Knowledge of some flavor of SQL (MySQL, Oracle, Hive, Impala), including the fundamentals of data modeling and performance.
Skills in real-time streaming applications.
Develop scalable and reliable data solutions to move data across systems from multiple sources in real time (Nifi, Kafka) as well as batch modes (Sqoop).
Experienced in Data Engineering with good understanding of Datawarehouse, Data Lake, Data Modelling, Parsing, Data wrangling, Cleansing & Transformation, and sanitizing.
Agile work experience, build CI/CD pipelines using Jenkins, GIT, Artifactory, Anisble etc.
Hands-on Development experience with Scala, Python using Spark 2.0, Spark Internals and Spark jobs performance improvement.
Good understanding of Yarn, Spark UI, Spark resource management and Hadoop resource management and efficient Hadoop storage mechanisms.
Good understanding & experience with Performance tuning in Cloud environment for complex S/W projects mainly around large scale and low latency.
AWS knowledge is essential with good working experience in AWS Technologies – EMR, S3, Cluster management, AWS Airflow automation, Snowflake Knowledge is plus.
AWS development certification/Spark certifications is an advantage.
Expert in data analysis in Python (Numpy, Scipy, Scikit-learn, Pandas, etc.)
Strong UNIX Shell scripting experience to support data warehousing solutions.
Process oriented, focused on standardization, streamlining, and implementation of best practices delivery.
Excellent problem solving and analytical skill, excellent verbal and written communication skills.
Proven teamwork in multi-site/multi-geography organizations.
Ability to multi-task and function efficiently in a fast-paced environment.
Strong background in Scala or Java and experience with streaming technologies such as Flink, Kafka, Kinesis, and Firehose ,experience with EMR, Spark, Parquet, and Airflow.
Excellent interpersonal skills, ability to handle ambiguity and learn quickly.
Exposure to data architecture & governance is helpful.
A degree in Computer Science or a related technical field; or equivalent work experience
Employment Type: CONTRACTOR