Data Architect (with OpenShift)
Rivago Infotech Inc - Charlotte, NC
Apply NowJob Description
Project: Data ModernizationMigration of an on-premises SQL data warehouse to a modern enterprise Data Lake platform, enabling analytics and GenAI use cases. The platform leverages PySpark-based processing, CI/CD pipelines, and containerized deployments on OpenShift (OCP), with GCP as a preferred cloud platform, to deliver scalable, secure, and high-performance data solutionsAbout Program/ProjectThe IAM Data Modernization program focuses on transforming legacy data platforms into a scalable and cloud-compatible architecture.Key Highlights:Integration Scope: 30+ source systems with multiple downstream integrations Capabilities: Metrics, reporting, advanced analytics, and GenAI use cases (NL querying, summarisation, cross-domain insights)Benefits:Scalable and resilient data platformHigh-performance semantic and analytics layerSingle source of truth for enterprise-wide reporting and analyticsRole SummaryWe are looking for a Data Architect with strong expertise in OpenShift (OCP), PySpark, and CI/CD pipelines to design and govern scalable data platforms.The role requires defining end-to-end data architecture, containerised deployment patterns, orchestration strategies (Airflow/Autosys), and platform standards, along with hands-on involvement in implementation.Key ResponsibilitiesData Architecture & Platform DesignDefine enterprise data architecture for IAM data lake and analytics platformDesign scalable, modular, and containerised data pipeline architectures on OCPEstablish data models, schema governance, and data lifecycle strategiesDefine best practices for data partitioning, performance optimisation, and cost efficiencyOpenShift (OCP) & Platform EngineeringArchitect and govern containerised data workloads on OpenShift (OCP)Define standards for deployment, scaling, and workload isolationCollaborate with DevOps teams for platform engineering and infrastructure alignmentBig Data & Processing (PySpark Focus)Define architecture for PySpark-based batch and near real-time processing pipelinesProvide guidance on distributed processing design, optimisation, and performance tuningEstablish reusable frameworks for ETL/ELT processingData Ingestion & OrchestrationArchitect data ingestion frameworks (batch, streaming, CDC)Define orchestration strategies using Airflow / AutosysImplement standards for retry, backfills, dependency management, and error handlingDevOps / CI-CDDefine and oversee CI/CD strategy for data and platform deploymentsEnable automation of build, test, and deployment processesEnsure integration of CI/CD pipelines with OCP-based environmentsCloud & Data Platforms (Preferred)Provide architecture guidance for GCP-based data platforms (preferred, not mandatory)Define integration patterns for cloud-native and on-premise hybrid environmentsGuide teams on cloud migration strategies and modern data platform adoptionData Governance, Quality & ObservabilityDefine frameworks for:Data quality, validation, and lineageMetadata management and cataloguingEstablish monitoring, logging, alerting, and SLOs for platform reliabilityEnsure compliance with data security and audit requirementsStakeholder CollaborationWork closely with client architects, IAM teams, and business stakeholdersTranslate business requirements into scalable technical architectureProvide architectural guidance and mentorship to engineering teamsRequired SkillsCore Skills (Must Have)Strong experience in:OpenShift (OCP) / Kubernetes-based platformsPySpark / Spark ecosystemCI/CD implementation for data platformsAirflow / Autosys orchestration toolsSolid understanding of:Data lake architectures (layered models)ETL/ELT design patternsDistributed data processing conceptsData Engineering & StorageExpertise in:Data formats: Parquet, ORC, AvroPartitioning and performance tuningLarge-scale data modelling for analyticsCloud (Preferred - Not Mandatory)Experience with Google Cloud Platform (GCP) (preferred)Exposure to services like BigQuery, Dataproc, Dataflow, GCS is a plusObservability & ReliabilityExperience defining:Monitoring, logging, alerting frameworksDashboards, SLOs, and operational runbooksGood to HaveExperience with IAM domain / cybersecurity dataUnderstanding of data security and access control frameworksExposure to GenAI-enabled data platformsExperience in Agile delivery and team leadershipQualificationsExperience:10-14+ years in Data Architecture / Data EngineeringStrong experience in OCP, PySpark, CI/CD, and orchestration frameworksPrior experience in data modernisation / migration programsEducation:Bachelor's/Master's in Computer Science, Information Systems, or equivalentCertifications (Preferred):OpenShift / Kubernetes certificationsGCP certifications (preferred, not mandatory)
Created: 2026-05-13