Data Engineer II
UNIVERSITY OF TEXAS AT AUSTIN - Austin, TX
Apply NowJob Description
Maintains and optimizes data pipeline architecture by designing, building, and managing ETL processes that extract, transform, and load data from diverse sources. Assembles large, complex data sets to meet both functional and non-functional requirements, and develops scalable architectures for structured and unstructured tegrates and consolidates data from multiple systemssuch as disparate databases and electronic health recordsinto unified repositories like data warehouses or data lakes. Develops and enhances the underlying data infrastructure using SQL and cloud technologies to ensure scalability and reliability.Creates and supports analytics tools that empower analysts and data scientists to access and analyze data efficiently. Builds custom queries, scripts, and dashboards that enable insight generation and data product optimization. Collaborates with analytics experts to organize, query, and visualize data for reporting and research.Identifies and implements process improvements to enhance data operations. Automates manual workflows, optimize data delivery pipelines, and redesign system architecture to support scalability and performance. Continuously evaluates workflows and technologies to recommend improvements that accommodate growing data complexity.Ensures data governance and security by validating data for accuracy and consistency, and maintaining secure, compliant data environments. Follows best practices and regulatory standards (e.g., HIPAA) to protect sensitive information and uphold data integrity.Collaborates with stakeholders across departmentsincluding executives, product managers, researchers, and designersto address data infrastructure needs and resolve technical issues. Translates non-technical requirements into effective data solutions and advises on best practices for data architecture.Manages and executes data projects from planning through deployment. Applies light project management techniques to coordinate tasks, communicates with team members, and ensures timely delivery. Exercises independent judgment to overcome obstacles and align project outcomes with organizational goals.MARGINAL OR PERIODIC FUNCTIONS:Adheres to internal controls and reporting structure.Performs related duties as required.KNOWLEDGE/SKILLS/ABILITIESSystems Knowledge: Broad understanding of system-level concepts in computing. This includes knowledge of programming and scripting, operating systems, database query languages (SQL) and data mining techniques, as well as familiarity with IT infrastructure (servers, networking, cloud services). Such knowledge enables the Data Engineer II to troubleshoot and optimize across the technology stack.Big Data Processing: Proficiency with big data frameworks such as Apache Spark for distributed data processing and large-scale computations. Experience optimizing Spark jobs for performance is often required.Workflow Orchestration: Experience with workflow orchestration tools like Apache Airflow (or similar platforms) to schedule and manage complex data pipelines. Ability to design reliable job workflows and handle dependencies between tasks.Programming and Databases: Strong programming skills in Python (especially using PySpark) and solid knowledge of SQL for querying and manipulating data. Familiarity with working in both relational databases (SQL) and NoSQL databases, with the ability to design and optimize database schemas and queries for each.Version Control: Experience using Git or other version control systems for managing codebases and collaborating on data projects. Follows best practices in code versioning and documentation to maintain a clear history of changes.Cloud Data Pipelines: Hands-on experience building data pipelines on cloud or modern data platforms. This could include using services in Microsoft Fabric (e.g., Azure Data Factory within Fabric) or similar ETL tools to move and transform data at scale. Knowledge of cloud ecosystems and services for data processing (suc as AWS Glue or Azure Synapse pipelines) is beneficial.Data Warehousing: Familiarity with cloud-based data warehousing and analytics services such as Google BigQuery, Microsoft Fabric (Synapse Analytics), or AWS Redshift for storing and querying large datasets. Ability to optimize data models and SQL queries on these platforms to ensure fast performance and cost-efficiency.Domain Expertise: (If applicable) Experience working with healthcare or clinical data is highly valuable. For example, familiarity with electronic health record (EHR) systems and clinical registries, experience using tools like REDCap for data capture, or involvement in healthcare analytics projects. Ability to create quality/outcome reports and develop data visualizations for non-technical stakeholders is a plus.Technical LearningQuickly grasps technical concepts and applies them effectively.Learns new tools and platforms independently.Applies new techniques to improve data pipelines.Shares technical knowledge with peers.Problem SolvingUses logic and data to solve complex problems effectively.Diagnoses root causes of data issues.Designs scalable solutions.Anticipates and mitigates risks.Action OrientedTakes initiative and acts with urgencyProactively addresses data quality issuesSuggests improvements without being promptedDelivers results under tight deadlinesCollaborationWorks effectively with others to achieve shared municates clearly with non-technical stakeholders.Participates in cross-functional teams.Resolves conflicts constructively.Planning and OrganizingPrioritizes tasks and manages time effectively.Breaks down complex projects into manageable steps.Tracks progress and adjusts plans as needed.Meets deadlines consistently.
Created: 2026-04-04