MLOps Engineer

Apr 24

Company Overview

TECQMIND is an end-to-end AI professional services and staffing partner focused on the "how." We build the teams and the infrastructure required to turn AI potential into production-scale reality. From building proprietary "Model Gardens" to ensuring performance scaling, we move beyond the pilot phase to provide the technical foundation for businesses to thrive in an AI-first world.

The Role

As an MLOps Engineer at TECQMIND, you are the architect and guardian of our production environments. While our AI/ML Engineers focus on the intelligence of the models, you focus on their reliability, scalability, and security. Your mission is to build the automated "factory" that allows custom AI models to run seamlessly at scale.

You will act as a critical technical bridge, synchronizing deployment standards and infrastructure management with our specialized technology partners in Taiwan.

Key Responsibilities

ML Pipeline Automation: Design, build, and maintain robust CI/CD and CT (Continuous Training) pipelines to automate the deployment of machine learning models.
Infrastructure as Code (IaC): Use tools like Terraform or Pulumi to manage and scale cloud infrastructure (AWS, GCP, or Azure) specifically optimized for heavy AI workloads and GPU orchestration.
Model Observability & Monitoring: Implement advanced monitoring, logging, and alerting frameworks to track model performance, latency, and data drift, ensuring Continuous Optimization.
Containerization & Orchestration: Manage production-grade AI clusters using Docker and Kubernetes(specifically leveraging KServe, Ray, or Kubeflow).
Security & Compliance: Ensure all AI infrastructure meets enterprise-grade security standards (SOC2, HIPAA) and that proprietary client IP remains protected within our "Model Gardens."
Global Technical Synchronization: Collaborate daily with our Taiwan-based engineering teams to standardize environments, manage cross-region deployments, and troubleshoot infrastructure bottlenecks.

Required Qualifications

Experience: 3–5 years of professional experience in MLOps, DevOps, or Site Reliability Engineering (SRE) with a focus on machine learning systems.
Core Technical Stack: Advanced proficiency in Python and Bash scripting. Expert-level knowledge of Docker and Kubernetes.
Cloud Mastery: Proven experience managing production environments in at least one major cloud provider (AWS SageMaker, GCP Vertex AI, or Azure Machine Learning).
Tooling: Experience with MLOps platforms such as MLflow, DVC, or BentoML, and monitoring tools like Prometheus and Grafana.
Automation: Experience with CI/CD tools (GitHub Actions, GitLab CI, or Jenkins).

Highly Desired: Preferred Qualifications

Bilingual Proficiency: Professional fluency in both English and Mandarin (written and verbal) is highly desired to facilitate deep technical collaboration with our Taiwan-based partners.
Scale Experience: Experience managing large-scale vector databases (Pinecone, Weaviate) or high-throughput real-time inference systems.
Cultural Competency: Experience working within global, distributed teams and managing technical workflows across significantly different time zones.

Katie Hsieh