CareerCross uses cookies to enhance your experience on our websites. If you continue to view our sites without changing your browser settings, then it is assumed that we have your consent to collect and utilise your cookies. If you do not want to give us your consent, then please change the cookie settings on your browser. Please refer to our privacy policy for more information.

Login

Login

Companies log in with your username, job seekers log in with your registered email.

Keep me logged in
Forgot your password?

Or login with

You can import your profiles by logging in with these social accounts

Keep me logged in Stay logged in?

Recommended for trusted devices only

Get logged out after 1 month of inactivity

When using a public or shared device, remember to logout once finished

Best for public and shared devices

Get logged out automatically after 30 minutes of inactivity

Recommended for trusted devices only

Get logged out after 1 month of inactivity

When using a public or shared device, remember to logout once finished
Register

IMPORTANT: Please be cautious of messages from accounts claiming to be "CareerCross"

Job ID : 1573179 Date Updated : February 19th, 2026

Minatomirai Station | Inference Systems Engineer @AI startup

Hiring Company	Unsung Fields Corp.
Location	Kanagawa Prefecture, Yokohama-shi Nishi-ku
Job Type	Permanent Full-time
Salary	8 million yen ~ 14 million yen

Work Style

Casual Clothing Minimal Overtime

Job Description

Inference Systems Engineer (LLM Serving Runtime + Performance)

Role overview

As an inference & serving engineer, your objective is to build a high-performance, multi-tenant serving stack that squeezes maximum utilization out of heterogeneous hardware. This involves navigating the trade-offs between various state-of-the-art inference frameworks and engines, selecting and optimizing the right runtime for the right workload. The scope of work is not limited to Large Language Models; it extends to the frontier of Generative AI, including high-throughput Video generation and complex Multimodal systems where memory pressure and compute requirements are significantly more demanding.

Beyond just deploying models at scale, this role is responsible for building a robust system that bridges the gap between boutique, high-performance clusters and massive, multi-node deployments as the company grows. This requires a deep understanding of the "Inference Triangle"—constantly tuning the stack to find the optimal equilibrium between low-latency (TTFT/ITL), high-throughput, and inference quality (Precision/Quantization). The ideal candidate is a hands-on engineer who views the entire GPU fleet as a single, programmable compute fabric and is eager to get their hands dirty at every level of the stack.

Responsibilities

Runtime Selection & Deep Optimization: Lead the evaluation, integration, and continuous tuning of diverse inference frameworks to ensure best-in-class performance across LLM, Video, and Multimodal workloads.
Latency & Throughput Engineering: Own the end-to-end performance profile of the model lifecycle, implementing advanced strategies such as disaggregated prefill/decode, speculative decoding, and continuous batching to minimize TTFT and maximize tokens-per-second.
Scalable Systems Evolution: Design and implement serving architectures that function seamlessly on small experimental clusters while providing a clear, robust path to massive-scale, multi-node deployments.
Advanced Memory & Cache Orchestration: Implement and optimize memory management techniques to maximize KV-cache reuse and minimize redundant computations in multi-turn or high-concurrency scenarios.
Day 0 Model Support: Working with the ecosystem, craft a Day 0 model support strategy ensuring our stack provides stable, high-performance support for new models when they are released.]
Cross-Stack Integration: Collaborate with the Backend/Gateway and Compute Orchestration teams to ensure the inference engine’s telemetry, failure domains, and lifecycle management are perfectly aligned with the global load balancer and API layers.
Hands-on Technical Leadership: Maintain a high level of personal agency by writing production code, debugging complex distributed system "hangs," and contributing to architectural decisions in a flat, fast-moving team environment.
Collaborative Communication: Function as a primary technical peer to engineering leads, translating complex hardware and model constraints into clear product and infrastructure strategies.
Inference Strategy & Trade-offs: Define path forward when balancing model precision and quantization against the physical limits of HBM bandwidth and compute throughput.
(Optional) Kernel-Level Development: Dive into the lowest levels of the execution stack to develop and refine custom CUDA or Triton kernels, eliminating overhead in the execution loop and optimizing for specific hardware primitives.

[Employment Type]
Full-time employee
*Probationary period: 3 months

[Salary]
Annual Salary: ¥8,000,000 - ¥14,000,000
Monthly Salary: ¥666,667 - ¥1,166,667 (Monthly Base Salary: ¥666,667 - ¥1,166,667)
■Salary Increases: Available

[Working Hours]
9:00 AM - 6:00 PM (60-minute break)

[Work Location]
Queen's Tower A, 10th Floor, 2-3-1 Minatomirai, Nishi-ku, Yokohama, Kanagawa Prefecture, 220-6010
■Access: 7-minute walk from Sakuragicho Station (all lines), direct access from Minatomirai Station (Toyoko Line, Minatomirai Line)
■Non-smoking workplace
■Changes to work location: Company-designated offices
■Transfers/Secondments: None

[Holidays and Leave]
120 days off per year Days
Full two-day weekend
Annual paid vacation (minimum 10 days after the seventh month of employment)

[Benefits]
Partial transportation allowance (up to ¥15,000 per month)
Social insurance (health insurance, employee pension insurance, employment insurance, workers' compensation insurance)
Overtime pay: Standard overtime pay

General Requirements

Minimum Experience Level	Over 6 years
Career Level	Mid Career
Minimum English Level	Business Level
Minimum Japanese Level	None
Minimum Education Level	Bachelor's Degree
Visa Status	No permission to work in Japan required

Required Skills

You may be a fit if you have the following skills:

Inference Engine: Deep experience with the internals of modern runtimes. You are a prominent contributor to inference engine ecosystems, including but not limited to OSS projects or proprietary engines at top-tier AI labs.
Multimodal Domain Knowledge: Understanding of the specific challenges involved in serving Large Language Models alongside Video and Vision-based generative models.
Scale-First Engineering: A track record of building and managing distributed systems that have evolved from small-scale proofs-of-concept to large-scale production deployments.
Great Team Spirit: A mission-driven approach to engineering, valuing clear communication, hands-on execution, and collective success over individual silos.

Job Location

Kanagawa Prefecture, Yokohama-shi Nishi-ku
Minatomirai Line, Minatomirai Station

Work Conditions

Job Type	Permanent Full-time
Salary	8 million yen ~ 14 million yen
Work Hours	09:00 - 18:00（60-minute break）
Holidays	Two-day weekends,holidays,special leaves,120+ days off annually
Industry	Software

Job Category

Company Details

Company Type

Small/Medium Company (300 employees or less)

Some similar jobs others are looking at

Login

Or login with

Keep me logged in Stay logged in?