本ウェブサイトでは、ユーザーにウェブサイト上のサービスを最適な状態でお届けするためCookieを使用しています。ブラウザの設定（Cookieの無効化等）をそのまま変更せずに閲覧される場合は、弊社ウェブサイト上の全ページでCookieを受信することに同意したものとみなします。詳細は、弊社プライバシーポリシーをご覧ください。

新規登録・ログインをしてスカウトメールや保存した求人を確認しよう

ログイン

ログイン

企業の方は「ユーザー名」、求職者の方は「Eメール」でログイン下さい。

ログインの維持
パスワードをお忘れですか？

以下のサービスでログイン

登録情報を簡単に取り込む事ができます

ログインの維持ログインを維持する

信頼のできるデバイスのみに最適です

1ヶ月間操作のない場合に限りログイン状態を維持します

共有のデバイスの場合は使用後のログアウトをお勧め致します

公共施設や共有のデバイスに最適です

30分間操作のない場合に自動でログアウトされます

信頼のできるデバイスのみに最適です

1ヶ月間操作のない場合に限りログイン状態を維持します

共有のデバイスの場合は使用後のログアウトをお勧め致します
新規登録

新規登録・ログインをして求人を探そう

ログイン新規登録

転職者アンケートへご協力ください

重要:「CareerCross」を語ったメールやメッセンジャーアプリからのなりすましにご注意ください

求人ID : 1573179 更新日 : 2026年01月22日

Minatomirai Station | Inference Systems Engineer @AI startup

採用企業	株式会社Unsung Fields
勤務地	神奈川県, 横浜市西区
雇用形態	正社員
給与	800万円 ~ 1400万円

募集要項

Inference Systems Engineer (LLM Serving Runtime + Performance)

Role overview

As an Inference Systems Engineer, you will own the serving runtime that powers production LLM inference. This is a deeply technical role focused on system performance and stability: optimizing request lifecycle behavior, streaming correctness, batching/scheduling strategy, cache and memory behavior, and runtime execution efficiency. You will ship changes that improve TTFT, p95/p99 latency, throughput, and cost efficiency—while preserving correctness and reliability under multi-tenant load.

You will collaborate closely with platform/infrastructure operations, networking, and API/control-plane teams to ensure the serving system behaves predictably in production and can be debugged quickly when incidents occur. This role is for engineers who can reason about the entire inference pipeline, validate improvements with rigorous measurement, and operate with production-grade discipline.

Responsibilities

Own the end-to-end serving runtime behavior: request lifecycle, streaming semantics, cancellation, retries interaction, timeouts, and consistent failure modes.
Design and implement batching and scheduling strategy: dynamic batching, admission control, fairness under mixed tenants, priority lanes, and backpressure mechanisms to prevent cascading failures.
Optimize performance at the systems level: reduce time-to-first-token, improve tail latency stability, increase tokens/sec throughput, and improve accelerator utilization under realistic workloads.
Improve memory behavior and cache efficiency: KV-cache policies, fragmentation control, eviction strategies, and safeguards against OOM cliffs and performance thrash.
Drive runtime execution optimizations: operator-level improvements, quantization integration, compilation/tuning paths where appropriate, and parameterization that produces stable performance across deployments.
Establish a performance measurement discipline: reproducible benchmarks, realistic traffic traces, profiling workflows, regression detection gates, and dashboards tied to production outcomes.
Build production readiness into the system: feature-flagged rollouts, canarying, safe configuration changes, and incident playbooks that reduce MTTR.
Partner with networking and infrastructure operations to align deployment topology, failure domains, and capacity constraints to performance and reliability goals.
Collaborate with product and API teams to ensure the serving layer’s guarantees are reflected accurately in external interfaces and customer expectations.

[Employment Type]
Full-time employee
*Probationary period: 3 months

[Salary]
Annual Salary: ¥8,000,000 - ¥14,000,000
Monthly Salary: ¥666,667 - ¥1,166,667 (Monthly Base Salary: ¥666,667 - ¥1,166,667)
■Salary Increases: Available

[Working Hours]
9:00 AM - 6:00 PM (60-minute break)

[Work Location]
Queen's Tower A, 10th Floor, 2-3-1 Minatomirai, Nishi-ku, Yokohama, Kanagawa Prefecture, 220-6010
■Access: 7-minute walk from Sakuragicho Station (all lines), direct access from Minatomirai Station (Toyoko Line, Minatomirai Line)
■Non-smoking workplace
■Changes to work location: Company-designated offices
■Transfers/Secondments: None

[Holidays and Leave]

120 days off per year Days
Full two-day weekend
Annual paid vacation (minimum 10 days after the seventh month of employment)

[Benefits]
Partial transportation allowance (up to ¥15,000 per month)
Social insurance (health insurance, employee pension insurance, employment insurance, workers' compensation insurance)
Overtime pay: Standard overtime pay

応募必要条件

職務経験	3年以上
キャリアレベル	中途経験者レベル
英語レベル	ビジネス会話レベル
日本語レベル	無し
最終学歴	大学卒：学士号
現在のビザ	日本での就労許可は必要ありません

スキル・資格

Requirements

5+ years building high-performance systems (model serving, GPU systems, performance engineering, or low-latency distributed systems).
Strong understanding of LLM inference tradeoffs: batching vs latency, prefill vs decode dynamics, cache behavior, memory pressure, and tail latency causes.
Comfort working across Python/C++ stacks with production profiling and debugging tools.
Track record of shipping performance improvements that hold up under production variance and operational constraints.
Strong engineering hygiene: tests, instrumentation, documentation, and careful rollout discipline.
Ability to communicate clearly across teams and operate calmly during incidents.

勤務地

神奈川県, 横浜市西区
みなとみらい線、　みなとみらい駅

労働条件

雇用形態	正社員
給与	800万円 ~ 1400万円
勤務時間	09:00 - 18:00（60-minute break）
休日・休暇	Two-day weekends,holidays,special leaves,120+ days off annually
業種	ソフトウエア

職種

ICTスペシャリスト（IT・Web・通信系）求人 > システムアーキテクト求人

会社概要

会社の種類

中小企業 (従業員300名以下)

この求人を見ている人は次の求人も見ています

ログイン

以下のサービスでログイン

ログインの維持 ログインを維持する