本ウェブサイトでは、ユーザーにウェブサイト上のサービスを最適な状態でお届けするためCookieを使用しています。ブラウザの設定（Cookieの無効化等）をそのまま変更せずに閲覧される場合は、弊社ウェブサイト上の全ページでCookieを受信することに同意したものとみなします。詳細は、弊社プライバシーポリシーをご覧ください。

新規登録・ログインをしてスカウトメールや保存した求人を確認しよう

ログイン

ログイン

企業の方は「ユーザー名」、求職者の方は「Eメール」でログイン下さい。

ログインの維持
パスワードをお忘れですか？

以下のサービスでログイン

登録情報を簡単に取り込む事ができます

ログインの維持ログインを維持する

信頼のできるデバイスのみに最適です

1ヶ月間操作のない場合に限りログイン状態を維持します

共有のデバイスの場合は使用後のログアウトをお勧め致します

公共施設や共有のデバイスに最適です

30分間操作のない場合に自動でログアウトされます

信頼のできるデバイスのみに最適です

1ヶ月間操作のない場合に限りログイン状態を維持します

共有のデバイスの場合は使用後のログアウトをお勧め致します
新規登録

新規登録・ログインをして求人を探そう

ログイン新規登録

転職者アンケートへご協力ください

重要:「CareerCross」を語ったメールやメッセンジャーアプリからのなりすましにご注意ください

求人ID : 1560023 更新日 : 2026年04月01日

AI QA Specialist (LLM Evaluation)

採用企業	AI QA Specialist (LLM Evaluation)
勤務地	東京都 23区, 新宿区
雇用形態	正社員
給与	700万円 ~ 1400万円

ワークスタイル

服装カジュアル副業OK フレックスタイム制

募集要項

As an AI QA Specialist, you will lead the design, construction, and operation of the quality evaluation infrastructure for AI agents.

Own the entire process from evaluation metric selection and design to integrating automated evaluation pipelines into CI/CD
Plan and execute red teaming to detect safety risks before release
Quantitatively verify the effectiveness of quality improvements through A/B test analysis based on statistical experimental design
Feed evaluation signals back to the research and development teams, creating a compound-interest loop for model improvement
Ensure the quality of products used in production by ~200 companies through a "science of quality" approach

Job Description

Evaluation Infrastructure Design & Development
- Design, build, and maintain evaluation sets (synthetic data + real logs)
- Select and design evaluation metrics (win rate, task success, factuality, harm detection)
- Build automated evaluation pipelines and integrate them into CI/CD
- Design agent harnesses (multi-turn, tool use, long-context support)
Safety & Quality Verification
- Plan and execute red-teaming (adversarial testing)
- Build safety and policy compliance verification frameworks
- Design and run prompt/tool regression tests
- Analyze and improve issues related to hallucination, bias, and output quality
Statistical Analysis & Reporting
- Design and analyze statistical experiments (A/B tests, significance testing)
- Create quality reports and improvement proposals
- Visualize regression detection and quality trends
- Feed evaluation signals back to research and development teams

応募必要条件

職務経験	6年以上
キャリアレベル	中途経験者レベル
英語レベル	ビジネス会話レベル (英語使用比率: 25％程度)
日本語レベル	無し
最終学歴	専門学校卒
現在のビザ	日本での就労許可が必要です

スキル・資格

You May Be a Good Fit If You

Bachelor's degree or equivalent practical experience in Computer Science, Software Engineering, Artificial Intelligence, Machine Learning, Mathematics, Physics, or related fields
3+ years of practical experience as a software engineer or QA engineer
Knowledge of LLM / generative AI evaluation methods (prompt evaluation, quantitative output quality measurement, hallucination detection, etc.)
Foundational knowledge of statistics and experimental design
Experience building evaluation pipelines in Python
Experience integrating tests into CI/CD pipelines
Experience designing prompt / tool regression tests

Strong Candidates May Also Have

NLP / ML evaluation benchmark design experience
Knowledge of AI safety / Responsible AI
Red teaming / penetration testing experience
Experience evaluating multi-agent workflows, tool use, and long-context scenarios
Large-scale data processing experience (Spark / BigQuery, etc.)
Ability to read, comprehend, and reproduce research papers
Technical communication ability in English

勤務地

東京都 23区, 新宿区

労働条件

雇用形態	正社員
給与	700万円 ~ 1400万円
勤務時間	10:00～19:00
業種	インターネット・Webサービス

職種

会社概要

会社の種類	中小企業 (従業員300名以下)
外国人の割合	外国人半数

この求人を見ている人は次の求人も見ています

ログイン

以下のサービスでログイン

ログインの維持 ログインを維持する