CareerCross uses cookies to enhance your experience on our websites. If you continue to view our sites without changing your browser settings, then it is assumed that we have your consent to collect and utilise your cookies. If you do not want to give us your consent, then please change the cookie settings on your browser. Please refer to our privacy policy for more information.

Login

Login

Companies log in with your username, job seekers log in with your registered email.

Keep me logged in
Forgot your password?

Or login with

You can import your profiles by logging in with these social accounts

Keep me logged in Stay logged in?

Recommended for trusted devices only

Get logged out after 1 month of inactivity

When using a public or shared device, remember to logout once finished

Best for public and shared devices

Get logged out automatically after 30 minutes of inactivity

Recommended for trusted devices only

Get logged out after 1 month of inactivity

When using a public or shared device, remember to logout once finished
Register

IMPORTANT: Please be cautious of messages from accounts claiming to be "CareerCross"

Job ID : 1560023 Date Updated : April 1st, 2026

AI QA Specialist (LLM Evaluation)

Hiring Company	AI QA Specialist (LLM Evaluation)
Location	Tokyo - 23 Wards, Shinjuku-ku
Job Type	Permanent Full-time
Salary	7 million yen ~ 14 million yen

Work Style

Casual Clothing Side Business Ok Flex Time

Job Description

As an AI QA Specialist, you will lead the design, construction, and operation of the quality evaluation infrastructure for AI agents.

Own the entire process from evaluation metric selection and design to integrating automated evaluation pipelines into CI/CD
Plan and execute red teaming to detect safety risks before release
Quantitatively verify the effectiveness of quality improvements through A/B test analysis based on statistical experimental design
Feed evaluation signals back to the research and development teams, creating a compound-interest loop for model improvement
Ensure the quality of products used in production by ~200 companies through a "science of quality" approach

Job Description

Evaluation Infrastructure Design & Development
- Design, build, and maintain evaluation sets (synthetic data + real logs)
- Select and design evaluation metrics (win rate, task success, factuality, harm detection)
- Build automated evaluation pipelines and integrate them into CI/CD
- Design agent harnesses (multi-turn, tool use, long-context support)
Safety & Quality Verification
- Plan and execute red-teaming (adversarial testing)
- Build safety and policy compliance verification frameworks
- Design and run prompt/tool regression tests
- Analyze and improve issues related to hallucination, bias, and output quality
Statistical Analysis & Reporting
- Design and analyze statistical experiments (A/B tests, significance testing)
- Create quality reports and improvement proposals
- Visualize regression detection and quality trends
- Feed evaluation signals back to research and development teams

General Requirements

Minimum Experience Level	Over 6 years
Career Level	Mid Career
Minimum English Level	Business Level (Amount Used: English usage about 25%)
Minimum Japanese Level	None
Minimum Education Level	Technical/Vocational College
Visa Status	Permission to work in Japan required

Required Skills

You May Be a Good Fit If You

Bachelor's degree or equivalent practical experience in Computer Science, Software Engineering, Artificial Intelligence, Machine Learning, Mathematics, Physics, or related fields
3+ years of practical experience as a software engineer or QA engineer
Knowledge of LLM / generative AI evaluation methods (prompt evaluation, quantitative output quality measurement, hallucination detection, etc.)
Foundational knowledge of statistics and experimental design
Experience building evaluation pipelines in Python
Experience integrating tests into CI/CD pipelines
Experience designing prompt / tool regression tests

Strong Candidates May Also Have

NLP / ML evaluation benchmark design experience
Knowledge of AI safety / Responsible AI
Red teaming / penetration testing experience
Experience evaluating multi-agent workflows, tool use, and long-context scenarios
Large-scale data processing experience (Spark / BigQuery, etc.)
Ability to read, comprehend, and reproduce research papers
Technical communication ability in English

Job Location

Tokyo - 23 Wards, Shinjuku-ku

Work Conditions

Job Type	Permanent Full-time
Salary	7 million yen ~ 14 million yen
Work Hours	10:00～19:00
Industry	Internet, Web Services

Job Category

Company Details

Company Type	Small/Medium Company (300 employees or less)
Non-Japanese Ratio	About half Japanese

Some similar jobs others are looking at

Login

Or login with

Keep me logged in Stay logged in?