CareerCross uses cookies to enhance your experience on our websites. If you continue to view our sites without changing your browser settings, then it is assumed that we have your consent to collect and utilise your cookies. If you do not want to give us your consent, then please change the cookie settings on your browser. Please refer to our privacy policy for more information.
CareerCross uses cookies to enhance your experience on our websites. If you continue to view our sites without changing your browser settings, then it is assumed that we have your consent to collect and utilise your cookies. If you do not want to give us your consent, then please change the cookie settings on your browser. Please refer to our privacy policy for more information.
| Hiring Company | AI Startup |
| Location | Tokyo - 23 Wards, Shinjuku-ku |
| Job Type | Permanent Full-time |
| Salary | 8 million yen ~ 16 million yen |
1. Evaluation Metric R&D
Research and implement LLM-as-Judge calibration (rubric design, bias detection,
scoring rules).
Design and validate bespoke evaluation benchmarks to ensure construct validity.
Apply Reward Modeling and preference learning to improve evaluation accuracy.
2. Automated Pipeline Engineering
Build scalable automated evaluation pipelines integrated into CI/CD.
Develop agent evaluation harnesses supporting multi-turn dialogues, tool-use,
and long-context scenarios.
3. Red Teaming & Safety
Automate adversarial testing and build policy compliance verification frameworks.
| Minimum Experience Level | Over 6 years |
| Career Level | Mid Career |
| Minimum English Level | Business Level (Amount Used: English Only) |
| Minimum Japanese Level | None |
| Minimum Education Level | Post Grad Degree (PHD/MBA etc) |
| Visa Status | Permission to work in Japan required |
Minimum Qualifications
・Education: Master’s degree or higher in CS, Machine Learning, Statistics, Physics,
or related quantitative fields.
・Experience: 3+ years as an ML Engineer, Data Scientist, or Research Engineer.
・Technical: Proficiency in Python and ML frameworks (PyTorch, JAX, etc.).
・Domain Knowledge: Deep understanding of Generative AI evaluation
(benchmarking, quantitative quality measurement).
・Language: Business-level English proficiency.
Preferred Qualifications
・Publication record at top-tier conferences (NeurIPS, ICML, ACL, etc.).
・Experience with RLHF, DPO, or preference learning.
・Expertise in AI Safety, Responsible AI, and automated red teaming.
| Job Type | Permanent Full-time |
| Salary | 8 million yen ~ 16 million yen |
| Work Hours | 10:00~19:00 |
| Industry | Internet, Web Services |
| Company Type | Small/Medium Company (300 employees or less) |
| Non-Japanese Ratio | About half Japanese |