An Efficient, Generalized Bellman Update For Cooperative Inverse Reinforcement Learning

{"PUBLIC_ROOT":"","POST_CHAR_LIMIT":50000,"CONFIRM_MINUTES":15,"UPLOAD_LIMIT_MB":60,"UPLOAD_LIMIT_MB_PDF":35,"UPLOAD_SEC_LIMIT":35,"CHAT_LENGTH":500,"POST_BUFFER_MS":60000,"COMMENT_BUFFER_MS":30000,"POST_LIMITS":{"TITLE":200,"DESCRIPTION":2200,"CONTENT":500000,"ATTRIBUTION":350,"COMMENT_CONTENT":10000},"VOTE_TYPES":{"single_up":1},"UPLOAD_BUFFER_S":20,"UPLOAD_LIMIT_GENERIC_MB":1,"HOLD_UNLOGGED_SUBMIT_DAYS":1,"KARMA_SCALAR":0.01,"VOTE_CODES":{"rm_upvote":"removed upvote","rm_down":"removed downvote","add_upvote":"added upvote","add_down":"added downvote"},"BADGE_TYPES":{"voting":{"ranks":[1,5,10,15,20],"name":"Voter"},"strengths":{"ranks":[1,5,10,15,20,30,40,50],"name":"Upvoter"},"vulns":{"ranks":[1,5,10,15,20,30,40,50],"name":"Critic"},"received_vote":{"ranks":[1,2,3,5,8,13,21,34],"name":"Popular"}}}

attributed to: Dhruv Malik, Malayandi Palaniappan, Jaime F. Fisac, Dylan Hadfield-Menell, Stuart Russell, Anca D. Dragan

posted by: KabirKumar

Our goal is for AI systems to correctly identify and act according to their human user's objectives. Cooperative Inverse Reinforcement Learning (CIRL) formalizes this value alignment problem as a two-player game between a human and robot, in which only the human knows the parameters of the reward function: the robot needs to learn them as the interaction unfolds. Previous work showed that CIRL can be solved as a POMDP, but with an action space size exponential in the size of the reward parameter space.

open PDF in new tab

Vulnerabilities & Strengths

Add a Vulnerability/Strength