and if so, how do we know and how do we measure? Student evaluations are problematic instruments because they are strongly related to expected grades and even to the physical appearance of the instructor.
In a very interesting new NBER paper, Scott Carrell and James West exploit the particular characteristics of the US Air Force Academy system to try and provide an answer. At the USAF students are assigned randomly to professors in a range of core courses and are also randomly assigned to sections of required follow-up courses.
Though there is a lot of heterogeneity across subjects, they tend to find that while less experienced professors produce better grades in the initial core course, their students tend to do worse in the follow on courses.
Since the exams are common across all the sections of the core courses and grading is done by a committee of the the professors teaching the classes, this is not due to the less experienced professors inflating grades. Instead it is likely due to the less experienced professors "teaching to the test", while more experienced professors teach in a way that benefits students in later related coursework, like teaching them general methods of working in the subject field.
Here is a link to an ungated version of the paper.