There are really two separate questions here.
1. Why is the second and following passes faster than the first?
2. Is the first pass as fast as it can be?
Caching in -B is the answer to the first one. 30000 is not a large -B for a production system, but might be reasonable for a development box. Use promon during the second pass to see what percentage reads you are satisfying from the cache and that will tell you whether making it bigger will make any difference. Or, just make it bigger and see what happens.
As for the second, you haven't really given us any information except for the contrast with the second pass, which really doesn't mean much since there is a good chance you aren't reading the disk much. If you are sure that you are bracketing the index properly (check with COMPILE LISTING), then you are reading only what you need and that is the speed you can hope for. 200,000 isn't 20,000,000, but it isn't 200 either.