It took me a while to figure out what you're doing, there but I got it eventually. The code is a reasonable enough method, though using BC for an 8-bit count is a bit wasteful. You can get the same by replacing ld bc with ld b, pop bc with pop b, push bc with push b, and the dec bc;ld a,b;or c;jp nz, with djnz.
Also, not sure why you start with 16383, since the first byte of the screen is 16384. By conincidence, the addressing means you do actually get the same effect, but you might not want to rely on that thinking
I can't help feeling you're not taking advantage of the screen layout to do this easier. Your code is essentially doing 32 8-pixel blocks horizontally, and for each block fill one pixel line.
This is good if you're working with actual characters (and your ld (hl), source is something other than a constant), but if you're just filling the top 8 pixel lines of the screen (and you really don't want to cheat by filling in with attributes, which is easier and quicker) then I'm sure there's an easier way.
This sets up a loop of 8 passes, each time adding the pass count (in b) to hl and de. Note that it adds to the upper byte, thus b=1 actually adds 256. Then it does the previous trick of copying n+1 = n using ldir and a count of 32 (note that ldir uses bc here, even though the value is less than 256).
Lastly, it copies b to a so we can use cp 8. The effect of cp is to perform a subtraction but not store the result. Thus when b = 8, 8 - 8 = 0, so the zero flag is set. Until b = 8, the loop continues.