Firstly, sorry to bring out this old question again, but recently I just received this inquiry from my friend who is also on his way to digital streaming though, as we were discuss whether a network switch would help or not like everyone else, he sorta struck me with this, " Based on some researches, I found that most hi-end or high quality digital audio devices would reclock and dejitter signals on the arrival anyway, so what's the possible benefit of doing that in a network switch prior to arrival? "
Based on the question, I'm assuming there is a network delivering audio data to a streamer, then going to a DAC. The streamer may be built in to the DAC.
There are at least two different protocols involved here: the network protocol and the digital audio protocol. Because any network communication needs to be translated to digital audio somehow, you can basically consider the two halves of the transmission independently, if you assume the "translation block" (i.e. where the network data is received, interpreted, and then sent out as digital audio data) in the middle is ideal, or you could consider the translation block separately.
Since the network transmission will be using a reliable communications protocol, jitter (or anything else) that actually results in an error of data will cause that data packet to be rejected and the data to be resent. If this is enough of a problem that the somewhat-real-time conversion to the digital audio data is delayed, you will notice a gap/pause in the audio playback. You may experience this when watching streaming video when you see everything pause and it has to rebuffer or buffer more to catch up. (This is not the same thing as frame drops.)
Digital audio signals, like S/PDIF, have been making use of some method to deal with jitter for many decades. The issue has been that depending on the method used, while the data will be captured and interpreted correctly (absent extremely bad jitter), it may carry digital audio signal jitter through to the clock that is used to drive the digital-to-analog conversion process. In other words, what happened is jitter on the incoming S/PDIF signal would result in jitter on the outgoing analog audio signal. (How much jitter you need to be audible is a subject of research papers.)
But, a few decades ago (instead of many decades ago), people started doing things differently in order to isolate the incoming jitter from the DAC stage. Buffering and re-clocking would be used. Extreme jitter would still be a problem, but anything below a certain threshold would not negated/rejected. I'd hazard a guess that almost anything designed in the last 10 years is going to do something like this where jitter on the incoming digital audio should not be a concern anymore. For instance, pretty much everything measured by Stereophile shows high immunity to jitter these days.
Also, since the data transmission rate of the network is going to be significantly higher than that of the digital audio stream, and converting packetized data into bitstream data, the translation block is going to be a gigantic buffer and re-clock stage. There should be no reason for jitter on the TCP stack to in any way influence jitter on the S/PDIF stream unless the translation block is too underpowered (extremely unlikely considering the minimum requirements).
If the network switch is actually impacting the sound, it is not going to be due to jitter or any sort of dejittering. It has to be something else, like Tom (W9TR) says.