Problem with service discovery (2012-10-15) #2

Closed
opened 2020-07-18 13:44:06 +00:00 by df · 12 comments
Owner

Jens Kristian Søgaard

Hi,

Continuing from the blog post comments at:

http://irq5.wordpress.com/2011/04/10/publishing-services-over-mdns-in-c/

I have fetched the latest commits from 12th of October 2012, but I still experience the same problem.

Scenario:

I have two servers running Linux and tinysvcmdns. They call mdnsd_set_hostname() to register hostname "Server1" and "Server2" respectively with two different IP-adresses. Afterwards they call mdns_register_svc() to register a service with name "Server1 Service" and "Server2 Service" respectively, with the same protocol and port number.

My client computer is running Windows and Apple's Bonjour service.

If I don’t do any service discovery from the client PC for a little while (a few minutes) and then do a discovery, it will only find one of the two servers.

If I do a discovery again a second later, it will find both servers – and it will do that again and again without any problems… until I wait for a few minutes, then the first time I try it will only find 1 server again.

# Jens Kristian Søgaard Hi, Continuing from the blog post comments at: http://irq5.wordpress.com/2011/04/10/publishing-services-over-mdns-in-c/ I have fetched the latest commits from 12th of October 2012, but I still experience the same problem. Scenario: I have two servers running Linux and tinysvcmdns. They call mdnsd_set_hostname() to register hostname "Server1" and "Server2" respectively with two different IP-adresses. Afterwards they call mdns_register_svc() to register a service with name "Server1 Service" and "Server2 Service" respectively, with the same protocol and port number. My client computer is running Windows and Apple's Bonjour service. If I don’t do any service discovery from the client PC for a little while (a few minutes) and then do a discovery, it will only find one of the two servers. If I do a discovery again a second later, it will find both servers – and it will do that again and again without any problems… until I wait for a few minutes, then the first time I try it will only find 1 server again.
Author
Owner

geekman (2012-10-15) repo owner

Hi Jens, I just pushed another fix to address the issue. Could you try it and let me know if it works for you now?

# geekman (2012-10-15) repo owner Hi Jens, I just pushed another fix to address the issue. Could you try it and let me know if it works for you now?
Author
Owner

Jens Kristian Søgaard (2012-10-15) reporter

Thanks a lot!

I have tested for a few minutes, but everything seems to be working perfectly now!

I'll do some more testing over the rest of the day!

Sorry about that... but the issue popped up again after some more testing :-(

# Jens Kristian Søgaard (2012-10-15) reporter Thanks a lot! I have tested for a few minutes, but everything seems to be working perfectly now! I'll do some more testing over the rest of the day! Sorry about that... but the issue popped up again after some more testing :-(
Author
Owner

geekman (2012-10-16) repo owner

is the bug reproducible? i'd like to see if i can reproduce the problem on my end.

# geekman (2012-10-16) repo owner is the bug reproducible? i'd like to see if i can reproduce the problem on my end.
Author
Owner

Jens Kristian Søgaard (2012-10-16) reporter

Well, it has happened 20-30 times today for me, so reproducible for me.

I haven't tried to set up any alternative client systems, so I don't know if this is only something that happens with Windows clients or Apple Bonjour clients... but both Windows and Apple Bonjour are in use a lot of places, so the system would not be useful for me if it it did not support those.

Let me know what, if any, specifics you need from me?

# Jens Kristian Søgaard (2012-10-16) reporter Well, it has happened 20-30 times today for me, so reproducible for me. I haven't tried to set up any alternative client systems, so I don't know if this is only something that happens with Windows clients or Apple Bonjour clients... but both Windows and Apple Bonjour are in use a lot of places, so the system would not be useful for me if it it did not support those. Let me know what, if any, specifics you need from me?
Author
Owner

geekman (2012-10-16) repo owner

Firstly, how do you perform service discovery on Windows? Secondly, when does the bug occur? Does it consistently happen at a certain time, for example the 5th discovery, or after 10mins? This is what I mean by reproducible, along with the conditions under which the bug can be reproduced.

Other than the issue I have fixed, I can't think of anything that would cause tinysvcmdns to not respond. I have also verified that it is responding correctly so far, to keep the service records in Bonjour "alive". I have both Windows and Apple clients running Bonjour as well.

Unless I can reproduce the bug that you are facing, it will not be possible for me to fix it.

# geekman (2012-10-16) repo owner Firstly, how do you perform service discovery on Windows? Secondly, when does the bug occur? Does it consistently happen at a certain time, for example the 5th discovery, or after 10mins? This is what I mean by reproducible, along with the conditions under which the bug can be reproduced. Other than the issue I have fixed, I can't think of anything that would cause tinysvcmdns to not respond. I have also verified that it is responding correctly so far, to keep the service records in Bonjour "alive". I have both Windows and Apple clients running Bonjour as well. Unless I can reproduce the bug that you are facing, it will not be possible for me to fix it.
Author
Owner

Jens Kristian Søgaard (2012-10-16) reporter

Service discovery on Windows is performed using the Apple Bonjour SDK. I call the function DNSServiceBrowse in domain "local." for protocol "_myprotocol._tcp", which is the same protocol as advertised using tinysvcmdns.

The number of discoveries does not matter - I can do 100 discoveries right after each other and they will all succeed.

The bug occurs when I haven't done discoveries for a period of time, and then the first discovery after the pause will only find one of the servers. I have tried to do tests here to determine what the period of time specifically is. My tests indicate that it is somewhere between 90 seconds and 120 seconds.

I looked through the tinysvcmdns source code after I found the time to be approx. two minutes.

I stumbled over this line in mdns.c:

#define DEFAULT_TTL 120

I tried changing that to 1200 seconds instead of 120 sec.

Now I discover both servers every time even when I wait 5 minutes between discoveries.

I don't know if this helps you in any way?

# Jens Kristian Søgaard (2012-10-16) reporter Service discovery on Windows is performed using the Apple Bonjour SDK. I call the function DNSServiceBrowse in domain "local." for protocol "_myprotocol._tcp", which is the same protocol as advertised using tinysvcmdns. The number of discoveries does not matter - I can do 100 discoveries right after each other and they will all succeed. The bug occurs when I haven't done discoveries for a period of time, and then the first discovery after the pause will only find one of the servers. I have tried to do tests here to determine what the period of time specifically is. My tests indicate that it is somewhere between 90 seconds and 120 seconds. I looked through the tinysvcmdns source code after I found the time to be approx. two minutes. I stumbled over this line in mdns.c: #define DEFAULT_TTL 120 I tried changing that to 1200 seconds instead of 120 sec. Now I discover both servers every time even when I wait 5 minutes between discoveries. I don't know if this helps you in any way?
Author
Owner

geekman (2012-10-16) repo owner

Nope that TTL does not affect anything. If a client is interested in a particular service, it will keep track of the TTL and broadcast a query again as the 120s is about to expire. This is when tinysvcmdns will respond to extend the client's TTL by another 120s.

# geekman (2012-10-16) repo owner Nope that TTL does not affect anything. If a client is interested in a particular service, it will keep track of the TTL and broadcast a query again as the 120s is about to expire. This is when tinysvcmdns will respond to extend the client's TTL by another 120s.
Author
Owner

Jens Kristian Søgaard (2012-10-16) reporter

It does seem to have an effect here. Could something cause that re-query not to happen, or to fail?

I haven't learned how mDNS works in detail, so I'm a bit in the blind (for now).

# Jens Kristian Søgaard (2012-10-16) reporter It does seem to have an effect here. Could something cause that re-query not to happen, or to fail? I haven't learned how mDNS works in detail, so I'm a bit in the blind (for now).
Author
Owner

geekman (2012-10-16) repo owner

I'm guessing the effect is that you don't see the problem happening so soon. Previously there were bugs that caused tinysvcmdns to not respond, hence sometimes the services will disappear as soon as the TTL expired, but this has since been fixed.

Until you can see a pattern of when the bug is occurring, there's really nothing much I can do.

# geekman (2012-10-16) repo owner I'm guessing the effect is that you don't see the problem happening so soon. Previously there were bugs that caused tinysvcmdns to not respond, hence sometimes the services will disappear as soon as the TTL expired, but this has since been fixed. Until you can see a pattern of when the bug is occurring, there's really nothing much I can do.
Author
Owner

Jens Kristian Søgaard (2012-10-16) reporter

What kind of pattern description do you need exactly? - I'm willing to do tests, if you can give me some direction.

The pattern I see right now is that I do a discovery right now and both servers. Then I wait 120 seconds and do a discovery again - and now I find only one server.

Which one I will detect as the "lone server" is seemingly random, as it changes from test to test.

# Jens Kristian Søgaard (2012-10-16) reporter What kind of pattern description do you need exactly? - I'm willing to do tests, if you can give me some direction. The pattern I see right now is that I do a discovery right now and both servers. Then I wait 120 seconds and do a discovery again - and now I find only one server. Which one I will detect as the "lone server" is seemingly random, as it changes from test to test.
Author
Owner

geekman (2012-10-16) repo owner

Do you mind if you do a network packet capture of the mDNS packets? You can use Wireshark for that. This way I can see what tinysvcmdns is broadcasting, and how your Windows client is doing service discovery. I'd like to see what is happening when only one of the servers responded.

You need only to capture mDNS traffic. The capture filter looks like this:

mDNS Capture Filter

geekman (2012-10-16) repo owner Do you mind if you do a network packet capture of the mDNS packets? You can use Wireshark for that. This way I can see what tinysvcmdns is broadcasting, and how your Windows client is doing service discovery. I'd like to see what is happening when only one of the servers responded. You need only to capture mDNS traffic. The capture filter looks like this: ![mDNS Capture Filter](https://irq5.files.wordpress.com/2012/10/mdns-capture-filter.png)
Author
Owner

geekman (2012-10-18) repo owner

The bug, reported and reproduced, was described as follows:

If I don’t do any service discovery from the client PC for a little
while (a few minutes) and then do a discovery, it will only find
one of the two servers.

This problem occurs because when one of the servers responds to the query _http._tcp with server1._http._tcp, the other server checks the known-answer list and thinks that its own _http._tcp was already added and therefore does not respond. This causes server2 to not be found.

This was fixed in rev ea6495c by calling rr_entry_match() which specially checks the target of the PTR as well, ensuring that server2 does not incorrectly recognize someone else's record as its own.

In testing, both server1 and server2 now respond to queries that it is supposed to.

changed status to resolved

# geekman (2012-10-18) repo owner The bug, reported and reproduced, was described as follows: > If I don’t do any service discovery from the client PC for a little > while (a few minutes) and then do a discovery, it will only find > one of the two servers. This problem occurs because when one of the servers responds to the query _http._tcp with server1._http._tcp, the other server checks the known-answer list and thinks that its own _http._tcp was already added and therefore does not respond. This causes server2 to not be found. This was fixed in rev ea6495c by calling rr_entry_match() which specially checks the target of the PTR as well, ensuring that server2 does not incorrectly recognize someone else's record as its own. In testing, both server1 and server2 now respond to queries that it is supposed to. changed status to resolved
df closed this issue 2020-07-18 14:07:20 +00:00
Sign in to join this conversation.
No Label
No Milestone
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: df/zeroconf#2
No description provided.