Bittorrent implementation in java && need some info on swarm behaviour [duplicate]

Bittorrent implementation in java && need some info on swarm behaviour [duplicate] - java

This question already has an answer here:
Creating a torrent client in Java? [duplicate]
(1 answer)
Closed 7 years ago.
I'm developing a bitTorrent client in Java. I know there are a lot of libraries out there, online, but I can't help it; I want my own. Anyway, I noticed some weird behaviors and maybe you guys know something that I'm missing:
About 80% of all peers I'm trying to connect to result in unsuccessful connections (either socketTimeOut or "can't connect" errors). Obviously, the list of peers is received from the trackers. I also tested randomly some IPs by trying to ping them; the ping is usually successful.
When I do connect:
50% drop connection after HandShake,
on 30% I noticed a weird behaviour: I receive Handshake, I receive BitField (they have all pieces), I get bombarded with +20 Have messages (I checked the index of piece they already mentioned this in BitField), then they drop connection, which is weird.
(For all statistics, figures are not precise.)
Some BitTorrent questions:
UPDATE #4: im cutting off some questions due to considering answer found
this was the '80% failed connect rate question': What could be reason of my 80% failed to connect rate? This can't be bad luck, in the sense that every client I tried to connect had no more room for me. I'm listening on 6881, but also tested with other ports. Yesterday I had great success, a bunch of connections accepted (same code, a few changes in past week), Piece messages started flowing.. so my code is not totally useless.
Do torrent clients send, before closing, a last message to tracker with event=stopped to make it update its internal database with peer info so that it won't send, as a response, a list with useless peer info? Or just they should.. because it really seems I'm receiving dead peers.
Is the order of received peers of any importance? Maybe percentage of completion.. or really random.
Also, every now and then I receive a peer with port 0, which makes my Socket constructor throw an exception. What does port 0 mean ? Can I contact it on any port?
Can my PeerId (that I send in Handshake or announce my self to the tracker) influence if the torrent clients I' trying to communicate will continue a started connection? Meaning what if I lie and say that I'm an Azureus client by using '-AZ2060-' as my ID?
this was the 'piece availability scaring off peers question': Does my piece availability scare off peers? I'm trying to connect, and I send a empty bitfield (I have no pieces, [length: 1][Id = 5][payload: {}]); it seems that they send bitfield, I send bitfield.. (some send like crazy Have messages), they realise I'm poor, they drop me.. some drop connection after handshake. (How rude.)
Is there a benefit of not using the classic port interval: 6881 - 6889?
this was the 'list of bad peers question': Do torrent clients keep internally a list of bad peers (like a black list)? Sometimes after finding a nice peer, I continually used its info in my tests but only 1/3 connection was accepted. Sometimes 10 minutes had to pass to have a successful connection again.
UPDATE #1: it seems that connections with μTorrent clients behave in the aforementioned pattern (BITFIELD, HAVE bombardment, close connection). I tested locally with a bunch of bitTorrent clients (μTorrent, BitTorrent, Vuze, BitCommet, Deluge) and only noticed this pattern on μTorrent. On the others, communication was fine (HS, BITFIELD, UNCHOCE & happy piece sharing). Now, this μTorrent is probably the most popuar bitTorrent client (6/8 connections started were μTorrent), so… any ideas?
UPDATE #2: In terms of keeping a "bad list," it does seem so (and it actually makes sense to do so). For example, with μTorrent, I noticed the following no-connect intervals (30s, 1min, 1min30s, 2min.. ). By "no-connect" in mean, after previous connection ended, for x seconds no new connection was accepted.
UPDATE #3: That HAVE message bombardment might have been the so-called "lazy bitfield" (did a couple of tests, each piece mentioned in HAVE was not present in BITFIELD). I see that μTorrent and BitTorrent use this approach.
Another conclusion: Some clients are more restrictive in terms of respecting the BitTorrent specs and will close connection if you break a rule. Ex: I noticed with BitTorrent and BitTornado that if you send a bitfield message but have no pieces they will close connection (no pieces = empty bitfield.. but specs say "It is optional, and need not be sent if a client has no pieces"), while others close connection if you send any type of msg before they send a UNCHOKE msg (not even INTERESTED).
UPDATE #4:
Since I'm mostly interested in first question (What could be reason of my 80% failed to connect rate?.. the striked questions are more than probably liked), here are some explanations of why sometimes connections were unsuccesfully:
1) if I start a connection with peer shortly after stopping a previous connection (by stop - I mean close socket): the peer on the other side wont know until next read/write.
Details:
- I noticed this a bunch of times, this is more obvious after finishing a download.. if I close connection peer won't realise this until it tries to send a new KEEP_ALIVE (~2 minutes). But if I close while in an exchange REQUEST-PIECE, peer will realise pretty fast.. In first scenario after closing connection, I am still present in uTorrent peer tab. If I look inside the logger tab, after about 2 minutes, it will realise that I am gone.
2) it seems that uTorrent sees my BITFIELD message corrupted (& obvious should close connection after receiving it) (this doesn't happen always.. also I checked & rechecked, msg is OK & with other BT client there were no such problems).
Details:
- if I look inside uTorrent logger tab, it displays "Disconnected: Bad packet" right after I send bitfield
- I'm planning to try an implementation of lazzy bitfield, maybe I can escape this (also I see that majority of BT clients do this)
3) (more than probably linked to #1) when uTorrent doesn't allow me to re-connect, I see in logger tab: "Disconnect: already have equal connection (dropped extra connection)".. Currently i choose random local port when initing a new connection (saw this implemented in the majority of BT clients), but this doesn't trick it, he still sees that im a peer already present in his "peer list" (probably does ip match).. Buuut: in 30% of the tests, same scenario, it does allow me to reconnect :) .. I have no explanations why yet
4) one more thing: it seems that the 'listener for incomming connexions' is still alive after you close a torrent in uTorrent (by close I mean: right click + stop). This means that I can still start a connection, send HANDSHAKE.. after this, I'm disconnected (it doesn't HANDSHAKE back). Message in uTorrent logger: "Disconnect: No such torrent: 80FF40A75A3B907C0869B798781D97938CE146AE", this long string being my info hash.. seen this while testing with other BT clients too.
Some more info:
scenarios with uTorrent of type full-upload/partial-upload & full-download
are successful, those of partial-download not so much.. probably due to #2
I still get with uTorrent that bitField + have bombardment + close
connection.. as I remember the same msg in logger tab "Disconnected: Bad packet".. probably due to #2
besides uTorrent, I've tested with: BitTorrent, BitTornado, BitCommet, qBitTorrent, FlashGet (communication was OK) & with Vuze, FrostWire, Shareaza (with these guys, it was super OK).
not all clients behave the same. Ex: FlashGet & uTorrent (& BitCommet?)
don't unchoke until you send INTERESTED.. while others seem to unchoke right
after BITFIELD.. in this sense I'm planning somehow to treat clients differently (i really think this is necessary).. probably guess their name from the bitfield (there are only 2 naming conventions) & start from there.. I already have something implemented, this is how I know that I connected to client of type uTorrent..

Ok, I have an answer for you but I must warn you that I myself never wrote a bit-torrent client and some answers might not be 100% accurate, all that I wrote is from my understanding of the global view of how bit-torrent work. So I apologize if I wasted your time but I still think you might learn about the core of what you asking about from my answer.
•What could be reason of my 80% failed to connect rate?
Very complicated to explain in one linear explanation but:
- bit torrent ideology is tit-4-tat.. if you're not giving/having tit you ain't getting tat..
UNLESS you just started to download and in that case you might get a "donation" to start with...
OR the other side is a dedicated seeding machine.. in that case he might check if you are a giver or just a taker... OR many currently downloading this... OR (fill in your idea..)
So, you see there are many, and actually very smart mechanisms to make sure the swarm can be agile and efficient and while some of them can be traced to your machine most of them cannot really be even monitored by your machine least to say under its control.
•Do torrent clients send, before closing, a last message to tracker with event=stopped to make it update its internal database with peer info so that it won't send, as a response, a list with useless peer info? Or just they should.. because it really seems I'm receiving dead peers.
It depends on the client code - some might do that some not.. (keep reading)
•Is the order of received peers of any importance? Maybe percentage of completion.. or really random.
It depends on the server code - some might do that some not.. (keep reading)
Alright, Note for those two (keep reading) notes.. You should keep in mind that in a P2P network there is no authority to strictly bind clients or even servers to uphold the protocol to the letter, even if the protocol states something that should be done - it does not mean that every client will implement it or act the same upon it or upon missing it.
•Also, every now and then I receive a peer with port 0, which makes my Socket constructor throw an exception. What does port 0 mean? Can I contact it on any port?
Port 0 is kind of a wildcard, if you connect to it - it will automatically connect you to the next available port. (some say next available port above 1023 - but I never tested that)
•Can my PeerId (that I send in Handshake or announce myself to the tracker) influence if the torrent clients I' trying to communicate will continue a started connection? Meaning what if I lie and say that I'm an Azureus client by using '-AZ2060-' as my ID?
It will think you are Azureus and if other Azureuses promote connection to Azureuses according to that (and that's a big if there) you will be getting a benefit from it.
•Does my piece availability scare off peers? I'm trying to connect, and I send an empty bitfield (I have no pieces, [length: 1][Id = 5][payload: {}]); it seems that they send bitfield, I send bitfield.. (some send like crazy Have messages), they realise I'm poor, they drop me.. some drop connection after handshake. (How rude.)
possible..
•Is there a benefit of not using the classic port interval: 6881 - 6889?
I don't think so - except maybe confusing your ISP..
• Do torrent clients keep internally a list of bad peers (like a black list)? Sometimes after finding a nice peer, I continually used its info in my tests but only 1/3 connection was accepted. Sometimes 10 minutes had to pass to have a successful connection again.
Depends on client Code.
Summary
It's a jungle out there - everyone can write its own logic as long as he sending the correct protocol commands - your questions focus on the logical behaviour of clients but there is no common ground as you probably understood by now, this is also the beauty of the bit-torrent and probably the main reason for its success.

Related

How ServerSocket deal with multiple connection from clients at the same time?

Ok, so let´s clarify the questions...
I'm studing Sockets in Java, from my understood until now, related to this subject are:
To make multiple clients to connect to only one address in the server (port), then it is necessary to assign each client connection to another thread
Based on that I got confused about somethings AND could not find any acceptable answer here or at Google until now.
If Socket is synchronous, what happens if 2 clients try to connect AT THE SAME TIME and how the server decides who will connect first?
How the server process multiple messages from one client? I mean, does it process in order? Return ordered?
Same question above BUT with multiple messages from multiple clients?
If the messages are not ordered, how to achieve that? (in java)
Sorry about all those questions but for me all of them are related...
Edit:
As the comment said, I misunderstood the concept of synchronization, so changed that part.
Guys we ask here to LEARN not to get judged by other SO think about that before giving -1 vote ok.

what happens if 2 clients try to connect AT THE SAME TIME
It is impossible for 2 clients to connect at exactly the same time: networking infrastructure guarantees it. Two requests happening at the exact same time is called a collision (wikipedia), and the network handles it in some way: it can be through detection or through avoidance.
How the server process multiple messages from one client? I mean, does it process in order?
Yes. The Socket class API uses the TCP/IP protocol, which includes sequence numbers in every segment, and re-orders segments so that they are processed in the order they are sent, which may be different from the order they are received.
If you used DatagramSocket instead, that would use UDP, which does not guarantee ordering.
Same question above BUT with multiple messages from multiple clients?
There are no guarantees of the relative ordering of segments sent from multiple sources.

Socket Programming - Does Server Queue Requests?

I am working on socket programming on Java recently and something is confusing me. I have three questions about it.
First one is;
There is a ServerSocket method in Java. And this method can take up to 3 parameters such as port, backlog and ip address. Backlog means # of clients that can connect as a form of queue into a server. Now lets think about this situation.
What happens if 10 clients try to connect this server at the same
time?
Does Server drop last 5 clients which tried to connect? Lets increase the number of clients up to 1 million per hour. How can I handle all of them?
Second question is;
Can a client send messages concurrently without waiting server's response? What happens if a client sends 5 messages into server that has 5 backlog size?
The last one is not a question actually. I have a plan to manage load balancing in my mind. Lets assume we have 3 servers running on a machine.
Let the servers names are A, B and C and both of them are running smoothly. According to my plan, if I gave them a priority according to incoming messages then smallest priority means the most available server. For example;
Initial priorities -> A(0), B(0), C(0) and respond time is at the end of 5. time unit.
1.Message -> A (1), B(0), C(0)
2.Message -> A (1), B(1), C(0)
3.Message -> A (1), B(1), C(1)
4.Message -> A (2), B(1), C(1)
5.Message -> A (2), B(2), C(1)
6.Message -> A (1), B(2), C(2)
.
.
.
Is this logic good? I bet there is a far better logic. What do I do to handle more or less a few million requests in a day?
PS: All this logic is going to be implemented into Java Spring-Boot project.
Thanks

What happens if 10 clients try to connect this server at the same time?
The javadoc explains it:
The backlog argument is the requested maximum number of pending connections on the socket. Its exact semantics are implementation specific. In particular, an implementation may impose a maximum length or may choose to ignore the parameter altogther.
.
Lets increase the number of clients up to 1 million per hour. How can I handle all of them?
By accepting them fast enough to handle them all in one hour. Either the conversations are so quick that you can just handle them one after another. Or, more realistically, you will handle the various messages in several threads, or use non-blocking IO.
Can a client send messages concurrently without waiting server's response?
Yes.
What happens if a client sends 5 messages into server that has 5 backlog size?
Sending messages has nothing to do with the backlog size. The backlog is for pending connections. Messages can only be sent once you're connected.
All this logic is going to be implemented into Java Spring-Boot project.
Spring Boot is, most of the time, not used for low-level socket communication, but to expose web services. You should probably do that, and let standard solutions (a reverse proxy, software or hardware) do the load-balancing for you. Especially given that you don't seem to understand how sockets, non-blocking IO, threads, etc. work yet.

So for your first question, the backlog queue is something where the clients will be held in wait if you are busy with handling other stuff (IO with already connected client e.g.). If the list grows beyond backlog, the those news clients will get a connection refused. You should be ok with 10 clients connect at the same time. It's long discussion, but keep a thread pool, as soon you get a connected socket from accept, hand it to your thread pool and go back to wait in accept. You can't support millions of client "practically" on one single server period! You'll need to load balance.
Your second question is not clear, clients can't send messages, as long as they are on the queue, they will be taken off the queue, once you accept them & then it's not relevant how long the queue is.
And lastly your question about load balancing, I'd suggest if you are going to have to serve millions of clients, invest in some good dedicated load-balancer :), that can do round robin as well as you mentioned.
With all that said, don't reinvent the wheel :), there are some open source java servers, my favorite: https://netty.io/

Java multicast listening and IGMP

I have an issue that is driving me crazy! Both design-wise and tech-wise.
I have a need to listen to a LOT of multicast addresses. They are divided into 3 groups per item that I am monitoring/collecting. I have gone down the road of having one process spin-up 100 threads. Each thread uses 2 ports, and three addresses/groups. (2 of the groups are on same port) I am using MulticastChannel for each port, and using SELECT to monitor for data. (I have used datagram but found NIO MulticastChannel much better).
Anyway, I am seeing issues where I can subscribe to about a thousand of these threads, and data hums along nicely. Problem is, after a while I will have some of them stop receiving data. I have confirmed with the system (CentOS) that I am still subscribed to these addresses, but data just stops. I have monitors in my threads that monitor data drops and out-of-order via the RTP headers. When I detect that a thread has stopped getting data, I do a DROP/JOIN, and data then resumes.
I am thinking that a router in my path is dropping my subscription.
I am at my wits end writing code to stabilize this process.
Has anyone ever sent IGMP joins out the network to keep the data flowing? Is this possible, or even reasonable.
BTW: The computer is a HP DL380 Gen-9 with a 10G fiber connection to a 6509 switch.
Any pointers on where to look would really help.
Please do not ask for any code examples.

The joinGroup() operation already sends out IGMP requests on the network. It shouldn't be necessary to send them out yourself, and it isn't possible in pure Java anyway.
You could economize on sockets and threads. A socket can join up to about 20 groups on most operating systems, and if you're using NIO and selectors there's no need for more than one thread anyway.
I have used datagram but found NIO MulticastChannel much better).
I don't know what this means. If you're referring to DatagramSocket, you can't use it for receiving multicasts, so the sentence is pointless. If you aren't, the sentence is meaningless.

How to get a socket object without a reference variable?

I've been thinking about this all day, i dont really think if the Title is the correct one but here it goes, let me explain my situation: Im working on a project, a server made in Java for clients made in Delphi. Conections are good, multiple clients with its own threads, i/o working good. The clients send Strings to the server which i read with BufferedReader. Depending on the reserved words the server receives, it makes an action. Before the client sends the string, it inserts information to a SQL Server database so the server can go and check it after getting the order/command via socket. The server obtains the information in the database, process it, and send it to... let's call it "The Dark Side".
At the moment that the transaction is done, and the info is sent to the dark side, the server inserts the information... cough cough, dark information into a database table so the client can go and take what it requested. BUT, i need to report that to the client! ("Yo, check again the database bro, what you want is there :3").
The conection, the socket is made in other class. Not the one that i want to use to answer to the client, so if i dont have the socket, i dont have the OutputStream, which i need to talk back. That class, the one processing and sending information to the dark side, is going to be working with hundred of transactions in group.
My Issue is here: I can't report to the client that is done because i dont have the sockets references in that class. I instance the clients thread like:
new Client(socket).start();
Objects without references variables, but, i have an option i can take: Store the Sockets and their ip's in a HashMap object at the moment that a new connection is made, like this:
sockets.put(newSocket.getInetAddress().getHostAddress(), newSocket);
Then i can get the socket(so i can get the OutputStream and answer) calling an static method like this:
public static Socket getSocket(String IP) {
Socket RequestedSocket;
RequestedSocket = sockets.get(IP);
return RequestedSocket;
}
But i want you to tell me if there is a better way of doing this, better than storing all of those sockets in a list/hashmap. How can i get those objects without reference variables ? Or maybe thats a good way of doing it and im just trying to overpass the limits.
P.S.: I tried to store the Client objects in the database, serializing them, but the sockets can't be serialized.
Thanks.

This is a design issue for you. You will need to keep track of them somewhere, one solution might be to simply create a singleton class [SocketMapManager] for instance that holds the hashmap, so that you can access it statically from other classes. http://www.javaworld.com/javaworld/jw-04-2003/jw-0425-designpatterns.html

Any solution that tells you to keep a reference to the socket/ connection/ stream is bad -> as that means your connections are going to be held up while the server does its work.
You have a couple of options open
1. have the clients act as servers too. when they connect, they give the server their IP, port and some secret string as part of the hand shake. This means you have control over client code to make this happen.
the servers have a protocol to either take new jobs or check status of old jobs. Client pools the server periodically.
clients connect to database or other application (web service or plain socket like the original app) that connects to data base to get the status of the job. Meaning server gives client a job id.
a socket is open then it one OS resource open. can read up Network Programming: to maintain sockets or not?
All depends on
1. how many client connect at a time/ in 5 minutes.
2. how many seconds/ minutes does one client's request take to process
if number of clients in 5 minutes is maximum (in next 3 years) 300 at a time/ in any 5 minute duration and each request takes at a max 50 seconds to process then a dedicated server with max 50,000 sockets should suffice. Else you need async or more servers (and a DNS/ web server/ port forwarding or other method for load balance)

I'm having a bit of a problem trying to understand what is the flow of the operations, and what exactly you have at disposition. Is this sequence correct?
1. client writes to database (delphi)
2. client writes to server (delphi)
3. server writes to database (java)
4. server writes to client (java)
5. client reads database (delphi)
And the problem is pass 4?
More important: you are saying that there isn't a socket in the Client class, and that you don't have a list of Client too?
Are you able to use the reflection to search/obtain a socket reference from Client?
If you say you don't have the socket, how could it be that you can add that socket in a HashMap?
Last but not least: why do you need to store the socket? Maybe every client opens one connection which is used for multiple requests?
It could be beautiful if all the answers could be conveyed to just one ip:port...

Java Sockets - Need help understanding them better

Okay, so I've read around on the Oracal site and some questions on this site. I'm still having kind of a hard time understanding a few things about sockets so I'll see if anyone here could spend the time to explain it to my slow brain. What I'm doing is setting up a chat client and chat server (To learn swing and sockets in one swoop). Despite all the examples I've seen, I still don't quiet grasp how they work. I know how 1 socket with an input stream and 1 socket with an output stream work, but beyond that I'm having trouble understanding because that is as far as most the resources I find explain. Here is my volley of questions regarding this.
If I want to be able to handle input and output to a client at the same time what would I do? Wait for out, then if there is a change in the server switch to input stream and get the changes, then switch back to output stream? Or can I run both an input and output stream at once?
Lets say the server has to handle several clients at once. I'll have to make a socket for each client right? What would you suggest is a good way handle this?
Lets say the client wants to change the IP address or port of their current socket and connect to a different server. Would I just create a new socket, or is there some way to change the current one?
That's the main questions I have. If I can get that much understood I'm pretty sure I could figure out the rest I need on my own.
.

Here's an excellent guide to sockets. It's not "Java sockets" per se, but I think you'll find it very useful:
Beej's Guide to Network Programming
To answer your questions:
Q: If I want to be able to handle input and output to a client at the
same time what would I do?
A: You don't have to do anything special. Sockets are automatically "bi-modal": you can read (if there's any data) or write at any time.
Q: Lets say the server has to handle several clients at once. I'll
have to make a socket for each client right?
A: Actually, the system gives you the socket for each new client connection. You don't "create" one - it's given to you.
Each new connection is a new socket.
Often, your server will spawn a new thread to service each new client connection.
Q: Lets say the client wants to change the IP address or port of their
current socket and connect to a different server. Would I just create
a new socket, or is there some way to change the current one?
A: The client would terminate the existing connection and open a new connection.

I'll try to do my best here, but I really don't think this is the place for that kind of questions:
First of all, you need to understand that sockets are an abstraction of the underlying operating system sockets (unix socket, win socks, etc).
These kinds of sockets are to model connection-oriented services of the transport layer (look at the OSI model). So this means that sockets offer you a stream of bytes from the client and a stream of bytes to the client, so to answer your first question, these streams are independent. Of course it is your responsibility for the design of the protocol you speak over these streams.
To answer your second question you need to know how TCP connections work, basically your server is listening over one or more network interfaces in one port (ports are the TCP addressing mechanism) and can handle a configurable backlog of incoming simultaneous connections. So the answer is, it is common that for any incoming connection a new Thread on the server gets created or obtained from a Thread pool.
To answer your third question, connections are made between hosts, so if you need to change any of them, there will be the need of creating a new connection.
Hope this helps.
Cheers

1.- If I want to be able to handle input and output to a client at the same time what would I do? Wait for out, then if there is a change in
the server switch to input stream and get the changes, then switch
back to output stream? Or can I run both an input and output stream at
once?
It depends on your protocol, if your client start the connection, then your server waits for an input before going to the output stream and sends something. Every connection, being a tcp connection or even working with files have an input stream and an output stream.
2.- Lets say the server has to handle several clients at once. I'll have to make a socket for each client right? What would you suggest is
a good way handle this?
There are different strategies for this that include multithreading so for now focus on streams.Or keep it with one server one client.
3.- Lets say the client wants to change the IP address or port of their current socket and connect to a different server. Would I just
create a new socket, or is there some way to change the current one?
Yes, the definition of a socket is a connection made by an IP address through a specific port if any of those change you need a new socket.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.