Hi, my name is James Royalty and I’m a Pando blog n00b. We recently opened our blog to contributions from all staff members at Pando and personally I’m excited. I’m looking forward to the talented people I have the opportunity to work with daily sharing their thoughts and recent projects here. Look for more soon.
In addition to being a P.blog n00b, I’m also a server-side engineer here at Pando. The server team handles all the stuff you don’t see when using Pando: trackers, storage proxies, web services etc. So, I’m happy to have the chance to write about a cool feature we’ve been working on — P4P. P4P received quite a bit of press coverage recently (as Laird mentioned in his last post) and we thought this would be a great time to give you an idea how it works. After all, P4P comes from an open working group.
In terms of peer-to-peer infrastructure, P4P integration is done at the tracker-level. This post is fairly technical and geared toward developers wondering how they might add P4P capabilities to their tracker. P4P is, at the end of the day, a fairly simple concept and integrating it is straightforward, but there is a lot to explain. I’ve decided to break up the necessary information into two posts.
- API and P4P data. How to obtain P4P data (at runtime) and what the data look like. We’ll cover this here.
- Using P4P data. How to apply P4P during peer announcement.
That’s the current plan at least. I’ll watch the comments to this post, so if you’d like for me to elaborate on something or go in a different direction, please speak up. :)
Okay, enough preamble. Let’s get to it!
Basics
Openp4p.net has some excellent background information under the Q&A and Field Tests sections. In particular, the Field Tests section has a nice picture of the an integrated P4P system under the “Information Flow” heading. If you are unfamiliar with P4P, go check those out and come back.
Hi, welcome back. All caught up? Good. Let’s move on to some terminology.
- iTracker. A special tracker with special knowledge of one or more ISP’s network topology. The iTracker provides this data to your tracker and makes peer recommendations based on the state of your swarm.
- AS ID. An autonomous system identifier. For simplicity’s sake think of an ASID as the name of a particular ISP.
- PID. A point of presence (POP) identifier. Consider this an aggregation of all networks within a certain “region”.
- Network location. An (AS ID, PID) pair.
To summarize the basics: the iTracker provides data to determine the network location of peers in a swarm. Your tracker communicates the state of a (P4P-enabled) swarm to the iTracker and gets a set of recommended peers — with respect to ISP — in return. So, you need to communicate with the iTracker… here’s how.
iTracker API and Data
Implementors of P4P interact with the iTracker using a very simple SOAP-based API. We have not made a WSDL publicly available as some semantics are likely to change. So, instead of talking in specifics here, I’ll discuss interaction with the iTracker in terms of pseudo code. Keep these structures in mind as I’ll refer to them later.
Method:
GetASIDsResponse GetASIDs()
Input:
None
Output:
GetASIDsResponse {
ARRAY asids
}
Integration with the iTracker starts here. Normally, on (your) tracker startup, you’ll invoke this method to discover the AS IDs the iTracker has data for. Even though AS IDs are often human-readable strings you should treat them as opaque values; you’ll use them as input to the next method.
Method:
GetPrefixResponse GetPrefix( asid )
Input:
An AS ID returned by GetASIDs().
Output:
GetPrefixResponse GetPrefix( ASID )
GetPrefixResponse {
ARRAY PidPrefixes {
pid
ARRAY networkPrefix
}
}
This method returns a description of the given AS in terms of network structure. The iTracker’s data model is such that ASes contain one or more PIDs and PIDs contain one or more network addresses. So, let’s consider an AS called “sampleAS” with two PIDs, “samplePID1″ and “samplePID2″ where each PID contains two /24 networks. Here’s an outline of the data returned by GetPrefix().
- AS ID = sampleAS
- PID = samplePID1
- 192.168.22.1/24
- 192.168.22.2/24
- PID = samplePID2
- 192.168.23.1/24
- 192.168.23.2/24
You’ll need to keep this data easily accessible within your tracker (i.e., in memory if possible) because as peers in a P4P-enabled swarm announce you’ll use this information to determine their P4P network location. How? By finding the longest prefix match using a peer’s IP address. (Note that network addresses are returned in CIDR block form.) This lookup should be quick so choosing an efficient data structure is important. Good choices include Tries or Patricia Tries. One structure I came across while adding P4P to Pando’s tracker is a SIGCOMM’97 paper titled Scalable High Speed IP Routing Lookups. It describes a pretty neat way to do binary search over a set of hash tables storing prefixes. It might be a little exotic for some implementations but is worth keeping in mind.
Getting back to the method at hand, you should call GetPrefix() for each AS ID (as returned by GetASIDs()) you are interested in. Store the network addresses in a data structure allowing longest matching prefix lookup. In this structure, network addresses should be keys and values should be (AS ID, PID) pairs (the network location). Here’s an example.
// Assume allPrefixes is your prefix lookup structure
forall as in GetASIDs() {
forall pref in GetPrefix( as ) {
forall netprefix in pref.networkPrefix {
allPrefixes.add( as, pref.pid, netprefix )
}
}
}
Later, when a peer announces in your P4P-enabled swarm you’ll assign it a network location, using the allPrefixes structure as follows.
if ( peer.networkLocation is empty OR peer.ipAddress changed ) {
peer.networkLocation = allPrefixes.lookup( peer.ipAddress )
}
The “if” clause is to there to save some work; you only need to make this assignment once. However, if the same peer announces with a different IP address and port then you’ll need to update that peer’s network location. (Assuming you use something other than IP:port to uniquely identify a peer within a swarm.)
The data returned by calls to GetASIDs() and GetPrefix() changes very infrequently, but they do change. Pando’s tracker calls GetPrefix() every 24 hours and GetASIDs() only on startup.
Method:
GetPeeringWeightResponse GetPeeringWeight( GetPeeringWeightRequest )
Input:
GetPeeringWeightRequest {
swarmId
ARRAY SwarmState {
asid
pid
numLeeches
numSeeds
uploadCapacity
downloadCapacity
}
}
Output:
GetPeeringWeightResponse {
swarmId
ARRAY PeeringWeight {
sourceASID
sourcePID
destinationASID
destinationPID
weight
}
}
The methods we’ve discussed up to this point involve you fetching relatively static data from the iTracker. Using GetPeeringWeight(), you get communicate what’s going on inside your tracker. This method is central to P4P and sets it apart from other network location-based peer selection methods: the iTracker makes dynamic peer recommendations based on the state of your swarm (as expressed by GetPeeringWeightRequest) and an ISP’s preferences regarding traffic flow. Exactly how the iTracker makes these recommendations is beyond the scope of this post so let’s focus on communicating the state of your swarm.
Information in GetPeeringWeightRequest is an aggregation along (AS ID, PID) pair lines. That is, for each (AS ID, PID) in a given given swarm, sum up the number of leeches and seeds along with the estimated total upload and download capacity (more on that in a second). For example, say we have a swarm XYZZY (this is an arbitrary unique identifier; a simple choice is to use the BitTorrent info hash) with peers from one AS ID (”sampleAS”) and two PIDs (”samplePID1″ and “samplePID2″). Here’s a outline of the data sent to GetPeeringWeight().
- swarmId = XYZZY
- asid=sampleAS, pid=samplePID1
- numLeeches = 2000
- numSeeds = 500
- uploadCapacity = 100.0
- downloadCapacity = 256.0
- asid=sampleAS, pid=samplePID2
- numLeeches = 200
- numSeeds = 1000
- uploadCapacity = 0.0
- downloadCapacity = 0.0
As I mentioned, the number of leeches and seeds is summed within a particular (AS ID, PID). Peers that have no network information are suppressed from the above. So, in actuality, the size of this swarm may be larger, but in the P4P sense, there are 2200 leeches and 1500 seeds (across all AS IDs).
Back to upload/download capacity: If you are unable to obtain this information (e.g., your peers don’t announce with a number of bytes transferred ) just set both to 0.0. If your peers do announce with number of bytes transferred you can compute these capacities as a function of a peer’s announce interval.
Now you are ready to make calls to GetPeeringWeight(). You don’t need to fetch the weights on every announce; computing swarm state and interpreting GetPeeringWeightResponse can take some time, especially for large swarms. So how often should you call GetPeeringWeight()? A simple approach is to invoke on fixed intervals; every five minutes, for example. While simple, calling only on fixed intervals could mean that new peers don’t benefit of P4P recommendations on their first couple of announces (assuming you have no existing peers in their (AS ID, PID)).
A more sophisticated approach is to determine the frequency of calls to GetPeeringWeight() as a function of the arrival rate of new peers. If new peers are arriving at a high rate, invoke GetPeeringWeight() every few seconds. As the arrival rate drops and your swarm stabilizes, make calls less frequently (up to a maximum interval of 5 minutes, say).
Wrap up
As this post is already pretty long I’ll save the details of GetPeeringWeightResponse for the next post. In the meantime, please leave questions and suggestions for the next post in the comments. Cheers! –James
Share This