geolocation - Algorithm for determining most probable geo-location based on multiple bits of information -


I'm looking for an indicator to guide in the right direction in the creation of an algorithm.

The situation is simple: there are many pieces of information that can reflect the geographical location of a person. For example, a recent IP address or e-mail address or TDL address is clearly provided such as city or postal code.

These bits of information or are not present, they may have some level of accuracy (a postal code will be more accurate than a national TLD) and reliability (IP is more reliable than a postal code. Even if the postal code is more accurate). Apart from this, the information may suffer from old age.

I am trying to create an algorithm which tries to determine the most likely location based on this information. I get many ideas on this, most of which are involved in guessing and calculating points for precision and reliability, but it is very easy to cover the holes in this.

Is there a special algorithm or similar problems? Maybe algorithms, which work with data reliability / accuracy in general or actual statistical data on the reliability / accuracy of geographic information?

You want to find the most likely location L , some of the information Part I is this, you want to maximize the conditional probability

  P (L | I) - & gt; Maximum  

Because this function P (L | I) is difficult to guess, one usually applies here:

  PL | I) P = I (P) I I  Since this information has been fixed, this word is continuous and maximum interest is not of interest to the above search.  P (L)  There is unconditional probability of a certain place. At this place, there may be such a good estimate like population density. In the end, you can  P (I | L)  Requires a model, there is the possibility of receiving the  I  availability of the given space  L . This will be the product of personal possibilities for many pieces of information: 

  P (I | L) = P (I1 | L) * P (I2 | L) * ...  

This works if individual pieces are given I1 , I2 , ... to position L The case which seems to be the case, as an example, the probability of a fixed postal code and the possibility of some cell towers are usually correlated strongly, but as soon as we consider a specific location L , the post Code key Does not affect the possibility of the cell tower anymore

Those personal probabilities P (I1 | L) ... represents the reliability and accuracy of information and they Should be provided externally. You have to come here with some assumptions as a general rule, when in doubt you are pessimistic about the credibility and accuracy of better information. If you are very pessimistic then you will be closed to some extent, but if you are very optimistic then your result can be completely completely wrong. Another thing that you should keep in mind, is that the feasibility of maximization is. If the attempts to get the maximum code are too high, a very accurate model is useless for P (I1 | L) . Typically, choosing a smooth function for the model simplifies optimization at the end.


Comments