How I scored top 1% on Kaggle — my solution to “Indoor Location & Navigation” competition

Recently I took part in my first Kaggle competition — “Indoor Location & Navigation” — organized by a Chinese company XYZ10 and Microsoft Research. The objective was to identify the position (floor and coordinates) of a smartphone in a shopping mall. The following solution scored top 1% (13th out of 1170 teams, 6th individually).

Problem statement and definitions

As mentioned earlier, the objective was to predict the floor and the coordinates of a smartphone. More precisely we were given data from many paths and a few timestamps at which the prediction had to be made. It could be assumed that the true position had to be in the corridor (although there were some exceptions).

It is important to notice that there is a strong correlation between subsequent predictions (position at timestamp t gives valuable information about position at timestamp t+1, as they should not be far away from each other). This is why recurrent models are the most suitable to tackle this problem.

Data from each path was given in one .txt file. It contained a building’s name (there were 24 of them) as well as information collected from WiFi and from smartphone’s motion sensors. Moreover, there was data collected from bluetooth beacons, but its contribution to the prediction was negligible.

There are a few important terms that I will use to describe my solution:

  • Access Point (AP) — this is a device that lets other devices connect to the WiFi network. One WiFi might comprise of a few APs that are located in different parts of the building

Below you can see what training .txt files looked like. The most important parts are: building name, floor name, TYPE_WAYPOINT reading which describes a smartphone’s location, IMU readings, RSSI values from different APs and corresponding MAC addresses.

Why not WiFi triangulation?

One idea that came to my mind was determining APs’ positions and using triangulation to locate the smartphone. Unfortunately there are lots of walls and other obstacles in the building that hinder signal’s transfer. Therefore a more robust solution was needed.

created with


The following solution combined 2 main sources of data: WiFi related and motion related. The whole pipeline can be broken down into 2 parts: floor prediction and coordinates prediction.

created with

Floor prediction

It was sufficient to use only RSSI values to predict the floor. To take advantage of the sequential nature of WiFi signal strength, RNN was trained. It achieved 99,8% accuracy. In the coordinates prediction part, the predicted floor was treated as input.

Coordinates prediction

Coordinates prediction pipeline consisted of 4 main stages.

created with

Stage 1.

In this stage two machine learning architectures were utilized — LSTM and MLP. Main difference between those 2 models was that LSTM’s input was sequential (as well as output), while MLP’s input was not. MLP’s input was just a snapshot of RSSI values from all APs in the building at the timestamp at which the position measurement was done (so it lacked context and therefore achieved slightly worse accuracy than LSTM).

Ensemble of those two models gave ~6 m accuracy. Below you can see my LSTM’s architecture.

And here you can see predicted locations from different paths.

Stage 2.

The competition organizers provided participants with code that calculated the approximate delta vector. The main idea behind this code was to utilize accelerometer data to detect steps and AHRS data to infer direction of motion.

Average delta vector’s length was 6.82 m, while the average error of the organizers’ code calculation was 2.65 m. It was not accurate enough, but it was a good starting point. I decided to build a RNN which corrected this calculation. It took as input IMU data, the code’s calculation as well as building and floor name. It reduced the Mean Absolute Error from 2.65 m to 1.69 m.

I decided to build a RNN which corrected this calculation. It took as input IMU data, the code’s calculation as well as building and floor name. It reduced the Mean Absolute Error from 2.65 m to 1.69 m.

To wrap it up, the RNN predicted how many meters along X and Y axis a smartphone moved between 2 timestamps. Moreover, another RNN was trained which predicted distance moved, but without the direction. It achieved 0.7 m MAE.

Stage 3.

And now comes the magic part. In this stage the goal was to somehow merge results from stage 1. and stage 2. The following observations helped to come up with appropriate merging method:

  • WiFi based predictions are not very accurate

In order to merge results of the previous stages, I defined the following cost function which incorporated different sources of information.

The parameters were tuned so that they prioritized the more reliable source of information (when the time difference between measurements was substantial, then WiFi based prediction had more importance, otherwise motion sensors were trusted more).

The goal was to find the minimum of the function for each path. This is a differentiable function without constraints, so the minimum can be found using Broyden–Fletcher–Goldfarb–Shanno algorithm.

Below you can see the paths that minimize the function. We can see that now the paths’ shapes look more plausible considering floor plan, than they looked after stage 1.

Stage 4.

When looking at the output from the previous stage you can see that we can do even better. Let’s take as an example the blue path. It should most probably be shifted a bit to the left as it would fit into the corridor better. But how do we make a program to automate such shifting?

For this purpose I defined another cost function. It was calculated in the following manner:

  • for each point calculate the squared distance to the corridor (I used Breadth First Search algorithm on the floor plan bitmask to find the distance)
floor bitmask

The goal is to find the local minimum of the cost function. The iterative shifting procedure was:

  1. calculate what would be the cost if the path was shifted by a vector, where the vector was from the following set:
    {1, 0.5, 0, -0.5, -1} * {1, 0.5, 0, -0.5, -1} (where “*” denotes the Cartesian product)

Below you can see the iterative shifting procedure performed on one path.

And below you can see the final predictions.


My solution was able to locate a smartphone with average ~3 m accuracy (2.55 m on Kaggle’s public leaderboard, 3.36 m on Kaggle’s private leaderboard). In my opinion it is impressive that such result was achievable considering the following facts:

  • some buildings were huge — they had even 10 floors, while the area of some floors was even over 100,000 square meters

For training I used GPUs and TPUv3–8 available on Kaggle.

If you have any questions about the solution or would like to hear more details about any stage, please let me know in the comments.

I strongly recommend taking part in Kaggle competitions. It is definitely a great way to learn from the wonderful community and boost your machine learning skills.

Computer Science student