General description
===================

As  described in the paper, each dataset is divided to:
1. Training data that contains the first 40 scenarios
2. Test data that contains the last 20 scenarios

Moreover, we dealt with 4 optimization models in the paper.
1. Time-dependent and stochastic (DD-TD-TSP-STW)
2. Time-dependent (TD-TSP-STW)
3. Stochastic (DD-TSP-STW)
4. Deterministic and time-independent (TSP-STW) 

The following document explains how to generate our problem instances for each optimization model. Note that the test set is only used for the evaluation of the performance of all models related to stochastic and time-dependent reality. 

In the paper, we present instances of 4 sizes: 12, 24, 30 and 36. 
In addition, there are three types of problem input that constitute the problem:
1. Time windows
2. Service times
3. Travel times

The time windows and the service times data for the instances with size <custmer_number> are stored in a folder named "Instances" + <custmer_number>. E.g, the Instances12 folder stores the time windows and service times of instances of size 12. 

Each "Instances" +<custmer_number> folder contains two sub-folders. 

The first folder is named "ServiceTimes". Each ServiceTimes folder contains 20 csv files, one for each instance. Each file contains <custmer_number> rows and 60 columns. The item in the i'th row and j'th column represents the service time of customer i in scenario j. 
Thus, the service times for the training dataset are the service times in the first 40 columns and the service times for the test dataset are detailed in the last 20 columns.

The service times for the DD-TD-TSP-STW model and for the DD-TSP-STW model are simply the service times as given in the first 40 columns. The service times for the TD-TSP-STW model and the TSP-STW model are calculated as the average of the first 40 scenarios rounded to the nearest integer.       

The second folder is named "TimeWindows". Each TimeWindows folder contains 20 csv files, one for each instance. Each file contains <custmer_number> rows and 2 columns. The item in the i'th row and first column represents the opening of customer i'th time window (a[i]), while the item in the same row in the second column represents the end of customer i's time window (b[i]). Recall that the time windows are the same for the train and test datasets as well as for all optimization models.

Next, we explain how to generate travel times data.
Due to the relatively large size of the final travel times data in our experiments, this kit contains the raw data that we collected using Google Maps and a python function that uses the raw data and generates the travel times data used in the experiments, for each optimization model.  

The file "TravelTimesAllDays1Minute.pickle" stores the data gathered in the 60 working days between 19 representative locations ( we call these times scenario time-dependent travel times). This data is stored as a list of 4 dimensions. The first dimension is the working days (60 days), the second and third dimension are the origin and destination locations (19) and the fourth is the time period (we gathered 6 representative time periods). 
In addition, the file "MatrixBig.csv" contains 21 matrixes, where each matrix is of size 60 * 60. These matrixes represent the typical travel time between the location every 30 minutes starting at 8:00. In the paper, we refer to these times as nominal time-dependent travel times. 

As we have explained in the paper, each problem instance of each type contains a subset of the 60 locations. The file "Combinations"  <custmer_number>.csv" contains the locations used for the instances of size <custmer_number>. For example, Combinations12 contains 20 rows, a row for each instance. Each row is of size <custmer_number>+1. The first item in all rows is always the depot, which is fixed in our experiments. The next <custmer_number> items were chosen randomly (from the other 59 locations) to create instances with one depot and <custmer_number> other locations.  

The "CreateTravelTimeMatrix.py" file contains a python function called "CreateMatrixesAndScenarios"  that receives a subset of locations, an optimization model (the options are: 'DD-TD-TSP-STW' , 'TD-TSP-STW', 'DD-TSP-STW', 'TSP-STW') and dataset  type (the options are: 'Training' , 'Test') and returns the full travel time data used in our experiments for this subset, optimization model and dataset type. We note again that travel times for the test dataset type should be produced only with relation to the 'DD-TD-TSP-STW' model since these times are only used for the evaluation of the performance of other models with relation to a stochastic and time-dependent reality. 
 
This function reads the raw data stored in TravelTimesAllDays1Minute.pickle and MatrixBig.csv files and performs the required processing to generate the travel time data. Short documentation on the function is given below.
The function recieves three parameters:
Combination - That is a list with a subset of locations for which the data is required    
OptType - optimization model. One of the following: 'DD-TD-TSP-STW' , 'TD-TSP-STW', 'DD-TSP-STW', 'TSP-STW'
DataSetType - One of the following: 'Training' , 'Test'
It returns one of four types of data structures. Each type is related to the required optimization model.   In the time-dependent models (DD-TD-TSP-STW, TD-TSP-STW) the data contain 540 travel time matrixes, one for each minute starting at 8:00. Note that we assume that the travel times after the 540'th period are the same as the travel times in the 540'th period. 

For the 'DD-TD-TSP-STW' type, a 4-dimensional list is returned. The first dimension represents the scenarios (of length 40 for training set and 20 for the test set). The second and third dimensions represent the origin and destination locations (of length equal to the number of locations) and the fourth dimension represents the time (540 periods). This data structure stores the travel time in all relevant (training/test) scenarios, between all locations in the given combination, in all 540 periods. 

For the 'TD-TSP-STW' type, a 3-dimensional list is returned. The first and second dimensions represent the origin and destination locations (of length equal to the number of locations) and the third dimension represents the time (540 periods). This data structure stores the travel time between all locations in the given combination, in all 540 periods.  

For the 'DD-TSP-STW' type, a 3-dimensional list is returned. The first dimension represents the scenarios (of length 40 for the training set and 20 for the test set). The second and third dimensions represent the origin and destination locations (of length equal to the number of locations). This data structure stores the travel time in all relevant (training/test) scenarios, between all locations in the given combination.

For the 'TSP-STW' type, a 2-dimensional list is returned. The first and second dimensions represent the origin and destionaiton locations (of length equal to the number of locations)  

If the optimization type or the dataset type are invalid, a string indicating the error is returned. Similarly, if the list of combinations is empty an error string is returned. No other validity checks on the content of the combination is done. Recall the locations are indexed in the range 0 to 59. 

Further information on the process carried out by the function can be found at the end of this file.   

Example
=======
If one wishes to generate the training scenario-based time dependent travel time data for the 4'th instance of size 24, he has to:
1. go to the file "Combinations24.csv". The relevant subset of locations of the 4'th instance is in the 4'th row, that is 0,13,35,11,37,10,18,32,49,52,8,33,41,50,38,47,45,57,17,25,23,53,58,28,29
2. Call the function "CreateMatrixesAndScenarios" in the file "CreateTravelTimeMatrix.py" with this subset of locations, optimization type 'DD-TD-TSP-STW' and data set type 'Training' , that is:

Data = CreateMatrixesAndScenarios ([0,13,35,11,37,10,18,32,49,52,8,33,41,50,38,47,45,57,17,25,23,53,58,28,29] ,'DD-TD-TSP-STW' ,'Training' )


CreateMatrixesAndScenarios - the process
=========================================
First, the function reads the raw data from TravelTimesAllDays1Minute.pickle and MatrixBig.csv and stores them in appropriate data structures in the memory. Next, The function generates a data structure that keeps time-dependent and stochastic travel times for all the 60 locations and the 6 representative times in the day based on spatial similarity as detailed in the paper. Next, the travel times of all other 534 periods in the day are estimated based on interpolation. Finally, the travel times between the required locations are adjusted to the problem size and corrected to repair some minor violation of FIFO property. If the desired optimization type is not the 'DD-TD-TSP-STW,' further manipulations are performed to average the data according to the requirements of the model.  
  