OneDrive SharePoint Migration Tool (SPMT)

I have a Office 365 Business Premium account. I wanted to sync many large folders with OneDrive for Business. However, it was taking very long to upload. While I was searching for improvements, I saw many other people have observed that OneDrive is much slower than Dropbox, Box, AWS, Google Files etc. Apparently this is a ‘known issue’ and Microsoft is addressing it.

One of the things they have done is provide a standalone desktop application called ‘SharePoint Migration Tool (SPMT)’. While the name doesn’t specifically include OneDrive, the tool is intended for use with Sharepoint and OneDrive. However, there are a few pieces of information I learned to use it for OneDrive. That is the purpose of this post.

First, the user who will be using SPMT must have permissions for the OneDrive for Business account you want to upload files to.

Then you need to download SPMT from here https://www.microsoft.com/en-us/download/details.aspx?id=53598

The SPMT tool can be used in any of several migration ‘scenarios’, but the one I am focusing on here is the ‘File share to OneDrive and SharePoint’ scenario. I wanted to use SPMT as a potentially faster way to upload files from my desktop computer to OneDrive for Business.

The guide for this scenario is here https://docs.microsoft.com/en-us/sharepointmigration/fileshare-to-odsp-migration-guide.

After you download, install and start SPMT the start screen will ask you to sign into to your Office 365 account. This will be user that has permission on OneDrive.

After successfully signing in you will see 3 scenarios to choose from:

The two that I was interested in testing were:

  • File Share
  • CSV file for bulk migration

Note / spoiler alert
While SPMT provides much more flexibility in terms of identifying source and target, and capability to batch migration jobs, it was not dramatically faster than just using OneDrive desktop sync tool. It took me a couple of hours to upload only 1000 files total 3 GB in size.

The Microsoft documentation refers to migrations with multiple TB’s of files but my experience suggests that would be impossible if the files were coming from outside of Microsoft cloud. Therefore, it is clear that for large migrations, the SPMT is intended primarily for use within Microsoft cloud.

File Share
The File share option is pretty straightforward. Use SPMT to select the local folder to upload, then manually enter the OneDrive location url to upload them to. However, there are some things to pay attention to here:

  • When you select a local (aka source) folder only it’s contents will be uploaded, not the entire folder with selected local folder name.
  • The string you enter for the OneDrive (aka target) location url is not 100% intuitive or completely described in the instructions. See below.

OneDrive (aka target) location url
The exact specifications for this are not given in a way that is intuitively helpful. There are subtle differences in the url between SharePoint and OneDrive. The linked documentation does indicate these but I had to search for additional clarification which I pass on below.

If my Office 365 account domain is ‘mycompany’ (eg mycompany.onmicrosoft.com) and the user is ‘myuser’ (eg myuser@mycompany.onmicrosoft.com) then myuser’s OneDrive target location url:

https://mycompany-my.sharepoint.com/personal/myuser_mycompany_onmicrosoft_com/

You can also simply copy this from your browser address bar when you are viewing myuser’s OneDrive folder online.

CSV file for bulk migration
This option uses similar information as above but you put the information for source and target into a CSV (or JSON) file format.

CSV file format:

  • Has 6 columns
    • Column A: Mandatory. Source folder path regular Windows path. Use double quotes if path has spaces.
    • Column B: Optional. For Sharepoint migration only. Not for file migration like we are talking about here.
    • Column C: Optional. Subfolder of source path.
    • Column D: Mandatory. OneDrive (aka target) location url. Note do not include anything other than what is above. Column D and E are used to specify folders.
    • Column D: Mandatory. Standard value “Documents”
    • Column F: Optional. Subfolders that will be created if they are not already present under your OneDrive root url. Enter as “Folder/subfolder/”. Note the use of fore slash and double quotes if any spaces.
  • The file has one row per ‘job’. For example, in File Share method above you select a single folder to migrate. That would be one row in CSV (or JSON) file.
  • Use SPMT to upload the CSV (or JSON) file and it processes all of the migration ‘jobs’.
  • However, it processes them in parallel which seems counterproductive. I would rather it process them in sequence, one row at a time.
  • This bulk migration method would be useful for processing many folders.
  • I did not try JSON method but would expect it would requir same information just in specified JSON foramt.

In both scenarios above when the migration is happening you will see screen similar to below.

Note that SPMT ‘scans’ the files to be uploaded first in any migration scenario to see if they meet specifications. There are strict file name prohibitions but it did rename ‘bad’ files which is nice.

Once it indicates files are being uploaded you should see corresponding folders and files in OneDrive online.

At any time during migration you can click the “View Status” link on the form to open Windows folder where SPMT keeps job status including fails etc in csv files. These are updated like log files so you can monitor status. This was helpful to see what failed and why.

I experienced a few quirky behaviour such as pausing or stopping a migration freezing SPMT app and had to close it using Task Manager. Also one very weird result of closing a job was that a local file was corrupted and I could not move or delete it with error “windows 10 cannot delete file the tag present in the reparse point buffer”. I had to run chkdsk to fix the file.

Overall SPMT does the job advertised. As mentioned it does not dramatically speed uploading files from local computer into OneDrive but it does offer more control and flexibility over OneDrive desktop sync for large uploads with many folders and files.

Use OneDrive API to upload files to Office 365 Sharepoint Site

I have automated uploading files from my web site host’s server to my Office 365 Sharepoint site using scheduled cron jobs running Python scripts on my web host.

The Python scripts use Microsoft’s Azure Active Directory Library (ADAL) to authenticate off Azure Active Directory (Azure AD or ADD), and  OneDrive API and Python Requests to use the authentication to upload the files to Sharepoint from my web host.

Here is the code

import adal
import urllib
import requests

## set variables
username = 'user@contoso.onmicrosoft.com'
password = 'password'
authorization_url = 'https://login.windows.net/contoso.onmicrosoft.com' # aka Authority
redirect_uri = 'https://login.microsoftonline.com/login.srf' # from Azure AD application
client_id = 'd84cbf4f-dc23-24d1-8a7d-08ff8359879a' # from Azure AD application
file_url = 'https://contoso.sharepoint.com/_api/v2.0/drive/root:/myfoldername/myfilename.csv:/content'

## use ADAL to create token response
token_response = adal.acquire_token_with_username_password(
        authorization_url,
        username,
        password
    )

## Use ADAL to create refresh token and save as text file to reuse
refresh_token = token_response['refreshToken']
refresh_token_file = open('refresh_token.txt', 'w')
refresh_token_file.write(refresh_token)
refresh_token_file.close()

## Get saved refresh token and use it to get new token response
refresh_token = open('refresh_token.txt', 'r').read()
token_response = adal.acquire_token_with_refresh_token(authorization_url, str(refresh_token))

## get access_token from token response JSON string
access_token = token_response.get('accessToken')

## create http header to send access token to authenticate
headers = {'Authorization':'BEARER ' + str(access_token)}

## example to upload file
upload_file = requests.put(file_url, data = open('myfilename.csv', 'rb'), headers=headers)

 

There are many things to consider when working with Microsoft’s APIs to work with its online services such as Office 365.

The first is how to authenticate. Microsoft is trying to move everyone to use Azure AD to do oAuth authentication. Microsoft services still have their own authentication methods but this exercise I used Azure AD.

The second is what API to use. Microsoft has recently released their Graph API that is ‘one endpoint to rule them all’. However Microsoft services still have their own API’s so while Graph API looks tempting for this exercise I used the OneDrive API.

Azure AD Authentication

The authentication will be done in two parts.

  1. Create Azure AD application to do the authentication for the Microsoft service(s) you want to interact with.
  2. Use ADAL to interact with Azure AD to do the oAuth flow.

Setup Azure AD – create application

Microsoft provides free use of Azure AD for light authentication needs. You can register and create account. Once you have your account you need to create a new application.

For my purposes I created an Azure AD native client application. Azure AD also has web application and web APIs but both require user to enter username and password in web browser. The native client application does technically also require user to enter these too but I hacked past this by using ADAL user authentication and hard coding username and password into the Python code. Since these are going onto my web host in protected directory to run as cron jobs they will be safe.

I am not going to go through the detail of creating an Azure AD application there are some good blog posts and Microsoft does good job of describing it. For example take a look at this site which has decent information about creating a new Azure AD application.

The Azure AD applications allow you to choose which Microsoft services it will be used to authenticate. Confusingly these are also called ‘applications’ too. They are represent Microsoft Services such as Office 365 Sharepoint Online, OneNote, Power BI, etc and is the place where you assign the permissions (also called ‘scopes’) that authentication will allows with that Microsoft service.

An Azure AD application might provide authentication for more than one Microsoft Service. But my native client application has only Windows Azure Active Directory permissions (which are there by default) and Office 365 Sharepoint Online permissions set to Read and write user files and Read and write items in all site collections.

After you have created your client application make sure to copy the client_id and resource_uri to use in code below. The client_id is automatically assigned and the resource_uri for a native client app can be any url and is just a unique identifier. I chose the Office 365 login url. The web applications need a real url because that is where the user will be prompted to enter credentials.

Azure Active Directory Library (ADAL)

Microsoft’s Azure Active Directory Library (ADAL) authentication libraries are created for developer’s to use with Azure AD. I used the ADAL Python SDK which was easily installed with pip install adal. 

The oAuth authentication flow can seem very complex but you don’t have to worry about that if you use ADAL. ADAL uses your Azure AD application credentials (client_id, resource_uri in case of native client application) to retrieve a token response which is a text string in JSON format.

This JSON string includes the actual access token that is used to authenticate accessing Sharepoint and upload the files. You can use Python to retrieve the access token (it is a Dictionary). Then you simply put the access token into a header that will be used in the Put Request as the method of passing the access token to the OneDrive API.

ADAL also takes care of refreshing tokens which expire. In my case where the scripts are running on the server as cron jobs I want the token to refresh automatically. ADAL gets a refresh token that you can save to get a new access token when previous one expires. I actually write the refresh token to a text file on the server and refresh the access token each time code is run. I could only refresh it if the previous one expires.

OneDrive API

The OneDrive API has different configurations depending on whether you are using it to access a OneDrive Personal, OneDrive Business or Sharepoint Online account.

Be warned that the documentation for OneDrive API can be very dense and there are different ways of presenting required syntax to identify interactions. Of course the representations vary with different SDKs too.  Also there are different versions of Microsofts file storage services over the years. So I recommend to focus on the newest OneDrive API and make sure you are looking at documentation relevant to newest version.

The Gotchas

ADAL Default Values

ADAL has default client_id and resource values that it uses for the username authentication. I changed these default values to match my Azure AD application.

Before changing these I was getting an Invalid audience Uri error

{“error”:”invalid_client”,”error_description”:”Invalid audience Uri ‘https:\/\/m
anagement.core.windows.net\/’.”}

This error means the url being used to create the token response was not same as the one that the file was being uploaded to.

EDIT August 21, 2016 Microsoft has updated the ADAL library so that you can specify the client id and the resource value because authentication against different services needs different client id and resource endpoint urls. That means the hack I used below is no longer required. For more details seehttps://github.com/AzureAD/azure-activedirectory-library-for-python

In ADAL’s __init__.py file look for the class _DefaultValues class at bottom of code and replace the default values:

  • I changed client_id to my application’s client_id
  • I changed resource from https://management.core.windows.net/ to https://tenant.sharepoint.com/

The acquire_token_with_username_password function sets these to None so they get set to default values. So this could be changed so they accept values from the code.

Sharepoint Site and Folder Paths

The OneDrive API dev documentation https://dev.onedrive.com/getting-started.htm demonstrates the different service urls:

  • OneDrive – https://api.onedrive.com/v1.0
  • OneDrive for Business – https://{tenant}-my.sharepoint.com/_api/v2.0
  • SharePoint Online – https://{tenant}.sharepoint.com/{site-relative-path}/_api/v2.0

The {site-relative-path} notation indicates the Sharepoint site name. My site didn’t have this because it was default site. However you might have to add your site relative path.

Also the Sharepoint url for the file I was uploading looked like this:

https://contoso.sharepoint.com/Shared%20Documents/myfoldername/myfilename.csv

However you will note that the file_url in the code doesn’t make any reference to the Shared%20Documents:

https://contoso.sharepoint.com/_api/v2.0/drive/root:/myfoldername/myfilename.csv:/content

 

Power BI Online – get data from Office 365 Sharepoint file

I want to create a Power BI Online report with a data source from a file on a remote web server that updates automatically so my Power BI report is always up to date.

Power BI Desktop and Online have lots of data connectors to third party ‘Online Services’ eg Salesforce, Mailchimp, Github, etc, as well as file and database connectors. But none of these help to get the file from my remote server directly.

There is no feature to connect to a file on a remote server. I could put my remote file data into MySQL or Postgres database and Power BI could connect to those but my remote server doesn’t allow external connections to hosted databases. So that is not an option for me.

A Power BI Online report can get data from a Sharepoint site file that will update automatically on schedule.

Since I have an Office 365 E3 account which has Sharepoint site I upload my remote file to the Sharepoint site and create a Power BI Online report linked to that Sharepoint file.

I would still have to figure out how to automate uploading my remote server file to my Office 365 E3 account Sharepoint site. But I am pretty sure I can do that with the OneDrive API but more on that in Part 2.

Here is a diagram outlining what I think my solution could be.

powerbidatapath

In the meantime to test using Office 365 Sharepoint file as data source for a Power BI Online report, I created a Power BI report in Desktop with file data source from my Office 365 Sharepoint site and Published it to my Power BI Online account.

After publishing the report to my Power BI Online account, I logged into Power BI Online, opened the newly published report and went to data source options and selected ‘Schedule Refresh’ which produced screen below.

onedriverefresh

I set ‘Keep your data up to date’ to ‘Yes’ and selected ‘Connect Directly’ which gave me error message telling me I had to update credentials.

Not surprisingly the report I published to Power BI Online didn’t ‘remember’ that I had already authorized the Power BI Desktop report to get file from my Office 365 Sharepoint site so I have to do it again in Power BI Online.

So I selected ‘Edit Credentials’ and then selected ‘oAuth’ as type of credentials which popped up Office 365 login screen where I entered my user and password clicked login and was returned back to Power BI online page.

The error message was gone so this must have created oAuth authentication to link the file data source from my Office 365 Sharepoint site into the data source in Power BI Online.

Then I set the automatic refresh to one of the daily 6-hour windows (below i have selected 12 pm to 6 pm) for refresh to run (hourly refresh is a Power BI Pro feature).

The report data source now refreshes from my Sharepoint file on daily automatic schedule so it looks like I am half way to my solution.

I will write another blog post detailing how I will automate moving my data file from my remote server to my Office 365 Sharepoint site. Pretty sure I will be using the OneDrive API https://dev.onedrive.com but there are other options too.

In the meantime talk a look at the previous blog post summarizing the OneDrive and Sharepoint API options.

One challenge I have encountered so far is that the OneDrive Python SDK is made for web apps and I want to setup server app (native client app).  More to come.

OneDrive API features

Microsoft has three file storage options:

  1. OneDrive Personal
  2. OneDrive Business
  3. Sharepoint

These have recently been unified into one new OneDrive API https://dev.onedrive.com and oAuth is preferred method of authentication.

However there are some key differences how the API:

  • OneDrive Personal authenticates against oAuth account created at Microsoft Application Registration Portal using a Microsoft account (Live, Microsoft.com). Authentication url is: https://login.live.com/oauth20_authorize.srf
  • OneDrive Business and Sharepoint authenticate against oAuth account created in Azure Active Directory and must be done with Office 365 account.  Authentication url is: https://login.microsoftonline.com/common/oauth2/

You can create two types of applications that will have different methods and parameters:

  • Web application – web site based application that user can sign into. Require definition of an active redirect url and definition of client secret. Scopes or permissions are identified on the fly when authentication is made.
  • Native client application – Android, ‘head-less’ server app, etc. Requires only definition of an unique redirect uri and scopes or permissions that Office 365 account / user have available eg read & write files, etc.

The process for authentication is similar:

  • Sign-in with user account/password or send authentication url to authentication server to get authentication code.
  • Server sends back url with authentication code.
  • Retrieve authentication code from url.
  • Send another url comprised of code and other parameters back to server to get tokens.
  • Use tokens to list, view, upload, download files etc.

There are development SDKs available for popular languages.

I was only interested in thePython SDK . Some key notes about it include:

  • It is created specifically for web applications, not native client applications. The SDK authentication method relies on using a web browser to pass urls with parameters, codes and tokens back and forth. A native client application will not use web browser. A work around was to use head-less browser but that is a bit hacky.
  • It only has OneDrive Personal authentication urls. To use this with OneDrive Business or Sharepoint these urls are easily replaced with the OneDrive Business authentication urls in two files: auth_provider.py and the onedrivesdk_helper.py.

The change to the unified OneDrive API and oAuth authentication only happened in late 2015 so this is pretty new.

There weren’t many well developed or well documented OneDrive Python example code available.

Note it is still also possible to work with OneDrive Business, Sharepoint and OneDrive Personal without using oAuth authentication and this new OneDrive API by simply using urllib, request and templating along with hard coded Office 365 username and password in your code to authenticate.

Finally Microsoft Graph API can be used to interact with OneDrive Business, Sharepoint and OneDrive Personal  once oAuth is setup.