OneDrive SharePoint Migration Tool (SPMT)

I have a Office 365 Business Premium account. I wanted to sync many large folders with OneDrive for Business. However, it was taking very long to upload. While I was searching for improvements, I saw many other people have observed that OneDrive is much slower than Dropbox, Box, AWS, Google Files etc. Apparently this is a ‘known issue’ and Microsoft is addressing it.

One of the things they have done is provide a standalone desktop application called ‘SharePoint Migration Tool (SPMT)’. While the name doesn’t specifically include OneDrive, the tool is intended for use with Sharepoint and OneDrive. However, there are a few pieces of information I learned to use it for OneDrive. That is the purpose of this post.

First, the user who will be using SPMT must have permissions for the OneDrive for Business account you want to upload files to.

Then you need to download SPMT from here https://www.microsoft.com/en-us/download/details.aspx?id=53598

The SPMT tool can be used in any of several migration ‘scenarios’, but the one I am focusing on here is the ‘File share to OneDrive and SharePoint’ scenario. I wanted to use SPMT as a potentially faster way to upload files from my desktop computer to OneDrive for Business.

The guide for this scenario is here https://docs.microsoft.com/en-us/sharepointmigration/fileshare-to-odsp-migration-guide.

After you download, install and start SPMT the start screen will ask you to sign into to your Office 365 account. This will be user that has permission on OneDrive.

After successfully signing in you will see 3 scenarios to choose from:

The two that I was interested in testing were:

  • File Share
  • CSV file for bulk migration

Note / spoiler alert
While SPMT provides much more flexibility in terms of identifying source and target, and capability to batch migration jobs, it was not dramatically faster than just using OneDrive desktop sync tool. It took me a couple of hours to upload only 1000 files total 3 GB in size.

The Microsoft documentation refers to migrations with multiple TB’s of files but my experience suggests that would be impossible if the files were coming from outside of Microsoft cloud. Therefore, it is clear that for large migrations, the SPMT is intended primarily for use within Microsoft cloud.

File Share
The File share option is pretty straightforward. Use SPMT to select the local folder to upload, then manually enter the OneDrive location url to upload them to. However, there are some things to pay attention to here:

  • When you select a local (aka source) folder only it’s contents will be uploaded, not the entire folder with selected local folder name.
  • The string you enter for the OneDrive (aka target) location url is not 100% intuitive or completely described in the instructions. See below.

OneDrive (aka target) location url
The exact specifications for this are not given in a way that is intuitively helpful. There are subtle differences in the url between SharePoint and OneDrive. The linked documentation does indicate these but I had to search for additional clarification which I pass on below.

If my Office 365 account domain is ‘mycompany’ (eg mycompany.onmicrosoft.com) and the user is ‘myuser’ (eg myuser@mycompany.onmicrosoft.com) then myuser’s OneDrive target location url:

https://mycompany-my.sharepoint.com/personal/myuser_mycompany_onmicrosoft_com/

You can also simply copy this from your browser address bar when you are viewing myuser’s OneDrive folder online.

CSV file for bulk migration
This option uses similar information as above but you put the information for source and target into a CSV (or JSON) file format.

CSV file format:

  • Has 6 columns
    • Column A: Mandatory. Source folder path regular Windows path. Use double quotes if path has spaces.
    • Column B: Optional. For Sharepoint migration only. Not for file migration like we are talking about here.
    • Column C: Optional. Subfolder of source path.
    • Column D: Mandatory. OneDrive (aka target) location url. Note do not include anything other than what is above. Column D and E are used to specify folders.
    • Column D: Mandatory. Standard value “Documents”
    • Column F: Optional. Subfolders that will be created if they are not already present under your OneDrive root url. Enter as “Folder/subfolder/”. Note the use of fore slash and double quotes if any spaces.
  • The file has one row per ‘job’. For example, in File Share method above you select a single folder to migrate. That would be one row in CSV (or JSON) file.
  • Use SPMT to upload the CSV (or JSON) file and it processes all of the migration ‘jobs’.
  • However, it processes them in parallel which seems counterproductive. I would rather it process them in sequence, one row at a time.
  • This bulk migration method would be useful for processing many folders.
  • I did not try JSON method but would expect it would requir same information just in specified JSON foramt.

In both scenarios above when the migration is happening you will see screen similar to below.

Note that SPMT ‘scans’ the files to be uploaded first in any migration scenario to see if they meet specifications. There are strict file name prohibitions but it did rename ‘bad’ files which is nice.

Once it indicates files are being uploaded you should see corresponding folders and files in OneDrive online.

At any time during migration you can click the “View Status” link on the form to open Windows folder where SPMT keeps job status including fails etc in csv files. These are updated like log files so you can monitor status. This was helpful to see what failed and why.

I experienced a few quirky behaviour such as pausing or stopping a migration freezing SPMT app and had to close it using Task Manager. Also one very weird result of closing a job was that a local file was corrupted and I could not move or delete it with error “windows 10 cannot delete file the tag present in the reparse point buffer”. I had to run chkdsk to fix the file.

Overall SPMT does the job advertised. As mentioned it does not dramatically speed uploading files from local computer into OneDrive but it does offer more control and flexibility over OneDrive desktop sync for large uploads with many folders and files.

How to update Office 365 password in Power BI dataset refresh

I recently changed my Office 365 user password for an account that I was using for a Power BI Dataset Scheduled Refresh.

The result was that my Power BI Refresh failed which looked like screenshot below.

powerbi sharepoint dataset - update password-01

So all I had to do was update the authentication Power BI was using to access my Office 365 Sharepoint folder.

It was clear that the Edit credentials link was where I needed to update the password.

That link got me the following page where I selected oAuth2 which is referring to the authentication that Power BI uses with my Office 365 user credentials.

powerbi sharepoint dataset - update password-02

Selecting oAuth2 popped a new browser window where I could enter my Office 365 user and new password and authenticate Power BI.

powerbi sharepoint dataset - update password-1

After clicking Sign In I was returned to the Power BI page and the credential errors above were gone and I could successfully refresh the dataset from my Office 365 Sharepoint files.

powerbi sharepoint dataset - update password-4

The Refresh Schedule log showed the previous failed refresh attempt and the just completed successful refresh. Back in business!

powerbi sharepoint dataset - update password-3

Use OneDrive API to upload files to Office 365 Sharepoint Site

I have automated uploading files from my web site host’s server to my Office 365 Sharepoint site using scheduled cron jobs running Python scripts on my web host.

The Python scripts use Microsoft’s Azure Active Directory Library (ADAL) to authenticate off Azure Active Directory (Azure AD or ADD), and  OneDrive API and Python Requests to use the authentication to upload the files to Sharepoint from my web host.

Here is the code

import adal
import urllib
import requests

## set variables
username = 'user@contoso.onmicrosoft.com'
password = 'password'
authorization_url = 'https://login.windows.net/contoso.onmicrosoft.com' # aka Authority
redirect_uri = 'https://login.microsoftonline.com/login.srf' # from Azure AD application
client_id = 'd84cbf4f-dc23-24d1-8a7d-08ff8359879a' # from Azure AD application
file_url = 'https://contoso.sharepoint.com/_api/v2.0/drive/root:/myfoldername/myfilename.csv:/content'

## use ADAL to create token response
token_response = adal.acquire_token_with_username_password(
        authorization_url,
        username,
        password
    )

## Use ADAL to create refresh token and save as text file to reuse
refresh_token = token_response['refreshToken']
refresh_token_file = open('refresh_token.txt', 'w')
refresh_token_file.write(refresh_token)
refresh_token_file.close()

## Get saved refresh token and use it to get new token response
refresh_token = open('refresh_token.txt', 'r').read()
token_response = adal.acquire_token_with_refresh_token(authorization_url, str(refresh_token))

## get access_token from token response JSON string
access_token = token_response.get('accessToken')

## create http header to send access token to authenticate
headers = {'Authorization':'BEARER ' + str(access_token)}

## example to upload file
upload_file = requests.put(file_url, data = open('myfilename.csv', 'rb'), headers=headers)

 

There are many things to consider when working with Microsoft’s APIs to work with its online services such as Office 365.

The first is how to authenticate. Microsoft is trying to move everyone to use Azure AD to do oAuth authentication. Microsoft services still have their own authentication methods but this exercise I used Azure AD.

The second is what API to use. Microsoft has recently released their Graph API that is ‘one endpoint to rule them all’. However Microsoft services still have their own API’s so while Graph API looks tempting for this exercise I used the OneDrive API.

Azure AD Authentication

The authentication will be done in two parts.

  1. Create Azure AD application to do the authentication for the Microsoft service(s) you want to interact with.
  2. Use ADAL to interact with Azure AD to do the oAuth flow.

Setup Azure AD – create application

Microsoft provides free use of Azure AD for light authentication needs. You can register and create account. Once you have your account you need to create a new application.

For my purposes I created an Azure AD native client application. Azure AD also has web application and web APIs but both require user to enter username and password in web browser. The native client application does technically also require user to enter these too but I hacked past this by using ADAL user authentication and hard coding username and password into the Python code. Since these are going onto my web host in protected directory to run as cron jobs they will be safe.

I am not going to go through the detail of creating an Azure AD application there are some good blog posts and Microsoft does good job of describing it. For example take a look at this site which has decent information about creating a new Azure AD application.

The Azure AD applications allow you to choose which Microsoft services it will be used to authenticate. Confusingly these are also called ‘applications’ too. They are represent Microsoft Services such as Office 365 Sharepoint Online, OneNote, Power BI, etc and is the place where you assign the permissions (also called ‘scopes’) that authentication will allows with that Microsoft service.

An Azure AD application might provide authentication for more than one Microsoft Service. But my native client application has only Windows Azure Active Directory permissions (which are there by default) and Office 365 Sharepoint Online permissions set to Read and write user files and Read and write items in all site collections.

After you have created your client application make sure to copy the client_id and resource_uri to use in code below. The client_id is automatically assigned and the resource_uri for a native client app can be any url and is just a unique identifier. I chose the Office 365 login url. The web applications need a real url because that is where the user will be prompted to enter credentials.

Azure Active Directory Library (ADAL)

Microsoft’s Azure Active Directory Library (ADAL) authentication libraries are created for developer’s to use with Azure AD. I used the ADAL Python SDK which was easily installed with pip install adal. 

The oAuth authentication flow can seem very complex but you don’t have to worry about that if you use ADAL. ADAL uses your Azure AD application credentials (client_id, resource_uri in case of native client application) to retrieve a token response which is a text string in JSON format.

This JSON string includes the actual access token that is used to authenticate accessing Sharepoint and upload the files. You can use Python to retrieve the access token (it is a Dictionary). Then you simply put the access token into a header that will be used in the Put Request as the method of passing the access token to the OneDrive API.

ADAL also takes care of refreshing tokens which expire. In my case where the scripts are running on the server as cron jobs I want the token to refresh automatically. ADAL gets a refresh token that you can save to get a new access token when previous one expires. I actually write the refresh token to a text file on the server and refresh the access token each time code is run. I could only refresh it if the previous one expires.

OneDrive API

The OneDrive API has different configurations depending on whether you are using it to access a OneDrive Personal, OneDrive Business or Sharepoint Online account.

Be warned that the documentation for OneDrive API can be very dense and there are different ways of presenting required syntax to identify interactions. Of course the representations vary with different SDKs too.  Also there are different versions of Microsofts file storage services over the years. So I recommend to focus on the newest OneDrive API and make sure you are looking at documentation relevant to newest version.

The Gotchas

ADAL Default Values

ADAL has default client_id and resource values that it uses for the username authentication. I changed these default values to match my Azure AD application.

Before changing these I was getting an Invalid audience Uri error

{“error”:”invalid_client”,”error_description”:”Invalid audience Uri ‘https:\/\/m
anagement.core.windows.net\/’.”}

This error means the url being used to create the token response was not same as the one that the file was being uploaded to.

EDIT August 21, 2016 Microsoft has updated the ADAL library so that you can specify the client id and the resource value because authentication against different services needs different client id and resource endpoint urls. That means the hack I used below is no longer required. For more details seehttps://github.com/AzureAD/azure-activedirectory-library-for-python

In ADAL’s __init__.py file look for the class _DefaultValues class at bottom of code and replace the default values:

  • I changed client_id to my application’s client_id
  • I changed resource from https://management.core.windows.net/ to https://tenant.sharepoint.com/

The acquire_token_with_username_password function sets these to None so they get set to default values. So this could be changed so they accept values from the code.

Sharepoint Site and Folder Paths

The OneDrive API dev documentation https://dev.onedrive.com/getting-started.htm demonstrates the different service urls:

  • OneDrive – https://api.onedrive.com/v1.0
  • OneDrive for Business – https://{tenant}-my.sharepoint.com/_api/v2.0
  • SharePoint Online – https://{tenant}.sharepoint.com/{site-relative-path}/_api/v2.0

The {site-relative-path} notation indicates the Sharepoint site name. My site didn’t have this because it was default site. However you might have to add your site relative path.

Also the Sharepoint url for the file I was uploading looked like this:

https://contoso.sharepoint.com/Shared%20Documents/myfoldername/myfilename.csv

However you will note that the file_url in the code doesn’t make any reference to the Shared%20Documents:

https://contoso.sharepoint.com/_api/v2.0/drive/root:/myfoldername/myfilename.csv:/content

 

Careful! Don’t click “Try Power BI for free”

This was a weird quirk.

I have Power BI Free account and uploaded a report to Power BI Service. The report has dataset that gets data from a Sharepoint file.

In Power BI Service I went to the dataset “Schedule Refresh”, selected “Connect Directly”, “Enter Credentials” as oAuth, then entered my Office 365 credentials. This setup the connection successfully to the Sharepoint file, and then I could switch the “Keep your data up to date” to “Yes”.

Then I accidentally clicked the “Try Pro for Free” button.

From that point on, every time I selected the Power BI Service dataset or the report, I got a pop up blocking message “To see this report upgrade to power bi pro”.

pro upgrade

The only way to make it stop was to switch the “Keep your data up to date” to “No”.

The only Pro feature is hourly updates. The Free Power BI Service version only allows daily updates. I hadn’t selected hourly updates so that wasn’t the problem. Just some weird quirk.

The resolution was to delete the report and dataset that I just scheduled refresh for, and then upload the report again and then redo the schedule refresh as per above (without accidentally clicking on the Try Pro for free button) to make it work again.

How to schedule Power BI dataset refresh

Do you want to create a Power BI Report that gets a daily scheduled refresh of data from a Sharepoint csv file?

The first step is to create your Power BI report in Power BI Desktop using the Sharepoint csv file as data source.

In Power BI Desktop use Get Data – File – Sharepoint Folder to connect to your Sharepoint Folder.

The resulting dataset query (Power Query) will look something like mine below. You will replace “mydomain” with your Sharepoint account name or domain.

You will also replace “datafile.csv” with your csv file name. The Power BI connection is to a Sharepoint folder which might have more than one file like I did. If you have only one file in the folder the filter will be redundant but can’t hurt.

let
Source = SharePoint.Files("https://mydomain.sharepoint.com", [ApiVersion = 15]),
#"Filtered Rows" = Table.SelectRows(Source, each ([Name] = "datafile.csv")),
#"Combined Binaries" = Binary.Combine(#"Filtered Rows"[Content]),
#"Imported CSV" = Csv.Document(#"Combined Binaries",[Delimiter=",", Columns=11, Encoding=1252, QuoteStyle=QuoteStyle.None]),
#"Promoted Headers" = Table.PromoteHeaders(#"Imported CSV")
in
#"Promoted Headers"

After you publish your report to your Power BI Online account you can select your newly uploaded dataset’s “Schedule Refresh” property where you can set up the refresh schedule.

schedule_refresh

 

First go to “Gateway connection”.

I selected “Connect Directly” which requires that you also enter Sharepoint credentials in the “Edit credentials” link which pops up a web page that prompts you to login into your Sharepoint account. This gives Power BI Service permission to access your Sharepoint account to refresh file.

If you have an enterprise gateway setup you could try “Enterprise Gateway” and enter the required credentials for that.

 

gateway_connection

 

If you entered credentials correctly you should now be able to select the “Keep your data up to date” switch to “Yes”.

Then you can select which four 6-hour window you want refresh to run. Power BI Service free accounts can do daily refreshes. Pro accounts can have hourly updates.

As an aside be warned that if you click the “Try Pro for free” button you might get a blocking message that you are using Pro feature. This happened to me and was clearly a quirky error. I had to delete my report and dataset and re-upload them and redo the scheduling to get rid of the error.

schedule_refresh_option

 

You can try refreshing the dataset manually (On demand) or wait for the next scheduled refresh (Scheduled) to happen to see if the data does refresh. You can see refreshes are successful and when they ran by clicking the “refresh history ” link.

refresh_history