With the August 2018 update, Power BI now supports a Python integration. This means you can use Python and all its modules to do your data manipulation before it hits the Power BI model. Let's take a look at how we can get started!
First, you need to enable Python in the options. Open the Power Query editor and navigate to File -> Options and Settings -> Options -> Preview Features -> Python Support
Check the 'Python support' checkbox and hit close. You'll need to restart Power BI as well for the features to be fully available.
After restarting, open the query editor once more. The Python data option is under the 'Other' category.
A small window will open up for you to enter your Python code. But before we do that, you'll need to make sure Power BI is pointing to the correct path for Python.
If you have both Python 2 and 3 installed, you want to make sure the path is for the one you intend to use. If everything checks out, you can enter your code right in this window. The normal commenting convention '#' works here as well. Here I've inserted a simple Python script using pandas to import an excel file. Obviously, if you don't have pandas installed, you'll need to do that first with pip install, or your other preferred method.
# import data from excel file with Python!
import pandas as pd fileName = "C:\Users\....\Sample Restaurant Data.xlsm" data = pd.read_excel(fileName)
Click 'OK' and Power BI will open the familiar navigation panel where you can preview the data. You'll notice that Power Query (M) takes over at this point. Loading up the data will trigger the standard steps for navigation and changing the data types.
What if I need to change the Python code?
All the Python code is still available to be edited within the Advanced Editor in Power Query - although it looks a bit messy. If you do end up editing the code, Power Query will need you to edit permissions on the script, just to safe guard any existing models that may be affected. Simply click 'Run' and it will work as expected.
Here, I've rearranged the M code to be a bit more readable. I've taken out the #"Changed Type" step as well, since it's uneccessary.
let Source = Python.Execute(" # import data from excel file with Python! #(lf)#(lf)import pandas as pd #(lf)#(lf)fileName = ""C:\Users\....\Sample Restaurant Data.xlsm"" #(lf)#(lf)data = pd.read_excel(fileName) "), data = Source{[Name="data"]}[Value] in data
Here we can see that our Python code is pretty much intact, with some additional pieces applied. Let's dive into how M is handling Python.
First, we see that the actual Python script is being fed into the Python.Execute() function. This looks pretty easy to understand and simple. It only needs 1 parameter, our Python script, wrapped in quotes.
The most unusual part is the #(lf)#(lf). These signal line separations. There cannot be a space between these characters and the corresponding Python code. For example:
This works: #(lf)#(lf)import pandas
This doesn't work: #(lf)#(lf) import pandas
One thing of interest to note... Power Query throws in two #(lf)'s, but in fact, it doesn't seem to affect the code if you only have one. For example, this M code will work just fine:
let Source = Python.Execute(" # import data from excel file with Python! #(lf)import pandas as pd #(lf)fileName = ""C:\Users\....\Sample Restaurant Data.xlsm"" #(lf)data = pd.read_excel(fileName) "), data = Source{[Name="data"]}[Value] in data
What about loops... how will that affect the M script? Take this simple loop. Here I'm just making an array and dumping it into a pandas dataframe:
import pandas a = [] for x in range(8): row = [] for y in range(8): row.append(y) a.append(row) df = pandas.DataFrame(data = a)
Transalting this to the M code and we get this:
let Source = Python.Execute(" #(lf)import pandas #(lf)a = [] #(lf)for x in range(8): #(lf) row = [] #(lf) for y in range(8): #(lf) row.append(y) #(lf) a.append(row) #(lf)df = pandas.DataFrame(data = a) "), df1 = Source{[Name="df"]}[Value] in df1
Alien Tip: Indeed, if we were to remove the indents for any of the iteration sections, the Python code will crash, leading to an error on the Power Query side. Also, refrain from any user input Python code like "raw_input()" as it will cause an error as well.
I think having Python available for Power BI is an enormous asset for any user! What do you think? Let me know in the comments!
Getting started with Power BI? Check out Power Pivot and Power BI by Rob Collie and Avichal Singh: