Data is vital to the wind industry

New Perspectives: Wind Turbines and Data Science

Published 24 June 2020

by Edward Anderson, Data Scientist, ORE Catapult

I’ve got data coming out of my ears!

I’m a few months into my new career in offshore renewable energy, working as a data scientist here at ORE Catapult. My first challenge? How to make myself useful, fast. There’s so much data streaming off wind turbines, I can see opportunities all around for implementing machine learning, neural networks and natural language processing – it’s an obvious fit. But, can I trust the data?

Trust in that data is important. Although I can start creating models and predictions and wow everyone with how a computer can generate a number now, to really make a difference, I need them to have confidence in the result. If I know the underlying data is questionable, then I’m not going to be convincing when I sell the result.

I see four potential challenges with the integrity of the data:

  1. It can be unreliable as sensors degrade, are miscalibrated or are offline.
  2. It is often aggregated into 10-minute averages, obscuring the true distribution of values.
  3. It does not fully describe the operating conditions i.e the information that best informs us how a turbine will perform may not be contained in the sensor signals we have recorded.
  4. It can be poorly documented, making it unclear if the signal is raw data from a sensor, or corrected data.

I can’t do much about the first, although trying to detect poorly calibrated sensors from their data sounds like an interesting challenge. And, while being able to better describe the data is important and would address point number four, that seems like a long-term problem. I wonder if there are industry standards?  If not, let’s make some! So that leaves points two and three: data aggregation and the ability of our data to accurately and fully describe the real conditions the turbine is operating in.

How can data help a wind farm run more efficiently?

Consider wind speed – it’s the primary energy source for a wind turbine and vital for predicting  its power output. Accurate power predictions are extremely attractive because they can be used to make production forecasts, as well as understand whether a turbine is performing as expected.

The grid operator also has an interest in accurate power predictions, as they need to balance the load on the grid. Therefore, a wind farm with an accurate power model will be a much more attractive power generator.  Wind speed can also be used to determine the lost production of a turbine, which is an important financial consideration.  Indeed, any time you might use a power curve, then you’re almost certainly using the wind speed.

Check out this 1-minute video if you want to know more about a power curve.

A deep dive into some detail

Each wind farm has a SCADA (supervisory control and data acquisition) system that aggregates sensors and comms to enable the assets to be controlled. Typically, for long-term data storage and retrospective analysis, SCADA data is aggregated into 10-minute periods, so that for each signal, like wind speed, power and pitch, you have a set of derived values: mean, min, max, and standard deviation.  On the surface of it, this seems fine.  However, consider using a power curve to predict the power produced with a given wind speed.  To do this, we have to assume that the mean wind speed for that 10-minute period is a good representation of the wind that would have produced the associated mean power for that 10-minute period.  I’m not convinced that it is, as the power produced is proportional to the cube of the wind speed and wind speed is mapped to power via the non-linear power curve (seen above).

To provide some evidence, I’ve created some dummy values in the table below.  I’ve broken down a 10-minute period into five equal parts, so that the overall mean wind speed remains the same, but the wind speed for each part differs.  I’ve then calculated the power for each part and taken an average.  Even though the average wind speed is the same (5, 10 or 15 m/s), the power that would be produced for the entire 10 minutes differs depending on how the wind varies over that time period.

Wind Speed (m/s) Power
T1 T2 T3 T4 T5 Mean
5 5 5 5 5 5 148
1 1 5 9 9 5 452
10 10 10 10 10 10 1,419
5 5 10 15 15 10 1,259
15 15 15 15 15 15 2,291
10 10 15 20 20 15 1,946


It’s interesting to see how this plays out from cut-in to rated power.  The following chart uses a generated dataset of every possible combination of integer wind speeds for a 10-minute period, split into 10 parts.  These combinations are then filtered so that the difference between the maximum and minimum wind speed is reflective of conditions at a known wind farm, i.e. the difference is less than 13 m/s.

The “Min” line is the minimum power that could have been produced for each mean wind speed, whilst the “Max” line is the maximum power that could have been produced. This theoretical plot suggests that at low wind speeds, below 7 m/s, a power curve is likely to underestimate  potential production. Whilst at higher wind speeds, above 11 m/s, the power curve is likely to overestimate  potential production.

I’ve also compared the power curve to the actual power produced.  The chart below shows the actual error in MWh.  The shape of the plot matches what we would expect, if my conclusions about the power curve were correct.

To understand the impact of this, consider a wind farm site manager, who is trying to determine how much lost production accrued in a day when there was a grid outage causing total wind farm downtime.  If the power curve is used to determine potential production that day, the site manager is likely to underestimate the lost MWh by an average of 0.011 MWh per wind turbine per 10 minutes.  With an electricity price of £60/MWh, this equates to £97 per turbine per day. Given the distribution of the errors, if the wind speeds were between the range of 4 to 10 m/s, which is typical in this example, then the magnitude of the average error would be greater, approaching 0.016 MWh, or £138 per turbine per day.  Lets say there is a grid outage at a 25 turbine site that lasts a week and the site manager is preparing a claim. By using a power curve lookup using 10 minute SCADA windspeed data, the site manager may leave himself up to £24,150 out of pocket!

That’s just the tip of the iceberg though…

This is just one example of how we can’t take the data on face value, or at least doing so would add considerable noise to any machine learning model that uses it.  I’ve started looking into how I can better estimate the true distribution of wind speeds given 10-minute aggregations.  Fortunately, it looks like a normal distribution is a pretty good estimate and starting point, but there are lots of other areas to explore:

  1. Average wind speed is not representative, so how can we generate an accurate wind profile from mean, min, max and standard deviation. It seems assuming a normal distribution is accurate, but there may be benefit in exploring Weibull distribution instead.
  2. Typical power curves are rather coarse, and provide values for every 0.5 m/s, so linearly interpolating between these values will add further error.
  3. The air density affects how much power can be produced at a given wind speed. What sources of data can give us this information so that we can correct for it?
  4. Yaw error, i.e. when the turbine is not facing precisely into the wind, will also affect power production. Is this routinely calculated?  Is it possible to correct for it if it’s known or is prevention the only sensible approach?
  5. As turbines grow bigger then wind shear may become a more dominant factor. It seems that you would normally want wind speed measurements at multiple heights to correct for this.  Can met mast data provide this data?  Are LIDAR now cheap enough to install on every turbine?  Is that justifiable?
  6. Not all turbines perform the same and will be impacted by the condition of the individual components. How can you factor this into your model? A turbine’s age would be a general approach, but how would you modify this age if a component is replaced?

Let’s work together on these challenges

Here at ORE Catapult, our Data and Digitalisation Team is  seeking ways to better understand and utilise data from operational renewable assets. If you have data-driven challenges you would like us to explore, or have a data-driven innovation that you would like to bring to the offshore renewable energy industry, then please get in touch.