by Edward Anderson, Data Scientist, ORE Catapult
I’ve got data coming out of my ears!
I’m a few months into my new career in offshore renewable energy, working as a data scientist here at ORE Catapult. My first challenge? How to make myself useful, fast. There’s so much data streaming off wind turbines, I can see opportunities all around for implementing machine learning, neural networks and natural language processing – it’s an obvious fit. But, can I trust the data?
Trust in that data is important. Although I can start creating models and predictions and wow everyone with how a computer can generate a number now, to really make a difference, I need them to have confidence in the result. If I know the underlying data is questionable, then I’m not going to be convincing when I sell the result.
I see four potential challenges with the integrity of the data:
I can’t do much about the first, although trying to detect poorly calibrated sensors from their data sounds like an interesting challenge. And, while being able to better describe the data is important and would address point number four, that seems like a long-term problem. I wonder if there are industry standards? If not, let’s make some! So that leaves points two and three: data aggregation and the ability of our data to accurately and fully describe the real conditions the turbine is operating in.
How can data help a wind farm run more efficiently?
Consider wind speed – it’s the primary energy source for a wind turbine and vital for predicting its power output. Accurate power predictions are extremely attractive because they can be used to make production forecasts, as well as understand whether a turbine is performing as expected.
The grid operator also has an interest in accurate power predictions, as they need to balance the load on the grid. Therefore, a wind farm with an accurate power model will be a much more attractive power generator. Wind speed can also be used to determine the lost production of a turbine, which is an important financial consideration. Indeed, any time you might use a power curve, then you’re almost certainly using the wind speed.
Check out this 1-minute video if you want to know more about a power curve.
A deep dive into some detail
Each wind farm has a SCADA (supervisory control and data acquisition) system that aggregates sensors and comms to enable the assets to be controlled. Typically, for long-term data storage and retrospective analysis, SCADA data is aggregated into 10-minute periods, so that for each signal, like wind speed, power and pitch, you have a set of derived values: mean, min, max, and standard deviation. On the surface of it, this seems fine. However, consider using a power curve to predict the power produced with a given wind speed. To do this, we have to assume that the mean wind speed for that 10-minute period is a good representation of the wind that would have produced the associated mean power for that 10-minute period. I’m not convinced that it is, as the power produced is proportional to the cube of the wind speed and wind speed is mapped to power via the non-linear power curve (seen above).
To provide some evidence, I’ve created some dummy values in the table below. I’ve broken down a 10-minute period into five equal parts, so that the overall mean wind speed remains the same, but the wind speed for each part differs. I’ve then calculated the power for each part and taken an average. Even though the average wind speed is the same (5, 10 or 15 m/s), the power that would be produced for the entire 10 minutes differs depending on how the wind varies over that time period.
|Wind Speed (m/s)||Power|
It’s interesting to see how this plays out from cut-in to rated power. The following chart uses a generated dataset of every possible combination of integer wind speeds for a 10-minute period, split into 10 parts. These combinations are then filtered so that the difference between the maximum and minimum wind speed is reflective of conditions at a known wind farm, i.e. the difference is less than 13 m/s.
The “Min” line is the minimum power that could have been produced for each mean wind speed, whilst the “Max” line is the maximum power that could have been produced. This theoretical plot suggests that at low wind speeds, below 7 m/s, a power curve is likely to underestimate potential production. Whilst at higher wind speeds, above 11 m/s, the power curve is likely to overestimate potential production.
I’ve also compared the power curve to the actual power produced. The chart below shows the actual error in MWh. The shape of the plot matches what we would expect, if my conclusions about the power curve were correct.
To understand the impact of this, consider a wind farm site manager, who is trying to determine how much lost production accrued in a day when there was a grid outage causing total wind farm downtime. If the power curve is used to determine potential production that day, the site manager is likely to underestimate the lost MWh by an average of 0.011 MWh per wind turbine per 10 minutes. With an electricity price of £60/MWh, this equates to £97 per turbine per day. Given the distribution of the errors, if the wind speeds were between the range of 4 to 10 m/s, which is typical in this example, then the magnitude of the average error would be greater, approaching 0.016 MWh, or £138 per turbine per day. Lets say there is a grid outage at a 25 turbine site that lasts a week and the site manager is preparing a claim. By using a power curve lookup using 10 minute SCADA windspeed data, the site manager may leave himself up to £24,150 out of pocket!
That’s just the tip of the iceberg though…
This is just one example of how we can’t take the data on face value, or at least doing so would add considerable noise to any machine learning model that uses it. I’ve started looking into how I can better estimate the true distribution of wind speeds given 10-minute aggregations. Fortunately, it looks like a normal distribution is a pretty good estimate and starting point, but there are lots of other areas to explore:
Let’s work together on these challenges
Here at ORE Catapult, our Data and Digitalisation Team is seeking ways to better understand and utilise data from operational renewable assets. If you have data-driven challenges you would like us to explore, or have a data-driven innovation that you would like to bring to the offshore renewable energy industry, then please get in touch.