Part 2: Why don’t we see more Machine Learning in the wind industry?

Published 14 July 2020

By Dr Conaill Soraghan, Data & Digitalisation Team Lead

This is the second part of a blog that examines why machine learning is underutilised in the wind sector compared to other data-heavy industries. It is recommended that you start with part 1 first which sets the context before the main arguments are laid out here.

What are the main ingredients?

The four main ingredients required to deploy effective ML are:

  1. Data – and lots of it.
  2. Appropriate domain challenges – asking the right questions of the data.
  3. Data science capability – to manipulate data, develop algorithms, then interpret the outputs.
  4. Time – to develop models to disrupt traditional processes.

Let’s run through the four ingredients to identify what’s missing:

Our data lacks context

The wind industry is drowning in data – predominantly consisting of unlabelled time series data and images. Unlike the worlds of online retail or health, where there are labels such as ‘user x purchased y’, the wind industry has an overwhelming volume of sensor measurements (temperatures, pressures, rotor speeds, wind speeds, etc.) and images/video (blades, welds, cable routes, etc.). Often, these are without sufficient context or labels such as ‘this turbine is healthy’ or ‘this blade is has level 2 erosion’.

Most data owners (OEMs and Owner/Operators) believe there is value in all of this unlabelled data, so they store it all. Yet, only a tiny fraction is used to inform decisions. When it is used, routine trending and periodic dashboards are often the extent of the analyses.

In the wind sector, alarm logs are a form of labelled data.  In a sense, they label your SCADA and other sensor data from the same time period. Unfortunately, there is no standard for this data source so the data quality, format and documentation vary significantly for different turbine manufacturers. There is also the labelled data from maintenance reports, which would be like gold dust if it was connected to the SCADA.

So yes, there is plenty of data, but a lack of context or labels, and the complementary data streams are not linked very well, limiting the types of ML that can be deployed.

The wrong questions are being asked

There is no shortage of data-driven challenges, of which the wind sector is very aware. For the operational assets (where I specialise) the questions all stem from the three key industry drivers of reducing cost, improving production and mitigating health and safety risks. A logical first step of exploring ML for many organisations and researchers is to assume that with all the data available, and the predicting capabilities of ML, ML can be used to predict wind turbine failures. This will reduce costs and improve production – which sounds ideal. However, it doesn’t take long to realise that there are not enough instances of similar failures or even reliable failure records to generate the accuracy and confidence required for a new process to be adopted by operations teams.

So, I would argue that the wrong questions are being asked of the copious unlabelled data in the wind sector. Put differently, the wrong ML paradigms and techniques are being applied. There are more exotic ML techniques for handling rare events, but the first step should be better labelling of data. Linking of contextual data (such as alarms and maintenance records) is one approach. Another data-driven approach is to exploit unsupervised approaches to improving understanding how assets and components behave, then use these operational state labels to feed into supervised ML algorithms to produce actionable insights.

We need a different skillset

Most large data owners have established data science teams in some form or another over the past five years to strategically leverage data. This is an excellent trend that is well overdue in such a data-rich industry. However, I see a significant gap between the data scientists, who are mechanical engineers and have learned some data science out of necessity on the job, and data scientists who have University-level education in data-driven thinking and code development.

Furthermore, many data scientists in the wind sector will find themselves wearing many hats, limiting the value they can contribute. Firstly, there is a lack of prioritisation of which data-driven challenges to investigate. Once some analysis begins, they will soon discover the data pipeline, and data quality is not optimised. In addition to recruiting or training data scientists, we need to accept that that there is a need for a suite of complementary roles including data engineers, business analysts and software developers.

We need time to fail and learn

An organisation can have all the data necessary, understand what types of ML are best suited to that data, and have a pool of talented engineers and analysts ready to deploy ML. However, if the analysis team are not given the time to experiment, innovate and fail (initially), ML will remain in the pilot demonstration phase for a very long time, while other organisations develop the competitive advantage that is possible by letting ML exploit data in a way that more traditional methods never will.

So, why is ML not more prolific in the wind sector? To summarise, one significant issue is the lack of labelled data in the wind industry, limiting the types of ML approaches that are applicable and causing us to ask the wrong questions of the data. Ultimately, many pilot projects fail to get off the ground or show insufficient accuracy to be moved into production. However, I would argue that the issue is, in fact, secondary to the lack of time that talented people in the right organisations with the right data have to explore and experiment with this new technology.

A lot of work is required to understand and demonstrate the value of ML in a wind-specific domain, and this is where ORE Catapult can support.

What is ORE Catapult doing?

We have confidence that ML has clear value for such a data-rich industry and ORE Catapult exists to stimulate and deliver innovation to help the industry evolve and improve. Therefore, we have set an internal goal to articulate the value of ML on behalf of the offshore renewables sector.

We aim to demystify the technology by increasing ML awareness, exploring data with ML collaboratively, and presenting real findings and value cases to drive significant improvements across the industry.

Our ML track record

Currently, we have experts across the business deploying ML models to innovate and improve on traditional wind O&M processes. The data and questions come from owning a 7MW offshore wind turbine and our involvement in over 50 live collaborative innovation projects with industry partners. There are data engineers and data scientists in ORE Catapult’s Data & Digitalisation Team working alongside wind turbine technology experts in our Research & Disruptive Innovation Team. And critically, we have time to explore this technology and fail fast – a luxury that site operational teams rarely have.

Some examples of our ML outputs are shown below:

Figure 1: Main bearing anomaly detection using neural networks

A neural network was trained to predict the main bearing temperature of multiple turbines in an operational offshore wind farm. On the right, the thick lines are the unhealthy turbine, and orange lines are six months before a failure. The neural network has trouble predicting temperatures in the last half-year for the unhealthy turbine, so this is showing promise for anomaly detection or possibly failure prediction.

Figure 2: Clustering wind turbine SCADA using k-means clustering with k = 5 (5 clusters)

SCADA data from one offshore wind turbine was fed into a k-means clustering algorithm, with k = 5 to partition the timestamps into five groups. The chart in the middle of the leftmost column is a power curve, which shows this approach is starting to identify modes of operation, but it has not added significant value (as I said, we fail fast to learn). It does reveal how modes of operation easily overlap when looking at a traditional two-dimensional view, which is the justification for using PCA and ML in general, because the human eye has trouble seeing the relationships.

Figure 3: Comparison of power curve lookups to a neural network for predicting power of a wind turbine

A neural network was trained to predict the power of a target turbine based on the anemometers of all other wind turbines in the same wind farm. The algorithm used a stochastic gradient decent optimiser and a mean square error loss function, with three hidden layers that use a tanh activation function. When compared to looking up the power curve using the wind speed at the target turbine, the ML approach shows a significant improvement. It is an overall lower error and is more symmetric about the 0 error line. The power curve approach is systematically underestimating production, so any lost production analysis will be underestimated.

Figure 4: Dimension reduction applied to seek outliers in a wind farm

Dimension reduction applied to SCADA from seven offshore wind turbines in the same wind farm, where each colour represents a different turbine, and the size of the bubble is the power output for that timestep. The Feature space plot can be used to identify outliers; in this case, the dark green turbine was found to be underperforming.

Figure 5 Principle Component Analysis applied to 200 channels of 1Hz SCADA data for one wind turbine

Principle Component Analysis applied to 200 channels of 1Hz SCADA data for one turbine. It was found that 50 principle components out of 200 were required to explain 80% of the variance in the data. Initially, this appeared of limited value and simply reflected the complexity of 1Hz SCADA data. However, when the first two principal components are plotted against each other (shown above), two distinct operative modes are evident. This could be used by an asset owner to track operative state in real-time and identify curtailment to trigger discussions with OEMs and asset managers.

Engage with us through the Wind Digital Innovation Forum

We have seen the value of ML first-hand and are keen to help the industry understand and unlock the value of this powerful digital technology.

We’re inviting industry to engage with us through our Wind Digital Innovation (DI) Forum. This a relatively new programme and its goal is to increase the digital maturity of the wind industry by unlocking the value of digital innovations, including ML.

The Wind DI Forum is open to any organisation in the value chain including owner/operators, supply chain, consultancies and academia. Joining the Wind Digital Innovation Forum will provide you with:

  • Regular opportunities to make connections with other data owners from the wind sector and the digital supply chain.
  • Access to our data scientists and engineers to explore ML collaboratively, develop proof of concepts and discover the value of ML.
  • Training masterclasses to upskill your organisation – the first of which is on how to get an organisation ready for ML.
  • The ability to influence and participate in industry competitions to bring in the best-in-class digital solutions from outside the wind industry.

Get in touch

ML is a fascinating technology for the wind industry, but as this article argues, it has yet to make a significant business impact due to four key barriers: Lack of data context; the wrong questions being asked; needing a different skillset; needing time to fail and learn.

Are you a data owner who wants to unlock the value of your data, or are you a solution provider with a technology or offering that will help the industry exploit the power of ML? At ORE Catapult,  we’d love to hear from you to support our goal of articulating the value of ML on behalf of the offshore renewables sector. We are keen to help the industry understand and unlock the value of this powerful digital technology.