[Explainer] PyTorch starter 0.857 F1-Score on public LB

Over 4 years ago

Your code looks really clean and nice. Can you please elaborate more on the process of inference? I am asking because I’ve come to the opposite conclusion: the fewer overlaps I do, the better the final score is

[Explainer] Need extra features? Different input approach? Try Seismic Attributes!

Over 4 years ago

I’ve tried following several approaches. Assuming that we have K attributes, we can either:

stack all them into a (K+1, WIDTH, HEIGHT) array and use as input to the NN (+1 comes from the seismic image itself)
make K+1 separate branches in NN for inputs

The first one is more general, the second requires less computational resources. In either case, it is possible for NN to learn those augmentations; but that requires data, and it can be the case that adding those features can indeed be helpful.

Also, there are some coherency functions in my colab notebook; to use them during training process, one should pre-convert the whole cube and use the stored data as it is way more effecient than computing coherency on the fly

[Explainer] Need extra features? Different input approach? Try Seismic Attributes!

Over 4 years ago

Are you planning on releasing your code after the contest is over? I will do so, and it seems that our approaches are absolutely different, pretty much orthogonal, so it would be amazing to look at one another’s work!

[Explainer] EDA in details, baseline and advanced models

Over 4 years ago

Yep, it is zero-centered normal noise. That is not that necessary in this task and usually used to help model generalize between multiple seismic cubes

📝 Explained by the Community | Win 4 x DJI Mavic Drones

Over 4 years ago

I have added the new notebooks with EDA and baseline models, so be sure to check it out in a separate topic!

explainer

Notebook on COLAB with EDA

Notebook on NBVIEWER with EDA

Notebook on COLAB with baseline model

Notebook on NBVIEWER with baseline model

[Explainer] EDA in details, baseline and advanced models

Over 4 years ago

For those of you, who have already read my previous post: I’ve added the EDA and baseline notebooks, so be sure to check them out!

The next model I am planning to add is the one that achieves ~0.88+ score, so leave that like to speed me up

[Explainer] EDA in details, baseline and advanced models

Over 4 years ago

Hi everyone!

I am Sergey, and I am a machine learning researcher in Gazpromneft, a Russian Oil&Gas company. I am working in the field of seismic exploration for some time now, and this hackathon is a perfect chance to showcase our skills to the rest of the world. Nevertheless, it is our very first time working with such type of structures on seismic data, so it is as new to us as to any other ML practitioners.

In this post, I will try to cover everything that I’ve done so far to produce submissions for this contest: from data exploration to generating predictions, with descriptions of pretty much every line of code. Also, as I had enough time for testing various approaches to work with seismic data, I will share them as well: libraries and tricks to use for geological analysis.

Currently, I use our own servers to run my experiments on; in the next few days, I will gradually transfer everything to COLAB to share the progress. That can take some time, so be sure to like this post and check it for updates!

0. seismiQB

To solve various tasks of seismic interpretation, we’ve developed an open-sourced Python framework seismiQB . It is capable of:

Working with seismic cubes at any stage of processing: pre-stack, post-stack, and 2d data. It may not be the most flashy thing from the standpoint of this contest, but allows us to conveniently work with any data we are provided with. Not only that, but our custom format of data storage can access data up to 30 times faster than regular SEG-Y can.
It is designed to work with 100+ GB cubes, so may not be that necessary for the needs of this competition where it is easier to load everything into memory.
Creating pipelines of data generation. Apply augmentations on the fly with a chosen parallelism engine: for example, if you have code that transforms one image from your batch, we provide a method of applying them in parallel to every image in the batch via multithreading, multiparallel or async.
Configuring sophisticated neural networks with just a few lines of code. We’ve written our custom wrappers for TensorFlow and PyTorch, which allow us to use short configs to define even the most complex architectures. Our model zoo has a huge number of pre-implemented architectures, ranging from plain old UNet to the DeepLab v3+ with all the modern bells and whistles. If you don’t like any of them, it is easy to build your own model from pre-defined blocks like ResNet block, ASPP, Xception block, etc.
Training on arbitrary 3D crops, sampled from the cube according to the desired distribution. We can work with any shapes and have rich tools to cut the data from cubes and merge individual crops back together: that allows us to make overlapping predictions to improve the overall quality of our models.
Exploring hyperparameter space by running a number of experiments simultaneously. With all the wealth of parameters that is presented by our framework, we need to make a lot of experiments to narrow the exact architecture down. To this end, we have a special module, that runs them in separate processes, while keeping a detailed log of each of them.
Profiling of pretty much everything. If you do something – you can time it and check the memory consumption. As we write production-quality code that is used by our geologists on daily basis, we optimize everything to the nth degree. Our tools enable us to do so.

Our code has a lot of documentation inside, but if you have any questions about it, be sure to contact us! We also have dedicated libraries for seismic processing and petrophysics .

As I already said, we have never worked with facies like in this challenge previously: the tasks that we tackled so far are:

Horizon picking
Horizon extension and enhancement
Estuaries detection
Faults extraction
Reef outlining
For most of them, we have openly available notebooks that demonstrate in great detail, how to solve the problem. We are advocates of a rigorous approach to our researches, so each of them follows our company-wide standard. Be sure to check them out!

In this contest, unlike any of our previous affairs, we need to do cube-to-cube prediction: to this end, I had to implement some new primitives, but that relates mostly to data loading. From the moment we have images and segmentation masks, it is business as usual.

1. Exploratory data analysis

We can extract a ton of information just by looking at the cubes from various axes: it is an invaluable insight into the data itself. As my specialization is horizon detection, I tried to look at borders between facies and analyze them. As was already mentioned by other participants, facies picking is very subjective; yet, I was able to find interesting patterns of labeling that can be used for the improving models.

Overall, it is pretty much the same as in any other contest: check the distributions, create visualizations, get familiar with the data itself. I explicitly note possible directions for improving the overall prediction score, which I made by observing the results of the analysis.

It also contains a huge number of geological attributes, as well as a detailed explanation, why each and every one of them can be easily learned by the neural network during the training process, rendering them completely useless.

Notebook on COLAB with EDA

Notebook on NBVIEWER with EDA

I actually prefer this one, as it loads 3D images way faster

2. Segmentation model

The next notebook shows everything for submission creation:

Data loading: generating data along both spatial axis
Applying a few image-related augmentations
NN architecture with all the training hyperparameters
Validation on left-off data
Sequential prediction on the test data and storing the prediction in the needed format

There are quite a few parameters that can be changed: from the architecture to the way we generate data during the training phase. There is also an ever-present question of whether to add seismic attributes to the model input or just hope that a huge neural network can figure our everything it needs on its own. All of these variables can have a huge impact on the outcomes and the scoring of the submission!

Notebook on COLAB with baseline model

Notebook on NBVIEWER with baseline model

The baseline model create a submission with a score of ~0.75 out of the box. By tweaking some knobs inside of it, it is easy to reach a ~0.88 score!

That is it for now! Feedback, suggestions and criticism is always welcome!

It is also worth mentioning, that we have a lot of articles and talks and conferences on the subject: right now, we are in the middle of running EAGE workshop on ML in Oil&Gas. If you are interested in any of this, be sure to contact us!

📝 Explained by the Community | Win 4 x DJI Mavic Drones

Over 4 years ago

Hi everyone!

I am Sergey, and I am a machine learning researcher in Gazpromneft, a Russian Oil&Gas company. I am working in the field of seismic exploration for some time now, and this hackathon is a perfect chance to showcase our skills to the rest of the world. Nevertheless, it is our very first time working with such type of structures on seismic data, so it is as new to us as to any other ML practitioners.

In this post, I will try to cover everything that I’ve done so far to produce submissions for this contest: from data exploration to generating predictions, with descriptions of pretty much every line of code. Also, as I had enough time for testing various approaches to work with seismic data, I will share them as well: libraries and tricks to use for geological analysis.

Currently, I use our own servers to run my experiments on; in the next few days, I will gradually transfer everything to COLAB to share the progress. That can take some time, so be sure to like this post and check it for updates!

0. seismiQB

To solve various tasks of seismic interpretation, we’ve developed an open-sourced Python framework seismiQB. It is capable of:

Working with seismic cubes at any stage of processing: pre-stack, post-stack, and 2d data. It may not be the most flashy thing from the standpoint of this contest, but allows us to conveniently work with any data we are provided with. Not only that, but our custom format of data storage can access data up to 30 times faster than regular SEG-Y can.
It is designed to work with 100+ GB cubes, so may not be that necessary for the needs of this competition where it is easier to load everything into memory.
Creating pipelines of data generation. Apply augmentations on the fly with a chosen parallelism engine: for example, if you have code that transforms one image from your batch, we provide a method of applying them in parallel to every image in the batch via multithreading, multiparallel or async.
Configuring sophisticated neural networks with just a few lines of code. We’ve written our custom wrappers for TensorFlow and PyTorch, which allow us to use short configs to define even the most complex architectures. Our model zoo has a huge number of pre-implemented architectures, ranging from plain old UNet to the DeepLab v3+ with all the modern bells and whistles. If you don’t like any of them, it is easy to build your own model from pre-defined blocks like ResNet block, ASPP, Xception block, etc.
Training on arbitrary 3D crops, sampled from the cube according to the desired distribution. We can work with any shapes and have rich tools to cut the data from cubes and merge individual crops back together: that allows us to make overlapping predictions to improve the overall quality of our models.
Exploring hyperparameter space by running a number of experiments simultaneously. With all the wealth of parameters that is presented by our framework, we need to make a lot of experiments to narrow the exact architecture down. To this end, we have a special module, that runs them in separate processes, while keeping a detailed log of each of them.
Profiling of pretty much everything. If you do something – you can time it and check the memory consumption. As we write production-quality code that is used by our geologists on daily basis, we optimize everything to the nth degree. Our tools enable us to do so.

Our code has a lot of documentation inside, but if you have any questions about it, be sure to contact us! We also have dedicated libraries for seismic processing and petrophysics.

As I already said, we have never worked with facies like in this challenge previously: the tasks that we tackled so far are:

Horizon picking
Horizon extension and enhancement
Estuaries detection
Faults extraction
Reef outlining
For most of them, we have openly available notebooks that demonstrate in great detail, how to solve the problem. We are advocates of a rigorous approach to our researches, so each of them follows our company-wide standard. Be sure to check them out!

In this contest, unlike any of our previous affairs, we need to do cube-to-cube prediction: to this end, I had to implement some new primitives, but that relates mostly to data loading. From the moment we have images and segmentation masks, it is business as usual.

1. Exploratory data analysis

We can extract a ton of information just by looking at the cubes from various axes: it is an invaluable insight into the data itself. As my specialization is horizon detection, I tried to look at borders between facies and analyze them. As was already mentioned by other participants, facies picking is very subjective; yet, I was able to find interesting patterns of labeling that can be used for the improving models.

Overall, it is pretty much the same as in any other contest: check the distributions, create visualizations, get familiar with the data itself.

Placeholder for a link to an amazing COLAB notebook.

2. Segmentation model

The next notebook shows everything for submission creation:

Data loading: generating data along both spatial axis
Applying a few image-related augmentations
NN architecture with all the training hyperparameters
Validation on left-off data
Sequential prediction on the test data and storing the prediction in the needed format

There are quite a few parameters that can be changed: from the architecture to the way we generate data during the training phase. There is also an ever-present question of whether to add seismic attributes to the model input or just hope that a huge neural network can figure our everything it needs on its own. All of these variables can have a huge impact on the outcomes and the scoring of the submission!

Placeholder for a link to an amazing COLAB notebook.

That is it for now! I will try to update every section with new insights and put everything to the COLAB as soon as possible. Feedback, suggestions and criticism is always welcome!

It is also worth mentioning, that we have a lot of articles and talks and conferences on the subject: right now, we are in the middle of running EAGE workshop on ML in Oil&Gas. If you are interested in any of this, be sure to contact us!

graded	89432	Sun, 18 Oct 2020 21:21:34
graded	89430	Sun, 18 Oct 2020 21:18:21
graded	89321	Sun, 18 Oct 2020 14:19:14

graded	89432	Sun, 18 Oct 2020 21:21:34
graded	89430	Sun, 18 Oct 2020 21:18:21
graded	89321	Sun, 18 Oct 2020 14:19:14

Location

Badges

Activity

Challenge Categories

Challenges Entered

Data Purchasing Challenge 2022

Latest submissions

Seismic Facies Identification Challenge

Latest submissions

Learning to Smell

Latest submissions

AI Blitz #4

Latest submissions

Seismic Facies Identification Challenge

[Explainer] PyTorch starter 0.857 F1-Score on public LB

[Explainer] Need extra features? Different input approach? Try Seismic Attributes!

[Explainer] Need extra features? Different input approach? Try Seismic Attributes!

[Explainer] EDA in details, baseline and advanced models

📝 Explained by the Community | Win 4 x DJI Mavic Drones

explainer

Notebook on COLAB with EDA

Notebook on NBVIEWER with EDA

Notebook on COLAB with baseline model

Notebook on NBVIEWER with baseline model

[Explainer] EDA in details, baseline and advanced models

[Explainer] EDA in details, baseline and advanced models

0. seismiQB

1. Exploratory data analysis

Notebook on COLAB with EDA

Notebook on NBVIEWER with EDA

2. Segmentation model

Notebook on COLAB with baseline model

Notebook on NBVIEWER with baseline model

📝 Explained by the Community | Win 4 x DJI Mavic Drones

0. seismiQB

1. Exploratory data analysis

2. Segmentation model

Notebooks

Notebooks