Organization
Location
Badges
Activity
Challenge Categories
Challenges Entered
Self-driving RL on DeepRacer cars - From simulation to real world
Latest submissions
3D Seismic Image Interpretation by Machine Learning
Latest submissions
Predicting smell of molecular compounds
Latest submissions
See Allgraded | 98203 | ||
graded | 98202 | ||
graded | 98183 |
5 PROBLEMS 3 WEEKS. CAN YOU SOLVE THEM ALL?
Latest submissions
See Allgraded | 98203 | ||
graded | 98202 | ||
graded | 98183 |
Participant | Rating |
---|---|
spiglerg | 0 |
contrebande | 0 |
Participant | Rating |
---|
Learning to Smell
Question following the townhall meeting
Over 4 years agoDear @guillaumegodin,
I have a question regarding one of your statements in yesterdayβs townhall meeting for the learning to smell challenge.
You mentioned that rearranging the SMILES can improve accuracy on tasks. I have been trying to find out a way to use this, but have not yet been successful. I have found your contribution to RDKit for this, which works fine. But now I am stuck finding a way to use these additional SMILES. Any sort of fingerprint type embedding will be the same for all of the generated SMILES, so there is nu use in extra SMILES using fingerprint embeddings. I have tried multiple different ways to represent SMILES without using any embeddings, such as by char_to_int converting with zero padding and LSTMSβs, but none are able to predict above chance level. My background is not in chemistry, so I am likely missing something quite obvious here due to my lack of domain knowledge.
Could you please point us in a direction of a type of input representation that can make use of these newly generated SMILES?
Thank you in advance.
Best,
Cas van Boekholdt
Explained by the Community | 200 CHF Cash Prize X 5
Over 4 years agoHi everyone,
I wrote a Google Colab tutorial/explainer on how to use vectors created with the SMILESVec package to train a fully-connected neural network using Tensorflow Keras on the learning to smell dataset:
https://colab.research.google.com/drive/1cePlnWwWOsYxwqs8NWebVHFwRr624tNc?usp=sharing
Let me know if you have any suggestions or questions, always happy to help out!
Cheers,
Cas
Test labels
Over 4 years agoYou can evaluate your model either by making predictions on the test set and uploading them, or splitting the labeled training set into a training and validation set.
https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html
Notebooks
-
SMILESVec package to train a fully-connected neural network Explains how to use vectors created with the SMILESVec package to train a fully-connected neural network using Tensorflow KerascasvanboekholdtΒ· Over 4 years ago
Question following the townhall meeting
Over 4 years agoThank you for the response, @guillaumegodin.
I can see how the augmentation would work in practice. However, when I create a fingerprint embedding of e.g. Smiles 1 and Smiles1, aug1, they are the same. So then how does this replication add any value to the data? What kind of input representation preserves the difference between these augmented SMILES?
Best,
Cas