A review of unusual datasets for your models
When starting out in the field of machine learning, data sets are typically used as MNIST, Iris, o 20 newsgroups, entre otros… Pero hay cientos de datasets raros e interesantes que se pueden encontrar online. En el Immune Technology Institute we asked our teachers to create a list of the strangest datasets they have encountered. Here we go!
Price of marijuana
This is a repository that contains a record of the marijuana prices over the years, los cuales varían bastante de un estado a otro. Pero la cuestión aquí es cómo se han obtenido los datos…
Although it may seem like a useless set of data, it can be very relevant in the times we live in, since it many countries are considering legalising marijuana.
What is the optimal size for a chopstick?
If you've never wondered, as usual, what the optimal size of chopsticks is, don't worry, someone has wondered before. A team of researchers evaluated the effects of chopstick length on the eating performance of adults and children. For this reason, they created this dataset to find the optimal length of the sticks.

They concluded that the process of poking food was significantly affected by the length of the chopsticks. The researchers suggested that families with children should provide chopsticks of 240 and 180 mm in length. Restaurants should provide toothpicks for 210 mm longto find a balance between ergonomics and price.
Images of rice grains
This dataset contains more than 3500 images of rice grains of two different species. Different properties were extracted from each rice grain, such as:
- The longest line that can be drawn on the grain of rice.
- The shortest line that can be drawn on the grain of rice.
- Or the perimeter of each grain.
Popular dog names in Sweden
Did you know that the most popular dog name in Sweden is Molly?
This dataset lists the most popular dog names in Sweden in 2018 by number of animals. Bella was the second most popular name, with almost six thousand animals, followed by Charlie with approximately 4600 animals.
Flags
I'm pretty sure Sheldon will love this one. data set… Este dataset contiene las banderas y detalles de varias países, como:
- The religion of each country.
- The predominant colour of the flag.
- If the flag contains a crescent moon or sun stars.
- Si contiene un águila, un árbol, …
It might be interesting to try to predict the religion of a country by its size and the colours of its flag.
Sometimes it is also interesting to see how people extract relationships in data where they are not visible to the naked eye. This website is an expert at finding correlations where no one else can find them, for example:
Cheese consumption vs. number of people who died from entanglement in their bed sheets
PhDs in mathematics vs. stored uranium in US nuclear power plants.
Total revenue generated by arcades vs. computer science Ph.
You can discover new correlations on this website. Share with we your results! ?
Who are we?
At Immune Technology Institute We try to apply and teach the most advanced technology in the field of computing. In addition, we love to share knowledge as we believe that is when it becomes powerful.
So if you want to learn how to develop real-world applications or handle large amounts of data, you may be interested in our Master of Data Science. It is a programme aimed at professionals who want to specialise in Data Science, learn the main techniques of data mining and analysis. Artificial Intelligence and how to apply them in different industries.
24 September we will have an online information session with the director of the master's degree, Monica Villas. IMMUNE can help you to boost your career through their partner companies y contacts with recruiters and industry professionals. You can register HERE.
Espera una cosa más – Datathon
Want to be a data scientist through and through? Sign up for the Datathon organised by Immune Technology Institute in cooperation with Spanish Startups on 19 September. It will be an online event featuring top data experts and a great challenge to test your knowledge. It has a prize! You can register HERE.
This article was written by: Alejandro Diaz Santos- (LinkedIn, GitHub) for IMMUNE Technology Institute.




