Deep Learning is Easy – Learn Something Harder

These days I come across many people who want to get into machine learning/AI, particularly deep learning. Some are asking me what the best way is to get started and learn. Clearly, at the speed things are evolving, there seems to be no time for a PhD. Universities are sometimes a bit behind the curve on applications, technology and infrastructure, so is a masters worth doing? A couple companies now offer residency programmes, extended internships, which supposedly allow you to kickstart a successful career in machine learning without a PhD. What your best option is depends largely on your circumstances, but also on what you want to achieve.

Source: Deep Learning is Easy – Learn Something Harder

 

Scrapy Tips from the Pros: Part 1 | The Scrapinghub Blog

Use Extruct to Extract Microdata from Websites

I am sure each and every developer of web crawlers has had a reason to curse web developers who use messy layouts for their websites. Websites with no semantic markups and especially those based on HTML tables are the absolute worst. These types of websites make scraping much harder because there are little-to-no clues about what each element means. Sometimes you even have to trust that the order of the elements on each page will remain the same to grab the data you need.

Which is why we are so grateful for Schema.org, a collaborative effort to bring semantic markup to the web. This project provides web developers with schemas to represent a range of different objects in their websites, including Person, Product, and Review, using any metadata format like Microdata, RDFa, JSON-LD, etc. It makes the job of search engines easier because they can extract useful information from websites without having to dig into the HTML structure of all the websites they crawl.

For example, AggregateRating is a schema used by online retailers to represent user ratings for their products. Here’s the markup that describes user ratings for a product in an online store using theMicrodata format:

 

Source: Scrapy Tips from the Pros: Part 1 | The Scrapinghub Blog