Join devRant
Do all the things like
++ or -- rants, post your own rants, comment on others' rants and build your customized dev avatar
Sign Up
Pipeless API
From the creators of devRant, Pipeless lets you power real-time personalized recommendations and activity feeds using a simple API
Learn More
Search - "google crawler"
-
Lab needs a crawler to download some assets, none of my business though
But why not
Haven't touched crawler for two years
Google for latest state of art
Found scrapy
I have to define a class for a crawling script?
Got scared
Went back to beautifulsoup and request
Got the job done in 20 mins
Fuck yeah6 -
So I go into Google Search Console to try to determine why its saying pages are not compliant with mobile and whatnot.
3 hours later I come out realizing that what Google REALLY wants is for everyone to build every web page as static HTML with no script tags and never a call to an external website. Just dump all that javascript and css and HTML into one BIG FRIGGIN FILE so our crawler feels satisfied that it's loading everything all in one request.
No CMSes allowed unless GOOGLE built it.
Let's just all revert to HTML 1.0 and be done with it.1 -
Question about google crawler (?)
So, got a question, hopefully you have an answer.
I have a personal website that went up about 2 months ago that has a contact form.
Today I got two emails sent to me. This is the way I have coded it up.
But take a look at the name and message fields. I wonder if this is a google crawler submitting the form by any chance. I also got another email around the same time where the message and name field are reversed.
Anyone else experience this?9 -
Just posted this in another thread, but i think you'll all like it too:
I once had a dev who was allowing his site elements to be embedded everywhere in the world (intentional) and it was vulnerable to clickjacking (not intentional). I told him to restrict frame origin and then implement a whitelist.
My man comes back a month later with this issue of someone in google sites not being able to embed the element. GOOGLE FUCKING SITES!!!!! I didnt even know that shit existed! So natually i go through all the extremely in depth and nuanced explanations first: we start looking at web traffic logs and find out that its not the google site name thats trying to access the element, but one of google's web crawler-type things. Whatever. Whitelist that url. Nothing.
Another weird thing was the way that google sites referenced the iframe was a copy of it stored in a google subsite???? Something like "googleusercontent.com" instead of the actual site we were referencing. Whatever. Whitelisted it. Nothing.
We even looked at other solutions like opening the whitelist completely for a span of time to test to see if we could get it to work without the whitelist, as the dev was convinced that the whitelist was the issue. It STILL didnt work!
Because of this development i got more frustrated because this wasnt tested beforehand, and finally asked the question: do other web template sites have this issue like squarespace or wix?
Nope. Just google sites.
We concluded its not an issue with the whitelist, but merely an issue with either google sites or the way the webapp is designed, but considering it works on LITERALLY ANYTHING ELSE i am unsure that the latter is the answer.2 -
When I browsed for a Food Recipes (Especially Indian Food) Dataset, I could not find one (that I could use) online. So, I decided to create one.
The dataset can be found here: https://lnkd.in/djdh9nX
It contains following fields (self-explanatory) - ['RecipeName', 'TranslatedRecipeName', 'Ingredients', 'TranslatedIngredients', 'Prep', 'Cook', 'Total', 'Servings', 'Cuisine', 'Course', 'Diet', 'Instructions', 'TranslatedInstructions']. The datset contains a csv and a xls file. Sometimes, the content in Hindi is not visible in the csv format.
You might be wondering what the columns with the prefix 'Translated' are. So, a lot of entries in the dataset were in Hindi language. To take care of such entries and translating them to English for consistency, I went ahead and used 'googletrans'. It is a python library that implements Google Translate API underneath.
The code for the crawler, cleaning and transformation is on Github (Repo:https://lnkd.in/dYp3sBc) (@kanishk307).
The dataset has been created using Archana's Kitchen Website (https://lnkd.in/d_bCPWV). It is a great website and hosts a ton of useful content. You should definitely consider viewing it if you are interested.
#python #dataAnalytics #Crawler #Scraper #dataCleaning #dataTransformation -
Does Google Crawl every website and pages on the internet?
I want to know does google crawler will visit my pages or they only visit few, like I have more than 10 Million pages so how much time it will take approx to index complete pages.4