devRant - A fun community for developers to connect over code, tech & life as a programmer

Search - "google crawler"

10

K-ASS

2540

5y

Lab needs a crawler to download some assets, none of my business though

But why not

Haven't touched crawler for two years

Google for latest state of art

Found scrapy

I have to define a class for a crawling script?

Got scared

Went back to beautifulsoup and request

Got the job done in 20 mins

Fuck yeah

rant

6
9

stackodev

13653

6y

So I go into Google Search Console to try to determine why its saying pages are not compliant with mobile and whatnot.

3 hours later I come out realizing that what Google REALLY wants is for everyone to build every web page as static HTML with no script tags and never a call to an external website. Just dump all that javascript and css and HTML into one BIG FRIGGIN FILE so our crawler feels satisfied that it's loading everything all in one request.

No CMSes allowed unless GOOGLE built it.

Let's just all revert to HTML 1.0 and be done with it.

rant madness google search console google

1
7

Scipio

692

6y

Question about google crawler (?)

So, got a question, hopefully you have an answer.

I have a personal website that went up about 2 months ago that has a contact form.

Today I got two emails sent to me. This is the way I have coded it up.

But take a look at the name and message fields. I wonder if this is a google crawler submitting the form by any chance. I also got another email around the same time where the message and name field are reversed.

Anyone else experience this?

question google crawler

9
6

HampusMa

3116

7y

Would be fun if you could connect to a Google crawler bot. But you cant :(

random google google bot web crawler

1
5

arcsector

2370

5y

Just posted this in another thread, but i think you'll all like it too:

I once had a dev who was allowing his site elements to be embedded everywhere in the world (intentional) and it was vulnerable to clickjacking (not intentional). I told him to restrict frame origin and then implement a whitelist.

My man comes back a month later with this issue of someone in google sites not being able to embed the element. GOOGLE FUCKING SITES!!!!! I didnt even know that shit existed! So natually i go through all the extremely in depth and nuanced explanations first: we start looking at web traffic logs and find out that its not the google site name thats trying to access the element, but one of google's web crawler-type things. Whatever. Whitelist that url. Nothing.

Another weird thing was the way that google sites referenced the iframe was a copy of it stored in a google subsite???? Something like "googleusercontent.com" instead of the actual site we were referencing. Whatever. Whitelisted it. Nothing.

We even looked at other solutions like opening the whitelist completely for a span of time to test to see if we could get it to work without the whitelist, as the dev was convinced that the whitelist was the issue. It STILL didnt work!

Because of this development i got more frustrated because this wasnt tested beforehand, and finally asked the question: do other web template sites have this issue like squarespace or wix?

Nope. Just google sites.

We concluded its not an issue with the whitelist, but merely an issue with either google sites or the way the webapp is designed, but considering it works on LITERALLY ANYTHING ELSE i am unsure that the latter is the answer.

rant google sites xframeorigin website whitelist sameorigin iframe

2
2

HAIM

33

9y

Does Facebook has a new logo?!

undefined facebook google google crawler

2
2

kanishk

4740

4y

When I browsed for a Food Recipes (Especially Indian Food) Dataset, I could not find one (that I could use) online. So, I decided to create one.

The dataset can be found here: https://lnkd.in/djdh9nX
It contains following fields (self-explanatory) - ['RecipeName', 'TranslatedRecipeName', 'Ingredients', 'TranslatedIngredients', 'Prep', 'Cook', 'Total', 'Servings', 'Cuisine', 'Course', 'Diet', 'Instructions', 'TranslatedInstructions']. The datset contains a csv and a xls file. Sometimes, the content in Hindi is not visible in the csv format.

You might be wondering what the columns with the prefix 'Translated' are. So, a lot of entries in the dataset were in Hindi language. To take care of such entries and translating them to English for consistency, I went ahead and used 'googletrans'. It is a python library that implements Google Translate API underneath.

The code for the crawler, cleaning and transformation is on Github (Repo:https://lnkd.in/dYp3sBc) (@kanishk307).

The dataset has been created using Archana's Kitchen Website (https://lnkd.in/d_bCPWV). It is a great website and hosts a ton of useful content. You should definitely consider viewing it if you are interested.
#python #dataAnalytics #Crawler #Scraper #dataCleaning #dataTransformation

random project datasetcreation dataset
1

KineJacob

1

5y

Does Google Crawl every website and pages on the internet?

I want to know does google crawler will visit my pages or they only visit few, like I have more than 10 Million pages so how much time it will take approx to index complete pages.

question

4

Top Tags

rant linux code windows fuck i java c programming android dev the is javascript js a life joke python

Weekly Rant

View

Most unrealistic deadline you've had?