Skip to content

Malware University

Class is in Session

  • About
    • Privacy Policy
  • Contact
  • Resources

Tag: scraping

Manual Scraping

Posted on August 3, 2024 by admin

Sometimes it’s better to do things by hand. Google has some useful Dorks and may become difficult accessing enough results in an automated fashion. With a little JavaScript fu we can dig out of logistical messes with a human touch:

// Download from a Google search result to txt
// Get all links from the unique class (changes)
let links = document.querySelectorAll('.yuRUbf a');

// Extract the href attributes (URLs)
let urls = Array.from(links).map(link => link.href);

//
// Create a Blob from the URLs.
// This will help us create an easy-to-download text file for local consumption.
//
let blob = new Blob([urls.join('\n')], { type: 'text/plain' });

// Create a link element (to "download" later)
let link = document.createElement('a');

// Set the download attribute with a filename
link.download = 'urls.txt';

// Create a URL for the Blob and set it as the href attribute
link.href = window.URL.createObjectURL(blob);

// Append the link to the document body
document.body.appendChild(link);

// Programmatically click the link to trigger the download
link.click();

// Remove the link from the document
document.body.removeChild(link);

You can simply download and your browser should number them sequentially as you click along the results page numbers. From there, you can “cat * > somefile.txt” to aggregate results.

Happy link building, friends.

How to run: Open the Developer Tools for your browser (Ctrl + Shift + I). Click “console”. Copy/paste the code. Enjoy.

How this works: We use the results from manually searching Google to download a list of sites in the returned data series. A little JavaScript magic to create a downloadable element on-the-fly to quickly “scrape” data using Google’s coveted search capabilities.

Posted in UtilitiesTagged dorks, javascript, reconaissance, scraping

Recent Posts

  • Manual Scraping
  • Nitter Replacement
  • MFA Abuse in Splunk
  • Virtualbox Automation
  • Repository Poisoning

Recent Comments

    Archives

    • August 2024
    • July 2023
    • August 2022
    • March 2022
    • November 2021
    • October 2021
    • September 2021
    • August 2021
    • July 2021
    • June 2021
    • February 2021
    • December 2020
    • October 2020
    • September 2020
    • April 2020
    • March 2020
    • January 2020
    • July 2019
    • June 2019

    Categories

    • Campaign Analysis
    • Campaign Management
    • Code Analysis
    • Current Events
    • Malware Development
    • Techniques
    • Uncategorized
    • Utilities

    Meta

    • Log in
    • Entries feed
    • Comments feed
    • WordPress.org
    Proudly powered by WordPress | Theme: micro, developed by DevriX.