The Art of automation, creating your own Alfred


~ 12 minute read
Crafted : 3 months ago Updated : 3 months ago
Tags:
#infosec #golang #scraping #web-scraping #automation #coding #fun #alfred

Hello Luvs, it's been a month since I've updated this blog. I like to write something once a week, but it doesn't seem to be quite possible. I decided to continue my studies on computer science, been working on our startup, and busy living. Besides, The world isn't in its best shape, and we hear a big dose of sad news daily, which makes writing even harder. 

Table of Contents 

Introduction

Enough nagging, let's jump into it. If you ever watched/read batman, you should know about his friend Alfred Pennyworth, Alfred handle everything in the shadows so batman can be a shining star. Even though these characters are fictional, but the idea of a loyal, tireless friend is pretty much valid in the real world. 

 

Probably in the whole history of humanity, we have never been this close to create phenomenal tools. Modern technology can offer you something that was marked as impossible just a few years ago, no matter what your profession is. Modern technology is so advanced that we can use it to create an Alfred! In this post, we try to build an early version of Alfred, which helps us with a sample task of scraping and categorizing some useful data.

 

What is an ideal Alfred?

Now, as this blog is mostly about infosec, we are using an infosec case study for this post, but you can use the idea for whatever you are doing. 

Let's imagine my job is an offensive security engineer, which means I do penetration testing, red teaming, bug bounties, etc. What my ideal Alfred will do?

 

  • Execute at any time 
  • Find and update resources 
  • Learn from updated resources 
  • Test what he learned on clients and bounty programs 
  • Earn money using learned techniques 
  • Read my emails, respond with proper answers 
  • Use the earned money and invest in other businesses 
  • makes me a coffee in the morning and order groceries 
  • Setup interview times so I get credit for things he has done!  
  • The end of each year writes a book about the whole year. 
  • Setup my calender 
  • so on so forth 

 

Technically speaking, it's possible everything listed here using machine learning, scraping, consuming various API, IoT devices. So if you own a restaurant or work as tech support for hosting company, you can still have your version of Alfred. So you may ask if it's possible to automate everything literally why people don't do it?

 

1- Oh, Dear, it's easier said than done.

 

 

2- they are doing, and they are talking about it. Tesla, Yea, autonomous cars are an example of automation; they automate driving.  

3- They are doing it, and they don't talk about it. There are many examples I like you to think about it. Hint? Bots. 

 

The art of scraping 

 

Let's begin creating our own Alfred; we can't create fully functional Alfred in a single post; we can barely touch the surface. So remember our hypothetical case study job? It starts by executing at any time and gathering articles and resources in the related field. 

Okay, for our infosec guy, this is the chosen list. 

-Gather recent exploits, advisories, and write-ups; he needs to know about new vulnerabilities (sources: exploit-DB, GitHub advisory, HackerOne Hactivity, pentesterland write-ups)

- Gather news; He wants to know whats going on in the industry (sources: thehackernews.com, Reddit netsec, NewsAPI with filters on specific keywords. "0-day", "hacker", "data-breach" , "bug-bounty" ,"vulnerability" , "malware"

- Gather new jobs; well, he wants to know trending jobs (source: infosec jobs)

 

Okay, so how we want to gather these data? By using a well-known technique called web scraping.. 

Scraping is one of the base technique used by most automation software from SEO tools to fancy treat intelligence software. It's genuinely on art because there are unlimited ways to scrap a particular data. Working with multiple sources, you may have to use a different technique for each source. We are going to use Golang for our example, but you can use any programming language you like. I believe Python suites web scraping better than any other language. 

 

Let's start with something easy, pentesterland

let's say we want ten last write-up. 
 

...
c.OnHTML("#bug-bounty-writeups-published-in-2020", func(e *colly.HTMLElement) {



   e.DOM.Next().Find("a").Each(func(i int, selection *goquery.Selection) {



      link,_ := selection.Attr("href")

      title := selection.Text()



      if i >= 18{

         return

      }



      // filters and ten records

      if strings.Contains(selection.Text(),"@") || strings.Contains(link,"twitter"){

         return

      }



      pentesterLandArr[i] = []string{title,link}



      log.Println(title)

      log.Println(link)
...




Here what I've done is finding the ID "#bug-bounty-writeups-published-in-2020" and discover every link after it. Finally, make sure the link isn't a twitter link and filter a few first results. 

The same goes for infosec-jobs.com and thehackernews.com, github.com/advisory. 

 

//infosec-jobs

c.OnHTML("#job-list", func(e *colly.HTMLElement) {



   e.DOM.Find("a").Each(func(i int, s *goquery.Selection) {





      // only last 10 entries

      if i >= 10 {

         return

      }





      if s.Find("p").HasClass("job-list-item-company"){



         //fmt.Print(s.Find("p").First().Text() ) company



         link , _ := s.Attr("href")



         title := s.Find("p").Next().Text()



         infoSecJobsArr[i] = []string{title,link}



         //log.Print(link)



         //log.Println(title)

      }

 

// thehackernews 

c.OnHTML("#Blog1", func(e *colly.HTMLElement) {



   e.DOM.Find(".story-link").Each(func(i int, s *goquery.Selection) {



      // only last 10 entries

      if i >= 10 {

         return

      }





      link , _ := s.Attr("href")



      title := strings.TrimSpace(s.Find(".home-title").Text())





      //log.Println(title)



      hackerNewsArr[i] = []string{title,link}

      //log.Println(link)







// Github Advisory 

c.OnHTML(".Box", func(e *colly.HTMLElement) {



   e.DOM.Find(".Box-row").Each(func(i int, s *goquery.Selection) {



      // only last 10 entries

      if i >= 10 {

         return

      }







      link , _ := s.First().Find("a").Attr("href")



      title := strings.TrimSpace(s.First().Find("a").Text())





      //log.Println(title)



      githubAdvisoryArr[i] = []string{link,title}

      log.Println(title)



      //hackerNewsArr = append(arr["URL"], &hackerNewsArr )









   })

 

Just finding the ID and location of links and grab them. Now let's make the game more interesting. Let's say we want to scrap Reddit netsec items, and we can't use our classic HTML parsing technique; what else can we do? We can always try to find another endpoint which gives us the information we want. In this case, I found an endpoint I didn't know it exists.

it returns ten items in JSON from the subreddit I want. What else could I wish for? Nothing. Now there is an issue if you try to CURL discovered endpoint you will get {"message": "Too Many Requests", "error": 429}. Huh? When I refresh the endpoint in the browser, it still shows me the JSON. Can you guess what's going on here? A silly user-agent check. Let's bypass it.

....

client := &http.Client{}





// Didn't know such endpoint exists

req, err := http.NewRequest("GET", "https://www.reddit.com/r/netsec/.json?count=10", nil)

if err != nil {

   return nil, err

   log.Println(err)

}



// well we also need a bypass for reddit client check

req.Header.Set("User-Agent", "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36")



resp, err := client.Do(req)

if err != nil {



   return nil,err

   log.Println(err)

}

....

Let's move on exploit-DB; this one is interesting because it uses javascript to create the table dynamically. In this case, we can't scrap it using a classic scraping technique. We have to use either a headless browser with a JS engine or find an endpoint like the Reddit case. Here is my solution. We send a request to draw endpoint, and to ensure it returns JSON; we make a fool out of it using ("x-requested-with","XMLHttpRequest"), which spoof AJAX behavior. 

....

// url-encoded query is already filtered for 10 entities

dtQuery := "&columns%5B0%5D%5Bdata%5D=date_published&columns%5B0%5D%5Bname%5D=date_published&columns%5B0%5D%5Bsearchable%5D=true&columns%5B0%5D%5Borderable%5D=true&columns%5B0%5D%5Bsearch%5D%5Bvalue%5D=&columns%5B0%5D%5Bsearch%5D%5Bregex%5D=false&columns%5B1%5D%5Bdata%5D=download&columns%5B1%5D%5Bname%5D=download&columns%5B1%5D%5Bsearchable%5D=false&columns%5B1%5D%5Borderable%5D=false&columns%5B1%5D%5Bsearch%5D%5Bvalue%5D=&columns%5B1%5D%5Bsearch%5D%5Bregex%5D=false&columns%5B2%5D%5Bdata%5D=application_md5&columns%5B2%5D%5Bname%5D=application_md5&columns%5B2%5D%5Bsearchable%5D=true&columns%5B2%5D%5Borderable%5D=false&columns%5B2%5D%5Bsearch%5D%5Bvalue%5D=&columns%5B2%5D%5Bsearch%5D%5Bregex%5D=false&columns%5B3%5D%5Bdata%5D=verified&columns%5B3%5D%5Bname%5D=verified&columns%5B3%5D%5Bsearchable%5D=true&columns%5B3%5D%5Borderable%5D=false&columns%5B3%5D%5Bsearch%5D%5Bvalue%5D=&columns%5B3%5D%5Bsearch%5D%5Bregex%5D=false&columns%5B4%5D%5Bdata%5D=description&columns%5B4%5D%5Bname%5D=description&columns%5B4%5D%5Bsearchable%5D=true&columns%5B4%5D%5Borderable%5D=false&columns%5B4%5D%5Bsearch%5D%5Bvalue%5D=&columns%5B4%5D%5Bsearch%5D%5Bregex%5D=false&columns%5B5%5D%5Bdata%5D=type_id&columns%5B5%5D%5Bname%5D=type_id&columns%5B5%5D%5Bsearchable%5D=true&columns%5B5%5D%5Borderable%5D=false&columns%5B5%5D%5Bsearch%5D%5Bvalue%5D=&columns%5B5%5D%5Bsearch%5D%5Bregex%5D=false&columns%5B6%5D%5Bdata%5D=platform_id&columns%5B6%5D%5Bname%5D=platform_id&columns%5B6%5D%5Bsearchable%5D=true&columns%5B6%5D%5Borderable%5D=false&columns%5B6%5D%5Bsearch%5D%5Bvalue%5D=&columns%5B6%5D%5Bsearch%5D%5Bregex%5D=false&columns%5B7%5D%5Bdata%5D=author_id&columns%5B7%5D%5Bname%5D=author_id&columns%5B7%5D%5Bsearchable%5D=false&columns%5B7%5D%5Borderable%5D=false&columns%5B7%5D%5Bsearch%5D%5Bvalue%5D=&columns%5B7%5D%5Bsearch%5D%5Bregex%5D=false&columns%5B8%5D%5Bdata%5D=code&columns%5B8%5D%5Bname%5D=code.code&columns%5B8%5D%5Bsearchable%5D=true&columns%5B8%5D%5Borderable%5D=true&columns%5B8%5D%5Bsearch%5D%5Bvalue%5D=&columns%5B8%5D%5Bsearch%5D%5Bregex%5D=false&columns%5B9%5D%5Bdata%5D=id&columns%5B9%5D%5Bname%5D=id&columns%5B9%5D%5Bsearchable%5D=false&columns%5B9%5D%5Borderable%5D=true&columns%5B9%5D%5Bsearch%5D%5Bvalue%5D=&columns%5B9%5D%5Bsearch%5D%5Bregex%5D=false&order%5B0%5D%5Bcolumn%5D=9&order%5B0%5D%5Bdir%5D=desc&start=0&length=10"



client := &http.Client{}





// dataTables scraping technique

requestUrl := "https://www.exploit-db.com/?draw=1" + dtQuery



req, err := http.NewRequest("GET", requestUrl, nil)

if err != nil {

   log.Println(err)

}

// well , let's make a fool out of it

req.Header.Add("x-requested-with","XMLHttpRequest")



resp , _ := client.Do(req)

....

 

The same goes for HackerOne. We can't extract data classically because it will respond differently to a requester without a JS engine. So again, we can use a headless browser or another endpoint. This time let's use HackerOne GraphQL endpoint to achieve our goal. 

...



   client := graphql.NewClient("https://hackerone.com/graphql")



   // make a request

   req := graphql.NewRequest(`

      query HacktivityPageQuery($querystring: String, $orderBy: HacktivityItemOrderInput, $secureOrderBy: FiltersHacktivityItemFilterOrder, $where: FiltersHacktivityItemFilterInput, $maxShownVoters: Int) {

  me {

    id

    __typename

  }

  hacktivity_items(last: 25, after: "MjU", query: $querystring, order_by: $orderBy, secure_order_by: $secureOrderBy, where: $where) {

    total_count

    ...HacktivityList

    __typename

  }

}

fragment HacktivityList on HacktivityItemConnection {

  total_count

  pageInfo {

    endCursor

    hasNextPage

    __typename

  }

  edges {

    node {

      ... on HacktivityItemInterface {

        id

        databaseId: _id

        ...HacktivityItem

        __typename

      }

      __typename

    }

    __typename

  }

  __typename

}

fragment HacktivityItem on HacktivityItemUnion {

  type: __typename

  ... on HacktivityItemInterface {

    id

    votes {

      total_count

      __typename

    }

    voters: votes(last: $maxShownVoters) {

      edges {

        node {

          id

          user {

            id

            username

            __typename

          }

          __typename

        }

        __typename

      }

      __typename

    }

    upvoted: upvoted_by_current_user

    __typename

  }

  ... on Undisclosed {

    id

    ...HacktivityItemUndisclosed

    __typename

  }

  ... on Disclosed {

    id

    ...HacktivityItemDisclosed

    __typename

  }

  ... on HackerPublished {

    id

    ...HacktivityItemHackerPublished

    __typename

  }

}

fragment HacktivityItemUndisclosed on Undisclosed {

  id

  reporter {

    id

    username

    ...UserLinkWithMiniProfile

    __typename

  }

  team {

    handle

    name

    medium_profile_picture: profile_picture(size: medium)

    url

    id

    ...TeamLinkWithMiniProfile

    __typename

  }

  latest_disclosable_action

  latest_disclosable_activity_at

  requires_view_privilege

  total_awarded_amount

  currency

  __typename

}

fragment TeamLinkWithMiniProfile on Team {

  id

  handle

  name

  __typename

}

fragment UserLinkWithMiniProfile on User {

  id

  username

  __typename

}

fragment HacktivityItemDisclosed on Disclosed {

  id

  reporter {

    id

    username

    ...UserLinkWithMiniProfile

    __typename

  }

  team {

    handle

    name

    medium_profile_picture: profile_picture(size: medium)

    url

    id

    ...TeamLinkWithMiniProfile

    __typename

  }

  report {

    id

    title

    substate

    url

    __typename

  }

  latest_disclosable_action

  latest_disclosable_activity_at

  total_awarded_amount

  severity_rating

  currency

  __typename

}

fragment HacktivityItemHackerPublished on HackerPublished {

  id

  reporter {

    id

    username

    ...UserLinkWithMiniProfile

    __typename

  }

  team {

    id

    handle

    name

    medium_profile_picture: profile_picture(size: medium)

    url

    ...TeamLinkWithMiniProfile

    __typename

  }

  report {

    id

    url

    title

    substate

    __typename

  }

  latest_disclosable_activity_at

  severity_rating

  __typename

}

   `)

...



 

Finally, we can always use the actual API if it's available. It's far more robust than scraping. Let's do that for our NewsApi.org API. 

// selected stuff

keywords := []string{"0-day", "hacker", "data-breach" , "bug-bounty" ,"vulnerability" , "malware"}



newsAPIArr := make(map[int][]string)



counter := 0



var newsApi NewsApiResp

for i := 0; i<len(keywords);i++{

   query := fmt.Sprintf("?qInTitle=%s&pagesize=%d&sortBy=publishedAt&language=en&apiKey=%s" , keywords[i] , pageSize  ,key )



   resp , err := http.Get(endPoint+query)



   if err != nil {

      return nil,err

      log.Println(err)

   }

   decoder := json.NewDecoder(resp.Body)

   err = decoder.Decode(&newsApi)



   if err != nil {



      return nil,err



      log.Println(err)

   }

 

Now we just created our scraper code barely. We need to store it nicely together. Something like this will do :

 

func WriteNewsAPIToDB(newsArr map[int][]string,entity Entity , db *gorm.DB) (int,error) {



   totalFound := 0



   for _,key := range newsArr{



      entity.Title = key[0]

      entity.URL = key[1]

      entity.Source = "NewsAPI"



      if err := db.Create(&entity).Error; err !=nil {

         log.Println(err)

      }else {

         totalFound++

      }

      entity.ID++

   }

   return totalFound,nil

}









Let's wrap everything with a simple UI for one-click usage. We use fyne.io here for our UI, which is pretty awesome but too young for production, 

but our Alfred at this stage is for personal usage, so we are doing fine. May you think a GUI seriously? hey ! this one is my Alfred, so my rules! :D and what I'm trying to say is how limitless possibilities you have otherwise a GUI like this might not be that useful. Yours can be command line, with a web UI, or even come with hardware. 

...

myApp := app.New()

myWindow := myApp.NewWindow("InfoSec Alfred")



// fun

greet := canvas.NewText("Hello master "+master.Name, color.RGBA{

   R: 189,

   G: 147,

   B: 249,

   A: 0,

})

centered := fyne.NewContainerWithLayout(layout.NewHBoxLayout(),

   layout.NewSpacer(), greet, layout.NewSpacer())



image := canvas.NewImageFromResource(resourceAlfredLgPng) // NewImageFromFile( "./assets/alfred-lg.png")



myWindow.Resize(fyne.NewSize(300, 300))



image.FillMode = canvas.ImageFillOriginal



progress := widget.NewProgressBar()

progress.SetValue(0)



status := widget.NewLabel("Idle...")



statusContainer := fyne.NewContainerWithLayout(layout.NewHBoxLayout(),

   layout.NewSpacer(), status, layout.NewSpacer())



...

Demo Time.  

 

I know coding parts might remind you of this photo: 

 

but that's not the case here. source-code is available here

Keep in mind if you are scraping some data which aren't supposed to, you may get yourself in legal trouble, so be careful. 

 

Conclusion 

 

We barely touch the surface on both web scraping and automation.

The main idea of this post is to make you believe more in automation. Keep in mind, don't overdo it. Automation is useful only if it makes sense to automate something. If you have a brilliant idea for automation,

it's your turn to create your own Alfred. Your Alfred can perform different tasks; it's all about what you need and what helps you. Don't worry if you can't code if your plan is compelling enough you can probably find a software for it or outsource it. You can also always contact me if you need help building your stuff. 

 

If you love the idea of having an Alfred and you work in InfoSec, make sure checkout HunterSuite which is our sophisticated Alfred for offensive-security related tasks. That's it for this post. Don't forget, repo is here.

I also need to thank all of my lovely readers. I started this blog just because I felt an urge to talk and share now it has thousands of readers. Even though you barely speak to me directly (which you can !), I still appreciate your presence; it gives me the courage to write more. 

 

0xSha

Assist me:
Buy Me a Coffee at ko-fi.com