<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <p>Hi,</p>
    <p>I assume you're referring to the daily dumps that we release
      here:<br>
      <a class="moz-txt-link-freetext" href="https://data-store.ripe.net/datasets/atlas-daily-dumps/">https://data-store.ripe.net/datasets/atlas-daily-dumps/</a><br>
    </p>
    <p>There are a couple of things that I find are relatively slow to
      deal with on the command line: standard bzip2 tooling, and jq for
      json parsing. So I lean on a couple of other tools to speed things
      up for me:</p>
    <p>- the lbzip2 suite parallelises parts of the compress/decompress
      pipeline<br>
      - GNU parallel can split data in a pipe onto one process per core<br>
    </p>
    <p>So, for example, on my laptop I can reasonably quickly pull out
      all of the traceroutes my own probe ran:<br>
      lbzcat traceroute-2018-07-23T0700.bz2 | parallel -q --pipe jq '. |
      select(.prb_id == 14277)'<br>
      <br>
      Stéphane has written about using jq to parse Atlas results on
      labs.ripe.net also:
<a class="moz-txt-link-freetext" href="https://labs.ripe.net/Members/stephane_bortzmeyer/processing-ripe-atlas-results-with-jq">https://labs.ripe.net/Members/stephane_bortzmeyer/processing-ripe-atlas-results-with-jq</a></p>
    <p>Happy to hear from others what tools they use for data
      processing!</p>
    <p>Cheers,</p>
    <p>S.</p>
    <p><br>
    </p>
    <br>
    <div class="moz-cite-prefix">On 21/07/2018 19:09, BELLAFKIH hayat
      wrote:<br>
    </div>
    <blockquote type="cite"
cite="mid:CAL6eEe2q78O9LQWUoHOvvKMZ3L=X23Nqqg5bO8jseZMhmjP37g@mail.gmail.com">
      <meta http-equiv="content-type" content="text/html; charset=utf-8">
      <div dir="ltr">
        <div><font size="2"><span
              style="font-family:arial,helvetica,sans-serif">Dear RIPE
              Atlas users,</span></font></div>
        <div><font size="2"><span
              style="font-family:arial,helvetica,sans-serif"><br>
            </span></font></div>
        <font size="2"><span
            style="font-family:arial,helvetica,sans-serif">I am studying
            the processing of the data collected by the probes as a Big
            Data problem. For instance, one hour of <span
              class="gmail-un">traceroute</span> data count for 500 Mo
            (bzip2), so 7 Go of data in text format. Can you share with
            me how you deal with these data in practice.<br>
          </span></font>
        <div><font size="2"><span
              style="font-family:arial,helvetica,sans-serif">are you
              using a super machine, Big Data tools?</span></font></div>
        <div><font size="2"><span
              style="font-family:arial,helvetica,sans-serif"><br>
            </span></font></div>
        <div>
          <div style="color:rgb(0,0,0)" class="gmail_default"><font
              size="2"><span
                style="font-family:arial,helvetica,sans-serif">​best
                regards,</span></font></div>
          <div style="color:rgb(0,0,0)" class="gmail_default"><font
              size="2"><span
                style="font-family:arial,helvetica,sans-serif">Hayat​</span></font></div>
          <br>
        </div>
        <div class="gmail_default" style="font-family:times new
          roman,serif;font-size:large;color:#000000"><br>
        </div>
      </div>
    </blockquote>
    <br>
  </body>
</html>