LinuxQuestions.org
Help answer threads with 0 replies.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 02-12-2022, 06:23 PM   #1
Varister
LQ Newbie
 
Registered: Oct 2021
Posts: 7

Rep: Reputation: Disabled
How can I fix Unicode error in my program?


I am trying to use download.pcap file with this program, but when I run it, I keep getting a "UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd4 in position 0: invalid continuation byte" error. I am not sure how to fix this.

Code:
import dpkt
import optparse
import socket
THRESH = 1000

def findDownload(pcap):
    for (ts, buf) in pcap:
        try:
            eth = dpkt.ethernet.Ethernet(buf)
            ip = eth.data
            src = socket.inet_ntoa(ip.src)
            tcp = ip.data
            http = dpkt.http.Request(tcp.data)
            if http.method == 'GET':
                uri = http.uri.lower()
                if '.zip' in uri and 'loic' in uri:
                    print ('[!] ' + src + ' Downloaded LOIC.')
        except:
            pass

def findHivemind(pcap):
    for (ts, buf) in pcap:
        try:
            eth = dpkt.ethernet.Ethernet(buf)
            ip = eth.data
            src = socket.inet_ntoa(ip.src)
            dst = socket.inet_ntoa(ip.dst)
            tcp = ip.data
            dport = tcp.dport
            sport = tcp.sport
            if dport == 6667:
                if '!lazor' in tcp.data.lower():
                    print ('[!] DDoS Hivemind issued by: '+src)
                    print ('[+] Target CMD: ' + tcp.data)
            if sport == 6667:
                if '!lazor' in tcp.data.lower():
                    print ('[!] DDoS Hivemind issued to: '+src)
                    print ('[+] Target CMD: ' + tcp.data)
        except:
            pass

def findAttack(pcap):
    pktCount = {}
    for (ts, buf) in pcap:
        try:
            eth = dpkt.ethernet.Ethernet(buf)
            ip = eth.data
            src = socket.inet_ntoa(ip.src)
            dst = socket.inet_ntoa(ip.dst)
            tcp = ip.data
            dport = tcp.dport
            if dport == 80:
                stream = src + ':' + dst
                if pktCount.has_key(stream):
                    pktCount[stream] = pktCount[stream] + 1
                else:
                    pktCount[stream] = 1
        except:
            pass

    for stream in pktCount:
        pktsSent = pktCount[stream]
        if pktsSent > THRESH:
            src = stream.split(':')[0]
            dst = stream.split(':')[1]
            print ("[+] "+src+" attacked "+dst+" with " \
                + str(pktsSent) + " pkts.")

def main():
    parser = optparse.OptionParser("usage %prog '+\
      '-p <pcap file> -t <thresh>"
                              )
    parser.add_option("-p", dest='pcapFile', type="string",\
      help='specify pcap filename')
    parser.add_option("-t", dest="thresh", type="int",\
      help="specify threshold count ")

    (options, args) = parser.parse_args()
    if options.pcapFile == None:
        print (parser.usage)
        exit(0)
    if options.thresh != None:
        THRESH = options.thresh
    pcapFile = options.pcapFile
    f = open(pcapFile)
    pcap = dpkt.pcap.Reader(f)
    with open(pcapFile, 'rb') as f:
        pcap = dpkt.pcap.Reader(f)
        findDownload(pcap)
    with open(pcapFile, 'rb') as f:
        pcap = dpkt.pcap.Reader(f)
        findHivemind(pcap)
    with open(pcapFile, 'rb') as f:
        pcap = dpkt.pcap.Reader(f)
        findAttack(pcap)

if __name__ == "__main__":
   main()
 
Old 02-13-2022, 03:52 AM   #2
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 22,041

Rep: Reputation: 7348Reputation: 7348Reputation: 7348Reputation: 7348Reputation: 7348Reputation: 7348Reputation: 7348Reputation: 7348Reputation: 7348Reputation: 7348Reputation: 7348
without knowing that pcap file hard to say anything, probably the message is correct, the file is corrupted. Also would be nice to post the full error message, not only one line.
From the other hand you can use open like this (for example):
Code:
with open(filename, encoding="something") as datafile:
    # work on datafile here
 
1 members found this post helpful.
Old 02-13-2022, 08:36 AM   #3
NevemTeve
Senior Member
 
Registered: Oct 2011
Location: Budapest
Distribution: Debian/GNU/Linux, AIX
Posts: 4,880
Blog Entries: 1

Rep: Reputation: 1871Reputation: 1871Reputation: 1871Reputation: 1871Reputation: 1871Reputation: 1871Reputation: 1871Reputation: 1871Reputation: 1871Reputation: 1871Reputation: 1871
I suppose your file is a binary log of network traffic; it might contain any bytes, there is no point assuming any part of it is in utf8.
 
2 members found this post helpful.
Old 02-13-2022, 11:03 AM   #4
dugan
LQ Guru
 
Registered: Nov 2003
Location: Canada
Distribution: distro hopper
Posts: 11,249

Rep: Reputation: 5323Reputation: 5323Reputation: 5323Reputation: 5323Reputation: 5323Reputation: 5323Reputation: 5323Reputation: 5323Reputation: 5323Reputation: 5323Reputation: 5323
Generally speaking, you can sometimes deal with that by calling .decode('utf-8') or .encode('utf-8) on the strings. This works if the data is actually UTF-8.

If NevemTeve is correct (and he is), then the usual way to deal with it is to hex dump the data. Not to print it.

I don't know if you're familiar with hex dumps, but there's a Wikipedia article about them:

https://en.wikipedia.org/wiki/Hex_dump

Do be aware that the usual way to determine whether a file is binary (as opposed to text) is to check if it has null bytes.

Last edited by dugan; 02-13-2022 at 11:12 AM.
 
1 members found this post helpful.
Old 02-14-2022, 10:07 AM   #5
sundialsvcs
LQ Guru
 
Registered: Feb 2004
Location: SE Tennessee, USA
Distribution: Gentoo, LFS
Posts: 10,691
Blog Entries: 4

Rep: Reputation: 3947Reputation: 3947Reputation: 3947Reputation: 3947Reputation: 3947Reputation: 3947Reputation: 3947Reputation: 3947Reputation: 3947Reputation: 3947Reputation: 3947
There is, in fact, a hexdump utility which can, among other things, print the data in rows of hexadecimal bytes with the corresponding characters (if printable ASCII) beside them.

A data stream is certainly not UTF-encoded. Any algorithm which is told that it is, will try to decode it and probably fail ... although it may seem to succeed some of the time.
 
  


Reply

Tags
python



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] grub-mkrescue: error: cannot open `/usr/share/grub/unicode.pf2': No such file or directory. igadoter Slackware 2 10-23-2021 06:00 PM
Cannot fix broken package using "sudo apt --fix-broken install" to fix XFCE packages for Zorin OS 16 Core AltFantasy Zorin OS 8 09-25-2021 06:51 AM
[SOLVED] Compare unicode characters in Python program alsaf Programming 2 07-17-2013 03:42 PM
[SOLVED] Problem displaying Unicode special characters in Urxvt/rxvt-unicode terminal shahinism Slackware 4 10-22-2012 03:08 PM
all attempts to fix the problem failed... can someone help me fix partition space? foreverdita Linux - Enterprise 2 05-11-2005 09:02 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 06:10 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration