LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   awk pickle (https://www.linuxquestions.org/questions/linux-newbie-8/awk-pickle-4175736621/)

johnnybao 05-01-2024 04:19 PM

awk pickle
 
Hi I am trying to write a proper awk statement to only return hostname entries from a logfile from a week ago to present time.

Logfile format is like this:
27-04-2024_00:04 hostname1 EverythingElseAfterHere
28-04-2024_02:05 hostname2 EverythingElseAfterHere

I thought I could reformat the date to a single string and compare like so:

#!/bin/bash
# get the date from a week ago:
lastweek=$(date +"%Y-%m-%d" --date="1 week ago")
# run today (5/1/24), this returns:
20240424

Then I tried converting field $1 in my file via awk to a similar format:
awk 'n=split($1,a,"[-_]") {print a[3] a[2] a[1]}' mylogfile
# this also looks good, returning as an example:
20240427

Here is where I get stuck. I want to (if possible) use the value of n to compare with lastweek and see if the date (value) is greater:
awk -v lastweek="$lastweek" 'n=split($1,a,"[-_]") {print a[3] a[2] a[1]} n > lastweek {print $2}' mylogfile
# this just returns more dates like '20240427' but I want field 2 with the hostname

I don't even know if I am doing the compare correctly or if its even possible.
I am trying to push the output from the split/print subcommand into 'n' and then compare that timestamp as text to the lastweek text and if n is greater then output $2 (hostname). Its getting messy and I am getting confused now as I am not very familiar with awk.

Any help would be greatly appreciated.

Thanks!

boughtonp 05-01-2024 05:43 PM


 
Here's the fixed version of the method you're trying to do:
Code:

awk -v lastweek="$lastweek" 'split($1,a,"[_-]") && (a[3]"-"a[2]"-"a[1]) > lastweek {print $2}' input-file
The date command you used has hyphens, hence why we are re-inserting them here.

Otherwise, the return value of split is not needed, nor is print needed to concatenate, and we use && to make it a single condition/action item.

-

Alternatively, with GNU Awk, there are date functions available, so we can re-format the date into descending order, and use mktime to output a timestamp, e.g:
Code:

awk -F '[ _:-]' '{print mktime($3" "$2" "$1" "$4" "$5" 00")}' input-file
To set the date cut-off, there's two ways - either subtract the appropriate number of seconds from current time:
Code:

awk -vDaysAgo=4 'split($1,d,"[_:-]") && mktime(d[3]" "d[2]" "d[1]" "d[4]" "d[5]" 00")>(systime()-86400*DaysAgo) {print $2}' input-file
Or take advantage of a useful Gawk feature correcting out of range values:
Code:

awk -vDaysAgo=4 'split($1,d,"[_:-]") && mktime(d[3]" "d[2]" "d[1]+DaysAgo" "d[4]" "d[5]" 00")>systime() {print $2}' input-file
i.e. Adding 7 to 28 April results in "35 April" but gets corrected to "5 May"

If the hostnames are internal and can be guaranteed to not include hyphens or underscores, it can be simplified to:
Code:

awk -vDaysAgo=4 -F '[ _:-]' 'mktime($3" "$2" "$1+DaysAgo" "$4" "$5" 00")>systime() {print $6}' input-file
(Using 4 days ago here, because (at time of posting) that's the difference in the two rows of sample data.)


Turbocapitalist 05-02-2024 12:36 PM

Or using a slightly different date format output from the date utility will make the comparison easier:

Code:

awk -v lastweek="$(date -d 'last week' +'%Y%m%d')" \
    '{ n=split($1,a,"[-_]");
      date = a[3] a[2] a[1]; }
    date > lastweek { print; }' \
    mylogfile


johnnybao 05-02-2024 05:04 PM

Thank you @boughtonp - that worked perfectly!
awk -vDaysAgo=4 'split($1,d,"[_:-]") && mktime(d[3]" "d[2]" "d[1]" "d[4]" "d[5]" 00")>(systime()-86400*DaysAgo) {print $2}' input-file

@Turbocapitalist I just saw your response and will check it out also for the alternate formatting.

Thanks all!

astrogeek 05-02-2024 05:46 PM

Welcome to LQ johnnybao!

You have already attracted replies from two of the sharp pencils who share their knowledge here, so nothing to add! But I invite you to visit the Programming forum here at LQ where you may find others eager to offer help with any programming question when needed!

Again, welcome and good luck!

michaelk 05-02-2024 06:41 PM

I know you posted it works perfectly and I have not actually played with the code but it depends on what date/times you actually want to "extract". For 1 week ago does then mean based on today 2/5 (or 5/2) anything > 25/4 or >= 25/4? Is the log file in UTC (I would guess) or local time?

boughtonp's script works on seconds so that if you were running the script at say 0900 you would not necessarily see time stamps from 25/4 (again 1 week ago from today 2/5) < 0900.

On the other hand, Turbocapitalist's script should output anything > 25/4 (based on 2/5) regardless of time.

Assuming I am awake enough to follow everything...

MadeInGermany 05-03-2024 02:02 AM

split() returns the number of fields i.e. the number of resulting array elements.

A simple string concatenation is done as (a[3] a[2] a[1])
String concatenation in awk does not have an operator; for clarity I wrap it in parentheses.
An alternative is sprintf("%s%s%s", a[3], a[2], a[1])

boughtonp 05-03-2024 09:30 AM


 
Quote:

Originally Posted by michaelk (Post 6499546)
Is the log file in UTC (I would guess) or local time?

Two good points I meant to mention - I got distracted by wrestling with the idiotic LQ "security" filter not letting me post.

My view is that log files should be UTC (or include timezone), but that's definitely not guaranteed, so it might be necessary to add/remove hours as appropriate.


Quote:

boughtonp's script works on seconds so that if you were running the script at say 0900 you would not necessarily see time stamps from 25/4 (again 1 week ago from today 2/5) < 0900.
This was a deliberate choice to do it that way - again I meant to make it clear but forgot.

If one wanted they can set the hour and minute values to zero for midnight and have it work the other way. (Or indeed, some other fixed time of day if that makes sense for the use-case.)


michaelk 05-03-2024 09:44 AM

I had thought about setting the default time to midnight. There are a couple of odd cases where the OP might not get the exact desired data in either script. Depending on the data, the OP's timezone and when the script was set to run, the starting results could be either the day before or day after.

syg00 05-03-2024 05:15 PM

These are issues only the OP can determine - or more likely not give a damn about. "logs from a week ago" is sufficiently vague to not worry about IMHO. Plenty of good (awk) ideas already presented for the OP to work with.


All times are GMT -5. The time now is 09:19 AM.