You really ought to subscribe :)
— Mary Shelley
Read more… (30 min remaining to read)
Read more… (30 min remaining to read)
Since we all are working on our laptops or computers, a little bit of data I want to share on my blog which I read in my course. Over time systems have become more demanding for memory resources and at the same time the RAM prices have decreased and the performance have been improved. So controlling memory sub-systems can be a complicated process, one must take a note of the memory usage and the I/O throughput are intrinsically related as in most cases the memory is being used to cache the contents of the files on disk. Thus, changing memory parameters can have a large effect on I/O performance and changing I/O parameters can have a converse effect on virtual memory.
When tweaking parameters in /proc/sys/vm, the usual best practice is to adjust one thing at a time and look for effects first. The primary tasks are:
Moreover, it is often the case that bottlenecks in overall system performance and throughput are memory-related; the CPUs and the I/O subsystem can be waiting for data to be retrieved from or written to memory.
The simplest tool to use is free
$ free -m
total used free shared buff/cache available
Mem: 3665 2178 459 292 1027 954
Swap: 26107 281 25826
The /proc/sys/vm directory contains many tunable knobs to control the Virtual Memory system. Exactly what appears in this directory will depend somewhat on the kernel version. Almost all of the entries are writable (by root).
Let us know about the vmstat tool. Basically, vmstat
is a multi-purpose tool that displays information about memory, paging, I/O, processor activity and processes. It has many options, the general form of the command is:
$ vmstat [options] [delay] [count]
If the delay is not given in seconds then the report is repeated at that interval count of times; if the count is not given then it will keep reporting statistics forever until it is killed by a signal CTRL+C
.
vmstat 2 4
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 0 805696 511216 67312 1013556 6 37 154 154 263 597 29 10 59 2 0
0 0 805696 506856 67328 1018736 0 0 0 54 632 592 2 1 95 2 0
1 0 805696 517460 67332 1007956 0 0 0 254 380 413 1 1 98 0 0
0 0 805696 516044 67332 1006092 0 0 0 0 532 654 2 1 97 0 0
I tried the vmstat command and this is the result.
Procs
r: The number of processes waiting for run time.
b: The number of processes in uninterruptible sleep.
Memory
swpd: the amount of virtual memory used.
free: the amount of idle memory.
buff: the amount of memory used as buffers.
cache: the amount of memory used as cache.
inact: the amount of inactive memory. (-a option)
active: the amount of active memory. (-a option)
Swap
si: Amount of memory swapped in from disk (/s).
so: Amount of memory swapped to disk (/s).
IO
bi: Blocks received from a block device (blocks/s).
bo: Blocks sent to a block device (blocks/s).
System
in: The number of interrupts per second, including the clock.
cs: The number of context switches per second.
CPU
These are percentages of total CPU time.
us: Time spent running non-kernel code. (user time, including nice time)
sy: Time spent running kernel code. (system time)
id: Time spent idle. Prior to Linux 2.5.41, this includes IO-wait time.
wa: Time spent waiting for IO. Prior to Linux 2.5.41, included in idle.
st: Time stolen from a virtual machine. Prior to Linux 2.6.11, unknown.
With the -a
option, vmstat displays information about active and inactive memory, where active memory pages are those which have been recently used; they may be clean (disk contents are up to date) or dirty (need to be flushed to disk eventually). By contrast, inactive memory pages have not been recently used and are more likely to be clean and are released sooner under memory pressure.
To get a table of disk statistics use -d:
$ vmstat -d
disk- ------------reads------------ ------------writes----------- -----IO------
total merged sectors ms total merged sectors ms cur sec
sda 144661 47639 8344492 3668697 144191 140734 8400849 4636600 0 1869
dm-0 149991 0 6931698 3907196 108628 0 4206722 3434151 0 890
dm-1 144 0 6512 3917 0 0 0 0 0 0
zram0 39474 0 315792 248 252095 0 2016760 2560 0 9
dm-2 40086 0 1351274 3145931 174672 0 4166338 3953480 0 1316
This post was first sent to my newsletter on November 16th, 2020.
You really ought to subscribe :)
Read more… (4 min remaining to read)
Content Security Policy (or CSP) is a way of avoiding certain types of website-related attacks like cross-site scripting and malicious data injections. It is a way by which website developers can tell the browser what content origins are approved so that everything else is blocked. One needs to add a Content-Security-Policy
HTTP header mentioning the sources which they allow for loading scripts, styles, images, etc.
To read in detail about CSP, check Content Security Policy Level 3 working draft.
We are going to discuss here why sha256 hashes often don't let inline styles to not pass in chromium browsers. Chromium browser console complains about the style-src hashes mismatch even though it shows them to be the same. Why? And how to solve it?
TL;DR: If using <style>
, use style-src
. If using style=""
attribute in HTML tag, use style-src-attr
Now, if you are interested in more information, let's dive a little deeper into what's going on.
The usual practice of having a tight, secure CSP is to not allow any inline style or inline scripts. This helps mitigate malicious scripts entered via data injection from getting executed.
When I say inline scripts, one might understand 2 different scenarios:
<!-- Scenario 1 -->
<script>alert('Hello world');</script>
or
<!-- Scenario 2 -->
<button onclick="alert('Hello world');">
Click me!
</button>
Now, the easiest way to allow this would be to add unsafe-inline
in script-src
of the CSP. But then we are back to the problem of executing malicious scripts entered by data injection. There are two ways to still allow only these scripts to work: nonce
and sha256
hashes. We are going to talk about sha256
hashes here.
The idea is to get the sha256 hash of the entire script and add it to the script-src
. So in this case, it would be something like this:
script-src 'self' 'sha256-DUTqIDSUj1HagrQbSjhJtiykfXxVQ74BanobipgodCo='
You can get the hash from https://report-uri.com/home/hash. Also, chromium browsers will usually show the hash that should be added for a particular inline script.
Now, all this sounds good, and in Firefox, just adding the above to your CSP will make both the scripts to work. However, in chromium, the above CSP will work only in Scenario 1 but not in Scenario 2. You can read more about the discussion here: https://bugs.chromium.org/p/chromium/issues/detail?id=546106#c8
In JavaScript, I think in general scenario 1 will be much more encouraged than scenario 2. So scenario 2 might not be encountered that often. However, the situation changes, when it comes to styles (or CSS)
In case of inline styles, following are the scenarios:
<!-- scenario 1 -->
<style>p{color: blue;}</style>
and
<!-- scenario 2 -->
<p style="color: blue;">This is a text</p>
In CSS, the second scenario is much more common when someone does inline styles than scenario 1. But again, in this case, adding a sha256 hash to style-src
won't execute the scenario 2 in chromium browsers.
This is because styles added in scenario 2 are part of the style attribute in the HTML tag which in CSP terms are essentially event handlers. According to w3c CSP draft, the hash in style-src
allows the inline styles mentioned inside <style>
tag to pass but doesn't allow event handlers (as is the case in scenario 2). There's more on this discussion here.
Yes, it is a feature. In chromium browsers, adding a hash to style-src
only allows any inline style written inside the <style>
tags to execute. This is by design. If you need to execute the inline styles present in style=
attribute of HTML tags, you need to use another directive in CSP called style-src-attr
. Similarly, script-src-attr
should be used if you are doing JavaScript event handling in the HTML tag itself.
So, for example, if you want to only allow an inline CSS such as this:
<p style="color: blue;">This is a text</p>
all you need to do is put the sha256 hash in style-src-attr
along with 'unsafe-hashes'
. This will tell the browser to allow any inline style, with the hashes that you added in style-src-attr
to be executed.
So the CSP will have something like this:
style-src-attr 'unsafe-hashes' 'sha256-C8uD/9cXZAvqgnwxgdb67jgkSDq7f8xjP8F6lhY1Gtk='
And, that's it! That will do the trick in any chromium browser. The related code for chromium can be found here. According to caniuse.com, all chromium browsers above 75 supports this behaviour.
Even though firefox still doesn't have support for style-src-attr
but it allows inline styles and scripts of all types to pass based on style-src
and script-src
hashes. So as long as the hash is mentioned in both style-src
and style-src-attr
, it should work in most of the browsers.
As for the explanation behind why 'unsafe-hashes'
, there is a pretty good explainer document written by andypaicu talking about exactly this.
Also, read more about style-src-attr
in detail in the w3c draft to understand exactly what's happening and what kind of risk it may still pose.
PS: Inline JavaScript event handlers using script-src-attr
can be very risky given an attacker can trigger a passing javascript from within an unrelated HTML tag.
One of the essential task for monitoring your system is to keep track of the processes which are running(or sleeping). The ps
command has been used in UNIX-based operating systems.
Now to control the monitoring of your computer’s process you need some tools. And in Linux these tools are nothing but some commands such as top, ps and pstree. Let me list some of the commands down and what is the utility of the command.
In my last blog I wrote about /proc
. /proc
filesystem can also be helpful in monitoring processes as well as any other items on the system. You can view your process states with the help of ps
. Some common options present are:
ps aux
ps elf
ps -eL
~ ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.2 176404 9252 ? Ss 13:32 0:04 /usr/lib/systemd/systemd --switched-root --system --deserialize 30
root 2 0.0 0.0 0 0 ? S 13:32 0:00 [kthreadd]
root 3 0.0 0.0 0 0 ? I< 13:32 0:00 [rcu_gp]
root 4 0.0 0.0 0 0 ? I< 13:32 0:00 [rcu_par_gp]
root 6 0.0 0.0 0 0 ? I< 13:32 0:00 [kworker/0:0H-events_highpri]
root 9 0.0 0.0 0 0 ? I< 13:32 0:00 [mm_percpu_wq]
root 10 0.0 0.0 0 0 ? S 13:32 0:06 [ksoftirqd/0]
root 11 0.0 0.0 0 0 ? I 13:32 0:20 [rcu_sched]
root 12 0.0 0.0 0 0 ? S 13:32 0:00 [migration/0]
root 13 0.0 0.0 0 0 ? S 13:32 0:00 [cpuhp/0]
root 14 0.0 0.0 0 0 ? S 13:32 0:00 [cpuhp/1]
The aux command shows all processes and the commands which are surrounded by the square bracket are the threads that exist totally within the kernel. The screenshot above shows various things like VSZ,RSS,STAT etc.
VSZ is the process virtual memory.
RSS is the resident set size;the non-swpapped physical memory which is being used in KB.
STAT describes the state of the process; in our example we see only S for sleeping, or R for running.
- < for high priority (not nice)
- N for low priority (nice)
- L for having pages locked in memory
- s for session leader
- l for multi-threaded
You can also customize your ps
output. If you use the -o option, followed by a comma-separated list of field identifiers, you can print out a customized list of ps fields:
Some few commands oh how to view you process monitoring stats.
Anyone who has ever created an OpenPGP key knows that it is a terrifying day in their life. Be it someone skilled in computers, or someone who just cares about their privacy, creating OpenPGP key is a fearsome incident. Add moving all subkeys properly to yubikey along with managing all passphrases, and the terror just increases manifold.
Well, do not fear, Tumpa is here.
For most journalists, lawyers, activists, or anyone who wants to have secure communication, OpenPGP key is a great way to send and receive encrypted messages. But most people dread a black terminal (or command line) with some text menu. That's the only way to probably create OpenPGP keys and transfer them to a smartcard (e.g, Yubikey) till now. So, when Kushal showed me johnnycanencrypt, his python library for various OpenPGP key based operations, we had this idea that it would be simply amazing if we can provide a Graphical User Interface (GUI) for people to create keys and transfer their keys to yubikey.
Being a digital security trainer, I can vouch that most journalists, lawyers, activists and anyone who doesn't sit in front of a terminal all day would rather have a desktop application to click a few buttons, fill up a few forms, and get their result, rather than typing command after command in a black screen.
And that's exactly what Tumpa does!
Tumpa provides a simple form where you need to add your name, all emails that you want to associate with your OpenPGP key, a passphrase for your OpenPGP key, click on the big "Generate" button, and boom!
That's it!
You have your OpenPGP key with proper subkeys and everything!
Well, what about transferring the key to the smart card? Just plug your Yubikey, click on the big "Upload to SmartCard" button, add the necessary passphrases, and done!
You have your key transferred to a physical key!
Usually, a training session to teach someone to create OpenPGP key properly and transferring everything properly to a smartcard like yubikey takes about 3-4 hours. And after such a session, usually, everyone loses a bit of their sanity in the process.
The first time I and Kushal got the first draft working and went through the entire flow, we were both positively surprised and probably laughing hysterically (thanks Anwesha for tolerating us for the last few days).
Tumpa optimistically reduces work which you would take hours, into a few minutes. And also lets everyone keep their sanity. Most of the operations that would need you to type a lot of commands and understand some command-line options, can be achieved by a few clicks.
You can download the .deb
package from the release page.
Then, install using dpkg -i ./tumpa_0.1.0+buster+nmu1_all.deb
, preferrably on an airgapped computer inside of Tails.
Tumpa is at a very early stage of development. We have tried to make Tumpa feature complete to the most necessary ones and make the initial release. But there's still a lot of work left to be done.
We want to make Tumpa even easier to use for people who don't want to get into all the intricacies of OpenPGP key while giving more advanced options to the more experimental and curious users.
Right now, Tumpa uses Curve25519 to create keys and subkeys with an expiration date of 3 years. We want to give options to possibly select these based on a user's need in case they really care and want to change things are. There are many such customizations and also simplifications that we will slowly add in the next releases trying to improve the entire user experience even more.
We have started conducting user interviews. We would really love more people to do usability studies with a varied group of technologists, lawyers, journalists, activists, or anyone interested, to improve the UX manifold.
The UI, for now, is very simple and probably not the best. So we can definitely use any feedback or suggestions.
We are available on #tumpa
channel on Freenode. Feel free to drop by with all your comments.
Also, read Kushal's release blog on Tumpa to know more about installation and packaging.
Generating OpenPGP keys in an offline air-gapped system and then moving them
into a smart card is always a difficult task for me. To remember the steps and
command-line options of gpg2
correctly and then following them in the same
order is difficult, and I had trouble enough number of times in doing so when I
think about someone who is not into the command line that much, how difficult
these steps are for them.
While having a chat with Saptak a few weeks ago, we came up with the idea of writing a small desktop tool to help. I started adding more features into my Johnnycanencrypt for the same. The OpenPGP operations are possible due to the amazing Sequoia project.
The work on the main application started during the holiday break, and today I
am happy to release 0.1.0
version of
Tumpa to make specific OpenPGP operations
simple to use. It uses Johnnycanencrypt inside, and does not depend on the gpg
.
Here is a small demo of the application running in a Tails (VM) environment. I am creating a new OpenPGP key with encryption and signing subkeys, and then putting them into a Yubikey. We are also setting the card holder's name via our tool.
We can also reset any Yubikey with just a click.
You can download the Debian Buster package for Tails from the release page from Github. You can run from the source in Mac or Fedora too. But, if you are doing any real key generation, then you should try to do it in an air-gapped system.
You can install the package as dpkg -i ./tumpa_0.1.0+buster+nmu1_all.deb
inside of Tails.
A lot of work :) This is just the beginning. There are a ton of features we planned, and we will slowly add those. The UI also requires a lot of work and touch from a real UX person.
The default application will be very simple to use, and we will also have many advanced features, say changing subkey expiration dates, creating new subkeys, etc. for the advanced users.
We are also conducting user interviews (which takes around 20 minutes of time). If you have some time to spare to talk to us and provide feedback, please feel free to ping us via Twitter/mastodon/IRC.
We are available on #tumpa
channel on Freenode. Come over and say hi :)
There are a lot of people I should thank for this release. Here is a quick list at random. Maybe I miss many names here, but you know that we could not do this without your help and guidance.
How to get a TLS certificate for a domain inside of my local network? This was
a question for me for a long time. I thought of creating a real subdomain,
getting the certificate, and copying over the files locally, and then enforcing
local domain names via the DNS or /etc/hosts
. But, during the TLS training
from Scott Helme, I learned about getting
certificates via DNS challenge
using acme.sh.
I use DreamHost nameservers for most of m domains. I got an API_KEY from them for only DNS manipulation.
Next, I just had to execute one single command along with the API_KEY to fetch fresh and hot certificate from Let's Encrypt.
The following command fetches for fire.das.community
subdomain.
DH_API_KEY=MYAPIKEY acme.sh --issue --dns dns_dreamhost -d fire.das.community
There is a wiki page listing how to use acme.sh tool for various DNS providers.
by Bhavin Gandhi (bhavin192@removethis.geeksocket.in) at December 12, 2020 04:55 PM
by Bhavin Gandhi (bhavin192@removethis.geeksocket.in) at December 12, 2020 07:24 AM
I’m actually really good at procrastinating things when I’m uncomfortable.
- me, back in Feb 2020
As you’re likely aware, the human world has been quite a different place since my last post here. My transition to this change wasn’t super smooth, but hey, it could’ve been a lot worse. My mental health has definitely had a really fun roller coaster ride. I have been fortunate enough to be able to stay safe and also work remotely with some wonderful folks over this time.
As you might know, I’ve been working on pip’s dependency resolver since 2017, which was originally my Google Summer of Code project. There is now a public pip release that uses a new, written-from-the-ground-up dependency resolver, that replaces the old pseudo-dependency-resolver it had. 🎉
A large part of why this project was finally pushed over the line was that the Packaging-WG at the Python Software Foundation was able to secure funding toward this, from Mozilla (through its Mozilla Open Source Support Awards) and the Chan Zuckerberg Initiative.
Huge thanks to Bernard Tylers, Ernest W. Durbin III, Georgia Bullen, Nicole Harris, Paul Moore, Sumana Harihareshwara and Tzu-Ping Chung for being amazing colleagues (and friends!) as we worked on this together.
Hurray! I made it out of that place. As a faculty once told me, “you got to where you are despite this college, not because of it”.
My college has made me sign paperwork that prohibits me from speaking out against university management (something that should speak for itself). Here’s a link to a relevant section on Wikipedia.
Update: I had to change the link above to a permalink to a specific version of the document, because the linked section was edited out the day I put up this post.
I mentored Raphael McSinyx, who worked on pip, exploring speedups to pip’s dependency resolution process. You can read more about their work in final GSoC report.
I am now working at Bloomberg Engineering as, currently, a part of the Python Infrastructure team in London.
I’m officially no longer living in my parent’s home. Onward to new adventures, I guess.
I made a Sphinx documentation theme: Furo, modernised sphinx-themes.org and am collaborating with the amazing folks of the Executable Books project.
CZI conducted an EOSS kickoff meeting, bringing lots of members of their first cohort of 32 EOSS grantees into one room. This included me!
It was a really well conducted event, and the room was filled with brilliant people. I felt like a squirrel in a room full of elephants.
This happened right before COVID-19 was deemed serious enough to care about it. It was the last trip I had in the before-times, and it was definitely a good one.
The blog post "Email setup with isync, notmuch, afew, msmtp and Emacs" prompted a few questions. The questions were around synchronizing email in general.
I did promise to write up more blog posts to explain the pieces I brushed over quickly for brevity and ease of understanding. Or so I thought !
Let's talk Maildir. Wikipedia defines it as the following.
The Maildir e-mail format is a common way of storing email messages in which each message is stored in a separate file with a unique name, and each mail folder is a file system directory. The local file system handles file locking as messages are added, moved and deleted. A major design goal of Maildir is to eliminate the need for program code to handle file locking and unlocking.
It is basically what I mentioned before. Think of your emails as folders and files. The image will get clearer, so let's dig even deeper.
If you go into a Maildir directory, let's say Inbox and list all the directories in there, you'll find tree of them.
$ ls
cur/ new/ tmp/
These directories have a purpose.
tmp/
: This directory stores all temporary files and files in the process of being delivered.new/
: This directory stores all new files that have not yet been seen by any email client.cur/
: This directory stores all the files that have been previously seen.This is basically how emails are going to be represented on your disk. You will need to find an email client which can parse these files and work with them.
The Internet Mail Access Protocol, shortened to IMAP, is an
Internet standard protocol used by email clients to retrieve email messages from a mail server over a TCP/IP connection.
In simple terms, it is a way of communication that allows synchronization between a client and an email server.
Now, you have all the pieces of the puzzle to figure out how to think about your email on disk and how to synchronize it. It might be a good idea to dive a little bit into my configuration and why I chose these settings to begin with. Shall we ?
Most email servers nowadays offer you an IMAP (POP3 was another protocol used widely back in the day) endpoint to connect to. You might be using Outlook or Thunderbird or maybe even Claws-mail as an email client. They usually show you the emails in a neat GUI (Graphical User Interface) with all the read and unread mail and the folders. If you've had the chance to configure one of these clients a few years ago, you would've needed to find the IMAP host and port of the server. These clients talk IMAP too.
isync is an application to synchronize mailboxes. I use it to connect to my email server using IMAP and synchronize my emails to my hard drive as a Maildir.
The very first section of the configuration is the IMAP section.
IMAPAccount Personal
Host email.hostname.com
User personal@email.hostname.com
Pass "yourPassword"
# One can use a command which returns the password
# Such as a password manager or a bash script
#PassCmd sh script/path
SSLType IMAPS
CertificateFile /etc/ssl/certs/ca-certificates.crt
IMAPStore personal-remote
Account Personal
In here, we configure the IMAP settings. Most notably here is of course Host
, User
and Pass/PassCmd
. These settings refer to your server and you should populate them with that information.
The IMAPStore
is used further in the configuration, this gives a name for the IMAP Store. In simple terms, if you want to refer to your server you use personal-remote
.
The next section of the configuration is the Maildir part. You can think of this as where do you want your emails to be saved on disk.
MaildirStore personal-local
Subfolders Verbatim
Path ~/.mail/
Inbox ~/.mail/Inbox
This should be self explanatory but I'd like to point out the MaildirStore
key. This refers to email on disk. So, if you want to refer to your emails on disk you use personal-local
.
At this point, you are thinking to yourself what the hell does that mean ? What is this dude talking about ! Don't worry, I got you.
This is where all what you've learned comes together. The fun part ! The part where you get to choose how you want to do things.
Here's what I want. I want to synchronize my server Inbox with my on disk Inbox both ways. If the Inbox folder does not exist on disk, create it. The name of the Inbox on the server is Inbox
.
This can be translated to the following.
Channel sync-personal-inbox
Master :personal-remote:"Inbox"
Slave :personal-local:Inbox
Create Slave
SyncState *
CopyArrivalDate yes
I want to do the same with Archive
and Sent
.
Channel sync-personal-archive
Master :personal-remote:"Archive"
Slave :personal-local:Archive
Create Slave
SyncState *
CopyArrivalDate yes
Channel sync-personal-sent
Master :personal-remote:"Sent"
Slave :personal-local:Sent
Create Slave
SyncState *
CopyArrivalDate yes
At this point, I still have my trash. The trash on the server is called Junk
but I want it to be Trash
on disk. I can do that easily as follows.
Channel sync-personal-trash
Master :personal-remote:"Junk"
Slave :personal-local:Trash
Create Slave
SyncState *
CopyArrivalDate yes
I choose to synchronize my emails both ways. If you prefer, for example, not to download the sent emails and only synchronize them up to the server, you can do that with SyncState
. Check the mbsync
manual pages.
At the end, add all the channel names configured above under the save Group with the same account name.
Group Personal
Channel sync-personal-inbox
Channel sync-personal-archive
Channel sync-personal-sent
Channel sync-personal-trash
This is pretty much it. It is that simple. This is how I synchronize my email. How do you ?
I was asked recently about how I have my email client setup. As I naturally do, I replied with something along the lines of the following.
I use isync, notmuch, afew and msmtp with emacs as an interface, let me get you a link on how I did my setup from my blog.
To my surprise, I never wrote about the topic. I guess this is as better time as any to do so.
Let's dig in.
Looking at the big list of tools mentioned in the title, I could understand how one could get intimidated but I assure you these are very basic, yet very powerful, tools.
First task is to divide and conquer, as usual. We start by the first piece of the puzzle, understand email.
In a very simplified way of thinking of email is that each email is simply a file. This file has all the information needed as to who sent it to whom, from which server, etc… The bottom line is that it's simply a file in a folder somewhere on a server. Even though this might not be the case on the server, in this setup it will most certainly be the case locally on your filesystem. Thinking about it in terms of files in directories also makes sense because it will most likely be synchronized back with the server that way as well.
Now you might ask, what tool would offer us such a way to synchronize emails and my answer would be… Very many, of course… come on this is Linux and Open Source ! Don't ask silly questions… But to what's relevant to my setup it's isync.
Now that I have the emails locally on my filesystem, I need a way to interact with them. Some prefer to work with directories, I prefer to work with tags instead. That's where notmuch comes in. You can think of it as an email tagging and querying system. To make my life simpler, I utilize afew to handle a few basic email tasks to save me from writing a lot of notmuch rules.
I already make use of emacs extensively in my day to day life and having a notmuch interface in emacs is great. I can use emacs to view, tag, search and send email.
Oh wait, right… I wouldn't be able to send email without msmtp.
isync is defined as
a command line application which synchronizes mailboxes.
While isync currently supports Maildir and IMAP4 mailboxes, it has the very logical command of mbsync
. Of course !
Now, isync is very well documented in the man
pages.
man mbsync
Everything you need is there, have fun reading.
While you read the man
pages to figure out what you want, I already did that and here's what I want in my ~/.mbsyncrc
.
##########################
# Personal Configuration #
##########################
# Name Account
IMAPAccount Personal
Host email.hostname.com
User personal@email.hostname.com
Pass "yourPassword"
# One can use a command which returns the password
# Such as a password manager or a bash script
#PassCmd sh script/path
SSLType IMAPS
CertificateFile /etc/ssl/certs/ca-certificates.crt
IMAPStore personal-remote
Account Personal
MaildirStore personal-local
Subfolders Verbatim
Path ~/.mail/
Inbox ~/.mail/Inbox
Channel sync-personal-inbox
Master :personal-remote:"Inbox"
Slave :personal-local:Inbox
Create Slave
SyncState *
CopyArrivalDate yes
Channel sync-personal-archive
Master :personal-remote:"Archive"
Slave :personal-local:Archive
Create Slave
SyncState *
CopyArrivalDate yes
Channel sync-personal-sent
Master :personal-remote:"Sent"
Slave :personal-local:Sent
Create Slave
SyncState *
CopyArrivalDate yes
Channel sync-personal-trash
Master :personal-remote:"Junk"
Slave :personal-local:Trash
Create Slave
SyncState *
CopyArrivalDate yes
# Get all the channels together into a group.
Group Personal
Channel sync-personal-inbox
Channel sync-personal-archive
Channel sync-personal-sent
Channel sync-personal-trash
The following will synchronize both ways the following folders:
Those are the only directories I care about.
With the configuration in place, we can try to sync the emails.
mbsync -C -a -V
You can read more about notmuch on their webpage. Their explanation is interesting to say the least.
What notmuch does, is create a database where it saves all the tags and relevant information for all the emails. This makes it extremely fast to query and do different operations on large numbers of emails.
I use notmuch mostly indirectly through emacs, so my configuration is very simple. All I want from notmuch is to tag all new emails with the new
tag.
# .notmuch-config - Configuration file for the notmuch mail system
#
# For more information about notmuch, see https://notmuchmail.org
# Database configuration
#
# The only value supported here is 'path' which should be the top-level
# directory where your mail currently exists and to where mail will be
# delivered in the future. Files should be individual email messages.
# Notmuch will store its database within a sub-directory of the path
# configured here named ".notmuch".
#
[database]
path=/home/user/.mail/
# User configuration
#
# Here is where you can let notmuch know how you would like to be
# addressed. Valid settings are
#
# name Your full name.
# primary_email Your primary email address.
# other_email A list (separated by ';') of other email addresses
# at which you receive email.
#
# Notmuch will use the various email addresses configured here when
# formatting replies. It will avoid including your own addresses in the
# recipient list of replies, and will set the From address based on the
# address to which the original email was addressed.
#
[user]
name=My Name
primary_email=user@email.com
# other_email=email1@example.com;email2@example.com;
# Configuration for "notmuch new"
#
# The following options are supported here:
#
# tags A list (separated by ';') of the tags that will be
# added to all messages incorporated by "notmuch new".
#
# ignore A list (separated by ';') of file and directory names
# that will not be searched for messages by "notmuch new".
#
# NOTE: *Every* file/directory that goes by one of those
# names will be ignored, independent of its depth/location
# in the mail store.
#
[new]
tags=new;
#tags=unread;inbox;
ignore=
# Search configuration
#
# The following option is supported here:
#
# exclude_tags
# A ;-separated list of tags that will be excluded from
# search results by default. Using an excluded tag in a
# query will override that exclusion.
#
[search]
exclude_tags=deleted;spam;
# Maildir compatibility configuration
#
# The following option is supported here:
#
# synchronize_flags Valid values are true and false.
#
# If true, then the following maildir flags (in message filenames)
# will be synchronized with the corresponding notmuch tags:
#
# Flag Tag
# ---- -------
# D draft
# F flagged
# P passed
# R replied
# S unread (added when 'S' flag is not present)
#
# The "notmuch new" command will notice flag changes in filenames
# and update tags, while the "notmuch tag" and "notmuch restore"
# commands will notice tag changes and update flags in filenames
#
[maildir]
synchronize_flags=true
Now that notmuch is configured the way I want it to, I use it as follows.
notmuch new
Yup, that simple.
This will tag all new emails with the new
tag.
Once all the new emails have been properly tagged with the new
tag by notmuch, afew comes in.
afew is defined as an initial tagging script for notmuch. The reason of using it will become evident very soon but let me quote some of what their Github page says.
It can do basic thing such as adding tags based on email headers or maildir folders, handling killed threads and spam.
In move mode, afew will move mails between maildir folders according to configurable rules that can contain arbitrary notmuch queries to match against any searchable attributes.
This is where the bulk of the configuration is, in all honesty. At this stage, I had to make a decision of how would I like to manage my emails ?
I think it should be simple if I save them as folders on the server as it doesn't support tags. I can derive the basic tags from the folders and keep a backup of my database for all the rest of the tags.
My configuration looks similar to the following.
# ~/.config/afew/config
[global]
[SpamFilter]
[KillThreadsFilter]
[ListMailsFilter]
[SentMailsFilter]
[ArchiveSentMailsFilter]
sent_tag = sent
[DMARCReportInspectionFilter]
[Filter.0]
message = Tagging Personal Emails
query = 'folder:.mail/'
tags = +personal
[FolderNameFilter.0]
folder_explicit_list = .mail/Inbox .mail/Archive .mail/Drafts .mail/Sent .mail/Trash
folder_transforms = .mail/Inbox:personal .mail/Archive:personal .mail/Drafts:personal .mail/Sent:personal .mail/Trash:personal
folder_lowercases = true
[FolderNameFilter.1]
folder_explicit_list = .mail/Archive
folder_transforms = .mail/Archive:archive
folder_lowercases = true
[FolderNameFilter.2]
folder_explicit_list = .mail/Sent
folder_transforms = .mail/Sent:sent
folder_lowercases = true
[FolderNameFilter.3]
folder_explicit_list = .mail/Trash
folder_transforms = .mail/Trash:deleted
folder_lowercases = true
[Filter.1]
message = Untagged 'inbox' from 'archive'
query = 'tag:archive AND tag:inbox'
tags = -inbox
[MailMover]
folders = .mail/Inbox
rename = True
max_age = 7
.mail/Inbox = 'tag:deleted':.mail/Trash 'tag:archive':.mail/Archive
# what's still new goes into the inbox
[InboxFilter]
Basically, I make sure that all the emails, in their folders, are tagged properly. I make sure the emails which need to be moved are moved to their designated folders. The rest is simply the inbox.
Note
The read / unread tag is automatically handled between notmuch and isync. It's seemlessly synchronized between the tools.
With the configuration in place, I run afew.
afew -v -t --new
For moving the emails, I use afew as well but I apply it on all emails and not just the ones tagged with new
.
afew -v -m --all
msmtp is an SMTP client. It sends email.
The configuration is very simple.
# Set default values for all following accounts.
defaults
auth on
tls on
tls_trust_file /etc/ssl/certs/ca-certificates.crt
logfile ~/.msmtp.log
# Mail
account personal
host email.hostname.com
port 587
from personal@email.hostname.com
user personal@email.hostname.com
password yourPassword
# One can use a command which returns the password
# Such as a password manager or a bash script
# passwordeval sh script/path
# Set a default account
account default : personal
I use Doom as a configuration framework for Emacs. notmuch comes as a modules which I enabled, but you might want to check the notmuch's Emacs Documentation page for help with installation and configuration.
I wanted to configure the notmuch interface a bit to show me what I'm usually interested in.
(setq +notmuch-sync-backend 'mbsync)
(setq notmuch-saved-searches '((:name "Unread"
:query "tag:inbox and tag:unread"
:count-query "tag:inbox and tag:unread"
:sort-order newest-first)
(:name "Inbox"
:query "tag:inbox"
:count-query "tag:inbox"
:sort-order newest-first)
(:name "Archive"
:query "tag:archive"
:count-query "tag:archive"
:sort-order newest-first)
(:name "Sent"
:query "tag:sent or tag:replied"
:count-query "tag:sent or tag:replied"
:sort-order newest-first)
(:name "Trash"
:query "tag:deleted"
:count-query "tag:deleted"
:sort-order newest-first))
)
Now, all I have to do is simply open the notmuch
interface in Emacs.
To put everything together, I wrote a bash script with the commands provided above in series. This script can be called by a cron or even manually to synchronize emails.
From the Emacs interface I can do pretty much everything I need to do.
Future improvements I have to think about is the best way to do email notifications. There are a lot of different ways I can approach this. I can use notmuch to query for what I want. I could maybe even try querying the information out of the Xapian database. But that's food for thought.
I want email to be simple and this makes it simple for me. How are you making email simple for you ?
lone strollers with dogs and bottles of wine
going out for walks under basswood and pinechildren singing and whirling around
playin' their favorite games on a crowded playgroundan army of joggers on bouncing feet
elderly people tryin' to cross the streetthe twilight sky, dipped in orange and blue
the silvery loom of the crescent moonmore faces than I have seen in weeks
November wind gives them rosy cheekswith happy, sad, excited expressions
from football games and from jam sessionsengrossed in music engrossed in sports
engrossed in coffee, talks and thoughtsa spirit of life and truth and hope
and the occasional smell of dopelike collecting pieces to make me whole
like an emergency bandage for my soul
Thunderbird is the free and open source email client by Mozilla Foundation. I have been using it for some years now. Till now the Thunderbird users had to use an extension Enigmail to use GnuPG. Thunderbird 78 now uses a different implementation of OpenPGP called RNP.
Since RNP library still does not support the use of secret key on smartcards, to use Yubikey or any other GnuPG enabled smartcards, we need manually configure Thunderbird with GnuPG. The steps as said are the following :
dnf install GPGME
GPGME, GnuPG Made Easy library makes the GnuPG easily accessible by providing a high level crypto API for encrypt, decrypt, sign, verify and key management. I already have GnuPG installed in my Fedora 33 machine and my Yubikey ready.
Go to the Preferences menu then click on the config editor button at the very end.
Click on the I accept the risk.
Search for mail.openpgp.allow_external_gnupg and switch to true.
Remember to restart the Thunderbird after that.
Now go to the Account Settings and then go to the End-To-End-Encryption at the sidebar. Select the Use your external key through GnuPG(e.g. from a smartcard) option and click on continue.
Type your Secret Key ID in the box and click on Save key ID.
Now open the OpenPGP Key Manager and import your public key and then verify.
Now you can start using your hardware token in Thunderbird.
In this case we have to use 2 keyrings - GnuPG and RNP’s keyring (internal in Thunderbird). This is an extra step, which I hope in future can be avoided.
Mailvelope is an extension on web browsers to send end to end encrypted emails. This is a good option available to the users to send end to end encrypted without changing the email service they use. It is licensed under AGPL v3, making it Free and Open Source software. The code is there in Github for the community to have a look. This can be added as an extension to the - Chrome, Firefox and Edge browsers to securely encrypt emails with PGP using your email providers.
Mailvelope does provide end to end encryption for the email content but does not protect the metadata (subject, IP address of the sender) from third parties. As most of the email encryption tools, it does not work on the mobile browser. There is a detailed user guide on Mailvelope from the Freedom of the Press Foundation, which is really helpful for the new users.
By default, Mailvelope uses its own keyring. To use my Yubikey along with GnuPG keyring, I had to take the following steps:
We need gpgme installed. On my Fedora 33 I did
$ sudo dnf install gpgme -y
We have to create gpgmejson.json .json file in the ~/.config/google-chrome/NativeMessagingHosts
directory write the following json in there.
{
"name": "gpgmejson",
"description": "Integration with GnuPG",
"path": "/usr/bin/gpgme-json",
"type": "stdio",
"allowed_origins": [
"chrome-extension://kajibbejlbohfaggdiogboambcijhkke/"
]
}
mkdir -p ~/.mozilla/native-messaging-hosts
After creating the native-messaging-hosts directory inside the Mozilla directory, add gpgmejson.json file there with the following content.
vim ~/.mozilla/native-messaging-hosts/gpgmejson.json
{
"name": "gpgmejson",
"description": "Integration with GnuPG",
"path": "/usr/bin/gpgme-json",
"type": "stdio",
"allowed_extensions": [
"jid1-AQqSMBYb0a8ADg@jetpack"
]
}
Remember to restart the respective browser after you add the .json file. Then go to the Mailvelope extension to select the GnuPG keyring.
I recently read this paper titled, Understanding Real-World Concurrency Bugs in Go (PDF), that studies concurrency bugs in Golang and comments on the new primitives for messages passing that the language is often known for.
I am not a very good Go programmer, so this was an informative lesson in various ways to achieve concurrency and synchronization between different threads of execution. It is also a good read for experienced Go developers as it points out some important gotchas to look out for when writing Go code. The fact that it uses real world examples from well known projects like Docker, Kubernetes, gRPC-Go, CockroachDB, BoltDB etc. makes it even more fun to read!
The authors analyzed a total of 171 concurrency bugs from several prominent Go open source projects and categorized them in two orthogonal dimensions, one each for the cause of the bug and the behavior. The cause is split between two major schools of concurrency
Along the cause dimension, we categorize bugs into those that are caused by misuse of shared memory and those caused by misuse of message passing
and the behavior dimension is similarly split into
we separate bugs into those that involve (any number of ) goroutines that cannot proceed (we call themblocking bugs) and those that do not involve any blocking (non-blocking bugs)
Interestingly, they chose the behavior to be blocking instead of deadlock since the former implies that atleast one thread of execution is blocked due to some concurrency bug, but the rest of them might continue execution, so it is not a deadlock situation.
Go has primitive shared memory protection mechanisms like Mutex
, RWMutex
etc. with a caveat
Write lock requests in Go have ahigher privilege than read lock requests.
as compared to pthread in C. Go also has a new primitive called sync.Once
that can be used to guarantee that a function is executed only once. This can be useful in situations where some callable is shared across multiple threads of execution but it shouldn't be called more than once. Go also has sync.WaitGroups
, which is similar to pthread_join
to wait for various threads of executioun to finish executing.
Go also uses channels for the message passing between different threads of executions called Goroutunes. Channels can be buffered on un-buffered (default), the difference between them being that in a buffered channel the sender and receiver don't block on each other (until the buffered channel is full).
The study of the usage patterns of these concurrency primitives in various code bases along with the occurence of bugs in the codebase concluded that even though message passing was used at fewer places, it accounted for a larger number of bugs(58%).
Implication 1:With heavier usages of goroutines and newtypes of concurrency primitives, Go programs may potentiallyintroduce more concurrency bugs
Also, interesting to note is this observation in tha paper
Observation 5:All blocking bugs caused by message passing are related to Go’s new message passing semantics like channel. They can be difficult to detect especially when message passing operations are used together with other synchronization mechanisms
The authors also talk about various ways in which Go runtime can detect some of these concurrency bugs. Go runtime includes a deadlock detector which can detect when there are no goroutunes running in a thread, although, it cannot detect all the blocking bugs that authors found by manual inspection.
For shared memory bugs, Go also includes a data race detector which can be enbaled by adding -race
option when building the program. It can find races in memory/data shared between multiple threads of execution and uses happened-before algorithm underneath to track objects and their lifecycle. Although, it can only detect a part of the bugs discovered by the authors, the patterns and classification in the paper can be leveraged to improve the detection and build more sophisticated checkers.
TLDR; Trying to learn new things I tried writing a URL shortner called shorty. This is a first draft and I am trying to approach it from first principle basis. Trying to break down everything to the simplest component.
I decided to write my own URL shortner and the reason for doing that was to dive a little more into golang and to learn more about systems. I have planned to not only document my learning but also find and point our different ways in which this application can be made scalable, resilient and robust.
A high level idea is to write a server which takes the big
url and return me a short
url for the same. I have one more requirement where I do want to provide a slug
i.e a custom short url path for the same. So for some links like https://play.google.com/store/apps/details?id=me.farhaan.bubblefeed, I want to have a url like url.farhaan.me/linktray
which is easy to remember and distribute.
The way I am thinking to implement this is by having two components, I want a CLI
interface which talks to my Server
. I don’t want a fancy UI for now because I want it to be exclusively be used through terminal. A Client-Server architecture, where my CLI
client sends a request to the server
with a URL
and an optional slug
. If a slug is present URL will have that slug in it and if it doesn’t it generates a random
string and make the URL small. If you see from a higher level it’s not just a URL shortner
but also a URL tagger
.
The way a simple url shortner works:
A client makes a request to make a given URL short, server takes the URL and stores it to the database, server then generates a random string and maps the URL to the string and returns a URL like url.farhaan.me/<randomstring>.
Now when a client requests to url.farhaan.me/<randomstring>, it goest to the same server, it searches the original URL and redirects the request to a different website.
The slug
implementation part is very straightforward, where given a word, I might have to search the database and if it is already present we raise an error but if it isn’t we add it in the database and return back the URL.
One optimization, since it’s just me who is going to use this, I can optimize my database to see if the long URL already exists and if it does then no need to create a new entry. But this should only happen in case of random string and not in case of slugs. Also this is a trade off between reducing the redundancy and latency of a request.
But when it comes to generating a random string, things get a tiny bit complicated. This generation of random strings, decides how many URLs
you can store. There are various hashing algorithms that I can use to generate a string I can use md5
, base10
or base64
. I also need to make sure that it gives a unique hash and not repeated ones.
Unique hash can be maintained using a counter, the count either can be supplied from a different service which can help us to scale the system better or it can be internally generated, I have used database record number for the same.
If you look at this on a system design front. We are using the same Server to take the request and generate the URL and to redirect the request. This can be separated into two services where one service is required to generate the URL and the other just to redirect the URL. This way we increase the availability of the system. If one of the service goes down the other will still function.
The next step is to write and integrate a CLI system to talk to the server and fetch the URL. A client that can be used for an end user. I am also planning to integrate a caching
mechanism in this but not something out of the shelf rather write a simple caching system with some cache eviction policy and use it.
Till then I will be waiting for the feedback. Happy Hacking.
I now have a Patreon open so that you folks can support me to do this stuff for longer time and sustain myself too. So feel free to subscribe to me and help me keeping doing this with added benefits.
TLDR; Link Tray is a utility we recently wrote to curate links from different places and share it with your friends. The blogpost has technical details and probably some productivity tips.
Link Bubble got my total attention when I got to know about it, I felt it’s a very novel idea, it helps to save time and helps you to curate
the websites you visited. So on the whole, and believe me I am downplaying it when I say Link Bubble does two things:
It’s a better tab management system, what I felt weird was building a whole browser to do that. Obviously, I am being extremely naive when I am saying it because I don’t know what it takes to build a utility like that.
Now, since they discontinued it for a while and I never got a chance to use it. I thought let me try building something very similar, but my use case was totally different. Generally when I go through blogs or articles, I open the links mentioned in a different tab to come back to them later. This has back bitten me a lot of time because I just get lost in so many links.
I thought if there is a utility which could just capture the links on the fly and then I could quickly go through them looking at the title, it might ease out my job. I bounced off the same idea across to Abhishek and we ended up prototyping LinkTray.
Our first design was highly inspired by facebook messenger but instead of chatheads we have links opened. If you think about it the idea feels very beautiful but the design is “highly” not scalable. For example if you have as many as 10 links opened we had trouble in finding our links of interest which was a beautiful design problems we faced.
We quickly went to the whiteboard and put up a list of requirements, first principles; The ask was simple:
We took inspiration from an actual Drawer where we flick out a bunch of links and go through them. In a serendipitous moment the design came to us and that’s how link tray looks like the way it looks now.
Link Tray was a technical challenge as well. There is a plethora of things I learnt about the Android ecosystem and application development that I knew existed but never ventured into exploring it.
Link Tray is written in Java, and I was using a very loosely maintained library to get the overlay activity
to work. Yes, the floating activity or application that we see is called an overlay activity
, this allows the application to be opened over
an already running application.
The library that I was using doesn’t have support for Android O
and above. To figure that out it took me a few nights , also because I was hacking on the project during nights
. After reading a lot of GitHub issues I figured out the problem and put in the support for the required operating system.
One of the really exciting features that I explored about Android is Services
. I think I might have read most of the blogs out there and all the documentation available and I know that I still don't know enough
. I was able to pick enough pointers to make my utility to work.
Just like Uncle Bob says make it work
and then make it better
. There was a persistent problem, the service needs to keep running in the background for it to work. This was not a functional issue but it was a performance issue for sure and our user of version 1.0 did have a problem with it. People got mislead because there was constant notification that LinkTray is running and it was annoying. This looked like a simple problem on the face but was a monster in the depth.
The solution to the problem was simple stop the service when the tray is closed, and start the service when the link is shared back to link tray. Tried, the service did stop but when a new link was shared the application kept crashing. Later I figured out the bound service that is started by the library I am using is setting a bound
flag to True
but when they are trying to reset this flag , they were doing at the wrong place, this prompted me to write this StackOverflow answer to help people understand the lifecycle of service. Finally after a lot of logs and debugging session I found the issue and fixed it. It was one of the most exciting moment and it help me learn a lot of key concepts.
The other key learning, I got while developing Link Tray was about multi threading, what we are doing here is when a link is shared to link tray, we need the title of the page if it has and favicon for the website. Initially I was doing this on the main UI thread which is not only an anti-pattern but also a usability hazard. It was a network call which blocks the application till it was completed, I learnt how to make a network call on a different thread, and keep the application smooth.
Initially approach was to get a webview
to work and we were literally opening the links in a browser and getting the title
and favicon
out, this was a very heavy process. Because we were literally spawning a browser to get information about links, in the initial design it made sense because we were giving an option to consume
the links. Over time our design improved and we came to a point where we don’t give the option to consume
but to curate
. Hence we opted for web scraping, I used custom headers so that we don’t get caught by robot.txt. And after so much of effort it got to a place where it is stable and it is performing great.
It did take quite some time to reach a point where it is right now, it is full functional and stable. Do give it a go if you haven’t, you can shoot any queries to me.
Link to Link Tray: https://play.google.com/store/apps/details?id=me.farhaan.bubblefeed
Happy Hacking!
I'm on vacation at the North Sea with my family, and like exactly one year ago I was facing the problem of having too many postcards to write. Last year, I had written a small Python script that would take a yaml
file and compile it to an HTML postcard.
The yaml
describes all adjustable parts of the postcard, like the content and address, but also a title, stamp and front image. A jinja2 template, a bit of CSS and javascript create a flipable postcard that can be sent via email - which is very convenient if you, like me, are too lazy to buy postcards and stamps, and have more email addresses in your address book than physical addresses.
A postcard yaml
could look like this (click the card to flip it around):
--- - name: Holiday Status 2020 front_image: 'private_images/ninja.jpg' address: | Random Reader Schubisu's Blog World Wide Web title: I'm fine, thanks :) content: | Hey there! I'm currently on vacation and was stumbling over the same problem I had last year; writing greeting cards for friends and family. Luckily I've solved that issue last year, I simply had totally forgotten about it. This is an electronic postcard, made of HTML, CSS and a tiny bit of javascript, compiling my private photos and messages to a nice looking card. Feel free to fork, use and add whatever you like! Greets, Schubisu stamp: 'private_images/leuchtturm_2020.jpg'
and will be rendered by the script to this:
I was curious anyway, how this would be rendered on my blog. I've added a small adjustment to my CSS to scale the iframe
tag by 0.75% and I'm okay with the result ;)
Write your own postcard or add some features! You can find the repository here: https://gitlab.com/schubisu/postcard.
What color should I paint the bike-shed?
[Published in Open Source For You (OSFY) magazine, October 2017 edition.]
This article is the eighth in the DevOps series. In this issue, we shall learn to set up Docker in the host system and use it with Ansible.
Docker provides operating system level virtualisation in the form of containers. These containers allow you to run standalone applications in an isolated environment. The three important features of Docker containers are isolation, portability and repeatability. All along we have used Parabola GNU/Linux-libre as the host system, and executed Ansible scripts on target Virtual Machines (VM) such as CentOS and Ubuntu.
Docker containers are extremely lightweight and fast to launch. You can also specify the amount of resources that you need such as CPU, memory and network. The Docker technology was launched in 2013, and released under the Apache 2.0 license. It is implemented using the Go programming language. A number of frameworks have been built on top of Docker for managing these cluster of servers. The Apache Mesos project, Google’s Kubernetes, and the Docker Swarm project are popular examples. These are ideal for running stateless applications and help you to easily scale them horizontally.
The Ansible version used on the host system (Parabola GNU/Linux-libre x86_64) is 2.3.0.0. Internet access should be available on the host system. The ansible/ folder contains the following file:
ansible/playbooks/configuration/docker.yml
The following playbook is used to install Docker on the host system:
---
- name: Setup Docker
hosts: localhost
gather_facts: true
become: true
tags: [setup]
tasks:
- name: Update the software package repository
pacman:
update_cache: yes
- name: Install dependencies
package:
name: "{{ item }}"
state: latest
with_items:
- python2-docker
- docker
- service:
name: docker
state: started
- name: Run the hello-world container
docker_container:
name: hello-world
image: library/hello-world
The Parabola package repository is updated before proceeding to install the dependencies. The python2-docker package is required for use with Ansible. Hence, it is installed along with the docker package. The Docker daemon service is then started and the library/hello-world container is fetched and executed. A sample invocation and execution of the above playbook is shown below:
$ ansible-playbook playbooks/configuration/docker.yml -K --tags=setup
SUDO password:
PLAY [Setup Docker] *************************************************************
TASK [Gathering Facts] **********************************************************
ok: [localhost]
TASK [Update the software package repository] ***********************************
changed: [localhost]
TASK [Install dependencies] *****************************************************
ok: [localhost] => (item=python2-docker)
ok: [localhost] => (item=docker)
TASK [service] ******************************************************************
ok: [localhost]
TASK [Run the hello-world container] ********************************************
changed: [localhost]
PLAY RECAP **********************************************************************
localhost : ok=5 changed=2 unreachable=0 failed=0
With verbose ’-v’ option to ansible-playbook, you will see an entry for LogPath, such as /var/lib/docker/containers//-json.log. In this log file you will see the output of the execution of the hello-world container. This output is the same when you run the container manually as shown below:
$ sudo docker run hello-world
Hello from Docker!
This message shows that your installation appears to be working correctly.
To generate this message, Docker took the following steps:
1. The Docker client contacted the Docker daemon.
2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
3. The Docker daemon created a new container from that image which runs the
executable that produces the output you are currently reading.
4. The Docker daemon streamed that output to the Docker client, which sent it
to your terminal.
To try something more ambitious, you can run an Ubuntu container with:
$ docker run -it ubuntu bash
Share images, automate workflows, and more with a free Docker ID:
https://cloud.docker.com/
For more examples and ideas, visit:
https://docs.docker.com/engine/userguide/
A Deep Learning (DL) Docker project is available (https://github.com/floydhub/dl-docker) with support for frameworks, libraries and software tools. We can use Ansible to build the entire DL container from the source code of the tools. The base OS of the container is Ubuntu 14.04, and will include the following software packages:
The playbook to build the DL Docker image is given below:
- name: Build the dl-docker image
hosts: localhost
gather_facts: true
become: true
tags: [deep-learning]
vars:
DL_BUILD_DIR: "/tmp/dl-docker"
DL_DOCKER_NAME: "floydhub/dl-docker"
tasks:
- name: Download dl-docker
git:
repo: https://github.com/saiprashanths/dl-docker.git
dest: "{{ DL_BUILD_DIR }}"
- name: Build image and with buildargs
docker_image:
path: "{{ DL_BUILD_DIR }}"
name: "{{ DL_DOCKER_NAME }}"
dockerfile: Dockerfile.cpu
buildargs:
tag: "{{ DL_DOCKER_NAME }}:cpu"
We first clone the Deep Learning docker project sources. The docker_image module in Ansible helps us to build, load and pull images. We then use the Dockerfile.cpu file to build a Docker image targeting the CPU. If you have a GPU in your system, you can use the Dockerfile.gpu file. The above playbook can be invoked using the following command:
$ ansible-playbook playbooks/configuration/docker.yml -K --tags=deep-learning
Depending on the CPU and RAM you have, it will take considerable amount of time to build the image with all the software. So be patient!
The built dl-docker image contains Jupyter notebook which can be launched when you start the container. An Ansible playbook for the same is provided below:
- name: Start Jupyter notebook
hosts: localhost
gather_facts: true
become: true
tags: [notebook]
vars:
DL_DOCKER_NAME: "floydhub/dl-docker"
tasks:
- name: Run container for Jupyter notebook
docker_container:
name: "dl-docker-notebook"
image: "{{ DL_DOCKER_NAME }}:cpu"
state: started
command: sh run_jupyter.sh
You can invoke the playbook using the following command:
$ ansible-playbook playbooks/configuration/docker.yml -K --tags=notebook
The Dockerfile already exposes the port 8888, and hence you do not need to specify the same in the above docker_container configuration. After you run the playbook, using the ‘docker ps’ command on the host system, you can obtain the container ID as indicated below:
$ sudo docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
a876ad5af751 floydhub/dl-docker:cpu "sh run_jupyter.sh" 11 minutes ago Up 4 minutes 6006/tcp, 8888/tcp dl-docker-notebook
You can now login to the running container using the following command:
$ sudo docker exec -it a876 /bin/bash
You can then run an ‘ifconfig’ command to find the local IP address (“172.17.0.2” in this case), and then open http://172.17.0.2:8888 in a browser on your host system to see the Jupyter Notebook. A screenshot is shown in Figure 1:
TensorBoard consists of a suite of visualization tools to understand the TensorFlow programs. It is installed and available inside the Docker container. After you login to the Docker container, at the root prompt, you can start Tensorboard by passing it a log directory as shown below:
# tensorboard --logdir=./log
You can then open http://172.17.0.2:6006/ in a browser on your host system to see the Tensorboard dashboard as shown in Figure 2:
The docker_image_facts Ansible module provides useful information about a Docker image. We can use it to obtain the image facts for our dl-docker container as shown below:
- name: Get Docker image facts
hosts: localhost
gather_facts: true
become: true
tags: [facts]
vars:
DL_DOCKER_NAME: "floydhub/dl-docker"
tasks:
- name: Get image facts
docker_image_facts:
name: "{{ DL_DOCKER_NAME }}:cpu"
The above playbook can be invoked as follows:
$ ANSIBLE_STDOUT_CALLBACK=json ansible-playbook playbooks/configuration/docker.yml -K --tags=facts
The ANSIBLE_STDOUT_CALLBACK environment variable is set to ‘json’ to produce a JSON output for readability. Some important image facts from the invocation of the above playbook are shown below:
"Architecture": "amd64",
"Author": "Sai Soundararaj <saip@outlook.com>",
"Config": {
"Cmd": [
"/bin/bash"
],
"Env": [
"PATH=/root/torch/install/bin:/root/caffe/build/tools:/root/caffe/python:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
"CAFFE_ROOT=/root/caffe",
"PYCAFFE_ROOT=/root/caffe/python",
"PYTHONPATH=/root/caffe/python:",
"LUA_PATH=/root/.luarocks/share/lua/5.1/?.lua;/root/.luarocks/share/lua/5.1/?/init.lua;/root/torch/install/share/lua/5.1/?.lua;/root/torch/install/share/lua/5.1/?/init.lua;./?.lua;/root/torch/install/share/luajit-2.1.0-beta1/?.lua;/usr/local/share/lua/5.1/?.lua;/usr/local/share/lua/5.1/?/init.lua",
"LUA_CPATH=/root/torch/install/lib/?.so;/root/.luarocks/lib/lua/5.1/?.so;/root/torch/install/lib/lua/5.1/?.so;./?.so;/usr/local/lib/lua/5.1/?.so;/usr/local/lib/lua/5.1/loadall.so",
"LD_LIBRARY_PATH=/root/torch/install/lib:",
"DYLD_LIBRARY_PATH=/root/torch/install/lib:"
],
"ExposedPorts": {
"6006/tcp": {},
"8888/tcp": {}
},
"Created": "2016-06-13T18:13:17.247218209Z",
"DockerVersion": "1.11.1",
"Os": "linux",
"task": { "name": "Get image facts" }
You are encouraged to read the ‘Getting Started with Docker’ user guide available at http://docs.ansible.com/ansible/latest/guide_docker.html to know more about using Docker with Ansible.
This is an attempt to summarize the broader software architecture around dependency resolution in pip and how testing is being done around this area.
The motivation behind writing this, is to make sure all the developers working on this project are on the same page, and to have a written record about the state of affairs.
The “legacy” resolver in pip, is implemented as part of pip’s codebase and has been a part of it for many years. It’s very tightly coupled with the existing code, isn’t easy to work with and has severe backward compatibility concerns with modifying directly – which is why we’re implementing a separate “new” resolver in this project, instead of trying to improve the existing one.
The “new” resolver that is under development, is not implemented as part of pip’s codebase; not completely anyway. We’re using an abstraction that separates all the metadata-generation-and-handling stuff vs the core algorithm. This allows us to work on the core algorithm logic (i.e. the NP-hard search problem) separately from pip-specific logic (eg. download, building etc). The abstraction and core algorithm are written/maintained in https://github.com/sarugaku/resolvelib right now. The pip-specific logic for implementing the “other side” of the abstraction is in https://github.com/pypa/pip/tree/master/src/pip/_internal/resolution/resolvelib.
In terms of testing, we have dependency-resolution-related tests in both resolvelib and pip.
The tests in resolvelib are intended more as “check if the algorithm does things correctly” and even contains tests that are agnostic to the Python ecosystem (eg. we’ve borrowed tests from Ruby, Swift etc). The goal here is to make sure that the core algorithm we implement is capable of generating correct answers (for example: not getting stuck in looping on the same “requirement”, not revisiting rejected nodes etc).
The tests in pip is where I’ll start needing more words to explain what’s happening. :)
We have “YAML” tests which I’d written back in 2017, as a format to easily write tests for pip’s new resolver when we implement it. However, since we didn’t have a need for it to be working completely back then (there wasn’t a new resolver to test with it!), the “harness” for running these tests isn’t complete and would likely need some work to be as feature complete as we’d want it to be, for writing good tests.
YAML tests: https://github.com/pypa/pip/tree/master/tests/yaml
YAML test “harness”: https://github.com/pypa/pip/blob/master/tests/functional/test_yaml.py and https://github.com/pypa/pip/blob/master/tests/lib/yaml_helpers.py
We have some unit tests for the new resolver implementation. These cover very basic “sanity checks” to ensure it follows the “contract” of the abstraction, like “do the candidates returned by a requirement actually satisfy that requirement?”. These likely don’t need to be touched, since they’re fairly well scoped and test fairly low-level details (i.e. ideal for unit tests).
New resolver unit tests: https://github.com/pypa/pip/tree/master/tests/unit/resolution_resolvelib
We also have “new resolver functional tests”, which are written as part of the current work. These exist since how-to-work-with-YAML-tests was not an easy question to answer and there needs to be work done (both on the YAML format, as well as the YAML test harness) to flag which tests should run with which resolver (both, only legacy, only new) and make it possible to put run these tests in CI easily.
New resolver functional tests: https://github.com/pypa/pip/blob/master/tests/functional/test_new_resolver.py
These files test all the functionality of the install command (like: does it use the right build dependencies, does it download the correct files, does it write the correct metadata etc). There might be some dependency-resolution-related tests in test_install*.py
files.
These files contain a lot of tests so, ideally, at some point, someone would go through and de-duplicate tests from this as well.
If you use pip, there are a multiple ways that you can help us!
First and most fundamentally, please help us understand how you use pip by talking with our user experience researchers. You can do this right now! You can take a survey, or have a researcher interview you over a video call. Please sign up and spread the word to anyone who uses pip (even a little bit).
Right now, even before we release the new resolver as a beta, you can help by running pip check
on your current environment. This will report if you have any inconsistencies in your set of installed packages. Having a clean installation will make it much less likely that you will hit issues when the new resolver is released (and may address hidden problems in your current environment!). If you run pip check
and run into stuff you can’t figure out, please ask for help in our issue tracker or chat.
Thanks to Paul Moore and Tzu-Ping for help in reviewing and writing this post, as well as Sumana Harihareswara for suggesting to put this up on my blog!
[NOTE: This post originally appeared on deepsource.io, and has been posted here with due permission.]
In the early part of the last century, when David Hilbert was working on stricter formalization of geometry than Euclid, Georg Cantor had worked out a theory of different types of infinities, the theory of sets. This theory would soon unveil a series of confusing paradoxes, leading to a crisis in the Mathematics community regarding the stability of the foundational principles of the math of that time.
Central to these paradoxes was the Russell’s paradox (or more generally, as we’d talk about later, the Epimenides Paradox). Let’s see what it is.
In those simpler times, you were allowed to define a set if you could describe it in English. And, owing to mathematicians’ predilection for self-reference, sets could contain other sets.
Russell then, came up with this:
\(R\) is a set of all the sets which do not contain themselves.
The question was "Does \(R \) contain itself?" If it doesn’t, then according to the second half of the definition it should. But if it does, then it no longer meets the definition.
The same can symbolically be represented as:
Let \(R = \{ x \mid x \not \in x \} \), then \(R \in R \iff R \not \in R \)
Cue mind exploding.
“Grelling’s paradox” is a startling variant which uses adjectives instead of sets. If adjectives are divided into two classes, autological (self-descriptive) and heterological (non-self-descriptive), then, is ‘heterological’ heterological? Try it!
Or, the so-called Liar Paradox was another such paradox which shred apart whatever concept of ‘computability’ was, at that time - the notion that things could either be true or false.
Epimenides was a Cretan, who made one immortal statement:
“All Cretans are liars.”
If all Cretans are liars, and Epimenides was a Cretan, then he was lying when he said that “All Cretans are liars”. But wait, if he was lying then, how can we ‘prove’ that he wasn’t lying about lying? Ein?
This is what makes it a paradox: A statement so rudely violating the assumed dichotomy of statements into true and false, because if you tentatively think it’s true, it backfires on you and make you think that it is false. And a similar backfire occurs if you assume that the statement is false. Go ahead, try it!
If you look closely, there is one common culprit in all of these paradoxes, namely ‘self-reference’. Let’s look at it more closely.
If self-reference, or what Douglas Hofstadter - whose prolific work on the subject matter has inspired this blog post - calls ‘Strange Loopiness’ was the source of all these paradoxes, it made perfect sense to just banish self-reference, or anything which allowed it to occur. Russell and Whitehead, two rebel mathematicians of the time, who subscribed to this point of view, set forward and undertook the mammoth exercise, namely “Principia Mathematica”, which we as we will see in a little while, was utterly demolished by Gödel’s findings.
The main thing which made it difficult to ban self-reference was that it was hard to pin point where exactly did the self-reference occur. It may as well be spread out over several steps, as in this ‘expanded’ version of Epimenides:
The next statement is a lie.
The previous statement is true.
Russell and Whitehead, in P.M. then, came up with a multi-hierarchy set theory to deal with this. The basic idea was that a set of the lowest ‘type’ could only contain ‘objects’ as members (not sets). A set of the next type could then only either contain objects, or sets of lower types. This, implicitly banished self-reference.
Since, all sets must have a type, a set ‘which contains all sets which are not members of themselves’ is not a set at all, and thus you can say that Russell’s paradox was dealt with.
Similarly, if an attempt is made towards applying the expanded Epimenides to this theory, it must fail as well, for the first sentence to make a reference to the second one, it has to be hierarchically above it - in which case, the second one can’t loop back to the first one.
Thirty one years after David Hilbert set before the academia to rigorously demonstrate that the system defined in Principia Mathematica was both consistent (contradiction-free) and complete (i.e. every true statement could be evaluated to true within the methods provided by P.M.), Gödel published his famous Incompleteness Theorem. By importing the Epimenides Paradox right into the heart of P.M., he proved that not just the axiomatic system developed by Russell and Whitehead, but none of the axiomatic systems whatsoever were complete without being inconsistent.
Clear enough, P.M. lost it’s charm in the realm of academics.
Before Gödel’s work too, P.M. wasn’t particularly loved as well.
Why?
It isn’t just limited to this blog post, but we humans, in general, have a diet for self-reference - and this quirky theory severely limits our ability to abstract away details - something which we love, not only as programmers, but as linguists too - so much so, that the preceding paragraph, “It isn’t … this blog … we humans …” would be doubly forbidden because the ‘right’ to mention ‘this blog post’ is limited only to something which is hierarchically above blog posts, ‘metablog-posts’. Secondly, me (presumably a human) belonging to the class ‘we’ can’t mention ‘we’ either.
Since, we humans, love self-reference so much, let’s discuss some ways in which it can be expressed in written form.
One way of making such a strange loop, and perhaps the ‘simplest’ is using the word ‘this’. Here:
Another amusing trick for creating a self-reference without using the word ‘this sentence’ is to quote the sentence inside itself.
Someone may come up with:
The sentence ‘The sentence contains five words’ contains five words.
But, such an attempt must fail, for to quote a finite sentence inside itself would mean that the sentence is smaller than itself. However, infinite sentences can be self-referenced this way.
The sentence
"The sentence
"The sentence
...etc
...etc
is infinitely long"
is infinitely long"
is infinitely long"
There’s a third method as well, which you already saw in the title - the Quine method. The term ‘Quine’ was coined by Douglas Hofstadter in his book “Gödel Escher, Bach” (which heavily inspires this blog post). When using this, the self-reference is ‘generated’ by describing a typographical entity, isomorphic to the quine sentence itself. This description is carried in two parts - one is a set of ‘instructions’ about how to ‘build’ the sentence, and the other, the ‘template’ contains information about the construction materials required.
The Quine version of Epimenides would be:
“yields falsehood when preceded by it’s quotation” yields falsehood when preceded by it’s quotation
Before going on with ‘quining’, let’s take a moment and realize how awfully powerful our cognitive capacities are, and what goes in our head when a cognitive payload full of self-references is delivered - in order to decipher it, we not only need to know the language, but also need to work out the referent of the phrase analogous to ‘this sentence’ in that language. This parsing depends on our complex, yet totally assimilated ability to handle the language.
The idea of referring to itself is quite mind-blowing, and we keep doing it all the time — perhaps, why it feels so ‘easy’ for us to do so. But, we aren’t born that way, we grow that way. This could better be realized by telling someone much younger “This sentence is wrong.”. They’d probably be confused - What sentence is wrong?. The reason why it’s so simple for self-reference to occur, and hence allow paradoxes, in our language, is well, our language. It allows our brain to do the heavy lifting of what the author is trying to get through us, without being verbose.
Back to Quines.
Now, that we are aware of how ‘quines’ can manifest as self-reference, it would be interesting to see how the same technique can be used by a computer program to ‘reproduce’ itself.
To make it further interesting, we shall choose the language most apt for the purpose - brainfuck:
>>>>+++++++++++++++++++++++++++++++++++++++++++>+++++++++++++++++++++++++++++++++++++++++++>+++++++++++++++++++++++++++++++++++++++++++>+++++++++++++++++++++++++++++++++++++++++++>+++++++++++++++++++++++++++++++++++++++++++>+++++++++++++++++++++++++++++++++++++++++++>+++++++++++++++++++++++++++++++++++++++++++>+++++++++++++++++++++++++++++++++++++++++++>+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++>+++++++++++++++++++++++++++++++++++++++++++++>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++>+++++++++++++++++++++++++++++++++++++++++++>+++++++++++++++++++++++++++++++++++++++++++>+++++++++++++++++++++++++++++++++++++++++++>+++++++++++++++++++++++++++++++++++++++++++>+++++++++++++++++++++++++++++++++++++++++++>+++++++++++++++++++++++++++++++++++++++++++>+++++++++++++++++++++++++++++++++++++++++++>+++++++++++++++++++++++++++++++++++++++++++>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++>+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++>+++++++++++++++++++++++++++++++++++++++++++++>+++++++++++++++++++++++++++++++++++++++++++++>++++++++++++++++++++++++++++++++++++++++++++++>++++++++++++++++++++++++++++++++++++++++++++++>++++++++++++++++++++++++++++++++++++++++++++++>++++++++++++++++++++++++++++++++++++++++++++++>+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++>+++++++++++++++++++++++++++++++++++++++++++++>+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++>+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++>+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++>+++++++++++++++++++++++++++++++++++++++++++>+++++++++++++++++++++++++++++++++++++++++++>+++++++++++++++++++++++++++++++++++++++++++>+++++++++++++++++++++++++++++++++++++++++++>+++++++++++++++++++++++++++++++++++++++++++>+++++++++++++++++++++++++++++++++++++++++++>+++++++++++++++++++++++++++++++++++++++++++>+++++++++++++++++++++++++++++++++++++++++++>+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++>+++++++++++++++++++++++++++++++++++++++++++++>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++>+++++++++++++++++++++++++++++++++++++++++++>+++++++++++++++++++++++++++++++++++++++++++>+++++++++++++++++++++++++++++++++++++++++++>+++++++++++++++++++++++++++++++++++++++++++>+++++++++++++++++++++++++++++++++++++++++++>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++>+++++++++++++++++++++++++++++++++++++++++++>+++++++++++++++++++++++++++++++++++++++++++>+++++++++++++++++++++++++++++++++++++++++++>+++++++++++++++++++++++++++++++++++++++++++>+++++++++++++++++++++++++++++++++++++++++++>+++++++++++++++++++++++++++++++++++++++++++>+++++++++++++++++++++++++++++++++++++++++++>+++++++++++++++++++++++++++++++++++++++++++>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++>+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++>+++++++++++++++++++++++++++++++++++++++++++>+++++++++++++++++++++++++++++++++++++++++++>+++++++++++++++++++++++++++++++++++++++++++>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++>+++++++++++++++++++++++++++++++++++++++++++++>+++++++++++++++++++++++++++++++++++++++++++++>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++>+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++>+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++>+++++++++++++++++++++++++++++++++++++++++++++>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++>+++++++++++++++++++++++++++++++++++++++++++>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++>++++++++++++++++++++++++++++++++++++++++++++++>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++>+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++>++++++++++++++++++++++++++++++++++++++++++++++>+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++>+++++++++++++++++++++++++++++++++++++++++++++>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++>+++++++++++++++++++++++++++++++++++++++++++>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++>+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++>+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++>+++++++++++++++++++++++++++++++++++++++++++++>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++>+++++++++++++++++++++++++++++++++++++++++++>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++>+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++>+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++>+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++>+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++>+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++>++++++++++++++++++++++++++++++++++++++++++++++>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++>+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++>++++++++[->++++++++<]>--....[-]<<[<]<<++++++++[->+++++>++++++++<<]>+++>-->[[-<<<+>.>>]<.[->+<]<[->+<]>>>]<<<<[<]>[.>]
Running that program above produces itself as the output. I agree, it isn’t the most descriptive program in the world, so written in Python below, is the nearest we can go to describe what’s happening inside those horrible chains of +’s and >’s:
THREE_QUOTES = '"' * 3
def eniuq(template): print(
f'{template}({THREE_QUOTES}{template}{THREE_QUOTES})')
eniuq("""THREE_QUOTES = '"' * 3
def eniuq(template): print(
f'{template}({THREE_QUOTES}{template}{THREE_QUOTES})')
eniuq""")
The first line generates """
on the fly, which marks multiline strings in
Python.
Next two lines define the eniuq
function, which prints the argument template
twice - once, plain and then surrounded with triple quotes.
The last 4 lines cleverly call this function so that the output of the program is the source code itself.
Since we are printing in an order opposite of quining, the name of the function
is ‘quine’ reversed -> eniuq
(name stolen from Hofstadter again)
Remember the discussion about how self-reference capitalizes on the processor? What if ‘quining’ was a built-in feature of the language, providing what we in programmer lingo call ‘syntactic sugar’?
Let’s assume that an asterisk, *
in the brainfuck interpreter would copy the
instructions before executing them, what would then be the output of the
following program?
*
It’d be an asterisk again. You could make an argument that this is silly, and should be counted as ‘cheating’. But, it’s the same as relying on the processor, like using “this sentence” to refer to this sentence - you rely on your brain to do the inference for you.
What if eniuq
was a builtin keyword in Python? A perfect self-rep was then
just be a call away:
eniuq('eniuq')
What if quine
was a verb in the English language? We could reduce a lot of
explicit cognitive processes required for inference. The Epimenides paradox
would then be:
“yields falsehood if quined” yields falsehood if quined
Now, that we are talking about self-rep, here’s one last piece of entertainment for you.
This formula is defined through an inequality:
\({1 \over 2} < \left\lfloor \mathrm{mod}\left(\left\lfloor {y \over 17} \right\rfloor 2^{-17 \lfloor x \rfloor - \mathrm{mod}(\lfloor y\rfloor, 17)},2\right)\right\rfloor\)
If you take that absurd thing above, and move around in the cartesian plane for the coordinates \(0 \le x \le 106, k \le y \le k + 17\), where \(k\) is a 544 digit integer (just hold on with me here), color every pixel black for True, and white otherwise, you'd get:
This doesn't end here. If \(k\) is now replaced with another integer containing 291 digits, we get yours truly:
The Tex User Group 2019 conference was held between August 9-11, 2019 at Sheraton Palo Alto Hotel, in Palo Alto, California.
I wanted to attend TUG 2019 for two main reasons - to present my work on the “XeTeX Book Template”, and also to meet my favourite computer scientist, Prof. Donald Knuth. He does not travel much, so, it was one of those rare opportunities for me to meet him in person. His creation of the TeX computer typesetting system, where you can represent any character mathematically, and also be able to program and transform it is beautiful, powerful and the best typesetting software in the world. I have been using TeX extensively for my documentation and presentations over the years.
I reached the hotel venue only in the afternoon of Friday, August 9, 2019, as I was also visiting Mountain View/San Jose on official work. I quickly checked into the hotel and completed my conference registration formalities. When I entered the hall, Rishi T from STM Document Engineering Private Limited, Thiruvananthapuram was presenting a talk on “Neptune - a proofing framework for LaTeX authors”. His talk was followed by an excellent poetic narration by Pavneet Arora, who happened to be a Vim user, but, also mentioned that he was eager to listen to my talk on XeTeX and GNU Emacs.
After a short break, Shreevatsa R, shared his experiences on trying to understand the TeX source code, and the lessons learnt in the process. It was a very informative, user experience report on the challenges he faced in navigating and learning the TeX code. Petr Sojka, from Masaryk University, Czech Republic, shared his students’ experience in using TeX with a detailed field report. I then proceeded to give my talk on the “XeTeX Book Template” on creating multi-lingual books using GNU Emacs and XeTeX. It was well received by the audience. The final talk of the day was by Jim Hefferon, who analysed different LaTeX group questions from newbies and in StackExchange, and gave a wonderful summary of what newbies want. He is a professor of Mathematics at Saint Michael’s College, and is well-known for his book on Linear Algebra, prepared using LaTeX. It was good to meet him, as he is also a Free Software contributor.
The TUG Annual General Meeting followed with discussions on how to grow the TeX community, the challenges faced, membership fees, financial reports, and plan for the next TeX user group conference.
The second day of the conference began with Petr Sojka and Ondřej Sojka presenting on “The unreasonable effectiveness of pattern generation”. They discussed the Czech hyphenation patterns along with a pattern generation case study. This talk was followed by Arthur Reutenauer presenting on “Hyphenation patterns in TeX Live and beyond”. David Fuchs, a student who worked with Prof. Donald Knuth on the TeX project in 1978, then presented on “What six orders of magnitude of space-time buys you”, where he discussed the design trade-offs in TeX implementation between olden days and present day hardware.
After a short break, Tom Rokicki, who was also a student at Stanford and worked with Donald Knuth on TeX, gave an excellent presentation on searching and copying text in PDF documents generated by TeX for Type-3 bitmap fonts. This session was followed by Martin Ruckert’s talk on “The design of the HINT file format”, which is intended as a replacement of the DVI or PDF file format for on-screen reading of TeX output. He has also authored a book on the subject - “HINT: The File Format: Reflowable Output for TeX”. Doug McKenna had implemented an interactive iOS math book with his own TeX interpreter library. This allows you to dynamically interact with the typeset document in a PDF-free ebook format, and also export the same. We then took a group photo:
I then had to go to Stanford, so missed the post-lunch sessions, but, returned for the banquet dinner in the evening. I was able to meet and talk with Prof. Donald E. Knuth in person. Here is a memorable photo!
He was given a few gifts at the dinner, and he stood up and thanked everyone and said that “He stood on the shoulders of giants like Isaac Newton and Albert Einstein.”
< />
I had a chance to meet a number of other people who valued the beauty, precision and usefulness of TeX. Douglas Johnson had come to the conference from Savannah, Georgia and is involved in the publishing industry. Rohit Khare, from Google, who is active in the Representational State Transfer (ReST) community shared his experiences with typesetting. Nathaniel Stemen is a software developer at Overleaf, which is used by a number of university students as an online, collaborative LaTeX editor. Joseph Weening, who was also once a student to Prof. Donald Knuth, and is at present a Research Staff member at the Institute for Defense Analyses Center for Communications Research in La Jolla, California (IDA/CCR-L) shared his experiences in working with the TeX project.
The final day of the event began with Antoine Bossard talking on “A glance at CJK support with XeTeX and LuaTeX”. He is an Associate Professor of the Graduate School of Science, Kanagawa University, Japan. He has been conducting research regarding Japanese characters and their memorisation. This session was followed by a talk by Jaeyoung Choi on “FreeType MF Module 2: Integration of Metafont and TeX-oriented bitmap fonts inside FreeType”. Jennifer Claudio then presented the challenges in improving Hangul to English translation.
After a short break, Rishi T presented “TeXFolio - a framework to typeset XML documents using TeX”. Boris Veytsman then presented the findings on research done at the College of Information and Computer Science, University of Massachusetts, Amherst on “BibTeX-based dataset generation for training citation parsers”. The last talk before lunch was by Didier Verna on “Quickref: A stress test for Texinfo”. He teaches at École Pour l’Informatique et les Techniques Avancées, and is a maintainer of XEmacs, Gnus and BBDB. He also an avid Lisper and one of the organizers of the European Lisp Symposium!
After lunch, Uwe Ziegenhagen demonstrated on using LaTeX to prepare and automate exams. This was followed by a field report by Yusuke Terada, on how they use TeX to develop a digital exam grading system at large scale in Japan. Chris Rowley, from the LaTeX project, then spoke on “Accessibility in the LaTeX kernel - experiments in tagged PDF”. Ross Moore joined remotely for the final session of the day to present on “LaTeX 508 - creating accessible PDFs”. The videos of both of these last two talks are available online.
A number of TeX books were made available for free for the participants, and I grabbed quite a few, including a LaTeX manual written by Leslie Lamport. Overall, it was a wonderful event, and it was nice to meet so many like-minded Free Software people.
A special thanks to Karl Berry, who put in a lot of effort in organizing the conference, but, could not make it to due to a car accident.
The TeX User Group Conference in 2020 is scheduled to be held at my alma mater, Rochester Institute of Technology.
Happy Birthday Dear me!
I was still up at this unearthly hour, thinking about life for a while now - fumbled thoughts about where I had come, where I started, and quite expectedly, Omar Bhai, your name popped in.
The stream continued. I started thinking about everything I’ve learned from you and was surprised with merely the sheer volume of thoughts that followed. I felt nostalgic!
I made a mental note to type this out the next day.
I wanted to do this when we said our final goodbyes and you left for the States, but thank God, I didn’t - I knew that I would miss you, but never could I have guessed that it would be so overwhelming - I would’ve never written it as passionately as I do today.
For those of you who don’t already know him, here’s a picture:
I’m a little emotional right now, so please bear with me.
You have been warned - the words “thank you” and “thanks” appear irritatingly often below. I tried changing, but none other has quite the same essence.
Well, let’s start with this - thank you for kicking me on my behind, albeit civilly, whenever I would speak nuisance (read chauvinism). I can’t thank you enough for that!
I still can’t quite get how you tolerated the bigot I was and managed to be calm and polite. Thank You for teaching me what tolerance is!
Another thing which I learnt from you was what it meant to be privileged. I can no longer see things the way I used to, and this has made a huge difference. Thank You!
I saw you through your bad times and your good. The way you tackled problems, and how easy you made it look. Well, it taught me [drum roll] how to think (before acting and not the other way round). Thank You for that too!
And, thank you for buying me books, and even more so, lending away so many of them! and even more so, educating me about why to read books and how to read them. I love your collection.
You showed all of us, young folks, how powerful effective communication is. Thank You again for that! I know, you never agree on this, but you are one hell of a speaker. I’ve always been a fan of you and your puns.
I wasn’t preparing for the GRE, but I sat in your sessions anyways, just to see you speak. The way you connect with the audience is just brilliant.
For all the advice you gave me on my relationships with people - telling me to back off when I was being toxic and dragging me off when I was on the receiving side - I owe you big time. Thank You!
Also, a hearty thank you for making me taste the best thing ever - yes, fried cheese it is. :D
Thank You for putting your trust and confidence in me!
Thank you for all of this, and much more!
Yours Truly, Rahul
September 11, 2019
It’s been a very long time since I wrote here for the last.
The reason is nothing big but mainly because:
And, yes, there is no denying the fact that I was procastinating too because writing seems to be really hard at most times.
Though I had worked on many things throughout the time and I’ll try to write them here as short and quick logs below.
This one question always came up, many times, the students managed to destroy their systems by doing random things.
Kushal Dasrm -rf
is always one of the various commands in this regard.
rm -rf
to clean some of the left-over dependencies of some mail packages, but that eventually resulted into machine being crashed. It was not the end of the mess this time. I made an another extremely big mistake meanwhile. I was trying to back up the crashed system, into an external hard disk using dd
. But because I had never used dd
before, so again I did something wrong and this time, I ended up losing ~500 GBs of backed up data. This is “the biggest mistake” and “the biggest lesson” I have learnt so far. ext4
file system for linux backup and the other one as ntfs
for everything else. Thank you so much jasonbraganza for all the help and extremely useful suggestions during the time.
Thank you very much kushal for the RPi and an another huge thanks for providing me with all the guidance and support that made me reach to even what I am today.
Voila, finally, I finish compiling up the logs from some last 20 days of work and other stuffs. (and thus, I am eventually finishing my long pending task of writing this post here as well).
I will definitely try to be more consistent with my writing from now onwards.
That’s all for now. o/
In my last blog, I quoted
I'm an advocate of using SSH authentication and connecting to services like Github, Gitlab, and many others.
On this, I received a bunch of messages over IRC asking why do I prefer SSH for Git over HTTPS.
I find the Github documentation quite helpful when it comes down to learning the basic operation of using Git and Github. So, what has Github to say about "SSH v/s HTTPS"?
Github earlier used to recommend using SSH, but they later changed it to HTTPS. The reason for the Github's current recommendation could be:
SSH keys provide Github with a way to trust a computer. For every machine that I have, I maintain a separate set of keys. I upload the public keys to Github or whichever Git-forge I'm using. I also maintain a separate set of keys for the websites. So, for example, if I have 2 machines and I use Github and Pagure then I end up maintaining 4 keys. This is like a 1-to-1 connection of the website and the machine.
SSH is secure until you end up losing your private key. If you do end up losing your key, even then you can just login using your username/password and delete the particular key from Github. I agree, that the attacker can do nasty things but that would be limited to repositories and you would have control of your account to quickly mitigate the problem.
On the other side, if you end up losing your Github username/password to an attacker, you lose everything.
I also once benefitted from using SSH with Github, but IMO, exposing that also exposes a vulnerability so I'll just keep it a secret :)
Also, if you are on a network that has SSH blocked, you can always tunnel it over HTTPS.
But, above all, do use 2-factor authentication that Github provides. It's an extra layer of security to your account.
If you have other thoughts on the topic, do let me know over twitter @yudocaa, or drop me an email.
Photo by Christian Wiediger on Unsplash
This blog is more like a bookmark for me, the solution was scavenged from internet. Recently I have been working on an analytics project where I had to generate pivot transpose tables from the data. Now this is the first time I faced the limitations set on postgres database. Since its a pivot, one of my column would be transposed and used as column names here, this is where things started breaking. Writing to postgres failed with error stating column names are not unique. After some digging I realized Postgres has a column name limitation of 63 bytes and anything more than that will be truncated hence post truncate multiple keys became the same causing this issue.
Next step was to look at the data in my column, it ranged from 20-300 characters long. I checked with redshift and Bigquery they had similar limitations too, 128 bytes. After looking for sometime found a solution, downloaded the postgres source, changed NAMEDATALEN to 301(remember column name length is always NAMEDATALEN – 1) src/include/pg_config_manual.h
, followed the steps from postgres docs to compile the source and install and run postgres. This has been tested on Postgres 9.6 as of now and it works.
Next up I faced issues with maximum number columns, my pivot table had 1968 columns and postgres has a limitation of 1600 total columns. According to this answer I looked into the source comments and that looked quite overwhelming . Also I do not have a control over how many columns will be there post pivot so no matter whatever value i set , in future i might need more columns, so instead I handled the scenario in my application code to split the data across multiple tables and store them.
References:
Last updated:
January 28, 2021 08:01 AM
All times are UTC.
Powered by: